1
|
Forni D, Pozzoli U, Mozzi A, Cagliani R, Sironi M. Depletion of CpG dinucleotides in bacterial genomes may represent an adaptation to high temperatures. NAR Genom Bioinform 2024; 6:lqae088. [PMID: 39071851 PMCID: PMC11282364 DOI: 10.1093/nargab/lqae088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Revised: 06/17/2024] [Accepted: 07/18/2024] [Indexed: 07/30/2024] Open
Abstract
Dinucleotide biases have been widely investigated in the genomes of eukaryotes and viruses, but not in bacteria. We assembled a dataset of bacterial genomes (>15 000), which are representative of the genetic diversity in the kingdom Eubacteria, and we analyzed dinucleotide biases in relation to different traits. We found that TpA dinucleotides are the most depleted and that CpG dinucleotides show the widest dispersion. The abundances of both dinucleotides vary with genomic G + C content and show a very strong phylogenetic signal. After accounting for G + C content and phylogenetic inertia, we analyzed different bacterial lifestyle traits. We found that temperature preferences associate with the abundance of CpG dinucleotides, with thermophiles/hyperthemophiles being particularly depleted. Conversely, the TpA dinucleotide displays a bias that only depends on genomic G + C composition. Using predictions of intrinsic cyclizability we also show that CpG depletion may associate with higher DNA bendability in both thermophiles/hyperthermophiles and mesophiles, and that the former are predicted to have significantly more flexible genomes than the latter. We suggest that higher bendability is advantageous at high temperatures because it facilitates DNA positive supercoiling and that, through modulation of DNA mechanical properties, local or global CpG depletion controls genome organization, most likely not only in bacteria.
Collapse
Affiliation(s)
- Diego Forni
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, 23842 Bosisio Parini, Italy
| | - Uberto Pozzoli
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, 23842 Bosisio Parini, Italy
| | - Alessandra Mozzi
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, 23842 Bosisio Parini, Italy
| | - Rachele Cagliani
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, 23842 Bosisio Parini, Italy
| | - Manuela Sironi
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, 23842 Bosisio Parini, Italy
| |
Collapse
|
2
|
Qiu Y, Kang YM, Korfmann C, Pouyet F, Eckford A, Palazzo AF. The GC-content at the 5' ends of human protein-coding genes is undergoing mutational decay. Genome Biol 2024; 25:219. [PMID: 39138526 PMCID: PMC11323403 DOI: 10.1186/s13059-024-03364-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Accepted: 07/31/2024] [Indexed: 08/15/2024] Open
Abstract
BACKGROUND In vertebrates, most protein-coding genes have a peak of GC-content near their 5' transcriptional start site (TSS). This feature promotes both the efficient nuclear export and translation of mRNAs. Despite the importance of GC-content for RNA metabolism, its general features, origin, and maintenance remain mysterious. We investigate the evolutionary forces shaping GC-content at the transcriptional start site (TSS) of genes through both comparative genomic analysis of nucleotide substitution rates between different species and by examining human de novo mutations. RESULTS Our data suggests that GC-peaks at TSSs were present in the last common ancestor of amniotes, and likely that of vertebrates. We observe that in apes and rodents, where recombination is directed away from TSSs by PRDM9, GC-content at the 5' end of protein-coding gene is currently undergoing mutational decay. In canids, which lack PRDM9 and perform recombination at TSSs, GC-content at the 5' end of protein-coding is increasing. We show that these patterns extend into the 5' end of the open reading frame, thus impacting synonymous codon position choices. CONCLUSIONS Our results indicate that the dynamics of this GC-peak in amniotes is largely shaped by historic patterns of recombination. Since decay of GC-content towards the mutation rate equilibrium is the default state for non-functional DNA, the observed decrease in GC-content at TSSs in apes and rodents indicates that the GC-peak is not being maintained by selection on most protein-coding genes in those species.
Collapse
Affiliation(s)
- Yi Qiu
- Department of Biochemistry, University of Toronto, Toronto, Ontario, M5G1M1, Canada
| | - Yoon Mo Kang
- Department of Biochemistry, University of Toronto, Toronto, Ontario, M5G1M1, Canada
| | - Christopher Korfmann
- Department of Electrical Engineering and Computer Science, York University, Toronto, Ontario, M3J1P3, Canada
| | - Fanny Pouyet
- Laboratoire Interdisciplinaire des Sciences du Numérique, Université Paris-Saclay, 91190, Gif-sur-Yvette, France
| | - Andrew Eckford
- Department of Electrical Engineering and Computer Science, York University, Toronto, Ontario, M3J1P3, Canada
| | - Alexander F Palazzo
- Department of Biochemistry, University of Toronto, Toronto, Ontario, M5G1M1, Canada.
| |
Collapse
|
3
|
Radrizzani S, Kudla G, Izsvák Z, Hurst LD. Selection on synonymous sites: the unwanted transcript hypothesis. Nat Rev Genet 2024; 25:431-448. [PMID: 38297070 DOI: 10.1038/s41576-023-00686-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/04/2023] [Indexed: 02/02/2024]
Abstract
Although translational selection to favour codons that match the most abundant tRNAs is not readily observed in humans, there is nonetheless selection in humans on synonymous mutations. We hypothesize that much of this synonymous site selection can be explained in terms of protection against unwanted RNAs - spurious transcripts, mis-spliced forms or RNAs derived from transposable elements or viruses. We propose not only that selection on synonymous sites functions to reduce the rate of creation of unwanted transcripts (for example, through selection on exonic splice enhancers and cryptic splice sites) but also that high-GC content (but low-CpG content), together with intron presence and position, is both particular to functional native mRNAs and used to recognize transcripts as native. In support of this hypothesis, transcription, nuclear export, liquid phase condensation and RNA degradation have all recently been shown to promote GC-rich transcripts and suppress AU/CpG-rich ones. With such 'traps' being set against AU/CpG-rich transcripts, the codon usage of native genes has, in turn, evolved to avoid such suppression. That parallel filters against AU/CpG-rich transcripts also affect the endosomal import of RNAs further supports the unwanted transcript hypothesis of synonymous site selection and explains the similar design rules that have enabled the successful use of transgenes and RNA vaccines.
Collapse
Affiliation(s)
- Sofia Radrizzani
- Milner Centre for Evolution, Department of Life Sciences, University of Bath, Bath, UK
- Milner Therapeutics Institute, Jeffrey Cheah Biomedical Centre, University of Cambridge, Cambridge, UK
| | - Grzegorz Kudla
- MRC Human Genetics Unit, Institute for Genetics and Cancer, The University of Edinburgh, Edinburgh, UK
| | - Zsuzsanna Izsvák
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Society, Berlin, Germany
| | - Laurence D Hurst
- Milner Centre for Evolution, Department of Life Sciences, University of Bath, Bath, UK.
| |
Collapse
|
4
|
Forni D, Pozzoli U, Cagliani R, Clerici M, Sironi M. Dinucleotide biases in RNA viruses that infect vertebrates or invertebrates. Microbiol Spectr 2023; 11:e0252923. [PMID: 37800906 PMCID: PMC10714974 DOI: 10.1128/spectrum.02529-23] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 08/12/2023] [Indexed: 10/07/2023] Open
Abstract
IMPORTANCE Akin to a molecular signature, dinucleotide composition can be exploited by the zinc-finger antiviral protein (ZAP) to restrict CpG-rich (and UpA-rich) RNA viruses. ZAP evolved in tetrapods, and it is not encoded by invertebrates and fish. Because a systematic analysis is missing, we analyzed the genomes of RNA viruses that infect vertebrates or invertebrates. We show that vertebrate single-stranded (ss) RNA(+) viruses and, to a lesser extent, double-stranded RNA viruses tend to have stronger CpG bias than invertebrate viruses. Conversely, ssRNA(-) viruses have similar dinucleotide composition whether they infect vertebrates or invertebrates. Analysis of ssRNA(+) viruses that infect mammals, reptiles, and fish indicated that ZAP is unlikely to be a major driver of CpG depletion. We also show that, compared to other coronaviruses, the genome of SARS-CoV-2 is not homogeneously CpG-depleted. Our study provides new insights into virus evolution and strategies for recoding RNA virus genomes.
Collapse
Affiliation(s)
- Diego Forni
- Bioinformatics Lab, Scientific Institute IRCCS E. MEDEA, Bosisio Parini, Italy
| | - Uberto Pozzoli
- Bioinformatics Lab, Scientific Institute IRCCS E. MEDEA, Bosisio Parini, Italy
| | - Rachele Cagliani
- Bioinformatics Lab, Scientific Institute IRCCS E. MEDEA, Bosisio Parini, Italy
| | - Mario Clerici
- Department of Physiopathology and Transplantation, University of Milan, Milan, Italy
- Don C. Gnocchi Foundation ONLUS, IRCCS, Milan, Italy
| | - Manuela Sironi
- Bioinformatics Lab, Scientific Institute IRCCS E. MEDEA, Bosisio Parini, Italy
| |
Collapse
|
5
|
Molteni C, Forni D, Cagliani R, Bravo IG, Sironi M. Evolution and diversity of nucleotide and dinucleotide composition in poxviruses. J Gen Virol 2023; 104. [PMID: 37792576 DOI: 10.1099/jgv.0.001897] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/06/2023] Open
Abstract
Poxviruses (family Poxviridae) have long dsDNA genomes and infect a wide range of hosts, including insects, birds, reptiles and mammals. These viruses have substantial incidence, prevalence and disease burden in humans and in other animals. Nucleotide and dinucleotide composition, mostly CpG and TpA, have been largely studied in viral genomes because of their evolutionary and functional implications. We analysed here the nucleotide and dinucleotide composition, as well as codon usage bias, of a set of representative poxvirus genomes, with a very diverse host spectrum. After correcting for overall nucleotide composition, entomopoxviruses displayed low overall GC content, no enrichment in TpA and large variation in CpG enrichment, while chordopoxviruses showed large variation in nucleotide composition, no obvious depletion in CpG and a weak trend for TpA depletion in GC-rich genomes. Overall, intergenome variation in dinucleotide composition in poxviruses is largely accounted for by variation in overall genomic GC levels. Nonetheless, using vaccinia virus as a model, we found that genes expressed at the earliest times in infection are more CpG-depleted than genes expressed at later stages. This observation has parallels in betahepesviruses (also large dsDNA viruses) and suggests an antiviral role for the innate immune system (e.g. via the zinc-finger antiviral protein ZAP) in the early phases of poxvirus infection. We also analysed codon usage bias in poxviruses and we observed that it is mostly determined by genomic GC content, and that stratification after host taxonomy does not contribute to explaining codon usage bias diversity. By analysis of within-species diversity, we show that genomic GC content is the result of mutational biases. Poxvirus genomes that encode a DNA ligase are significantly AT-richer than those that do not, suggesting that DNA repair systems shape mutation biases. Our data shed light on the evolution of poxviruses and inform strategies for their genetic manipulation for therapeutic purposes.
Collapse
Affiliation(s)
- Cristian Molteni
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, Bosisio Parini, Italy
| | - Diego Forni
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, Bosisio Parini, Italy
| | - Rachele Cagliani
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, Bosisio Parini, Italy
| | - Ignacio G Bravo
- Laboratoire MIVEGEC (Univ Montpellier CNRS, IRD), Centre National de la Recherche Scientifique, Montpellier, France
| | - Manuela Sironi
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, Bosisio Parini, Italy
| |
Collapse
|
6
|
King KM, Rajadhyaksha EV, Tobey IG, Van Doorslaer K. Synonymous nucleotide changes drive papillomavirus evolution. Tumour Virus Res 2022; 14:200248. [PMID: 36265836 PMCID: PMC9589209 DOI: 10.1016/j.tvr.2022.200248] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 10/11/2022] [Accepted: 10/12/2022] [Indexed: 11/06/2022] Open
Abstract
Papillomaviruses have been evolving alongside their hosts for at least 450 million years. This review will discuss some of the insights gained into the evolution of this diverse family of viruses. Papillomavirus evolution is constrained by pervasive purifying selection to maximize viral fitness. Yet these viruses need to adapt to changes in their environment, e.g., the host immune system. It has long been known that these viruses evolved a codon usage that doesn't match the infected host. Here we discuss how papillomavirus genomes evolve by acquiring synonymous changes that allow the virus to avoid detection by the host innate immune system without changing the encoded proteins and associated fitness loss. We discuss the implications of studying viral evolution, lifecycle, and cancer progression.
Collapse
Affiliation(s)
- Kelly M King
- School of Animal and Comparative Biomedical Sciences, University of Arizona, Tucson, AZ, USA
| | - Esha Vikram Rajadhyaksha
- School of Animal and Comparative Biomedical Sciences, University of Arizona, Tucson, AZ, USA; Department of Physiology and Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, USA
| | - Isabelle G Tobey
- Cancer Biology Graduate Interdisciplinary Program, University of Arizona, Tucson, AZ, USA
| | - Koenraad Van Doorslaer
- School of Animal and Comparative Biomedical Sciences, University of Arizona, Tucson, AZ, USA; Cancer Biology Graduate Interdisciplinary Program, University of Arizona, Tucson, AZ, USA; The BIO5 Institute, The Department of Immunobiology, Genetics Graduate Interdisciplinary Program, UA Cancer Center, University of Arizona Tucson, Arizona, USA.
| |
Collapse
|
7
|
Odon V, Fiddaman SR, Smith AL, Simmonds P. Comparison of CpG- and UpA-mediated restriction of RNA virus replication in mammalian and avian cells and investigation of potential ZAP-mediated shaping of host transcriptome compositions. RNA (NEW YORK, N.Y.) 2022; 28:1089-1109. [PMID: 35675984 PMCID: PMC9297844 DOI: 10.1261/rna.079102.122] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Accepted: 05/26/2022] [Indexed: 06/15/2023]
Abstract
The ability of zinc finger antiviral protein (ZAP) to recognize and respond to RNA virus sequences with elevated frequencies of CpG dinucleotides has been proposed as a functional part of the vertebrate innate immune antiviral response. It has been further proposed that ZAP activity shapes compositions of cytoplasmic mRNA sequences to avoid self-recognition, particularly mRNAs for interferons (IFNs) and IFN-stimulated genes (ISGs) expressed during the antiviral state. We investigated whether restriction of the replication of mutants of influenza A virus (IAV) and the echovirus 7 (E7) replicon with high CpG and UpA frequencies varied in different species of mammals and birds. Cell lines from different bird orders showed substantial variability in restriction of CpG-high mutants of IAV and E7 replicons, whereas none restricted UpA-high mutants, in marked contrast to universal restriction of both mutants in mammalian cells. Dinucleotide representation in ISGs and IFN genes was compared with those of cellular transcriptomes to determine whether potential differences in inferred ZAP activity between species shaped dinucleotide compositions of highly expressed genes during the antiviral state. While mammalian type 1 IFN genes typically showed often profound suppression of CpG and UpA frequencies, there was no oversuppression of either in ISGs in any species, irrespective of their ability to restrict CpG- or UpA-high mutants. Similarly, genome sequences of mammalian and avian RNA viruses were compositionally equivalent, as were IAV strains recovered from ducks, chickens and humans. Overall, we found no evidence for host variability in inferred ZAP function shaping host or viral transcriptome compositions.
Collapse
Affiliation(s)
- Valerie Odon
- Nuffield Department of Medicine, Peter Medawar Building for Pathogen Research, University of Oxford, Oxford OX1 3SY, United Kingdom
| | - Steven R Fiddaman
- Department of Zoology, Peter Medawar Building for Pathogen Research, University of Oxford, Oxford OX1 3SY, United Kingdom
| | - Adrian L Smith
- Department of Zoology, Peter Medawar Building for Pathogen Research, University of Oxford, Oxford OX1 3SY, United Kingdom
| | - Peter Simmonds
- Nuffield Department of Medicine, Peter Medawar Building for Pathogen Research, University of Oxford, Oxford OX1 3SY, United Kingdom
| |
Collapse
|
8
|
Yi SV, Goodisman MAD. The impact of epigenetic information on genome evolution. Philos Trans R Soc Lond B Biol Sci 2021; 376:20200114. [PMID: 33866804 DOI: 10.1098/rstb.2020.0114] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Epigenetic information affects gene function by interacting with chromatin, while not changing the DNA sequence itself. However, it has become apparent that the interactions between epigenetic information and chromatin can, in fact, indirectly lead to DNA mutations and ultimately influence genome evolution. This review evaluates the ways in which epigenetic information affects genome sequence and evolution. We discuss how DNA methylation has strong and pervasive effects on DNA sequence evolution in eukaryotic organisms. We also review how the physical interactions arising from the connections between histone proteins and DNA affect DNA mutation and repair. We then discuss how a variety of epigenetic mechanisms exert substantial effects on genome evolution by suppressing the movement of transposable elements. Finally, we examine how genome expansion through gene duplication is also partially controlled by epigenetic information. Overall, we conclude that epigenetic information has widespread indirect effects on DNA sequences in eukaryotes and represents a potent cause and constraint of genome evolution. This article is part of the theme issue 'How does epigenetics influence the course of evolution?'
Collapse
Affiliation(s)
- Soojin V Yi
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Michael A D Goodisman
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA
| |
Collapse
|
9
|
Pichon F, Shen Y, Busato F, P Jochems S, Jacquelin B, Grand RL, Deleuze JF, Müller-Trutwin M, Tost J. Analysis and annotation of DNA methylation in two nonhuman primate species using the Infinium Human Methylation 450K and EPIC BeadChips. Epigenomics 2021; 13:169-186. [PMID: 33471557 DOI: 10.2217/epi-2020-0200] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Aim: Nonhuman primates are essential for research on many human diseases. The Infinium Human Methylation450/EPIC BeadChips are popular tools for the study of the methylation state across the human genome at affordable cost. Methods: We performed a precise evaluation and re-annotation of the BeadChip probes for the analysis of genome-wide DNA methylation patterns in rhesus macaques and African green monkeys through in silico analyses combined with functional validation by pyrosequencing. Results: Up to 165,847 of the 450K and 261,545 probes of the EPIC BeadChip can be reliably used. The annotation files are provided in a format compatible with a variety of standard bioinformatic pipelines. Conclusion: Our study will facilitate high-throughput DNA methylation analyses in Macaca mulatta and Chlorocebus sabaeus.
Collapse
Affiliation(s)
- Fabien Pichon
- Laboratory for Epigenetics & Environment, Centre National de Recherche en Génomique Humaine, CEA-Institut de Biologie François Jacob, Evry, France
| | - Yimin Shen
- Laboratory for Epigenetics & Environment, Centre National de Recherche en Génomique Humaine, CEA-Institut de Biologie François Jacob, Evry, France.,Laboratory for Bioinformatics, Fondation Jean Dausset - Centre d'Etude du Polymorphisme Humain, 75010 Paris, France
| | - Florence Busato
- Laboratory for Epigenetics & Environment, Centre National de Recherche en Génomique Humaine, CEA-Institut de Biologie François Jacob, Evry, France
| | - Simon P Jochems
- Institut Pasteur, HIV Inflammation & Persistence Unit, Paris, France.,Université Paris Diderot, Sorbonne Paris Cité, Paris, France.,Leiden University Medical Center, Leiden, The Netherlands
| | | | - Roger Le Grand
- Université Paris-Saclay, Inserm, CEA, Center for Immunology of Viral, Auto-immune, Hematological and Bacterial diseases (IMVA-HB/IDMIT), Fontenay-aux-Roses, France
| | - Jean-Francois Deleuze
- Laboratory for Epigenetics & Environment, Centre National de Recherche en Génomique Humaine, CEA-Institut de Biologie François Jacob, Evry, France.,Laboratory for Bioinformatics, Fondation Jean Dausset - Centre d'Etude du Polymorphisme Humain, 75010 Paris, France
| | | | - Jörg Tost
- Laboratory for Epigenetics & Environment, Centre National de Recherche en Génomique Humaine, CEA-Institut de Biologie François Jacob, Evry, France
| |
Collapse
|
10
|
Sun JH, Ai SM, Liu SQ. Methylation-driven model for analysis of dinucleotide evolution in genomes. Theor Biol Med Model 2020; 17:3. [PMID: 32264909 PMCID: PMC7140373 DOI: 10.1186/s12976-020-00122-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2019] [Accepted: 03/10/2020] [Indexed: 11/16/2022] Open
Abstract
Background CpGs, the major methylation sites in vertebrate genomes, exhibit a high mutation rate from the methylated form of CpG to TpG/CpA and, therefore, influence the evolution of genome composition. However, the quantitative effects of CpG to TpG/CpA mutations on the evolution of genome composition in terms of the dinucleotide frequencies/proportions remain poorly understood. Results Based on the neutral theory of molecular evolution, we propose a methylation-driven model (MDM) that allows predicting the changes in frequencies/proportions of the 16 dinucleotides and in the GC content of a genome given the known number of CpG to TpG/CpA mutations. The application of MDM to the 10 published vertebrate genomes shows that, for most of the 16 dinucleotides and the GC content, a good consistency is achieved between the predicted and observed trends of changes in the frequencies and content relative to the assumed initial values, and that the model performs better on the mammalian genomes than it does on the lower-vertebrate genomes. The model’s performance depends on the genome composition characteristics, the assumed initial state of the genome, and the estimated parameters, one or more of which are responsible for the different application effects on the mammalian and lower-vertebrate genomes and for the large deviations of the predicted frequencies of a few dinucleotides from their observed frequencies. Conclusions Despite certain limitations of the current model, the successful application to the higher-vertebrate (mammalian) genomes witnesses its potential for facilitating studies aimed at understanding the role of methylation in driving the evolution of genome dinucleotide composition.
Collapse
Affiliation(s)
- Jian-Hong Sun
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan & School of Life Sciences, Yunnan University, Kunming, 650091, China.,College of Engineering, Honghe University, Mengzi, 661100, China
| | - Shi-Meng Ai
- Department of Applied Mathematics, Yunnan Agricultural University, Kunming, 650201, China
| | - Shu-Qun Liu
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan & School of Life Sciences, Yunnan University, Kunming, 650091, China.
| |
Collapse
|
11
|
Pucci F, Rooman M. Relation between DNA ionization potentials, single base substitutions and pathogenic variants. BMC Genomics 2019; 20:551. [PMID: 31307386 PMCID: PMC6631442 DOI: 10.1186/s12864-019-5867-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Background It is nowadays clear that single base substitutions that occur in the human genome, of which some lead to pathogenic conditions, are non-random and influenced by their flanking nucleobase sequences. However, despite recent progress, the understanding of these "non-local" effects is still far from being achieved. Results To advance this problem, we analyzed the relationship between the base mutability in specific gene regions and the electron hole transport along the DNA base stacks, as it is one of the mechanisms that have been suggested to contribute to these effects. More precisely, we studied the connection between the normalized frequency of single base substitutions and the vertical ionization potential of the base and its flanking sequence, estimated using MP2/6-31G* ab initio quantum chemistry calculations. We found a statistically significant overall anticorrelation between these two quantities: the lower the vIP value, the more probable the substitution. Moreover, the slope of the regression lines varies. It is larger for introns than for exons and untranslated regions, and for synonymous than for missense substitutions. Interestingly, the correlation appears to be more pronounced when considering the flanking sequence of the substituted base in the 3’ rather than in the 5’ direction, which corresponds to the preferred direction of charge migration. A weaker but still statistically significant correlation is found between the ionization potentials and the pathogenicity of the base substitutions. Moreover, pathogenicity is also preferentially associated with larger changes in ionization potentials upon base substitution. Conclusions With this analysis we gained new insights into the complex biophysical mechanisms that are at the basis of mutagenesis and pathogenicity, and supported the role of electron-hole transport in these matters. Electronic supplementary material The online version of this article (10.1186/s12864-019-5867-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Fabrizio Pucci
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Roosevelt Ave. 50, Bruxelles, 1050, Belgium.,John von Neumann Institute for Computing, Jülich Supercomputer Centre, Forschungszentrum Jülich, Jülich, 52428, Germany
| | - Marianne Rooman
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Roosevelt Ave. 50, Bruxelles, 1050, Belgium.
| |
Collapse
|
12
|
Krishnamurthy SR, Wang D. Origins and challenges of viral dark matter. Virus Res 2017; 239:136-142. [DOI: 10.1016/j.virusres.2017.02.002] [Citation(s) in RCA: 141] [Impact Index Per Article: 17.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2016] [Revised: 01/31/2017] [Accepted: 02/06/2017] [Indexed: 02/07/2023]
|
13
|
Tan B, Yang XL, Ge XY, Peng C, Liu HZ, Zhang YZ, Zhang LB, Shi ZL. Novel bat adenoviruses with low G+C content shed new light on the evolution of adenoviruses. J Gen Virol 2017; 98:739-748. [PMID: 28475035 DOI: 10.1099/jgv.0.000739] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Bats have been reported to carry diverse adenoviruses. However, most bat adenoviruses have been identified on the basis of partial genome sequences, and knowledge on the evolution of bat adenoviruses remains limited. In this study, we isolated and characterized four novel adenoviruses from two distinct bat species, and their full-length genomes were sequenced. Sequence analysis revealed that these isolates represented three distinct species of the genus Mastadenovirus. However, all isolates had an exceptionally low G+C content and relatively short genomes compared with other known mastadenoviruses. We further analysed the relationships among the G+C content, 5'-C-phosphate-G-3' (CpG) representation and genome size in the family Adenoviridae. Our results revealed that the CpG representation in adenoviral genomes depends primarily on the level of methylation, and the genome size displayed significant positive correlations with both G+C content and CpG representation. Since ancestral adenoviruses are believed to have contained short genomes, those probably had a low G+C content, similar to the genomes of these bat strains. Our results suggest that bats are important natural reservoirs for adenoviruses and play important roles in the evolution of adenoviruses.
Collapse
Affiliation(s)
- Bing Tan
- University of Chinese Academy of Sciences, Beijing, PR China.,CAS Key Laboratory of Special Pathogens and Biosafety, Center for Emerging Infectious Diseases of Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan, PR China
| | - Xing-Lou Yang
- CAS Key Laboratory of Special Pathogens and Biosafety, Center for Emerging Infectious Diseases of Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan, PR China
| | - Xing-Yi Ge
- CAS Key Laboratory of Special Pathogens and Biosafety, Center for Emerging Infectious Diseases of Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan, PR China
| | - Cheng Peng
- CAS Key Laboratory of Special Pathogens and Biosafety, Center for Emerging Infectious Diseases of Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan, PR China
| | - Hai-Zhou Liu
- CAS Key Laboratory of Special Pathogens and Biosafety, Center for Emerging Infectious Diseases of Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan, PR China
| | - Yun-Zhi Zhang
- Yunnan Provincial Key Laboratory for Zoonosis Control and Prevention, Yunnan Institute of Endemic Diseases Control and Prevention, Dali, PR China
| | - Li-Biao Zhang
- Guangdong Institute of Applied Biological Resource, Guangzhou, PR China
| | - Zheng-Li Shi
- University of Chinese Academy of Sciences, Beijing, PR China.,CAS Key Laboratory of Special Pathogens and Biosafety, Center for Emerging Infectious Diseases of Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan, PR China
| |
Collapse
|
14
|
Kakou B, Angers B, Glémet H. Extensive length variation in the ribosomal DNA intergenic spacer of yellow perch (Perca flavescens). Genome 2016; 59:149-58. [PMID: 26841134 DOI: 10.1139/gen-2015-0114] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
The intergenic spacer (IGS) is located between ribosomal RNA (rRNA) gene copies. Within the IGS, regulatory elements for rRNA gene transcription are found, as well as a varying number of other repetitive elements that are at the root of IGS length heterogeneity. This heterogeneity has been shown to have a functional significance through its effect on growth rate. Here, we present the structural organization of yellow perch (Perca flavescens) IGS based on its entire sequence, as well as the IGS length variation within a natural population. Yellow perch IGS structure has four discrete regions containing tandem repeat elements. For three of these regions, no specific length class was detected as allele size was seemingly normally distributed. However, for one repeat region, PCR amplification uncovered the presence of two distinctive IGS variants representing a length difference of 1116 bp. This repeat region was also devoid of any CpG sites despite a high GC content. Balanced selection may be holding the alleles in the population and would account for the high diversity of length variants observed for adjacent regions. Our study is an important precursor for further work aiming to assess the role of IGS length variation in influencing growth rate in fish.
Collapse
Affiliation(s)
- Bidénam Kakou
- a Département des sciences de l'environnement, Université du Québec à Trois-Rivières, Trois-Rivières, QC G9A 5H7, Canada
| | - Bernard Angers
- b Department of Biological Sciences, Université de Montréal, Montréal, QC H3C 3J7, Canada.,c GRIL - Groupe de recherche interuniversitaire en limnologie et en environnement aquatique
| | - Hélène Glémet
- a Département des sciences de l'environnement, Université du Québec à Trois-Rivières, Trois-Rivières, QC G9A 5H7, Canada.,c GRIL - Groupe de recherche interuniversitaire en limnologie et en environnement aquatique
| |
Collapse
|
15
|
Upadhyay M, Vivekanandan P. Depletion of CpG Dinucleotides in Papillomaviruses and Polyomaviruses: A Role for Divergent Evolutionary Pressures. PLoS One 2015; 10:e0142368. [PMID: 26544572 PMCID: PMC4636234 DOI: 10.1371/journal.pone.0142368] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2015] [Accepted: 10/21/2015] [Indexed: 12/31/2022] Open
Abstract
Background Papillomaviruses and polyomaviruses are small ds-DNA viruses infecting a wide-range of vertebrate hosts. Evidence supporting co-evolution of the virus with the host does not fully explain the evolutionary path of papillomaviruses and polyomaviruses. Studies analyzing CpG dinucleotide frequencies in virus genomes have provided interesting insights on virus evolution. CpG dinucleotide depletion has not been extensively studied among papillomaviruses and polyomaviruses. We sought to analyze the relative abundance of dinucleotides and the relative roles of evolutionary pressures in papillomaviruses and polyomaviruses. Methods We studied 127 full-length sequences from papillomaviruses and 56 full-length sequences from polyomaviruses. We analyzed the relative abundance of dinucleotides, effective codon number (ENC), differences in synonymous codon usage. We examined the association, if any, between the extent of CpG dinucleotide depletion and the evolutionary lineage of the infected host. We also investigated the contribution of mutational pressure and translational selection to the evolution of papillomaviruses and polyomaviruses. Results All papillomaviruses and polyomaviruses are CpG depleted. Interestingly, the evolutionary lineage of the infected host determines the extent of CpG depletion among papillomaviruses and polyomaviruses. CpG dinucleotide depletion was more pronounced among papillomaviruses and polyomaviruses infecting human and other mammals as compared to those infecting birds. Our findings demonstrate that CpG depletion among papillomaviruses is linked to mutational pressure; while CpG depletion among polyomaviruses is linked to translational selection. We also present evidence that suggests methylation of CpG dinucleotides may explain, at least in part, the depletion of CpG dinucleotides among papillomaviruses but not polyomaviruses. Conclusions The extent of CpG depletion among papillomaviruses and polyomaviruses is linked to the evolutionary lineage of the infected host. Our results highlight the existence of divergent evolutionary pressures leading to CpG dinucleotide depletion among small ds-DNA viruses infecting vertebrate hosts.
Collapse
Affiliation(s)
- Mohita Upadhyay
- Kusuma School of Biological Sciences, Indian Institute of Technology Delhi, New Delhi, 006, India
| | - Perumal Vivekanandan
- Kusuma School of Biological Sciences, Indian Institute of Technology Delhi, New Delhi, 006, India
- * E-mail:
| |
Collapse
|
16
|
Sankar S, Upadhyay M, Ramamurthy M, Vadivel K, Sagadevan K, Nandagopal B, Vivekanandan P, Sridharan G. Novel Insights on Hantavirus Evolution: The Dichotomy in Evolutionary Pressures Acting on Different Hantavirus Segments. PLoS One 2015; 10:e0133407. [PMID: 26193652 PMCID: PMC4508033 DOI: 10.1371/journal.pone.0133407] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2015] [Accepted: 06/26/2015] [Indexed: 01/01/2023] Open
Abstract
Background Hantaviruses are important emerging zoonotic pathogens. The current understanding of hantavirus evolution is complicated by the lack of consensus on co-divergence of hantaviruses with their animal hosts. In addition, hantaviruses have long-term associations with their reservoir hosts. Analyzing the relative abundance of dinucleotides may shed new light on hantavirus evolution. We studied the relative abundance of dinucleotides and the evolutionary pressures shaping different hantavirus segments. Methods A total of 118 sequences were analyzed; this includes 51 sequences of the S segment, 43 sequences of the M segment and 23 sequences of the L segment. The relative abundance of dinucleotides, effective codon number (ENC), codon usage biases were analyzed. Standard methods were used to investigate the relative roles of mutational pressure and translational selection on the three hantavirus segments. Results All three segments of hantaviruses are CpG depleted. Mutational pressure is the predominant evolutionary force leading to CpG depletion among hantaviruses. Interestingly, the S segment of hantaviruses is GpU depleted and in contrast to CpG depletion, the depletion of GpU dinucleotides from the S segment is driven by translational selection. Our findings also suggest that mutational pressure is the primary evolutionary pressure acting on the S and the M segments of hantaviruses. While translational selection plays a key role in shaping the evolution of the L segment. Our findings highlight how different evolutionary pressures may contribute disproportionally to the evolution of the three hantavirus segments. These findings provide new insights on the current understanding of hantavirus evolution. Conclusions There is a dichotomy among evolutionary pressures shaping a) the relative abundance of different dinucleotides in hantavirus genomes b) the evolution of the three hantavirus segments.
Collapse
Affiliation(s)
- Sathish Sankar
- Sri Sakthi Amma Institute of Biomedical Research, Sri Narayani Hospital and Research Centre, Sripuram, Vellore, 632 055, Tamil Nadu, India
| | - Mohita Upadhyay
- Kusuma School of Biological Sciences, Indian Institute of Technology, New Delhi, 110 016, India
| | - Mageshbabu Ramamurthy
- Sri Sakthi Amma Institute of Biomedical Research, Sri Narayani Hospital and Research Centre, Sripuram, Vellore, 632 055, Tamil Nadu, India
| | - Kumaran Vadivel
- Sri Sakthi Amma Institute of Biomedical Research, Sri Narayani Hospital and Research Centre, Sripuram, Vellore, 632 055, Tamil Nadu, India
| | - Kalaiselvan Sagadevan
- Sri Sakthi Amma Institute of Biomedical Research, Sri Narayani Hospital and Research Centre, Sripuram, Vellore, 632 055, Tamil Nadu, India
| | - Balaji Nandagopal
- Sri Sakthi Amma Institute of Biomedical Research, Sri Narayani Hospital and Research Centre, Sripuram, Vellore, 632 055, Tamil Nadu, India
| | - Perumal Vivekanandan
- Kusuma School of Biological Sciences, Indian Institute of Technology, New Delhi, 110 016, India
- * E-mail:
| | - Gopalan Sridharan
- Sri Sakthi Amma Institute of Biomedical Research, Sri Narayani Hospital and Research Centre, Sripuram, Vellore, 632 055, Tamil Nadu, India
| |
Collapse
|
17
|
Wallberg A, Glémin S, Webster MT. Extreme recombination frequencies shape genome variation and evolution in the honeybee, Apis mellifera. PLoS Genet 2015; 11:e1005189. [PMID: 25902173 PMCID: PMC4406589 DOI: 10.1371/journal.pgen.1005189] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2014] [Accepted: 04/01/2015] [Indexed: 01/10/2023] Open
Abstract
Meiotic recombination is a fundamental cellular process, with important consequences for evolution and genome integrity. However, we know little about how recombination rates vary across the genomes of most species and the molecular and evolutionary determinants of this variation. The honeybee, Apis mellifera, has extremely high rates of meiotic recombination, although the evolutionary causes and consequences of this are unclear. Here we use patterns of linkage disequilibrium in whole genome resequencing data from 30 diploid honeybees to construct a fine-scale map of rates of crossing over in the genome. We find that, in contrast to vertebrate genomes, the recombination landscape is not strongly punctate. Crossover rates strongly correlate with levels of genetic variation, but not divergence, which indicates a pervasive impact of selection on the genome. Germ-line methylated genes have reduced crossover rate, which could indicate a role of methylation in suppressing recombination. Controlling for the effects of methylation, we do not infer a strong association between gene expression patterns and recombination. The site frequency spectrum is strongly skewed from neutral expectations in honeybees: rare variants are dominated by AT-biased mutations, whereas GC-biased mutations are found at higher frequencies, indicative of a major influence of GC-biased gene conversion (gBGC), which we infer to generate an allele fixation bias 5 – 50 times the genomic average estimated in humans. We uncover further evidence that this repair bias specifically affects transitions and favours fixation of CpG sites. Recombination, via gBGC, therefore appears to have profound consequences on genome evolution in honeybees and interferes with the process of natural selection. These findings have important implications for our understanding of the forces driving molecular evolution. Evolution results from changes in allele frequencies in populations. The main forces that cause such changes are natural selection and random genetic drift. However, an additional process, GC-biased gene conversion (gBGC), associated with meiotic recombination, affects the probability that alleles are passed from one generation to the next. The honeybee, Apis mellifera, has extremely high recombination rates—more than 20 times to those observed in humans. However, the reason for this is unknown and the effects of such high recombination rates on evolution are not well understood. Here we use patterns of genetic variation in the genomes of 30 honeybees to infer variation in the rate of recombination across the genome. We find that recombination rates and levels of genetic variation are strongly correlated, which is indicative of a pervasive impact of natural selection on genetic variation. We also infer a major role of DNA methylation in determining recombination rates in genes. Patterns of genetic variation appear to be strongly skewed due to the effects of gBGC, suggesting that recombination generates a bias in transmission of alleles during meiosis. This process seems to be interfering with the efficacy of selection at removing deleterious alleles and favouring beneficial ones. Recombination therefore has a huge impact on genetic variation and evolution in honeybees and appears to play a dominant role in genome evolution.
Collapse
Affiliation(s)
- Andreas Wallberg
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Sylvain Glémin
- Institut des Sciences de l’Evolution (ISEM—UMR 5554 Université de Montpellier-CNRS-IRD-EPHE), France
- Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| | - Matthew T. Webster
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
- * E-mail:
| |
Collapse
|
18
|
Necessary relations for nucleotide frequencies. J Theor Biol 2015; 374:179-82. [PMID: 25843217 DOI: 10.1016/j.jtbi.2015.03.025] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2014] [Revised: 02/01/2015] [Accepted: 03/21/2015] [Indexed: 11/21/2022]
Abstract
Genome composition analysis of di-, tri- and tetra-nucleotide frequencies is known to be evolutionarily informative, and useful in metagenomic studies, where binning of raw sequence data is often an important first step. Patterns appearing in genome composition analysis may be due to evolutionary processes or purely mathematical relations. For example, the total number of dinucleotides in a sequence is equal to the sum of the individual totals of the sixteen types of dinucleotide, and this is entirely independent of any assumptions made regarding mutation or selection, or indeed any physical or chemical process. Before any statistical analysis can be attempted, a knowledge of all necessary mathematical relations is required. I show that 25% of di-, tri- and tetra-nucleotide frequencies can be written as simple sums and differences of the remainder. The vast majority of organisms have circular genomes, for which these relations are exact and necessary. In the case of linear molecules, the absolute error is very nearly zero, and does not grow with contiguous sequence length. As a result of the new, necessary relations presented here, the foundations of the statistical analysis of di-, tri- and tetra-nucleotide frequencies, and k-mer analysis in general, need to be revisited.
Collapse
|
19
|
Evolutionary consequences of DNA methylation on the GC content in vertebrate genomes. G3-GENES GENOMES GENETICS 2015; 5:441-7. [PMID: 25591920 PMCID: PMC4349097 DOI: 10.1534/g3.114.015545] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
The genomes of many vertebrates show a characteristic variation in GC content. To explain its origin and evolution, mainly three mechanisms have been proposed: selection for GC content, mutation bias, and GC-biased gene conversion. At present, the mechanism of GC-biased gene conversion, i.e., short-scale, unidirectional exchanges between homologous chromosomes in the neighborhood of recombination-initiating double-strand breaks in favor for GC nucleotides, is the most widely accepted hypothesis. We here suggest that DNA methylation also plays an important role in the evolution of GC content in vertebrate genomes. To test this hypothesis, we investigated one mammalian (human) and one avian (chicken) genome. We used bisulfite sequencing to generate a whole-genome methylation map of chicken sperm and made use of a publicly available whole-genome methylation map of human sperm. Inclusion of these methylation maps into a model of GC content evolution provided significant support for the impact of DNA methylation on the local equilibrium GC content. Moreover, two different estimates of equilibrium GC content, one that neglects and one that incorporates the impact of DNA methylation and the concomitant CpG hypermutability, give estimates that differ by approximately 15% in both genomes, arguing for a strong impact of DNA methylation on the evolution of GC content. Thus, our results put forward that previous estimates of equilibrium GC content, which neglect the hypermutability of CpG dinucleotides, need to be reevaluated.
Collapse
|
20
|
Phylogeny and evolution of RNA structure. Methods Mol Biol 2014. [PMID: 24639167 DOI: 10.1007/978-1-62703-709-9_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
Abstract
Darwin's conviction that all living beings on Earth are related and the graph of relatedness is tree-shaped has been essentially confirmed by phylogenetic reconstruction first from morphology and later from data obtained by molecular sequencing. Limitations of the phylogenetic tree concept were recognized as more and more sequence information became available. The other path-breaking idea of Darwin, natural selection of fitter variants in populations, is cast into simple mathematical form and extended to mutation-selection dynamics. In this form the theory is directly applicable to RNA evolution in vitro and to virus evolution. Phylogeny and population dynamics of RNA provide complementary insights into evolution and the interplay between the two concepts will be pursued throughout this chapter. The two strategies for understanding evolution are ultimately related through the central paradigm of structural biology: sequence ⇒ structure ⇒ function. We elaborate on the state of the art in modeling both phylogeny and evolution of RNA driven by reproduction and mutation. Thereby the focus will be laid on models for phylogenetic sequence evolution as well as evolution and design of RNA structures with selected examples and notes on simulation methods. In the perspectives an attempt is made to combine molecular structure, population dynamics, and phylogeny in modeling evolution.
Collapse
|
21
|
Atkinson NJ, Witteveldt J, Evans DJ, Simmonds P. The influence of CpG and UpA dinucleotide frequencies on RNA virus replication and characterization of the innate cellular pathways underlying virus attenuation and enhanced replication. Nucleic Acids Res 2014; 42:4527-45. [PMID: 24470146 PMCID: PMC3985648 DOI: 10.1093/nar/gku075] [Citation(s) in RCA: 137] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Most RNA viruses infecting mammals and other vertebrates show profound suppression of CpG and UpA dinucleotide frequencies. To investigate this functionally, mutants of the picornavirus, echovirus 7 (E7), were constructed with altered CpG and UpA compositions in two 1.1–1.3 Kbase regions. Those with increased frequencies of CpG and UpA showed impaired replication kinetics and higher RNA/infectivity ratios compared with wild-type virus. Remarkably, mutants with CpGs and UpAs removed showed enhanced replication, larger plaques and rapidly outcompeted wild-type virus on co-infections. Luciferase-expressing E7 sub-genomic replicons with CpGs and UpAs removed from the reporter gene showed 100-fold greater luminescence. E7 and mutants were equivalently sensitive to exogenously added interferon-β, showed no evidence for differential recognition by ADAR1 or pattern recognition receptors RIG-I, MDA5 or PKR. However, kinase inhibitors roscovitine and C16 partially or entirely reversed the attenuated phenotype of high CpG and UpA mutants, potentially through inhibition of currently uncharacterized pattern recognition receptors that respond to RNA composition. Generating viruses with enhanced replication kinetics has applications in vaccine production and reporter gene construction. More fundamentally, the findings introduce a new evolutionary paradigm where dinucleotide composition of viral genomes is subjected to selection pressures independently of coding capacity and profoundly influences host–pathogen interactions.
Collapse
Affiliation(s)
- Nicky J Atkinson
- Infection and Immunity Division, Roslin Institute, University of Edinburgh, Easter Bush, Edinburgh EH25 9RG, UK and School of Life Sciences, University of Warwick, Coventry CV4 7AL, UK
| | | | | | | |
Collapse
|
22
|
CpG dinucleotide frequencies reveal the role of host methylation capabilities in parvovirus evolution. J Virol 2013; 87:13816-24. [PMID: 24109231 DOI: 10.1128/jvi.02515-13] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Parvoviruses are rapidly evolving viruses that infect a wide range of hosts, including vertebrates and invertebrates. Extensive methylation of the parvovirus genome has been recently demonstrated. A global pattern of methylation of CpG dinucleotides is seen in vertebrate genomes, compared to "fractional" methylation patterns in invertebrate genomes. It remains unknown if the loss of CpG dinucleotides occurs in all viruses of a given DNA virus family that infect host species spanning across vertebrates and invertebrates. We investigated the link between the extent of CpG dinucleotide depletion among autonomous parvoviruses and the evolutionary lineage of the infected host. We demonstrate major differences in the relative abundance of CpG dinucleotides among autonomous parvoviruses which share similar genome organization and common ancestry, depending on the infected host species. Parvoviruses infecting vertebrate hosts had significantly lower relative abundance of CpG dinucleotides than parvoviruses infecting invertebrate hosts. The strong correlation of CpG dinucleotide depletion with the gain in TpG/CpA dinucleotides and the loss of TpA dinucleotides among parvoviruses suggests a major role for CpG methylation in the evolution of parvoviruses. Our data present evidence that links the relative abundance of CpG dinucleotides in parvoviruses to the methylation capabilities of the infected host. In sum, our findings support a novel perspective of host-driven evolution among autonomous parvoviruses.
Collapse
|
23
|
Simmonds P, Xia W, Baillie JK, McKinnon K. Modelling mutational and selection pressures on dinucleotides in eukaryotic phyla--selection against CpG and UpA in cytoplasmically expressed RNA and in RNA viruses. BMC Genomics 2013; 14:610. [PMID: 24020411 PMCID: PMC3829696 DOI: 10.1186/1471-2164-14-610] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2013] [Accepted: 09/04/2013] [Indexed: 11/10/2022] Open
Abstract
Background Loss of CpG dinucleotides in genomic DNA through methylation-induced mutation is characteristic of vertebrates and plants. However, these and other eukaryotic phyla show a range of other dinucleotide frequency biases with currently uncharacterized underlying mutational or selection mechanisms. We developed a parameterized Markov process to identify what neighbour context-dependent mutations best accounted for patterns of dinucleotide frequency biases in genomic and cytoplasmically expressed mRNA sequences of different vertebrates, other eukaryotic groups and RNA viruses that infect them. Results Consistently, 11- to 14-fold greater frequencies of the methylation-associated mutation of C to T upstream of G (depicted as C→T,G) than other transitions best modelled dinucleotide frequencies in mammalian genomic DNA. However, further mutations such as G→T,T (5-fold greater than the default transversion rate) were required to account for the full spectrum of dinucleotide frequencies in mammalian sequence datasets. Consistent with modeling predictions for these two mutations, instability of both CpG and CpT dinucleotides was identified through SNP frequency analysis of human DNA sequences. Different sets of context-dependent mutations were modelled in other eukaryotes with non-methylated genomic DNA. In contrast to genomic DNA, best-fit models of dinucleotide frequencies in transcribed RNA sequences expressed in the cytoplasm from all organisms were dominated by mutations that eliminated UpA dinucleotides, observations consistent with cytoplasmically driven selection for mRNA stability. Surprisingly, mRNA sequences from organisms with methylated genomes showed evidence for additional selection against CpG through further context-dependent mutations (eg. C→A,G). Similar mutation or selection processes were identified among single-stranded mammalian RNA viruses; these potentially account for their previously described but unexplained under-representations of CpG and UpA dinucleotides. Conclusions Methods we have developed identify mutational processes and selection pressures in organisms that provide new insights into nucleotide compositional constraints and a wealth of biochemical and evolutionarily testable predictions for the future.
Collapse
Affiliation(s)
- Peter Simmonds
- Division of Infection and Immunity, Roslin Institute, University of Edinburgh, Easter Bush, Edinburgh EH25 9RG, UK.
| | | | | | | |
Collapse
|
24
|
Lechner M, Marz M, Ihling C, Sinz A, Stadler PF, Krauss V. The correlation of genome size and DNA methylation rate in metazoans. Theory Biosci 2012; 132:47-60. [PMID: 23132463 DOI: 10.1007/s12064-012-0167-y] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2012] [Accepted: 10/03/2012] [Indexed: 12/12/2022]
Abstract
Total DNA methylation rates are well known to vary widely between different metazoans. The phylogenetic distribution of this variation, however, has not been investigated systematically. We combine here publicly available data on methylcytosine content with the analysis of nucleotide compositions of genomes and transcriptomes of 78 metazoan species to trace the evolution of abundance and distribution of DNA methylation. The depletion of CpG and the associated enrichment of TpG and CpA dinucleotides are used to infer the intensity and localization of germline CpG methylation and to estimate its evolutionary dynamics. We observe a positive correlation of the relative methylation of CpG motifs with genome size. We tested this trend successfully by measuring total DNA methylation with LC/MS in orthopteran insects with very different genome sizes: house crickets, migratory locusts and meadow grasshoppers. We hypothesize that the observed correlation between methylation rate and genome size is due to a dependence of both variables from long-term effective population size and is driven by the accumulation of repetitive sequences that are typically methylated during periods of small population sizes. This process may result in generally methylated, large genomes such as those of jawed vertebrates. In this case, the emergence of a novel demethylation pathway and of novel reader proteins for methylcytosine may have enabled the usage of cytosine methylation for promoter-based gene regulation. On the other hand, persistently large populations may lead to a compression of the genome and to the loss of the DNA methylation machinery, as observed, e.g., in nematodes.
Collapse
Affiliation(s)
- Marcus Lechner
- Institut für Pharmazeutische Chemie, Philipps-Universität Marburg, Marbacher Weg 6, 35037, Marburg, Germany.
| | | | | | | | | | | |
Collapse
|
25
|
Berglund J, Nevalainen EM, Molin AM, Perloski M, André C, Zody MC, Sharpe T, Hitte C, Lindblad-Toh K, Lohi H, Webster MT. Novel origins of copy number variation in the dog genome. Genome Biol 2012; 13:R73. [PMID: 22916802 PMCID: PMC4053742 DOI: 10.1186/gb-2012-13-8-r73] [Citation(s) in RCA: 62] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2012] [Accepted: 08/23/2012] [Indexed: 11/10/2022] Open
Abstract
Background Copy number variants (CNVs) account for substantial variation between genomes and are a major source of normal and pathogenic phenotypic differences. The dog is an ideal model to investigate mutational mechanisms that generate CNVs as its genome lacks a functional ortholog of the PRDM9 gene implicated in recombination and CNV formation in humans. Here we comprehensively assay CNVs using high-density array comparative genomic hybridization in 50 dogs from 17 dog breeds and 3 gray wolves. Results We use a stringent new method to identify a total of 430 high-confidence CNV loci, which range in size from 9 kb to 1.6 Mb and span 26.4 Mb, or 1.08%, of the assayed dog genome, overlapping 413 annotated genes. Of CNVs observed in each breed, 98% are also observed in multiple breeds. CNVs predicted to disrupt gene function are significantly less common than expected by chance. We identify a significant overrepresentation of peaks of GC content, previously shown to be enriched in dog recombination hotspots, in the vicinity of CNV breakpoints. Conclusions A number of the CNVs identified by this study are candidates for generating breed-specific phenotypes. Purifying selection seems to be a major factor shaping structural variation in the dog genome, suggesting that many CNVs are deleterious. Localized peaks of GC content appear to be novel sites of CNV formation in the dog genome by non-allelic homologous recombination, potentially activated by the loss of PRDM9. These sequence features may have driven genome instability and chromosomal rearrangements throughout canid evolution.
Collapse
|
26
|
Falconnet M, Behrens S. Accurate estimations of evolutionary times in the context of strong CpG hypermutability. J Comput Biol 2012; 19:519-31. [PMID: 22468680 DOI: 10.1089/cmb.2011.0135] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
We consider the substitution model T92+CpG of DNA sequence evolution which takes into account the hypermutability of CpG dinucleotides, an effect that can be especially observed in vertebrate genomes. We provide an exact method to simulate the evolution of finite DNA sequences under this model and numerical procedures to infer evolutionary times in two cases: between an ancestral and a present sequence and between two homologous sequences. We show on simulated data that our new numerical method yields very accurate estimations of divergence times. In a context of strong CpG hypermutability, it clearly outperforms the classical estimation procedure that is solely based on the model T92 without CpG influence. Supplementary Material is available at www.liebertonline.com/cmb .
Collapse
Affiliation(s)
- Mikael Falconnet
- Institute for Evolution and Biodiversity, Westfälische Wilhelms-Universität, Münster, Germany.
| | | |
Collapse
|
27
|
UU/UA dinucleotide frequency reduction in coding regions results in increased mRNA stability and protein expression. Mol Ther 2012; 20:954-9. [PMID: 22434136 PMCID: PMC3345983 DOI: 10.1038/mt.2012.29] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
UU and UA dinucleotides are rare in mammalian genes and may offer natural selection against endoribonuclease-mediated mRNA decay. This study hypothesized that reducing UU and UA (UW) dinucleotides in the mRNA-coding sequence, including the codons and the dicodon boundaries, may promote resistance to mRNA decay, thereby increasing protein production. Indeed, protein expression from UW-reduced coding regions of enhanced green fluorescent protein (EGFP), luciferase, interferon-α, and hepatitis B surface antigen (HBsAg) was higher when compared to the wild-type protein expression. The steady-state level of UW-reduced EGFP mRNA was higher and the mRNA half-life was also longer. Ectopic expression of the endoribonuclease, RNase L, did not reduce the wild type or UW-reduced mRNA. A mutant form of the mRNA decay-promoting protein, tristetraprolin (TTP/ZFP36), which has a point mutation in the zinc-finger domain (C124R), was used. The wild-type EGFP mRNA but not the UW-reduced mRNA responded to the dominant negative action of the C124R ZFP36/TTP mutant. The results indicate the efficacy of the described rational approach to formulate a general scheme for boosting recombinant protein production in mammalian cells.
Collapse
|
28
|
Bérard J, Guéguen L. Accurate estimation of substitution rates with neighbor-dependent models in a phylogenetic context. Syst Biol 2012; 61:510-21. [PMID: 22331438 DOI: 10.1093/sysbio/sys024] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Most models and algorithms developed to perform statistical inference from DNA data make the assumption that substitution processes affecting distinct nucleotide sites are stochastically independent. This assumption ensures both mathematical and computational tractability but is in disagreement with observed data in many situations--one well-known example being CpG dinucleotide hypermutability in mammalian genomes. In this paper, we consider the class of RN95 + YpR substitution models, which allows neighbor-dependent effects--including CpG hypermutability--to be taken into account, through transitions between pyrimidine-purine dinucleotides. We show that it is possible to adapt inference methods originally developed under the assumption of independence between sites to RN95 + YpR models, using a mathematically rigorous framework provided by specific structural properties of this class of models. We assess how efficient this approach is at inferring the CpG hypermutability rate from aligned DNA sequences. The method is tested on simulated data and compared against several alternatives; the results suggest that it delivers a high degree of accuracy at a low computational cost. We then apply our method to an alignment of 10 DNA sequences from primate species. Model comparisons within the RN95 + YpR class show the importance of taking into account neighbor-dependent effects. An application of the method to the detection of hypomethylated islands is discussed.
Collapse
Affiliation(s)
- Jean Bérard
- Institut Camille Jordan, UMR CNRS 5208, Université Lyon 1, Villeurbanne F-69622 Cedex, Université de Lyon, Lyon 69003, France
| | | |
Collapse
|
29
|
Luo XL, Xu JG, Ye CY. Analysis of synonymous codon usage inShigella flexneri2a strain 301 and otherShigellaandEscherichia colistrains. Can J Microbiol 2011; 57:1016-23. [DOI: 10.1139/w11-095] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
In this study, we analysed synonymous codon usage in Shigella flexneri 2a strain 301 (Sf301) and performed a comparative analysis of synonymous codon usage patterns in Sf301 and other strains of Shigella and Escherichia coli . Although there was a significant variety in codon usage bias among different Sf301 genes, there was a slight but observable codon usage bias that could primarily be attributable to mutational pressure and translational selection. In addition, the relative abundance of dinucleotides in Sf301 was observed to be independent of the overall base composition but was still caused by differential mutational pressure; this also shaped codon usage. By comparing the relative synonymous codon usage values across different Shigella and E. coli strains, we suggested that the synonymous codon usage pattern in the Shigella genomes was strain specific. This study represents a comprehensive analysis of Shigella codon usage patterns and provides a basic understanding of the mechanisms underlying codon usage bias.
Collapse
Affiliation(s)
- Xue Lian Luo
- State Key Laboratory for Infectious Disease Prevention and Control, National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Changping, Beijing 102206, People’s Republic of China
| | - Jian Guo Xu
- State Key Laboratory for Infectious Disease Prevention and Control, National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Changping, Beijing 102206, People’s Republic of China
| | - Chang Yun Ye
- State Key Laboratory for Infectious Disease Prevention and Control, National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Changping, Beijing 102206, People’s Republic of China
| |
Collapse
|
30
|
Context-Dependent Evolutionary Models for Non-Coding Sequences: An Overview of Several Decades of Research and an Analysis of Laurasiatheria and Primate Evolution. Evol Biol 2011. [DOI: 10.1007/s11692-011-9139-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
31
|
Korzinov OM, Astakhova TV, Vlasov PK, Roytberg MA. Statistical analysis of DNA sequences in the neighborhood of splice sites. Mol Biol 2011. [DOI: 10.1134/s0026893308010202] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
32
|
Zeng J, Yi SV. DNA methylation and genome evolution in honeybee: gene length, expression, functional enrichment covary with the evolutionary signature of DNA methylation. Genome Biol Evol 2010; 2:770-80. [PMID: 20924039 PMCID: PMC2975444 DOI: 10.1093/gbe/evq060] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
A growing body of evidence suggests that DNA methylation is functionally divergent among different taxa. The recently discovered functional methylation system in the honeybee Apis mellifera presents an attractive invertebrate model system to study evolution and function of DNA methylation. In the honeybee, DNA methylation is mostly targeted toward transcription units (gene bodies) of a subset of genes. Here, we report an intriguing covariation of length and epigenetic status of honeybee genes. Hypermethylated and hypomethylated genes in honeybee are dramatically different in their lengths for both exons and introns. By analyzing orthologs in Drosophila melanogaster, Acyrthosiphonpisum, and Ciona intestinalis, we show genes that were short and long in the past are now preferentially situated in hyper- and hypomethylated classes respectively, in the honeybee. Moreover, we demonstrate that a subset of high-CpG genes are conspicuously longer than expected under the evolutionary relationship alone and that they are enriched in specific functional categories. We suggest that gene length evolution in the honeybee is partially driven by evolutionary forces related to regulation of gene expression, which in turn is associated with DNA methylation. However, lineage-specific patterns of gene length evolution suggest that there may exist additional forces underlying the observed interaction between DNA methylation and gene lengths in the honeybee.
Collapse
Affiliation(s)
- Jia Zeng
- School of Biology, Georgia Institute of Technology, USA
| | | |
Collapse
|
33
|
Abstract
The accumulation of base substitutions (mutations) not subject to natural selection is the neutral mutation rate. Because this rate reflects the in vivo processes involved in maintaining the integrity of genetic information, the factors that affect the neutral mutation rate are of considerable interest. Mammals exhibit two dramatically different neutral mutation rates: the CpG mutation rate, wherein the C of most CpGs (i.e., methyl-CpG) mutate at 10-50 times that of C in any other context or of any other base. The latter mutations constitute the non-CpG rate. The high CpG rate results from the spontaneous deamination of methyl-C to T and incomplete restoration of the ensuing T:G mismatches to C:Gs. Here, we determined the neutral non-CpG mutation rate as a function of CpG content by comparing sequence divergence of thousands of pairs of neutrally evolving chimpanzee and human orthologs that differ primarily in CpG content. Both the mutation rate and the mutational spectrum (transition/transversion ratio) of non-CpG residues change in parallel as sigmoidal (logistic) functions of CpG content. As different mechanisms generate transitions and transversions, these results indicate that both mutation rate and mutational processes are contingent on the local CpG content. We consider several possible mechanisms that might explain how CpG exerts these effects.
Collapse
Affiliation(s)
- Jean-Claude Walser
- Section on Genomic Structure and Function, Laboratory of Molecular and Cellular Biology, National Institute of Diabetes and Digestive and Kidney diseases, National Institutes of Health, Bethesda, Maryland 20892-0830, USA
| | | |
Collapse
|
34
|
Tatarinova TV, Alexandrov NN, Bouck JB, Feldmann KA. GC3 biology in corn, rice, sorghum and other grasses. BMC Genomics 2010; 11:308. [PMID: 20470436 PMCID: PMC2895627 DOI: 10.1186/1471-2164-11-308] [Citation(s) in RCA: 105] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2009] [Accepted: 05/16/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The third, or wobble, position in a codon provides a high degree of possible degeneracy and is an elegant fault-tolerance mechanism. Nucleotide biases between organisms at the wobble position have been documented and correlated with the abundances of the complementary tRNAs. We and others have noticed a bias for cytosine and guanine at the third position in a subset of transcripts within a single organism. The bias is present in some plant species and warm-blooded vertebrates but not in all plants, or in invertebrates or cold-blooded vertebrates. RESULTS Here we demonstrate that in certain organisms the amount of GC at the wobble position (GC3) can be used to distinguish two classes of genes. We highlight the following features of genes with high GC3 content: they (1) provide more targets for methylation, (2) exhibit more variable expression, (3) more frequently possess upstream TATA boxes, (4) are predominant in certain classes of genes (e.g., stress responsive genes) and (5) have a GC3 content that increases from 5'to 3'. These observations led us to formulate a hypothesis to explain GC3 bimodality in grasses. CONCLUSIONS Our findings suggest that high levels of GC3 typify a class of genes whose expression is regulated through DNA methylation or are a legacy of accelerated evolution through gene conversion. We discuss the three most probable explanations for GC3 bimodality: biased gene conversion, transcriptional and translational advantage and gene methylation.
Collapse
Affiliation(s)
- Tatiana V Tatarinova
- Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, USA.
| | | | | | | |
Collapse
|
35
|
Falconnet M. Phylogenetic distances for neighbour dependent substitution processes. Math Biosci 2010; 224:101-8. [PMID: 20064534 DOI: 10.1016/j.mbs.2009.12.010] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2008] [Revised: 12/24/2009] [Accepted: 12/31/2009] [Indexed: 10/20/2022]
Abstract
We consider models of nucleotidic substitution processes where the rate of substitution at a given site depends on the state of the neighbours of the site. We first estimate the time elapsed between an ancestral sequence at stationarity and a present sequence. Second, assuming that two sequences are issued from a common ancestral sequence at stationarity, we estimate the time since divergence. In the simplest non-trivial case of a Jukes-Cantor model with CpG influence, we provide and justify mathematically consistent estimators in these two settings. We also provide asymptotic confidence intervals, valid for nucleotidic sequences of finite length, and we compute explicit formulas for the estimators and for their confidence intervals. In the general case of an RN model with YpR influence, we extend these results under a proviso, namely that the equation defining the estimator has a unique solution.
Collapse
Affiliation(s)
- Mikael Falconnet
- Université Joseph Fourier Grenoble 1, Institut Fourier UMR 5582 UJF-CNRS, 100 rue des Maths, BP 74, 38402 Saint Martin d'Hères, France.
| |
Collapse
|
36
|
Audit B, Zaghloul L, Vaillant C, Chevereau G, d'Aubenton-Carafa Y, Thermes C, Arneodo A. Open chromatin encoded in DNA sequence is the signature of 'master' replication origins in human cells. Nucleic Acids Res 2009; 37:6064-75. [PMID: 19671527 PMCID: PMC2764438 DOI: 10.1093/nar/gkp631] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
For years, progress in elucidating the mechanisms underlying replication initiation and its coupling to transcriptional activities and to local chromatin structure has been hampered by the small number (approximately 30) of well-established origins in the human genome and more generally in mammalian genomes. Recent in silico studies of compositional strand asymmetries revealed a high level of organization of human genes around 1000 putative replication origins. Here, by comparing with recently experimentally identified replication origins, we provide further support that these putative origins are active in vivo. We show that regions approximately 300-kb wide surrounding most of these putative replication origins that replicate early in the S phase are hypersensitive to DNase I cleavage, hypomethylated and present a significant enrichment in genomic energy barriers that impair nucleosome formation (nucleosome-free regions). This suggests that these putative replication origins are specified by an open chromatin structure favored by the DNA sequence. We discuss how this distinctive attribute makes these origins, further qualified as 'master' replication origins, priviledged loci for future research to decipher the human spatio-temporal replication program. Finally, we argue that these 'master' origins are likely to play a key role in genome dynamics during evolution and in pathological situations.
Collapse
|
37
|
Lemaitre C, Zaghloul L, Sagot MF, Gautier C, Arneodo A, Tannier E, Audit B. Analysis of fine-scale mammalian evolutionary breakpoints provides new insight into their relation to genome organisation. BMC Genomics 2009; 10:335. [PMID: 19630943 PMCID: PMC2722678 DOI: 10.1186/1471-2164-10-335] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2009] [Accepted: 07/24/2009] [Indexed: 11/21/2022] Open
Abstract
Background The Intergenic Breakage Model, which is the current model of structural genome evolution, considers that evolutionary rearrangement breakages happen with a uniform propensity along the genome but are selected against in genes, their regulatory regions and in-between. However, a growing body of evidence shows that there exists regions along mammalian genomes that present a high susceptibility to breakage. We reconsidered this question taking advantage of a recently published methodology for the precise detection of rearrangement breakpoints based on pairwise genome comparisons. Results We applied this methodology between the genome of human and those of five sequenced eutherian mammals which allowed us to delineate evolutionary breakpoint regions along the human genome with a finer resolution (median size 26.6 kb) than obtained before. We investigated the distribution of these breakpoints with respect to genome organisation into domains of different activity. In agreement with the Intergenic Breakage Model, we observed that breakpoints are under-represented in genes. Surprisingly however, the density of breakpoints in small intergenes (1 per Mb) appears significantly higher than in gene deserts (0.1 per Mb). More generally, we found a heterogeneous distribution of breakpoints that follows the organisation of the genome into isochores (breakpoints are more frequent in GC-rich regions). We then discuss the hypothesis that regions with an enhanced susceptibility to breakage correspond to regions of high transcriptional activity and replication initiation. Conclusion We propose a model to describe the heterogeneous distribution of evolutionary breakpoints along human chromosomes that combines natural selection and a mutational bias linked to local open chromatin state.
Collapse
Affiliation(s)
- Claire Lemaitre
- Université de Bordeaux, Centre de Bioinformatique - Génomique Fonctionnelle Bordeaux, F-33000 Bordeaux, France.
| | | | | | | | | | | | | |
Collapse
|
38
|
Elango N, Hunt BG, Goodisman MAD, Yi SV. DNA methylation is widespread and associated with differential gene expression in castes of the honeybee, Apis mellifera. Proc Natl Acad Sci U S A 2009; 106:11206-11. [PMID: 19556545 PMCID: PMC2708677 DOI: 10.1073/pnas.0900301106] [Citation(s) in RCA: 229] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2009] [Indexed: 11/18/2022] Open
Abstract
The recent, unexpected discovery of a functional DNA methylation system in the genome of the social bee Apis mellifera underscores the potential importance of DNA methylation in invertebrates. The extent of genomic DNA methylation and its role in A. mellifera remain unknown, however. Here we show that genes in A. mellifera can be divided into 2 distinct classes, one with low-CpG dinucleotide content and the other with high-CpG dinucleotide content. This dichotomy is explained by the gradual depletion of CpG dinucleotides, a well-known consequence of DNA methylation. The loss of CpG dinucleotides associated with DNA methylation also may explain the unusual mutational patterns seen in A. mellifera that lead to AT-rich regions of the genome. A detailed investigation of this dichotomy implicates DNA methylation in A. mellifera development. High-CpG genes, which are predicted to be hypomethylated in germlines, are enriched with functions associated with developmental processes, whereas low-CpG genes, predicted to be hypermethylated in germlines, are enriched with functions associated with basic biological processes. Furthermore, genes more highly expressed in one caste than another are overrepresented among high-CpG genes. Our results highlight the potential significance of epigenetic modifications, such as DNA methylation, in developmental processes in social insects. In particular, the pervasiveness of DNA methylation in the genome of A. mellifera provides fertile ground for future studies of phenotypic plasticity and genomic imprinting.
Collapse
Affiliation(s)
- Navin Elango
- School of Biology, Georgia Institute of Technology, Atlanta, GA 30332
| | - Brendan G. Hunt
- School of Biology, Georgia Institute of Technology, Atlanta, GA 30332
| | | | - Soojin V. Yi
- School of Biology, Georgia Institute of Technology, Atlanta, GA 30332
| |
Collapse
|
39
|
Nakken S, Rødland EA, Rognes T, Hovig E. Large-scale inference of the point mutational spectrum in human segmental duplications. BMC Genomics 2009; 10:43. [PMID: 19161616 PMCID: PMC2640414 DOI: 10.1186/1471-2164-10-43] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2008] [Accepted: 01/22/2009] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND Recent segmental duplications are relatively large (> or = 1 kb) genomic regions of high sequence identity (> or = 90%). They cover approximately 4-5% of the human genome and play important roles in gene evolution and genomic disease. The DNA sequence differences between copies of a segmental duplication represent the result of various mutational events over time, since any two duplication copies originated from the same ancestral DNA sequence. Based on this fact, we have developed a computational scheme for inference of point mutational events in human segmental duplications, which we collectively term duplication-inferred mutations (DIMs). We have characterized these nucleotide substitutions by comparing them with high-quality SNPs from dbSNP, both in terms of sequence context and frequency of substitution types. RESULTS Overall, DIMs show a lower ratio of transitions relative to transversions than SNPs, although this ratio approaches that of SNPs when considering DIMs within most recent duplications. Our findings indicate that DIMs and SNPs in general are caused by similar mutational mechanisms, with some deviances at the CpG dinucleotide. Furthermore, we discover a large number of reference SNPs that coincide with computationally inferred DIMs. The latter reflects how sequence variation in duplicated sequences can be misinterpreted as ordinary allelic variation. CONCLUSION In summary, we show how DNA sequence analysis of segmental duplications can provide a genome-wide mutational spectrum that mirrors recent genome evolution. The inferred set of nucleotide substitutions represents a valuable complement to SNPs for the analysis of genetic variation and point mutagenesis.
Collapse
Affiliation(s)
- Sigve Nakken
- Department of Informatics, University of Oslo, PO Box 1080 Blindern, NO-0316 Oslo, Norway.
| | | | | | | |
Collapse
|
40
|
Borštnik B, Oblak B, Pumpernik D. The Evolutionary Constraints in Mutational Replacements. Evol Biol 2009. [DOI: 10.1007/978-3-642-00952-5_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
41
|
Baele G, Van de Peer Y, Vansteelandt S. A model-based approach to study nearest-neighbor influences reveals complex substitution patterns in non-coding sequences. Syst Biol 2008; 57:675-92. [PMID: 18853356 DOI: 10.1080/10635150802422324] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
In this article, we present a likelihood-based framework for modeling site dependencies. Our approach builds upon standard evolutionary models but incorporates site dependencies across the entire tree by letting the evolutionary parameters in these models depend upon the ancestral states at the neighboring sites. It thus avoids the need for introducing new and high-dimensional evolutionary models for site-dependent evolution. We propose a Markov chain Monte Carlo approach with data augmentation to infer the evolutionary parameters under our model. Although our approach allows for wide-ranging site dependencies, we illustrate its use, in two non-coding datasets, in the case of nearest-neighbor dependencies (i.e., evolution directly depending only upon the immediate flanking sites). The results reveal that the general time-reversible model with nearest-neighbor dependencies substantially improves the fit to the data as compared to the corresponding model with site independence. Using the parameter estimates from our model, we elaborate on the importance of the 5-methylcytosine deamination process (i.e., the CpG effect) and show that this process also depends upon the 5' neighboring base identity. We hint at the possibility of a so-called TpA effect and show that the observed substitution behavior is very complex in the light of dinucleotide estimates. We also discuss the presence of CpG effects in a nuclear small subunit dataset and find significant evidence that evolutionary models incorporating context-dependent effects perform substantially better than independent-site models and in some cases even outperform models that incorporate varying rates across sites.
Collapse
Affiliation(s)
- Guy Baele
- Department of Applied Mathematics and Computer Science, Ghent University, Ghent, Belgium
| | | | | |
Collapse
|
42
|
Walser JC, Ponger L, Furano AV. CpG dinucleotides and the mutation rate of non-CpG DNA. Genome Res 2008; 18:1403-14. [PMID: 18550801 DOI: 10.1101/gr.076455.108] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
The neutral mutation rate is equal to the base substitution rate when the latter is not affected by natural selection. Differences between these rates may reveal that factors such as natural selection, linkage, or a mutator locus are affecting a given sequence. We examined the neutral base substitution rate by measuring the sequence divergence of approximately 30,000 pairs of inactive orthologous L1 retrotransposon sequences interspersed throughout the human and chimpanzee genomes. In contrast to other studies, we related ortholog divergence to the time (age) that the L1 sequences resided in the genome prior to the chimpanzee and human speciation. As expected, the younger orthologs contained more hypermutable CpGs than the older ones because of their conversion to TpGs (and CpAs). Consequently, the younger orthologs accumulated more CpG mutations than the older ones during the approximately 5 million years since the human and chimpanzee lineages separated. But during this same time, the younger orthologs also accumulated more non-CpG mutations than the older ones. In fact, non-CpG and CpG mutations showed an almost perfect (R2 = 0.98) correlation for approximately 97% of the ortholog pairs. The correlation is independent of G + C content, recombination rate, and chromosomal location. Therefore, it likely reflects an intrinsic effect of CpGs, or mutations thereof, on non-CpG DNA rather than the joint manifestation of the chromosomal environment. The CpG effect is not uniform for all regions of non-CpG DNA. Therefore, the mutation rate of non-CpG DNA is contingent to varying extents on local CpG content. Aside from their implications for mutational mechanisms, these results indicate that a precise determination of a uniform genome-wide neutral mutation rate may not be attainable.
Collapse
Affiliation(s)
- Jean-Claude Walser
- Section on Genomic Structure and Function, Laboratory of Molecular and Cellular Biology, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland 20892-0830, USA
| | | | | |
Collapse
|
43
|
Low ETL, Alias H, Boon SH, Shariff EM, Tan CYA, Ooi LCL, Cheah SC, Raha AR, Wan KL, Singh R. Oil palm (Elaeis guineensis Jacq.) tissue culture ESTs: identifying genes associated with callogenesis and embryogenesis. BMC PLANT BIOLOGY 2008; 8:62. [PMID: 18507865 PMCID: PMC2442076 DOI: 10.1186/1471-2229-8-62] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/21/2007] [Accepted: 05/29/2008] [Indexed: 05/21/2023]
Abstract
BACKGROUND Oil palm (Elaeis guineensis Jacq.) is one of the most important oil bearing crops in the world. However, genetic improvement of oil palm through conventional breeding is extremely slow and costly, as the breeding cycle can take up to 10 years. This has brought about interest in vegetative propagation of oil palm. Since the introduction of oil palm tissue culture in the 1970s, clonal propagation has proven to be useful, not only in producing uniform planting materials, but also in the development of the genetic engineering programme. Despite considerable progress in improving the tissue culture techniques, the callusing and embryogenesis rates from proliferating callus cultures remain very low. Thus, understanding the gene diversity and expression profiles in oil palm tissue culture is critical in increasing the efficiency of these processes. RESULTS A total of 12 standard cDNA libraries, representing three main developmental stages in oil palm tissue culture, were generated in this study. Random sequencing of clones from these cDNA libraries generated 17,599 expressed sequence tags (ESTs). The ESTs were analysed, annotated and assembled to generate 9,584 putative unigenes distributed in 3,268 consensi and 6,316 singletons. These unigenes were assigned putative functions based on similarity and gene ontology annotations. Cluster analysis, which surveyed the relatedness of each library based on the abundance of ESTs in each consensus, revealed that lipid transfer proteins were highly expressed in embryogenic tissues. A glutathione S-transferase was found to be highly expressed in non-embryogenic callus. Further analysis of the unigenes identified 648 non-redundant simple sequence repeats and 211 putative full-length open reading frames. CONCLUSION This study has provided an overview of genes expressed during oil palm tissue culture. Candidate genes with expression that are modulated during tissue culture were identified. However, in order to confirm whether these genes are suitable as early markers for embryogenesis, the genes need to be tested on earlier stages of tissue culture and a wider range of genotypes. This collection of ESTs is an important resource for genetic and genome analyses of the oil palm, particularly during tissue culture development.
Collapse
Affiliation(s)
- Eng-Ti L Low
- Advanced Biotechnology and Breeding Centre, Biology Division, Malaysian Palm Oil Board (MPOB), 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor DE, Malaysia
| | - Halimah Alias
- Advanced Biotechnology and Breeding Centre, Biology Division, Malaysian Palm Oil Board (MPOB), 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor DE, Malaysia
- Malaysia Genome Institute, Heliks Emas Block, UKM-MTDC Smart Technology Centre, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor DE, Malaysia
| | - Soo-Heong Boon
- Advanced Biotechnology and Breeding Centre, Biology Division, Malaysian Palm Oil Board (MPOB), 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor DE, Malaysia
- Asiatic Centre for Genome Technology Sdn Bhd (ACGT), Lot L3-I-1, Enterprise 4, Technology Park Malaysia, 57000 Kuala Lumpur, Malaysia
| | - Elyana M Shariff
- Advanced Biotechnology and Breeding Centre, Biology Division, Malaysian Palm Oil Board (MPOB), 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor DE, Malaysia
- Myagri Associates Sdn. Bhd., 25-2, Jalan Seri Putra 1/2, Bandar Seri Putra Bangi, 43000 Kajang, Selangor DE, Malaysia
| | - Chi-Yee A Tan
- Advanced Biotechnology and Breeding Centre, Biology Division, Malaysian Palm Oil Board (MPOB), 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor DE, Malaysia
- Thermo Fisher Scientific, 3, Jalan Sepadu 25/123, Taman Perindustrian Axis, Seksyen 25, 40400 Shah Alam, Selangor Darul Ehsan, Malaysia
| | - Leslie CL Ooi
- Advanced Biotechnology and Breeding Centre, Biology Division, Malaysian Palm Oil Board (MPOB), 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor DE, Malaysia
| | - Suan-Choo Cheah
- Advanced Biotechnology and Breeding Centre, Biology Division, Malaysian Palm Oil Board (MPOB), 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor DE, Malaysia
- Asiatic Centre for Genome Technology Sdn Bhd (ACGT), Lot L3-I-1, Enterprise 4, Technology Park Malaysia, 57000 Kuala Lumpur, Malaysia
| | - Abdul-Rahim Raha
- Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, 43300 UPM Serdang, Selangor DE, Malaysia
| | - Kiew-Lian Wan
- Malaysia Genome Institute, Heliks Emas Block, UKM-MTDC Smart Technology Centre, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor DE, Malaysia
- School of Biosciences and Biotechnology, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor DE, Malaysia
| | - Rajinder Singh
- Advanced Biotechnology and Breeding Centre, Biology Division, Malaysian Palm Oil Board (MPOB), 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor DE, Malaysia
| |
Collapse
|
44
|
Gesell T, Washietl S. Dinucleotide controlled null models for comparative RNA gene prediction. BMC Bioinformatics 2008; 9:248. [PMID: 18505553 PMCID: PMC2453142 DOI: 10.1186/1471-2105-9-248] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2008] [Accepted: 05/27/2008] [Indexed: 11/15/2022] Open
Abstract
Background Comparative prediction of RNA structures can be used to identify functional noncoding RNAs in genomic screens. It was shown recently by Babak et al. [BMC Bioinformatics. 8:33] that RNA gene prediction programs can be biased by the genomic dinucleotide content, in particular those programs using a thermodynamic folding model including stacking energies. As a consequence, there is need for dinucleotide-preserving control strategies to assess the significance of such predictions. While there have been randomization algorithms for single sequences for many years, the problem has remained challenging for multiple alignments and there is currently no algorithm available. Results We present a program called SISSIz that simulates multiple alignments of a given average dinucleotide content. Meeting additional requirements of an accurate null model, the randomized alignments are on average of the same sequence diversity and preserve local conservation and gap patterns. We make use of a phylogenetic substitution model that includes overlapping dependencies and site-specific rates. Using fast heuristics and a distance based approach, a tree is estimated under this model which is used to guide the simulations. The new algorithm is tested on vertebrate genomic alignments and the effect on RNA structure predictions is studied. In addition, we directly combined the new null model with the RNAalifold consensus folding algorithm giving a new variant of a thermodynamic structure based RNA gene finding program that is not biased by the dinucleotide content. Conclusion SISSIz implements an efficient algorithm to randomize multiple alignments preserving dinucleotide content. It can be used to get more accurate estimates of false positive rates of existing programs, to produce negative controls for the training of machine learning based programs, or as standalone RNA gene finding program. Other applications in comparative genomics that require randomization of multiple alignments can be considered. Availability SISSIz is available as open source C code that can be compiled for every major platform and downloaded here: .
Collapse
Affiliation(s)
- Tanja Gesell
- Center for Integrative Bioinformatics Vienna, Max F. Perutz Laboratories, Dr. Bohr-Gasse 9, A-1030 Vienna, Austria.
| | | |
Collapse
|
45
|
Simmen MW. Genome-scale relationships between cytosine methylation and dinucleotide abundances in animals. Genomics 2008; 92:33-40. [PMID: 18485662 DOI: 10.1016/j.ygeno.2008.03.009] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2008] [Accepted: 03/26/2008] [Indexed: 01/11/2023]
Abstract
In mammalian genomes CpGs occur at one-fifth their expected frequency. This is accepted as resulting from cytosine methylation and deamination of 5-methylcytosine leading to TpG and CpA dinucleotides. The corollary that a CpG deficit should correlate with TpG excess has not hitherto been systematically tested at a genomic level. I analyzed genome sequences (human, chimpanzee, mouse, pufferfish, zebrafish, sea squirt, fruitfly, mosquito, and nematode) to do this and generally to assess the hypothesis that CpG deficit, TpG excess, and other data are accountable in terms of 5-methylcytosine mutation. In all methylated genomes local CpG deficit decreases with higher G + C content. Local TpG surplus, while positively associated with G + C level in mammalian genomes but negatively associated with G + C in nonmammalian methylated genomes, is always explicable in terms of the CpG trend under the methylation model. Covariance of dinucleotide abundances with G + C demonstrates that correlation analyses should control for G + C. Doing this reveals a strong negative correlation between local CpG and TpG abundances in methylated genomes, in accord with the methylation hypothesis. CpG deficit also correlates with CpT excess in mammals, which may reflect enhanced cytosine mutation in the context 5'-YCG-3'. Analyses with repeat-masked sequences show that the results are not attributable to repetitive elements.
Collapse
Affiliation(s)
- Martin W Simmen
- School of Biomedical Sciences, University of Edinburgh, Edinburgh EH8 9XD, UK.
| |
Collapse
|
46
|
Elango N, Yi SV. DNA methylation and structural and functional bimodality of vertebrate promoters. Mol Biol Evol 2008; 25:1602-8. [PMID: 18469331 DOI: 10.1093/molbev/msn110] [Citation(s) in RCA: 82] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Human promoters divide into 2 classes, the low CpG (LCG) and the high CpG (HCG), based on their CpG dinucleotide content. The LCG class of promoters is hypermethylated and is associated with tissue-specific genes, whereas the HCG class is hypomethylated and associated with broadly expressed genes. By analyzing several chordate genomes separated for hundreds of millions of years, here we show that the divide between low CpG and high CpG promoters is conserved in several distantly related vertebrate taxa (including human, chicken, frog, lizard, and fish) but not in close invertebrate outgroups (sea squirts). Furthermore, LCG and HCG promoters are distinctively associated with tissue-specific and broadly expressed genes in these distantly related vertebrate taxa. Our results indicate that the function of DNA methylation on gene expression is conserved across these vertebrate taxa and suggest that the 2 classes of promoters have evolved early in vertebrate evolution, as a consequence of the advent of global DNA methylation.
Collapse
Affiliation(s)
- Navin Elango
- School of Biology, Georgia Institute of Technology, USA
| | | |
Collapse
|
47
|
Bérard J, Gouéré JB, Piau D. Solvable models of neighbor-dependent substitution processes. Math Biosci 2007; 211:56-88. [PMID: 18001806 DOI: 10.1016/j.mbs.2007.10.001] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2007] [Revised: 09/27/2007] [Accepted: 10/02/2007] [Indexed: 11/18/2022]
Abstract
We prove that a wide class of Markov models of neighbor-dependent substitution processes on the integer line is solvable. This class contains some models of nucleotidic substitutions recently introduced and studied empirically by molecular biologists. We show that the polynucleotidic frequencies at equilibrium solve some finite-size linear systems. This provides, for the first time up to our knowledge, explicit and algebraic formulas for the stationary frequencies of non-degenerate neighbor-dependent models of DNA substitutions. Furthermore, we show that the dynamics of these stochastic processes and their distribution at equilibrium exhibit some stringent, rather unexpected, independence properties. For example, nucleotidic sites at distance at least three evolve independently, and all the sites, when encoded as purines and pyrimidines, evolve independently.
Collapse
Affiliation(s)
- Jean Bérard
- Institut Camille Jordan - UMR 5208, Université Claude Bernard Lyon 1, 69622, Villeurbanne, France.
| | | | | |
Collapse
|
48
|
Dynamics of adipogenic promoter DNA methylation during clonal culture of human adipose stem cells to senescence. BMC Cell Biol 2007; 8:18. [PMID: 17535427 PMCID: PMC1892011 DOI: 10.1186/1471-2121-8-18] [Citation(s) in RCA: 89] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2007] [Accepted: 05/29/2007] [Indexed: 02/08/2023] Open
Abstract
Background Potential therapeutic use of mesenchymal stem cells (MSCs) is likely to require large-scale in vitro expansion of the cells before transplantation. MSCs from adipose tissue can be cultured extensively until senescence. However, little is known on the differentiation potential of adipose stem cells (ASCs) upon extended culture and on associated epigenetic alterations. We examined the adipogenic differentiation potential of clones of human ASCs in early passage culture and upon senescence, and determined whether senescence was associated with changes in adipogenic promoter DNA methylation. Results ASC clones cultured to senescence display reduced adipogenic differentiation capacity in vitro, on the basis of limited lipogenesis and reduced transcriptional upregulation of FABP4 and LPL, two adipogenic genes, while LEP and PPARG2 transcription remains unaffected. In undifferentiated senescent cells, PPARG2 and LPL expression is unaltered, whereas LEP and FABP4 transcript levels are increased but not in all clones. Bisulfite sequencing analysis of DNA methylation reveals overall relative stability of LEP, PPARG2, FABP4 and LPL promoter CpG methylation during senescence and upon differentiation. Mosaicism in methylation profiles is maintained between and within ASC clones, and any CpG-specific methylation change detected does not necessarily relate to differentiation potential. One exception to this contention is CpG No. 21 in the LEP promoter, whose senescence-related methylation may impair upregulation of the gene upon adipogenic stimulation. Conclusion Senescent ASCs display reduced in vitro differentiation ability and transcriptional activation of adipogenic genes upon differentiation induction. These restrictions, however, cannot in general be attributed to specific changes in DNA methylation at adipogenic promoters. There also seems to be a correlation between CpGs that are hypomethylated and important transcription factor binding sites.
Collapse
|
49
|
Gayral P, Caminade P, Boursot P, Galtier N. The evolutionary fate of recently duplicated retrogenes in mice. J Evol Biol 2007; 20:617-26. [PMID: 17305828 DOI: 10.1111/j.1420-9101.2006.01245.x] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Inferences about the evolutionary impact of gene duplications often rely on the analysis of their long-term outcome. The fate of the majority of them must, however, be decided shortly after duplication. Here we analysed the evolutionary pattern of 10 mouse genes very recently duplicated by retrotransposition, by sequencing the retroposed copy in five to 10 closely related mouse species. In all cases the retroposed copy experienced accelerated nonsynonymous evolution whereas the divergence pattern of the source copy appeared unaffected by the duplication, consistent with the neofunctionalization model. The analysis further revealed that most retrogenes, including pseudogenes, did not experience a period of relaxed neutral evolution, but have been submitted to purifying selection ever since their retroposition. We propose that these duplicates play a biochemical role but are not indispensable. Purifying selection prevents them from acquiring a negative role until they are lost or silenced. This period of unnecessary redundancy could in rare cases give the time for new functions to evolve.
Collapse
Affiliation(s)
- P Gayral
- CNRS UMR -Génome, Populations, Interactions, Adaptation, Université Montpellier, Montpellier, France.
| | | | | | | |
Collapse
|
50
|
Subramanian S, Kumar S. Higher intensity of purifying selection on >90% of the human genes revealed by the intrinsic replacement mutation rates. Mol Biol Evol 2006; 23:2283-7. [PMID: 16982819 PMCID: PMC3072915 DOI: 10.1093/molbev/msl123] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
For over 3 decades, the rate of replacement mutations has been assumed to be equal to, and estimated from, the rate of "strictly" neutral sequence divergence in noncoding regions and in silent-codon positions where mutations do not alter the amino acid encoded. This assumption is fundamental to estimating the fraction of harmful protein mutations and to identifying adaptive evolution at individual codons and proteins. We show that the assumption is not justifiable because a much larger fraction of codon positions is involved in hypermutable CpG dinucleotides as compared with the introns, leading to a higher expected replacement mutation rate per site in a vast majority of the genes. Consideration of this difference reveals a higher intensity of purifying natural selection than previously inferred in human genes. We also show that a much smaller number of genes are expected to be evolving with positive selection than that predicted using sequence divergence at intron and silent positions in the human genome. These patterns indicate the need for using new approaches for estimating rates of amino acid-altering mutations in order to find positively selected genes and codons in genomes that contain hypermutable CpG's.
Collapse
Affiliation(s)
- Sankar Subramanian
- Center for Evolutionary Functional Genomics, The Biodesign Institute, Arizona State University
- School of Life Sciences, Arizona State University
| | - Sudhir Kumar
- Center for Evolutionary Functional Genomics, The Biodesign Institute, Arizona State University
- School of Life Sciences, Arizona State University
| |
Collapse
|