26
|
Polak P, Querfurth R, Arndt PF. The evolution of transcription-associated biases of mutations across vertebrates. BMC Evol Biol 2010; 10:187. [PMID: 20565875 PMCID: PMC2927911 DOI: 10.1186/1471-2148-10-187] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2009] [Accepted: 06/18/2010] [Indexed: 02/03/2024] Open
Abstract
Background The interplay between transcription and mutational processes can lead to particular mutation patterns in transcribed regions of the genome. Transcription introduces several biases in mutational patterns; in particular it invokes strand specific mutations. In order to understand the forces that have shaped transcripts during evolution, one has to study mutation patterns associated with transcription across animals. Results Using multiple alignments of related species we estimated the regional single-nucleotide substitution patterns along genes in four vertebrate taxa: primates, rodents, laurasiatheria and bony fishes. Our analysis is focused on intronic and intergenic regions and reveals differences in the patterns of substitution asymmetries between mammals and fishes. In mammals, the levels of asymmetries are stronger for genes starting within CpG islands than in genes lacking this property. In contrast to all other species analyzed, we found a mutational pressure in dog and stickleback, promoting an increase of GC-contents in the proximity to transcriptional start sites. Conclusions We propose that the asymmetric patterns in transcribed regions are results of transcription associated mutagenic processes and transcription coupled repair, which both seem to evolve in a taxon related manner. We also discuss alternative mechanisms that can generate strand biases and involves error prone DNA polymerases and reverse transcription. A localized increase of the GC content near the transcription start site is a signature of biased gene conversion (BGC) that occurs during recombination and heteroduplex formation. Since dog and stickleback are known to be subject to rapid adaptations due to population bottlenecks and breeding, we further hypothesize that an increase in recombination rates near gene starts has been part of an adaptive process.
Collapse
|
27
|
Schütze T, Arndt PF, Menger M, Wochner A, Vingron M, Erdmann VA, Lehrach H, Kaps C, Glökler J. A calibrated diversity assay for nucleic acid libraries using DiStRO--a Diversity Standard of Random Oligonucleotides. Nucleic Acids Res 2009; 38:e23. [PMID: 19965765 PMCID: PMC2831324 DOI: 10.1093/nar/gkp1108] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
We have determined diversities exceeding 1012 different sequences in an annealing and melting assay using synthetic randomized oligonucleotides as a standard. For such high diversities, the annealing kinetics differ from those observed for low diversities, favouring the remelting curve after annealing as the best indicator of complexity. Direct comparisons of nucleic acid pools obtained from an aptamer selection demonstrate that even highly complex populations can be evaluated by using DiStRO, without the need of complicated calculations.
Collapse
|
28
|
Polak P, Arndt PF. Long-range bidirectional strand asymmetries originate at CpG islands in the human genome. Genome Biol Evol 2009; 1:189-97. [PMID: 20333189 PMCID: PMC2817419 DOI: 10.1093/gbe/evp024] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/22/2009] [Indexed: 12/24/2022] Open
Abstract
In the human genome, CpG islands (CGIs), which are GC- and CpG-rich sequences, are associated with transcription starting sites (TSSs); in addition, there is evidence that CGIs harbor origins of bidirectional replication (OBRs) and are preferred sites for heteroduplex formation during recombination. Transcription, replication, and recombination processes are known to induce specific mutational patterns in various genomes, and therefore, these patterns are expected to be found around CGIs. We use triple alignments of human, chimp, and macaque to compute the rates of nucleotide substitutions in up to 1 Mbps long intergenic regions on both sides of CGIs. Our analysis revealed that around a CGI there is an asymmetry between complementary substitution rates that is similar to the one that found around the OBR in bacteria. We hypothesize that these asymmetries are induced by differences in the replication of the leading and lagging strand and that a significant number of CGIs overlap OBRs. Within CGIs, we observed a mutational signature of GC-biased gene conversion that is associated with recombination. We suggest that recombination has played a major role in the creation of CGIs.
Collapse
|
29
|
Singh ND, Arndt PF, Clark AG, Aquadro CF. Strong evidence for lineage and sequence specificity of substitution rates and patterns in Drosophila. Mol Biol Evol 2009; 26:1591-605. [PMID: 19351792 DOI: 10.1093/molbev/msp071] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Rates of single nucleotide substitution in Drosophila are highly variable within the genome, and several examples illustrate that evolutionary rates differ among Drosophila species as well. Here, we use a maximum likelihood method to quantify lineage-specific substitutional patterns and apply this method to 4-fold degenerate synonymous sites and introns from more than 8,000 genes aligned in the Drosophila melanogaster group. We find that within species, different classes of sequence evolve at different rates, with long introns evolving most slowly and short introns evolving most rapidly. Relative rates of individual single nucleotide substitutions vary approximately 3-fold among lineages, yielding patterns of substitution that are comparatively less GC-biased in the melanogaster species complex relative to Drosophila yakuba and Drosophila erecta. These results are consistent with a model coupling a mutational shift toward reduced GC content, or a shift in mutation-selection balance, in the D. melanogaster species complex, with variation in selective constraint among different classes of DNA sequence. Finally, base composition of coding and intronic sequences is not at equilibrium with respect to substitutional patterns, which primarily reflects the slow rate of the substitutional process. These results thus support the view that mutational and/or selective processes are labile on an evolutionary timescale and that if the process is indeed selection driven, then the distribution of selective constraint is variable across the genome.
Collapse
|
30
|
Zemojtel T, Kielbasa SM, Arndt PF, Chung HR, Vingron M. Methylation and deamination of CpGs generate p53-binding sites on a genomic scale. Trends Genet 2008; 25:63-6. [PMID: 19101055 DOI: 10.1016/j.tig.2008.11.005] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2008] [Revised: 11/19/2008] [Accepted: 11/20/2008] [Indexed: 11/28/2022]
Abstract
The formation of transcription-factor-binding sites is an important evolutionary process. Here, we show that methylation and deamination of CpG dinucleotides generate in vivo p53-binding sites in numerous Alu elements and in non-repetitive DNA in a species-specific manner. In light of this, we propose that the deamination of methylated CpGs constitutes a universal mechanism for de novo generation of various transcription-factor-binding sites in Alus.
Collapse
|
31
|
Kübler K, Arndt PF, Wardelmann E, Landwehr C, Krebs D, Kuhn W, van der Ven K. Genetic alterations of HLA-class II in ovarian cancer. Int J Cancer 2008; 123:1350-6. [PMID: 18561316 DOI: 10.1002/ijc.23624] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
The immune system controls tumor formation through identification and elimination of cellular alterations. Consequently, cancer development in immune competent hosts depends on strategies to evade the immune system. Modulation of tumor antigen-specific immune responses by aberrant expression of HLA-class I and II molecules is well documented in a variety of carcinomas including ovarian cancer. To date, little data are available about molecular mechanisms responsible for altered HLA-class II phenotypes in tumors. In our sample of 10 Caucasian patients with ovarian carcinoma, a semiquantitative analysis was performed for HLA-class II loci DRB1 and DQB1 in malignant and normal ovarian tissue. Gene amplifications were identified in 62.5% of analyzed alleles and deletions in 17.5%, demonstrating that genomic aberrations of 6p21.3 are common and that copy number gain is more frequent than loss. Moreover, amplifications are most pronounced in advanced-stage tumors. To evaluate genotype-phenotype relation, immunohistochemical analyses were performed and revealed de novo expression of HLA-class II in 30% of tumors with an inverse association between antigen level and HLA copy number. It remains to be elucidated whether the profound changes of the latter quantities are the result of the host's immunological self-defense, indicate the presence of an oncogene located within the MHC-complex or merely reflect the increasing loss of differentiation of the tumor tissue.
Collapse
|
32
|
Squartini F, Arndt PF. Quantifying the stationarity and time reversibility of the nucleotide substitution process. Mol Biol Evol 2008; 25:2525-35. [PMID: 18682605 DOI: 10.1093/molbev/msn169] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Markov models describing the evolution of the nucleotide substitution process, widely used in phylogeny reconstruction, usually assume the hypotheses of stationarity and time reversibility. Although these models give meaningful results when applied to biological data, it is not clear if the 2 assumptions mentioned above hold and, if not, how much sequence evolution processes deviate from them. To this aim, we introduce 2 sets of indices that can be calculated from the nucleotide distribution and the substitution rates. The stationarity indices (STIs) can be used to test the validity of the equilibrium assumption. The irreversibility indices (IRIs) are derived from the Kolmogorov cycle conditions for time reversibility and quantify the degree of nontime reversibility of a process. We have computed STIs and IRIs for the evolutionary process of 2 lineages, Drosophila simulans and Homo sapiens. In the latter case, we use a modified form of the indices that takes into account the CpG decay process. In both cases, we find statistically significant deviations from the ideal case of a process that has reached stationarity and is time reversible.
Collapse
|
33
|
Duret L, Arndt PF. The impact of recombination on nucleotide substitutions in the human genome. PLoS Genet 2008; 4:e1000071. [PMID: 18464896 PMCID: PMC2346554 DOI: 10.1371/journal.pgen.1000071] [Citation(s) in RCA: 254] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2007] [Accepted: 04/11/2008] [Indexed: 01/19/2023] Open
Abstract
Unraveling the evolutionary forces responsible for variations of neutral substitution patterns among taxa or along genomes is a major issue for detecting selection within sequences. Mammalian genomes show large-scale regional variations of GC-content (the isochores), but the substitution processes at the origin of this structure are poorly understood. We analyzed the pattern of neutral substitutions in 1 Gb of primate non-coding regions. We show that the GC-content toward which sequences are evolving is strongly negatively correlated to the distance to telomeres and positively correlated to the rate of crossovers (R2 = 47%). This demonstrates that recombination has a major impact on substitution patterns in human, driving the evolution of GC-content. The evolution of GC-content correlates much more strongly with male than with female crossover rate, which rules out selectionist models for the evolution of isochores. This effect of recombination is most probably a consequence of the neutral process of biased gene conversion (BGC) occurring within recombination hotspots. We show that the predictions of this model fit very well with the observed substitution patterns in the human genome. This model notably explains the positive correlation between substitution rate and recombination rate. Theoretical calculations indicate that variations in population size or density in recombination hotspots can have a very strong impact on the evolution of base composition. Furthermore, recombination hotspots can create strong substitution hotspots. This molecular drive affects both coding and non-coding regions. We therefore conclude that along with mutation, selection and drift, BGC is one of the major factors driving genome evolution. Our results also shed light on variations in the rate of crossover relative to non-crossover events, along chromosomes and according to sex, and also on the conservation of hotspot density between human and chimp. Mammalian genomes show a very strong heterogeneity of base composition along chromosomes (the so-called isochores). The functional significance of these peculiar genomic landscapes is highly debated: do isochores confer some selective advantage, or are they simply the by-product of neutral evolutionary processes? To resolve this issue, we analyzed the pattern of substitution in the human genome by comparison with chimpanzee and macaque. We show that the evolution of base composition (GC-content) is essentially determined by the rate of recombination. This effect appears to be much stronger in male than in female germline, which rules out selective explanations for the evolution of isochores. We show that this impact of recombination is most probably a consequence of the process of biased gene conversion (BGC). This neutral process mimics the action of selection and can induce strong substitution hotspots within recombination hotspots, sometimes leading to the fixation of deleterious mutations. BGC appears to be one of the major factors driving genome evolution. It is therefore essential to take this process into account if we want to be able to interpret genome sequences.
Collapse
|
34
|
Arndt PF, Vingron M. The Otto Warburg International Summer School and Workshop on Networks and Regulation. BMC Bioinformatics 2007. [PMCID: PMC1995547 DOI: 10.1186/1471-2105-8-s6-s1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
|
35
|
de la Chaux N, Messer PW, Arndt PF. DNA indels in coding regions reveal selective constraints on protein evolution in the human lineage. BMC Evol Biol 2007; 7:191. [PMID: 17935613 PMCID: PMC2151769 DOI: 10.1186/1471-2148-7-191] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2007] [Accepted: 10/12/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Insertions and deletions of DNA segments (indels) are together with substitutions the major mutational processes that generate genetic variation. Here we focus on recent DNA insertions and deletions in protein coding regions of the human genome to investigate selective constraints on indels in protein evolution. RESULTS Frequencies of inserted and deleted amino acids differ from background amino acid frequencies in the human proteome. Small amino acids are overrepresented, while hydrophobic, aliphatic and aromatic amino acids are strongly suppressed. Indels are found to be preferentially located in protein regions that do not form important structural domains. Amino acid insertion and deletion rates in genes associated with elementary biochemical reactions (e. g. catalytic activity, ligase activity, electron transport, or catabolic process) are lower compared to those in other genes and are therefore subject to stronger purifying selection. CONCLUSION Our analysis indicates that indels in human protein coding regions are subject to distinct levels of selective pressure with regard to their structural impact on the amino acid sequence, as well as to general properties of the genes they are located in. These findings confirm that many commonly accepted characteristics of selective constraints for substitutions are also valid for amino acid insertions and deletions.
Collapse
|
36
|
Messer PW, Bundschuh R, Vingron M, Arndt PF. Effects of long-range correlations in DNA on sequence alignment score statistics. J Comput Biol 2007; 14:655-68. [PMID: 17683266 DOI: 10.1089/cmb.2007.r008] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Long-range correlations in genomic base composition are a ubiquitous statistical feature among many eukaryotic genomes. In this article, these correlations are shown to substantially influence the statistics of sequence alignment scores. Using a Gaussian approximation to model the correlated score landscape, we calculate the corrections to the scale parameter lambda of the extreme value distribution of alignment scores. Our approximate analytic results are supported by a detailed numerical study based on a simple algorithm to efficiently generate long-range correlated random sequences. We find both, mean and exponential tail of the score distribution for long-range correlated sequences to be substantially shifted compared to random sequences with independent nucleotides. The significance of measured alignment scores will therefore change upon incorporation of the correlations in the null model. We discuss the magnitude of this effect in a biological context.
Collapse
|
37
|
Messer PW, Arndt PF. The majority of recent short DNA insertions in the human genome are tandem duplications. Mol Biol Evol 2007; 24:1190-7. [PMID: 17322553 DOI: 10.1093/molbev/msm035] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Nucleotide substitutions, insertions, and deletions constitute the principal molecular mechanisms generating genetic variation on small length scales. In contrast to substitutions, the nature of short DNA insertions and deletions (indels) is far less understood. With the recent availability of whole-genome multiple alignments between human and other primates, detailed investigations on indel characteristics and origin have come within reach. Here, we show that the majority of short (1-100 bp) DNA insertions in the human lineage are tandem duplications of directly adjacent sequence segments with conserved polarity. Indels in microsatellites comprise only a small fraction. The underlying molecular processes generating indels do not necessarily rely on the presence of preexisting duplicates, as would be expected for unequal crossing over, as well as replication slippage. Instead, our findings point toward a mechanism that preferentially occurs in the male germline and is not recombination-mediated. Surprisingly, nonframeshifting tandem duplications and deletions in coding regions still occur at approximately 50% of their genomic background rates. As is already well established in the context of gene and segmental duplications, our results demonstrate that duplications are also likely to constitute the predominant process for rapid generation of new genetic material and function on smaller scales.
Collapse
|
38
|
Arndt PF. Reconstruction of ancestral nucleotide sequences and estimation of substitution frequencies in a star phylogeny. Gene 2006; 390:75-83. [PMID: 17223282 DOI: 10.1016/j.gene.2006.11.022] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2006] [Revised: 11/15/2006] [Accepted: 11/15/2006] [Indexed: 10/23/2022]
Abstract
Maximum likelihood phylogeny reconstruction methods are widely used in uncovering and assessing the evolutionary history and relationships of natural systems. However, several simplifying assumptions commonly made in this analysis limit the explanatory power of the results obtained. We present an algorithm that performs the phylogenetic analysis without making the common assumptions for sequence data from at least three leaf nodes in a star phylogeny. In particular, the underlying nucleotide substitution model does not have to be reversible and may include neighbor-dependent processes like the CpG methylation deamination process (CpG-effect). The base composition of the sequences at the external nodes and the one of the ancestral sequence may be different from each other and they do not have to be stationary state distributions of the corresponding substitution model. The algorithm is able to reconstruct the ancestral base composition and accurately estimate substitution frequencies in the branches of the star phylogeny. Extensive tests on simulated data validate the very favorable performance of the algorithm. As an application we present the analysis of aligned genomic sequences from human, mouse, and dog. Different substitution pattern can be observed in the three lineages.
Collapse
|
39
|
Singh ND, Arndt PF, Petrov DA. Minor shift in background substitutional patterns in the Drosophila saltans and willistoni lineages is insufficient to explain GC content of coding sequences. BMC Biol 2006; 4:37. [PMID: 17049096 PMCID: PMC1626080 DOI: 10.1186/1741-7007-4-37] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2006] [Accepted: 10/18/2006] [Indexed: 11/10/2022] Open
Abstract
Background Several lines of evidence suggest that codon usage in the Drosophila saltans and D. willistoni lineages has shifted towards a less frequent use of GC-ending codons. Introns in these lineages show a parallel shift toward a lower GC content. These patterns have been alternatively ascribed to either a shift in mutational patterns or changes in the definition of preferred and unpreferred codons in these lineages. Results and discussion To gain additional insight into this question, we quantified background substitutional patterns in the saltans/willistoni group using inactive copies of a novel, Q-like retrotransposable element. We demonstrate that the pattern of background substitutions in the saltans/willistoni lineage has shifted to a significant degree, primarily due to changes in mutational biases. These differences predict a lower equilibrium GC content in the genomes of the saltans/willistoni species compared with that in the D. melanogaster species group. The magnitude of the difference can readily account for changes in intronic GC content, but it appears insufficient to explain changes in codon usage within the saltans/willistoni lineage. Conclusion We suggest that the observed changes in codon usage in the saltans/willistoni clade reflects either lineage-specific changes in the definitions of preferred and unpreferred codons, or a weaker selective pressure on codon bias in this lineage.
Collapse
|
40
|
Messer PW, Arndt PF. CorGen--measuring and generating long-range correlations for DNA sequence analysis. Nucleic Acids Res 2006; 34:W692-5. [PMID: 16845099 PMCID: PMC1538783 DOI: 10.1093/nar/gkl234] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
CorGen is a web server that measures long-range correlations in the base composition of DNA and generates random sequences with the same correlation parameters. Long-range correlations are characterized by a power-law decay of the auto correlation function of the GC-content. The widespread presence of such correlations in eukaryotic genomes calls for their incorporation into accurate null models of eukaryotic DNA in computational biology. For example, the score statistics of sequence alignment and the performance of motif finding algorithms are significantly affected by the presence of genomic long-range correlations. We use an expansion-randomization dynamics to efficiently generate the correlated random sequences. The server is available at http://corgen.molgen.mpg.de.
Collapse
|
41
|
Lipatov M, Arndt PF, Hwa T, Petrov DA. A Novel Method Distinguishes Between Mutation Rates and Fixation Biases in Patterns of Single-Nucleotide Substitution. J Mol Evol 2006. [DOI: 10.1007/s00239-006-7207-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
42
|
Roepcke S, Zhi D, Vingron M, Arndt PF. Identification of highly specific localized sequence motifs in human ribosomal protein gene promoters. Gene 2006; 365:48-56. [PMID: 16343812 DOI: 10.1016/j.gene.2005.09.033] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2005] [Revised: 07/22/2005] [Accepted: 09/27/2005] [Indexed: 11/28/2022]
Abstract
For ribosomal protein (RP) genes the start of transcription is rigidly controlled to maintain the 5'-TOP signal on the messenger RNA. The responsible regulatory mechanism is not yet fully understood. Careful comparative analysis of their proximal promoter sequences reveals common characteristics and thus provides clues to the underlying mechanism. We have extracted the proximal promoters of the 80 human cytosolic ribosomal protein genes together with the orthologous mouse sequences. After annotating the set with transcription factor binding sites based on the available literature, we searched for over-represented sequence motifs. We uncovered a novel motif that is localized at a fixed distance downstream to the transcription start. 31 out of the 80 promoters contain the motif in the same orientation around position +62 (standard deviation 6). A second evolutionary conserved and palindromic motif is found 13 times in the RP promoter set, 9 instances of which are located upstream around position -40. In addition, we see a characteristic profile of the GC-content and of the CpG dinucleotide frequencies. Our results support a model for the transcription of ribosomal protein genes in which the maintenance of the accurate start of transcription is provided by specific transcription factors. Such a factor binds the target DNA at a fixed location relative to the TSS, and possibly interacts directly with the basal transcription machinery.
Collapse
|
43
|
Kübler K, Arndt PF, Wardelmann E, Krebs D, Kuhn W, van der Ven K. HLA-class II haplotype associations with ovarian cancer. Int J Cancer 2006; 119:2980-5. [PMID: 17016821 DOI: 10.1002/ijc.22266] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
The development of cancer is a multistep process that is characterized by the accumulation of genetic alterations in cells and changed cellular interactions with the surrounding healthy tissues. The human immune system is believed to be intrinsically involved in this process. The correlation of certain human leukocyte antigen (HLA)-class I and II haplotypes with tumorigenesis is documented in a variety of tumors. However, few data exist on the possible association of specific HLA-class II alleles or haplotypes with ovarian cancer. In our sample of 52 Caucasian patients with primary ovarian carcinoma and 239 female healthy local controls, we observed a significantly increased incidence of the HLA-class II haplotypes DRB1*0301 - DQA1*0501 - DQB1*0201 (p < 0.001) and DRB1*1001 - DQA1*0101 - DQB1*0501 (p < 0.001) in the patients. Our data suggest that HLA-class II loci or individual HLA-class II haplotypes may be involved in the pathogenesis of ovarian cancer.
Collapse
|
44
|
Lipatov M, Arndt PF, Hwa T, Petrov DA. A Novel Method Distinguishes Between Mutation Rates and Fixation Biases in Patterns of Single-Nucleotide Substitution. J Mol Evol 2005; 62:168-75. [PMID: 16362483 DOI: 10.1007/s00239-005-0207-z] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2004] [Accepted: 06/20/2005] [Indexed: 10/25/2022]
Abstract
Analysis of the genome-wide patterns of single-nucleotide substitution reveals that the human GC content structure is out of equilibrium. The substitutions are decreasing the overall GC content (GC), at the same time making its range narrower. Investigation of single-nucleotide polymorphisms (SNPs) revealed that presently the decrease in GC content is due to a uniform mutational preference for A:T pairs, while its projected range is due to a variability in the fixation preference for G:C pairs. However, it is important to determine whether lessons learned about evolutionary processes operating at the present time (that is reflected in the SNP data) can be extended back into the evolutionary past. We describe here a new approach to this problem that utilizes the juxtaposition of forward and reverse substitution rates to determine the relative importance of variability in mutation rates and fixation probabilities in shaping long-term substitutional patterns. We use this approach to demonstrate that the forces shaping GC content structure over the recent past (since the appearance of the SNPs) extend all the way back to the mammalian radiation approximately 90 million years ago. In addition, we find a small but significant effect that has not been detected in the SNP data-relatively high rates of C:G-->A:T germline mutation in low-GC regions of the genome.
Collapse
|
45
|
Arndt PF, Hwa T, Petrov DA. Substantial regional variation in substitution rates in the human genome: importance of GC content, gene density, and telomere-specific effects. J Mol Evol 2005; 60:748-63. [PMID: 15959677 DOI: 10.1007/s00239-004-0222-5] [Citation(s) in RCA: 79] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2004] [Accepted: 12/30/2004] [Indexed: 01/08/2023]
Abstract
This study presents the first global, 1-Mbp-level analysis of patterns of nucleotide substitutions along the human lineage. The study is based on the analysis of a large amount of repetitive elements deposited into the human genome since the mammalian radiation, yielding a number of results that would have been difficult to obtain using the more conventional comparative method of analysis. This analysis revealed substantial and consistent variability of rates of substitution, with the variability ranging up to twofold among different regions. The rates of substitutions of C or G nucleotides with A or T nucleotides vary much more sharply than the reverse rates, suggesting that much of that variation is due to differences in mutation rates rather than in the probabilities of fixation of C/G vs. A/T nucleotides across the genome. For all types of substitution we observe substantially more hotspots than coldspots, with hotspots showing substantial clustering over tens of Mbp's. Our analysis revealed that GC-content of surrounding sequences is the best predictor of the rates of substitution. The pattern of substitution appears very different near telomeres compared to the rest of the genome and cannot be explained by the genome-wide correlations of the substitution rates with GC content or exon density. The telomere pattern of substitution is consistent with natural selection or biased gene conversion acting to increase the GC-content of the sequences that are within 10-15 Mbp away from the telomere.
Collapse
|
46
|
Messer PW, Arndt PF, Lässig M. Solvable sequence evolution models and genomic correlations. PHYSICAL REVIEW LETTERS 2005; 94:138103. [PMID: 15904043 DOI: 10.1103/physrevlett.94.138103] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/24/2004] [Indexed: 05/02/2023]
Abstract
We study a minimal model for genome evolution whose elementary processes are single site mutation, duplication and deletion of sequence regions, and insertion of random segments. These processes are found to generate long-range correlations in the composition of letters as long as the sequence length is growing; i.e., the combined rates of duplications and insertions are higher than the deletion rate. For constant sequence length, on the other hand, all initial correlations decay exponentially. These results are obtained analytically and by simulations. They are compared with the long-range correlations observed in genomic DNA, and the implications for genome evolution are discussed.
Collapse
|
47
|
Webster MT, Smith NGC, Hultin-Rosenberg L, Arndt PF, Ellegren H. Male-driven biased gene conversion governs the evolution of base composition in human alu repeats. Mol Biol Evol 2005; 22:1468-74. [PMID: 15772377 DOI: 10.1093/molbev/msi136] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Regional biases in substitution pattern are likely to be responsible for the large-scale variation in base composition observed in vertebrate genomes. However, the evolutionary forces responsible for these biases are still not clearly defined. In order to study the processes of mutation and fixation across the entire human genome, we analyzed patterns of substitution in Alu repeats since their insertion. We also studied patterns of human polymorphism within the repeats. There is a highly significant effect of recombination rate on the pattern of substitution, whereas no such effect is seen on the pattern of polymorphism. These results suggest that regional biases in substitution are caused by biased gene conversion, a process that increases the probability of fixation of mutations that increase GC content. Furthermore, the strongest correlate of substitution patterns is found to be male recombination rates rather than female or sex-averaged recombination rates. This indicates that in addition to sexual dimorphism in recombination rates, the sexes also differ in the relative rates of crossover and gene conversion.
Collapse
|
48
|
Arndt PF, Hwa T. Identification and measurement of neighbor-dependent nucleotide substitution processes. Bioinformatics 2005; 21:2322-8. [PMID: 15769841 DOI: 10.1093/bioinformatics/bti376] [Citation(s) in RCA: 64] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Neighbor-dependent substitution processes generated specific pattern of dinucleotide frequencies in the genomes of most organisms. The CpG-methylation-deamination process is, e.g. a prominent process in vertebrates (CpG effect). Such processes, often with unknown mechanistic origins, need to be incorporated into realistic models of nucleotide substitutions. RESULTS Based on a general framework of nucleotide substitutions we developed a method that is able to identify the most relevant neighbor-dependent substitution processes, estimate their relative frequencies and judge their importance in order to be included into the modeling. Starting from a model for neighbor independent nucleotide substitution we successively added neighbor-dependent substitution processes in the order of their ability to increase the likelihood of the model describing given data. The analysis of neighbor-dependent nucleotide substitutions based on repetitive elements found in the genomes of human, zebrafish and fruit fly is presented. AVAILABILITY A web server to perform the presented analysis is freely available at: http://evogen.molgen.mpg.de/server/substitution-analysis
Collapse
|
49
|
Dieterich C, Grossmann S, Tanzer A, Röpcke S, Arndt PF, Stadler PF, Vingron M. Comparative promoter region analysis powered by CORG. BMC Genomics 2005; 6:24. [PMID: 15723697 PMCID: PMC555765 DOI: 10.1186/1471-2164-6-24] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2004] [Accepted: 02/21/2005] [Indexed: 11/10/2022] Open
Abstract
Background Promoters are key players in gene regulation. They receive signals from various sources (e.g. cell surface receptors) and control the level of transcription initiation, which largely determines gene expression. In vertebrates, transcription start sites and surrounding regulatory elements are often poorly defined. To support promoter analysis, we present CORG , a framework for studying upstream regions including untranslated exons (5' UTR). Description The automated annotation of promoter regions integrates information of two kinds. First, statistically significant cross-species conservation within upstream regions of orthologous genes is detected. Pairwise as well as multiple sequence comparisons are computed. Second, binding site descriptions (position-weight matrices) are employed to predict conserved regulatory elements with a novel approach. Assembled EST sequences and verified transcription start sites are incorporated to distinguish exonic from other sequences. As of now, we have included 5 species in our analysis pipeline (man, mouse, rat, fugu and zebrafish). We characterized promoter regions of 16,127 groups of orthologous genes. All data are presented in an intuitive way via our web site. Users are free to export data for single genes or access larger data sets via our DAS server . The benefits of our framework are exemplarily shown in the context of phylogenetic profiling of transcription factor binding sites and detection of microRNAs close to transcription start sites of our gene set. Conclusion The CORG platform is a versatile tool to support analyses of gene regulation in vertebrate promoter regions. Applications for CORG cover a broad range from studying evolution of DNA binding sites and promoter constitution to the discovery of new regulatory sequence elements (e.g. microRNAs and binding sites).
Collapse
|
50
|
Abstract
MOTIVATION Substantial regional variations of substitutional processes have recently been reported from human/mouse comparisons. However, several features including the C + G dependence and the CpG-based transition effect remain obscure. RESULTS Utilizing the vast amount of transposable elements in the human genome, we performed detailed analysis of the substitutional and insertion/deletion patterns along the human lineage in a regional and time-resolved fashion. We observed a drastic increase in the CpG-based transition frequency at about the time of the mammalian radiation. We also observed clear regional biases of substitution patterns, most notably a bias to enrich the C+G content toward the telomeres. AVAILABILITY The programs used are available upon request from the authors.
Collapse
|