1
|
Kyriacou RG, Mulhair PO, Holland PWH. GC Content Across Insect Genomes: Phylogenetic Patterns, Causes and Consequences. J Mol Evol 2024; 92:138-152. [PMID: 38491221 PMCID: PMC10978632 DOI: 10.1007/s00239-024-10160-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 02/06/2024] [Indexed: 03/18/2024]
Abstract
The proportions of A:T and G:C nucleotide pairs are often unequal and can vary greatly between animal species and along chromosomes. The causes and consequences of this variation are incompletely understood. The recent release of high-quality genome sequences from the Darwin Tree of Life and other large-scale genome projects provides an opportunity for GC heterogeneity to be compared across a large number of insect species. Here we analyse GC content along chromosomes, and within protein-coding genes and codons, of 150 insect species from four holometabolous orders: Coleoptera, Diptera, Hymenoptera, and Lepidoptera. We find that protein-coding sequences have higher GC content than the genome average, and that Lepidoptera generally have higher GC content than the other three insect orders examined. GC content is higher in small chromosomes in most Lepidoptera species, but this pattern is less consistent in other orders. GC content also increases towards subtelomeric regions within protein-coding genes in Diptera, Coleoptera and Lepidoptera. Two species of Diptera, Bombylius major and B. discolor, have very atypical genomes with ubiquitous increase in AT content, especially at third codon positions. Despite dramatic AT-biased codon usage, we find no evidence that this has driven divergent protein evolution. We argue that the GC landscape of Lepidoptera, Diptera and Coleoptera genomes is influenced by GC-biased gene conversion, strongest in Lepidoptera, with some outlier taxa affected drastically by counteracting processes.
Collapse
Affiliation(s)
- Riccardo G Kyriacou
- Department of Biology, University of Oxford, 11a Mansfield Road, Oxford, OX1 3SZ, UK
| | - Peter O Mulhair
- Department of Biology, University of Oxford, 11a Mansfield Road, Oxford, OX1 3SZ, UK
| | - Peter W H Holland
- Department of Biology, University of Oxford, 11a Mansfield Road, Oxford, OX1 3SZ, UK.
| |
Collapse
|
2
|
Borovská I, Vořechovský I, Královičová J. Alu RNA fold links splicing with signal recognition particle proteins. Nucleic Acids Res 2023; 51:8199-8216. [PMID: 37309897 PMCID: PMC10450188 DOI: 10.1093/nar/gkad500] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Revised: 05/23/2023] [Accepted: 05/31/2023] [Indexed: 06/14/2023] Open
Abstract
Transcriptomic diversity in primates was considerably expanded by exonizations of intronic Alu elements. To better understand their cellular mechanisms we have used structure-based mutagenesis coupled with functional and proteomic assays to study the impact of successive primate mutations and their combinations on inclusion of a sense-oriented AluJ exon in the human F8 gene. We show that the splicing outcome was better predicted by consecutive RNA conformation changes than by computationally derived splicing regulatory motifs. We also demonstrate an involvement of SRP9/14 (signal recognition particle) heterodimer in splicing regulation of Alu-derived exons. Nucleotide substitutions that accumulated during primate evolution relaxed the conserved left-arm AluJ structure including helix H1 and reduced the capacity of SRP9/14 to stabilize the closed Alu conformation. RNA secondary structure-constrained mutations that promoted open Y-shaped conformations of the Alu made the Alu exon inclusion reliant on DHX9. Finally, we identified additional SRP9/14 sensitive Alu exons and predicted their functional roles in the cell. Together, these results provide unique insights into architectural elements required for sense Alu exonization, identify conserved pre-mRNA structures involved in exon selection and point to a possible chaperone activity of SRP9/14 outside the mammalian signal recognition particle.
Collapse
Affiliation(s)
- Ivana Borovská
- Institute of Molecular Physiology and Genetics, Centre of Biosciences, Slovak Academy of Sciences, Bratislava 840 05, Slovak Republic
| | - Igor Vořechovský
- Faculty of Medicine, University of Southampton, HDH, MP808, Southampton SO16 6YD, United Kingdom
| | - Jana Královičová
- Institute of Molecular Physiology and Genetics, Centre of Biosciences, Slovak Academy of Sciences, Bratislava 840 05, Slovak Republic
- Institute of Zoology, Slovak Academy of Sciences, Bratislava 845 06, Slovak Republic
| |
Collapse
|
3
|
Johnson KE, Adams CJ, Voight BF. Identifying rare variants inconsistent with identity-by-descent in population-scale whole-genome sequencing data. Methods Ecol Evol 2022; 13:2429-2442. [PMID: 38938451 PMCID: PMC11210625 DOI: 10.1111/2041-210x.13991] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Accepted: 09/12/2022] [Indexed: 12/01/2022]
Abstract
Analyses of genetic variation typically assume that rare variants within a population are inherited from a single common ancestral event identity-by-descent (IBD). However, there are genetic and technical processes through which rare variants in population genetic data may deviate from this simple evolutionary model, including recurrent mutations, gene conversions and genotyping error. All these processes can decrease the expected length of shared background haplotype surrounding a rare variant if that variant was inherited from a single event descending from a common ancestor. No method exists to computationally infer rare variants inconsistent with this simple model-denoted here as 'IBD-inconsistent'-using unphased population sequencing data.We hypothesized that the difference in shared haplotype background length can distinguish variants consistent and inconsistent with this simple IBD transmission population sequencing data without pedigree information. We implemented a Bayesian hierarchical model and used Gibbs sampling to estimate the posterior probability of IBD state for rare variants, using simulated recurrent mutations to demonstrate that our approach accurately distinguishes rare variants consistent and inconsistent with a simple IBD inheritance model.Applying our method to whole-genome sequencing data from 3,621 human individuals in the UK10K consortium, we found that IBD-inconsistent variants correlated with higher local mutation rates and genomic features like replication timing. Using a heuristic to categorize IBD-inconsistent variants as gene conversions, we found that potential gene conversions had expected properties such as enriched local GC content.By identifying IBD-inconsistent variants, we can better understand the spectrum of recent mutations in human populations, a source of genetic variation driving evolution and a key factor in understanding recent demographic history.
Collapse
Affiliation(s)
- Kelsey E. Johnson
- Cell and Molecular Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Christopher J. Adams
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Benjamin F. Voight
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| |
Collapse
|
4
|
Monroe JG, Srikant T, Carbonell-Bejerano P, Becker C, Lensink M, Exposito-Alonso M, Klein M, Hildebrandt J, Neumann M, Kliebenstein D, Weng ML, Imbert E, Ågren J, Rutter MT, Fenster CB, Weigel D. Mutation bias reflects natural selection in Arabidopsis thaliana. Nature 2022; 602:101-105. [PMID: 35022609 PMCID: PMC8810380 DOI: 10.1038/s41586-021-04269-6] [Citation(s) in RCA: 152] [Impact Index Per Article: 76.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Accepted: 11/17/2021] [Indexed: 12/24/2022]
Abstract
Since the first half of the twentieth century, evolutionary theory has been dominated by the idea that mutations occur randomly with respect to their consequences1. Here we test this assumption with large surveys of de novo mutations in the plant Arabidopsis thaliana. In contrast to expectations, we find that mutations occur less often in functionally constrained regions of the genome-mutation frequency is reduced by half inside gene bodies and by two-thirds in essential genes. With independent genomic mutation datasets, including from the largest Arabidopsis mutation accumulation experiment conducted to date, we demonstrate that epigenomic and physical features explain over 90% of variance in the genome-wide pattern of mutation bias surrounding genes. Observed mutation frequencies around genes in turn accurately predict patterns of genetic polymorphisms in natural Arabidopsis accessions (r = 0.96). That mutation bias is the primary force behind patterns of sequence evolution around genes in natural accessions is supported by analyses of allele frequencies. Finally, we find that genes subject to stronger purifying selection have a lower mutation rate. We conclude that epigenome-associated mutation bias2 reduces the occurrence of deleterious mutations in Arabidopsis, challenging the prevailing paradigm that mutation is a directionless force in evolution.
Collapse
Affiliation(s)
- J Grey Monroe
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, Tübingen, Germany.
- Department of Plant Sciences, University of California Davis, Davis, CA, USA.
| | - Thanvi Srikant
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, Tübingen, Germany
| | | | - Claude Becker
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, Tübingen, Germany
- Faculty of Biology, Ludwig Maximilian University, Martinsried, Germany
| | - Mariele Lensink
- Department of Plant Sciences, University of California Davis, Davis, CA, USA
| | - Moises Exposito-Alonso
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA, USA
- Department of Biology, Stanford University, Stanford, CA, USA
| | - Marie Klein
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, Tübingen, Germany
- Department of Plant Sciences, University of California Davis, Davis, CA, USA
| | - Julia Hildebrandt
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, Tübingen, Germany
| | - Manuela Neumann
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, Tübingen, Germany
| | - Daniel Kliebenstein
- Department of Plant Sciences, University of California Davis, Davis, CA, USA
| | - Mao-Lun Weng
- Department of Biology, Westfield State University, Westfield, MA, USA
| | - Eric Imbert
- ISEM, University of Montpellier, Montpellier, France
| | - Jon Ågren
- Department of Ecology and Genetics, EBC, Uppsala University, Uppsala, Sweden
| | - Matthew T Rutter
- Department of Biology, College of Charleston, Charleston, SC, USA
| | - Charles B Fenster
- Oak Lake Field Station, South Dakota State University, Brookings, SD, USA
| | - Detlef Weigel
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, Tübingen, Germany.
| |
Collapse
|
5
|
Castillo AI, Almeida RPP. Evidence of gene nucleotide composition favoring replication and growth in a fastidious plant pathogen. G3-GENES GENOMES GENETICS 2021; 11:6170658. [PMID: 33715000 PMCID: PMC8495750 DOI: 10.1093/g3journal/jkab076] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Accepted: 03/02/2021] [Indexed: 11/13/2022]
Abstract
Nucleotide composition (GC content) varies across bacteria species, genome regions, and specific genes. In Xylella fastidiosa, a vector-borne fastidious plant pathogen infecting multiple crops, GC content ranges between ∼51-52%; however, these values were gathered using limited genomic data. We evaluated GC content variations across X. fastidiosa subspecies fastidiosa (N = 194), subsp. pauca (N = 107), and subsp. multiplex (N = 39). Genomes were classified based on plant host and geographic origin; individual genes within each genome were classified based on gene function, strand, length, ortholog group, Core vs. Accessory, and Recombinant vs. Non-recombinant. GC content was calculated for each gene within each evaluated genome. The effects of genome and gene level variables were evaluated with a mixed effect ANOVA, and the marginal-GC content was calculated for each gene. Also, the correlation between gene-specific GC content vs. natural selection (dN/dS) and recombination/mutation (r/m) was estimated. Our analyses show that intra-genomic changes in nucleotide composition in X. fastidiosa are small and influenced by multiple variables. Higher AT-richness is observed in genes involved in replication and translation, and genes in the leading strand. In addition, we observed a negative correlation between high-AT and dN/dS in subsp. pauca. The relationship between recombination and GC content varied between core and accessory genes. We hypothesize that distinct evolutionary forces and energetic constraints both drive and limit these small variations in nucleotide composition.
Collapse
Affiliation(s)
- Andreina I Castillo
- Department of Environmental Science, Policy and Management, University of California, Berkeley, CA 94720, USA
| | - Rodrigo P P Almeida
- Department of Environmental Science, Policy and Management, University of California, Berkeley, CA 94720, USA
| |
Collapse
|
6
|
Abstract
Sex differences in overall recombination rates are well known, but little theoretical or empirical attention has been given to how and why sexes differ in their recombination landscapes: the patterns of recombination along chromosomes. In the first scientific review of this phenomenon, we find that recombination is biased toward telomeres in males and more uniformly distributed in females in most vertebrates and many other eukaryotes. Notable exceptions to this pattern exist, however. Fine-scale recombination patterns also frequently differ between males and females. The molecular mechanisms responsible for sex differences remain unclear, but chromatin landscapes play a role. Why these sex differences evolve also is unclear. Hypotheses suggest that they may result from sexually antagonistic selection acting on coding genes and their regulatory elements, meiotic drive in females, selection during the haploid phase of the life cycle, selection against aneuploidy, or mechanistic constraints. No single hypothesis, however, can adequately explain the evolution of sex differences in all cases. Sex-specific recombination landscapes have important consequences for population differentiation and sex chromosome evolution.
Collapse
Affiliation(s)
- Jason M. Sardell
- Department of Integrative Biology, University of Texas at Austin, Austin, TX 78712
| | - Mark Kirkpatrick
- Department of Integrative Biology, University of Texas at Austin, Austin, TX 78712
| |
Collapse
|
7
|
Kader F, Ghai M, Olaniran AO. Characterization of DNA methylation-based markers for human body fluid identification in forensics: a critical review. Int J Legal Med 2019; 134:1-20. [PMID: 31713682 DOI: 10.1007/s00414-019-02181-3] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2019] [Accepted: 10/15/2019] [Indexed: 02/07/2023]
Abstract
Body fluid identification in crime scene investigations aids in reconstruction of crime scenes. Several studies have identified and reported differentially methylated sites (DMSs) and regions (DMRs) which differ between forensically relevant tissues (tDMRs) and body fluids. Diverse factors affect methylation patterns such as the environment, diets, lifestyle, disease, ethnicity, genetic variation, amongst others. Thus, it is important to analyse the stability of markers employed for forensic identification. Furthermore, even though epigenetic modifications are described as stable and heritable, epigenetic inheritance of potential markers for body fluid identification needs to be assessed in the long term. Here, we discuss the current status of reported DNA methylation-based markers and their verification studies. Such thorough investigation is crucial to develop a stable panel of DNA methylation-based markers for accurate body fluid identification.
Collapse
Affiliation(s)
- Farzeen Kader
- Discipline of Genetics, School of Life Sciences, College of Agriculture, Engineering and Science, University of KwaZulu-Natal (Westville Campus), Private Bag X54001, Durban, Republic of South Africa
| | - Meenu Ghai
- Discipline of Genetics, School of Life Sciences, College of Agriculture, Engineering and Science, University of KwaZulu-Natal (Westville Campus), Private Bag X54001, Durban, Republic of South Africa.
| | - Ademola O Olaniran
- Discipline of Microbiology, School of Life Sciences, College of Agriculture, Engineering and Science, University of KwaZulu-Natal (Westville Campus), Private Bag X54001, Durban, Republic of South Africa
| |
Collapse
|
8
|
Ebler J, Haukness M, Pesout T, Marschall T, Paten B. Haplotype-aware diplotyping from noisy long reads. Genome Biol 2019; 20:116. [PMID: 31159868 PMCID: PMC6547545 DOI: 10.1186/s13059-019-1709-0] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2018] [Accepted: 05/06/2019] [Indexed: 12/19/2022] Open
Abstract
Current genotyping approaches for single-nucleotide variations rely on short, accurate reads from second-generation sequencing devices. Presently, third-generation sequencing platforms are rapidly becoming more widespread, yet approaches for leveraging their long but error-prone reads for genotyping are lacking. Here, we introduce a novel statistical framework for the joint inference of haplotypes and genotypes from noisy long reads, which we term diplotyping. Our technique takes full advantage of linkage information provided by long reads. We validate hundreds of thousands of candidate variants that have not yet been included in the high-confidence reference set of the Genome-in-a-Bottle effort.
Collapse
Affiliation(s)
- Jana Ebler
- Center for Bioinformatics, Saarland University, Saarland Informatics Campus E2.1, Saarbrücken, 66123, Germany
- Max Planck Institute for Informatics, Saarland Informatics Campus E1.4, Saarbrücken, Germany
- Graduate School of Computer Science, Saarland University, Saarland Informatics Campus E1.3, Saarbrücken, Germany
| | - Marina Haukness
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, 95064, CA, USA
| | - Trevor Pesout
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, 95064, CA, USA
| | - Tobias Marschall
- Center for Bioinformatics, Saarland University, Saarland Informatics Campus E2.1, Saarbrücken, 66123, Germany.
- Max Planck Institute for Informatics, Saarland Informatics Campus E1.4, Saarbrücken, Germany.
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, 95064, CA, USA.
| |
Collapse
|
9
|
Qi WH, Jiang XM, Yan CC, Zhang WQ, Xiao GS, Yue BS, Zhou CQ. Distribution patterns and variation analysis of simple sequence repeats in different genomic regions of bovid genomes. Sci Rep 2018; 8:14407. [PMID: 30258087 PMCID: PMC6158176 DOI: 10.1038/s41598-018-32286-5] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Accepted: 09/04/2018] [Indexed: 01/23/2023] Open
Abstract
As the first examination of distribution, guanine-cytosine (GC) pattern, and variation analysis of microsatellites (SSRs) in different genomic regions of six bovid species, SSRs displayed nonrandomly distribution in different regions. SSR abundances are much higher in the introns, transposable elements (TEs), and intergenic regions compared to the 3′-untranslated regions (3′UTRs), 5′UTRs and coding regions. Trinucleotide perfect SSRs (P-SSRs) were the most frequent in the coding regions, whereas, mononucleotide P-SSRs were the most in the introns, 3′UTRs, TEs, and intergenic regions. Trifold P-SSRs had more GC-contents in the 5′UTRs and coding regions than that in the introns, 3′UTRs, TEs, and intergenic regions, whereas mononucleotide P-SSRs had the least GC-contents in all genomic regions. The repeat copy numbers (RCN) of the same mono- to hexanucleotide P-SSRs showed significantly different distributions in different regions (P < 0.01). Except for the coding regions, mononucleotide P-SSRs had the most RCNs, followed by the pattern: di- > tri- > tetra- > penta- > hexanucleotide P-SSRs in the same regions. The analysis of coefficient of variability (CV) of SSRs showed that the CV variations of RCN of the same mono- to hexanucleotide SSRs were relative higher in the intronic and intergenic regions, followed by the CV variation of RCN in the TEs, and the relative lower was in the 5′UTRs, 3′UTRs, and coding regions. Wide SSR analysis of different genomic regions has helped to reveal biological significances of their distributions.
Collapse
Affiliation(s)
- Wen-Hua Qi
- College of Biology and Food Engineering, Chongqing Three Gorges University, Chongqing, 404100, P. R. China
| | - Xue-Mei Jiang
- College of Environmental and Chemistry Engineering, Chongqing Three Gorges University, Chongqing, 404100, P. R. China
| | - Chao-Chao Yan
- Key Laboratory of Bio-resources and Eco-environment (Ministry of Education), College of Life Sciences, Sichuan University, Chengdu, 610064, P. R. China
| | - Wan-Qing Zhang
- College of Life Sciences, Sichuan Agricultural University, Ya'an, Sichuan Province, 625014, P. R. China
| | - Guo-Sheng Xiao
- College of Biology and Food Engineering, Chongqing Three Gorges University, Chongqing, 404100, P. R. China
| | - Bi-Song Yue
- Key Laboratory of Bio-resources and Eco-environment (Ministry of Education), College of Life Sciences, Sichuan University, Chengdu, 610064, P. R. China
| | - Cai-Quan Zhou
- Key Laboratory of Southwest China Wildlife Resources Conservation (Ministry of Education), China West Normal University, Nanchong, 637009, P. R. China.
| |
Collapse
|
10
|
Distinct patterns of simple sequence repeats and GC distribution in intragenic and intergenic regions of primate genomes. Aging (Albany NY) 2017; 8:2635-2654. [PMID: 27644032 PMCID: PMC5191860 DOI: 10.18632/aging.101025] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2016] [Accepted: 08/22/2016] [Indexed: 01/23/2023]
Abstract
As the first systematic examination of simple sequence repeats (SSRs) and guanine-cytosine (GC) distribution in intragenic and intergenic regions of ten primates, our study showed that SSRs and GC displayed nonrandom distribution for both intragenic and intergenic regions, suggesting that they have potential roles in transcriptional or translational regulation. Our results suggest that the majority of SSRs are distributed in non-coding regions, such as the introns, TEs, and intergenic regions. In these primates, trinucleotide perfect (P) SSRs were the most abundant repeats type in the 5'UTRs and CDSs, whereas, mononucleotide P-SSRs were the most in the intron, 3'UTRs, TEs, and intergenic regions. The GC-contents varied greatly among different intragenic and intergenic regions: 5'UTRs > CDSs > 3'UTRs > TEs > introns > intergenic regions, and high GC-content was frequently distributed in exon-rich regions. Our results also showed that in the same intragenic and intergenic regions, the distribution of GC-contents were great similarity in the different primates. Tri- and hexanucleotide P-SSRs had the most GC-contents in the 5'UTRs and CDSs, whereas mononucleotide P-SSRs had the least GC-contents in the six genomic regions of these primates. The most frequent motifs for different length varied obviously with the different genomic regions.
Collapse
|
11
|
Inference of Distribution of Fitness Effects and Proportion of Adaptive Substitutions from Polymorphism Data. Genetics 2017; 207:1103-1119. [PMID: 28951530 PMCID: PMC5676230 DOI: 10.1534/genetics.117.300323] [Citation(s) in RCA: 87] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2017] [Accepted: 09/13/2017] [Indexed: 11/18/2022] Open
Abstract
The distribution of fitness effects (DFE) encompasses the fraction of deleterious, neutral, and beneficial mutations. It conditions the evolutionary trajectory of populations, as well as the rate of adaptive molecular evolution (α). Inferring DFE and α from patterns of polymorphism, as given through the site frequency spectrum (SFS) and divergence data, has been a longstanding goal of evolutionary genetics. A widespread assumption shared by previous inference methods is that beneficial mutations only contribute negligibly to the polymorphism data. Hence, a DFE comprising only deleterious mutations tends to be estimated from SFS data, and α is then predicted by contrasting the SFS with divergence data from an outgroup. We develop a hierarchical probabilistic framework that extends previous methods to infer DFE and α from polymorphism data alone. We use extensive simulations to examine the performance of our method. While an outgroup is still needed to obtain an unfolded SFS, we show that both a DFE, comprising both deleterious and beneficial mutations, and α can be inferred without using divergence data. We also show that not accounting for the contribution of beneficial mutations to polymorphism data leads to substantially biased estimates of the DFE and α. We compare our framework with one of the most widely used inference methods available and apply it on a recently published chimpanzee exome data set.
Collapse
|
12
|
Sahakyan AB, Balasubramanian S. Single genome retrieval of context-dependent variability in mutation rates for human germline. BMC Genomics 2017; 18:81. [PMID: 28086752 PMCID: PMC5237266 DOI: 10.1186/s12864-016-3440-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2016] [Accepted: 12/19/2016] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Accurate knowledge of the core components of substitution rates is of vital importance to understand genome evolution and dynamics. By performing a single-genome and direct analysis of 39,894 retrotransposon remnants, we reveal sequence context-dependent germline nucleotide substitution rates for the human genome. RESULTS The rates are characterised through rate constants in a time-domain, and are made available through a dedicated program (Trek) and a stand-alone database. Due to the nature of the method design and the imposed stringency criteria, we expect our rate constants to be good estimates for the rates of spontaneous mutations. Benefiting from such data, we study the short-range nucleotide (up to 7-mer) organisation and the germline basal substitution propensity (BSP) profile of the human genome; characterise novel, CpG-independent, substitution prone and resistant motifs; confirm a decreased tendency of moieties with low BSP to undergo somatic mutations in a number of cancer types; and, produce a Trek-based estimate of the overall mutation rate in human. CONCLUSIONS The extended set of rate constants we report may enrich our resources and help advance our understanding of genome dynamics and evolution, with possible implications for the role of spontaneous mutations in the emergence of pathological genotypes and neutral evolution of proteomes.
Collapse
Affiliation(s)
- Aleksandr B Sahakyan
- Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK.
| | - Shankar Balasubramanian
- Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK.
- Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, UK.
- School of Clinical Medicine, University of Cambridge, Cambridge, CB2 0SP, UK.
| |
Collapse
|
13
|
Narang P, Wilson Sayres MA. Variable Autosomal and X Divergence Near and Far from Genes Affects Estimates of Male Mutation Bias in Great Apes. Genome Biol Evol 2016; 8:3393-3405. [PMID: 27702816 PMCID: PMC5203777 DOI: 10.1093/gbe/evw232] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Male mutation bias, when more mutations are passed on via the male germline than via the female germline, is observed across mammals. One common way to infer the magnitude of male mutation bias, α, is to compare levels of neutral sequence divergence between genomic regions that spend different amounts of time in the male and female germline. For great apes, including human, we show that estimates of divergence are reduced in putatively unconstrained regions near genes relative to unconstrained regions far from genes. Divergence increases with increasing distance from genes on both the X chromosome and autosomes, but increases faster on the X chromosome than autosomes. As a result, ratios of X/A divergence increase with increasing distance from genes and corresponding estimates of male mutation bias are significantly higher in intergenic regions near genes versus far from genes. Future studies in other species will need to carefully consider the effect that genomic location will have on estimates of male mutation bias.
Collapse
Affiliation(s)
- Pooja Narang
- School of Life Sciences, Arizona State University, Tempe
| | - Melissa A Wilson Sayres
- School of Life Sciences, Arizona State University, Tempe .,Center for Evolution and Medicine, The Biodesign Institute, Arizona State University, Tempe
| |
Collapse
|
14
|
Kenigsberg E, Yehuda Y, Marjavaara L, Keszthelyi A, Chabes A, Tanay A, Simon I. The mutation spectrum in genomic late replication domains shapes mammalian GC content. Nucleic Acids Res 2016; 44:4222-32. [PMID: 27085808 PMCID: PMC4872117 DOI: 10.1093/nar/gkw268] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Revised: 03/10/2016] [Accepted: 03/30/2016] [Indexed: 11/14/2022] Open
Abstract
Genome sequence compositions and epigenetic organizations are correlated extensively across multiple length scales. Replication dynamics, in particular, is highly correlated with GC content. We combine genome-wide time of replication (ToR) data, topological domains maps and detailed functional epigenetic annotations to study the correlations between replication timing and GC content at multiple scales. We find that the decrease in genomic GC content at large scale late replicating regions can be explained by mutation bias favoring A/T nucleotide, without selection or biased gene conversion. Quantification of the free dNTP pool during the cell cycle is consistent with a mechanism involving replication-coupled mutation spectrum that favors AT nucleotides at late S-phase. We suggest that mammalian GC content composition is shaped by independent forces, globally modulating mutation bias and locally selecting on functional element. Deconvoluting these forces and analyzing them on their native scales is important for proper characterization of complex genomic correlations.
Collapse
Affiliation(s)
- Ephraim Kenigsberg
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel
| | - Yishai Yehuda
- Department of Microbiology and Molecular Genetics, IMRIC, Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel
| | - Lisette Marjavaara
- Department of Medical Biochemistry and Biophysics, Umeå University, Umeå, Sweden
| | - Andrea Keszthelyi
- Department of Medical Biochemistry and Biophysics, Umeå University, Umeå, Sweden
| | - Andrei Chabes
- Department of Medical Biochemistry and Biophysics, Umeå University, Umeå, Sweden
| | - Amos Tanay
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel
| | - Itamar Simon
- Department of Microbiology and Molecular Genetics, IMRIC, Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel
| |
Collapse
|
15
|
Park L. Ancestral alleles in the human genome based on population sequencing data. PLoS One 2015; 10:e0128186. [PMID: 26020928 PMCID: PMC4447449 DOI: 10.1371/journal.pone.0128186] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2015] [Accepted: 04/23/2015] [Indexed: 12/03/2022] Open
Abstract
Ancestral allele information is useful for genetics studies. Previously, the identification of ancestral alleles was primarily based on sequence alignments between species. Alternative ways to identify ancestral alleles were proposed in this study based on population sequencing data. The methods described here utilized the diversity between haplotypes harboring ancestral and newly emerged alleles. Simulations showed that these methods were reliable for identifying ancestral alleles when the variants had not aged too greatly. Application to the human genome sequencing data suggested the role of indels in maintaining the GC content in the human genome. The deletion-to-insertion ratios and GC proportions were correlated depending on the sizes of insertions and deletions in the direction of increasing GC content. There were GC-biased fixations in single base-pair insertions and AT-biased fixations in single base-pair deletions in the results based on the proposed methods. In the current study, GC-biased gene conversions in nucleotide substitutions were very slight or insignificant. In the variants of several quantitative trait loci (QTLs), slight GC-biased gene conversion was observed in nucleotide substitutions. For the QTL indels, insertions were observed more often than deletions, and deletion-biased fixation was observed, providing new insights into the evolution of functional genes.
Collapse
Affiliation(s)
- Leeyoung Park
- Natural Science Research Institute, Yonsei University, Seoul, Korea
| |
Collapse
|
16
|
Francioli LC, Polak PP, Koren A, Menelaou A, Chun S, Renkens I, van Duijn CM, Swertz M, Wijmenga C, van Ommen G, Slagboom PE, Boomsma DI, Ye K, Guryev V, Arndt PF, Kloosterman WP, de Bakker PIW, Sunyaev SR. Genome-wide patterns and properties of de novo mutations in humans. Nat Genet 2015; 47:822-826. [PMID: 25985141 PMCID: PMC4485564 DOI: 10.1038/ng.3292] [Citation(s) in RCA: 252] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2014] [Accepted: 04/07/2015] [Indexed: 12/12/2022]
Abstract
Mutations create variation in the population, fuel evolution, and cause genetic diseases. Current knowledge about de novo mutations is incomplete and mostly indirect 1–10. Here, we analyze 11,020 de novo mutations from whole-genomes of 250 families. We show that de novo mutations in offspring of older fathers are not only more numerous 11–13 but also occur more frequently in early-replicating, genic regions. Functional regions exhibit higher mutation rates due to CpG dinucleotides and reveal signatures of transcription-coupled repair, while mutation clusters with a unique signature point to a novel mutational mechanism. Mutation and recombination rates independently associate with nucleotide diversity, and regional variation in human-chimpanzee divergence is only partly explained by mutation rate heterogeneity. Finally, we provide a genome-wide mutation rate map for medical and population genetics applications. Our results reveal novel insights and refine long-standing hypotheses about human mutagenesis.
Collapse
Affiliation(s)
- Laurent C Francioli
- Department of Medical Genetics, Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Paz P Polak
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Amnon Koren
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Androniki Menelaou
- Department of Medical Genetics, Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Sung Chun
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Ivo Renkens
- Department of Medical Genetics, Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands
| | | | | | - Morris Swertz
- University of Groningen, University Medical Center Groningen, Department of Genetics, Groningen, The Netherlands.,University of Groningen, University Medical Center Groningen, Genomics Coordination Center, Groningen, The Netherlands
| | - Cisca Wijmenga
- University of Groningen, University Medical Center Groningen, Department of Genetics, Groningen, The Netherlands.,University of Groningen, University Medical Center Groningen, Genomics Coordination Center, Groningen, The Netherlands
| | - Gertjan van Ommen
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - P Eline Slagboom
- Section of Molecular Epidemiology, Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, The Netherlands
| | - Dorret I Boomsma
- Department of Biological Psychology, VU University Amsterdam, Amsterdam, The Netherlands
| | - Kai Ye
- Section of Molecular Epidemiology, Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, The Netherlands.,The Genome Institute, Washington University, St. Louis, MO, USA
| | - Victor Guryev
- European Research Institute for the Biology of Ageing, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Peter F Arndt
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Wigard P Kloosterman
- Department of Medical Genetics, Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Paul I W de Bakker
- Department of Medical Genetics, Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands.,Department of Epidemiology, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Shamil R Sunyaev
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
17
|
Liu X, Chu Z. Genome-wide evolutionary characterization and analysis of bZIP transcription factors and their expression profiles in response to multiple abiotic stresses in Brachypodium distachyon. BMC Genomics 2015. [PMID: 25887221 DOI: 10.1186/s12864-015-1457-1459] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/26/2023] Open
Abstract
BACKGROUND Plant basic leucine zipper (bZIP) transcription factors are one of the largest and most diverse gene families and play key roles in regulating diverse stress processes. Brachypodium distachyon is emerging as a widely recognized model plant for the temperate grass family and the herbaceous energy crops, however there is no comprehensive analysis of bZIPs in B. distachyon, especially those involved in stress tolerances. RESULTS In this study, 96 bZIP genes (BdbZIPs) were identified distributing unevenly on each chromosome of B. distachyon, and most of them were scattered in the low CpG content regions. Gene duplications were widespread throughout B. distachyon genome. Evolutionary comparisons suggested B. distachyon and rice's bZIPs had the similar evolutionary patterns. The exon splicing in BdbZIP motifs were more complex and diverse than those in other plant species. We further revealed the potential close relationships between BdbZIP gene expressions and items including gene structure, exon splicing pattern and dimerization features. In addition, multiple stresses expression profile demonstrated that BdbZIPs exhibited significant expression patterns responding to 14 stresses, and those responding to heavy metal treatments showed opposite expression pattern comparing to the treatments of environmental factors and phytohormones. We also screened certain up- and down-regulated BdbZIP genes with fold changes ≥2, which were more sensitive to abiotic stress conditions. CONCLUSIONS BdbZIP genes behaved diverse functional characters and showed discrepant and some regular expression patterns in response to abiotic stresses. Comprehensive analysis indicated these BdbZIPs' expressions were associated not only with gene structure, exon splicing pattern and dimerization feature, but also with abiotic stress treatments. It is possible that our findings are crucial for revealing the potentialities of utilizing these candidate BdbZIPs to improve productivity of grass plants and cereal crops.
Collapse
Affiliation(s)
- Xiang Liu
- Shanghai Chenshan Plant Science Research Center, Shanghai Chenshan Botanical Garden, Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 3888 Chenhua Road, 201602, Shanghai, Songjiang, China.
| | - Zhaoqing Chu
- Shanghai Chenshan Plant Science Research Center, Shanghai Chenshan Botanical Garden, Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 3888 Chenhua Road, 201602, Shanghai, Songjiang, China.
| |
Collapse
|
18
|
Genome-wide evolutionary characterization and analysis of bZIP transcription factors and their expression profiles in response to multiple abiotic stresses in Brachypodium distachyon. BMC Genomics 2015; 16:227. [PMID: 25887221 PMCID: PMC4393604 DOI: 10.1186/s12864-015-1457-9] [Citation(s) in RCA: 71] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2014] [Accepted: 03/09/2015] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND Plant basic leucine zipper (bZIP) transcription factors are one of the largest and most diverse gene families and play key roles in regulating diverse stress processes. Brachypodium distachyon is emerging as a widely recognized model plant for the temperate grass family and the herbaceous energy crops, however there is no comprehensive analysis of bZIPs in B. distachyon, especially those involved in stress tolerances. RESULTS In this study, 96 bZIP genes (BdbZIPs) were identified distributing unevenly on each chromosome of B. distachyon, and most of them were scattered in the low CpG content regions. Gene duplications were widespread throughout B. distachyon genome. Evolutionary comparisons suggested B. distachyon and rice's bZIPs had the similar evolutionary patterns. The exon splicing in BdbZIP motifs were more complex and diverse than those in other plant species. We further revealed the potential close relationships between BdbZIP gene expressions and items including gene structure, exon splicing pattern and dimerization features. In addition, multiple stresses expression profile demonstrated that BdbZIPs exhibited significant expression patterns responding to 14 stresses, and those responding to heavy metal treatments showed opposite expression pattern comparing to the treatments of environmental factors and phytohormones. We also screened certain up- and down-regulated BdbZIP genes with fold changes ≥2, which were more sensitive to abiotic stress conditions. CONCLUSIONS BdbZIP genes behaved diverse functional characters and showed discrepant and some regular expression patterns in response to abiotic stresses. Comprehensive analysis indicated these BdbZIPs' expressions were associated not only with gene structure, exon splicing pattern and dimerization feature, but also with abiotic stress treatments. It is possible that our findings are crucial for revealing the potentialities of utilizing these candidate BdbZIPs to improve productivity of grass plants and cereal crops.
Collapse
|
19
|
Watson CT, Steinberg KM, Graves TA, Warren RL, Malig M, Schein J, Wilson RK, Holt RA, Eichler EE, Breden F. Sequencing of the human IG light chain loci from a hydatidiform mole BAC library reveals locus-specific signatures of genetic diversity. Genes Immun 2015; 16:24-34. [PMID: 25338678 PMCID: PMC4304971 DOI: 10.1038/gene.2014.56] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2014] [Revised: 09/03/2014] [Accepted: 09/03/2014] [Indexed: 12/24/2022]
Abstract
Germline variation at immunoglobulin (IG) loci is critical for pathogen-mediated immunity, but establishing complete haplotype sequences in these regions has been problematic because of complex sequence architecture and diploid source DNA. We sequenced BAC clones from the effectively haploid human hydatidiform mole cell line, CHM1htert, across the light chain IG loci, kappa (IGK) and lambda (IGL), creating single haplotype representations of these regions. The IGL haplotype generated here is 1.25 Mb of contiguous sequence, including four novel IGLV alleles, one novel IGLC allele, and an 11.9-kb insertion. The CH17 IGK haplotype consists of two 644 kb proximal and 466 kb distal contigs separated by a large gap of unknown size; these assemblies added 49 kb of unique sequence extending into this gap. Our analysis also resulted in the characterization of seven novel IGKV alleles and a 16.7-kb region exhibiting signatures of interlocus sequence exchange between distal and proximal IGKV gene clusters. Genetic diversity in IGK/IGL was compared with that of the IG heavy chain (IGH) locus within the same haploid genome, revealing threefold (IGK) and sixfold (IGL) higher diversity in the IGH locus, potentially associated with increased levels of segmental duplication and the telomeric location of IGH.
Collapse
Affiliation(s)
- C T Watson
- Department of Biological Sciences, Simon Fraser University, Burnaby, British Columbia, Canada
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY USA
| | - K M Steinberg
- Department of Genome Sciences, University of Washington, Seattle, WA USA
- The Genome Institute, Washington University, St Louis, MO USA
| | - T A Graves
- The Genome Institute, Washington University, St Louis, MO USA
| | - R L Warren
- Genome Sciences Centre, BC Cancer Agency, Vancouver, British Columbia Canada
| | - M Malig
- Department of Genome Sciences, University of Washington, Seattle, WA USA
| | - J Schein
- Genome Sciences Centre, BC Cancer Agency, Vancouver, British Columbia Canada
| | - R K Wilson
- The Genome Institute, Washington University, St Louis, MO USA
| | - R A Holt
- Genome Sciences Centre, BC Cancer Agency, Vancouver, British Columbia Canada
| | - E E Eichler
- Department of Genome Sciences, University of Washington, Seattle, WA USA
- Howard Hughes Medical Institute, Seattle, WA USA
| | - F Breden
- Department of Biological Sciences, Simon Fraser University, Burnaby, British Columbia, Canada
| |
Collapse
|
20
|
Provata A, Nicolis C, Nicolis G. DNA viewed as an out-of-equilibrium structure. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2014; 89:052105. [PMID: 25353737 DOI: 10.1103/physreve.89.052105] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/14/2013] [Indexed: 05/02/2023]
Abstract
The complexity of the primary structure of human DNA is explored using methods from nonequilibrium statistical mechanics, dynamical systems theory, and information theory. A collection of statistical analyses is performed on the DNA data and the results are compared with sequences derived from different stochastic processes. The use of χ^{2} tests shows that DNA can not be described as a low order Markov chain of order up to r=6. Although detailed balance seems to hold at the level of a binary alphabet, it fails when all four base pairs are considered, suggesting spatial asymmetry and irreversibility. Furthermore, the block entropy does not increase linearly with the block size, reflecting the long-range nature of the correlations in the human genomic sequences. To probe locally the spatial structure of the chain, we study the exit distances from a specific symbol, the distribution of recurrence distances, and the Hurst exponent, all of which show power law tails and long-range characteristics. These results suggest that human DNA can be viewed as a nonequilibrium structure maintained in its state through interactions with a constantly changing environment. Based solely on the exit distance distribution accounting for the nonequilibrium statistics and using the Monte Carlo rejection sampling method, we construct a model DNA sequence. This method allows us to keep both long- and short-range statistical characteristics of the native DNA data. The model sequence presents the same characteristic exponents as the natural DNA but fails to capture spatial correlations and point-to-point details.
Collapse
Affiliation(s)
- A Provata
- Institute of Nanoscience and Nanotechnology, National Center for Scientific Research "Demokritos", 15310 Athens, Greece and Interdisciplinary Center for Nonlinear Phenomena and Complex Systems, Université Libre de Bruxelles, Campus Plaine, CP. 231, 1050 Bruxelles, Belgium
| | - C Nicolis
- Institut Royal Météorologique de Belgique, 3 Avenue Circulaire, 1180 Bruxelles, Belgium
| | - G Nicolis
- Interdisciplinary Center for Nonlinear Phenomena and Complex Systems, Université Libre de Bruxelles, Campus Plaine, CP. 231, 1050 Bruxelles, Belgium
| |
Collapse
|
21
|
Schaibley VM, Zawistowski M, Wegmann D, Ehm MG, Nelson MR, St. Jean PL, Abecasis GR, Novembre J, Zöllner S, Li JZ. The influence of genomic context on mutation patterns in the human genome inferred from rare variants. Genome Res 2013; 23:1974-84. [PMID: 23990608 PMCID: PMC3847768 DOI: 10.1101/gr.154971.113] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2013] [Accepted: 08/19/2013] [Indexed: 01/22/2023]
Abstract
Understanding patterns of spontaneous mutations is of fundamental interest in studies of human genome evolution and genetic disease. Here, we used extremely rare variants in humans to model the molecular spectrum of single-nucleotide mutations. Compared to common variants in humans and human-chimpanzee fixed differences (substitutions), rare variants, on average, arose more recently in the human lineage and are less affected by the potentially confounding effects of natural selection, population demographic history, and biased gene conversion. We analyzed variants obtained from a population-based sequencing study of 202 genes in >14,000 individuals. We observed considerable variability in the per-gene mutation rate, which was correlated with local GC content, but not recombination rate. Using >20,000 variants with a derived allele frequency ≤ 10(-4), we examined the effect of local GC content and recombination rate on individual variant subtypes and performed comparisons with common variants and substitutions. The influence of local GC content on rare variants differed from that on common variants or substitutions, and the differences varied by variant subtype. Furthermore, recombination rate and recombination hotspots have little effect on rare variants of any subtype, yet both have a relatively strong impact on multiple variant subtypes in common variants and substitutions. This observation is consistent with the effect of biased gene conversion or selection-dependent processes. Our results highlight the distinct biases inherent in the initial mutation patterns and subsequent evolutionary processes that affect segregating variants.
Collapse
Affiliation(s)
- Valerie M. Schaibley
- Department of Human Genetics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Matthew Zawistowski
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48019, USA
| | - Daniel Wegmann
- School of Life Sciences, École Polytechnique Fédérale de Lausanne, Switzerland
| | - Margaret G. Ehm
- Department of Quantitative Sciences, GlaxoSmithKline (GSK), Research Triangle Park, North Carolina 27709, USA
| | - Matthew R. Nelson
- Department of Quantitative Sciences, GlaxoSmithKline (GSK), Research Triangle Park, North Carolina 27709, USA
| | - Pamela L. St. Jean
- Department of Quantitative Sciences, GlaxoSmithKline (GSK), Research Triangle Park, North Carolina 27709, USA
| | - Gonçalo R. Abecasis
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48019, USA
| | - John Novembre
- Department of Ecology and Evolutionary Biology, University of California Los Angeles, Los Angeles, California 90095, USA
| | - Sebastian Zöllner
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48019, USA
- Department of Psychiatry, University of Michigan, Ann Arbor, Michigan 48019, USA
| | - Jun Z. Li
- Department of Human Genetics, University of Michigan, Ann Arbor, Michigan 48109, USA
| |
Collapse
|
22
|
Sverdlov ED, Mineev K. Mutation rate in stem cells: an underestimated barrier on the way to therapy. Trends Mol Med 2013; 19:273-80. [PMID: 23481596 DOI: 10.1016/j.molmed.2013.01.004] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2012] [Revised: 01/15/2013] [Accepted: 01/24/2013] [Indexed: 12/23/2022]
Abstract
Stem cells (SCs) are thought to have great therapeutic potential, but due to continuously and stochastically arising new mutations that unpredictably change the composition of a cell population, the large-scale manufacturing of SCs with uniform properties and predictable behavior is a challenge. Quantitative evaluation of the characteristic mutation rate of a given stem cell line could be an important criterion in making the decision to use the line in medical practice. Such an evaluation could provide a new quality standard for newly derived human embryonic stem cell (hESC) lines prior to depositing them in stem cell banks. Here, we substantiate this view with simple calculations showing the effect of the mutation rate on changes in the cell population composition due to amplification. Selection of SCs with low mutation rate could reduce the risk of negative side effects during treatment.
Collapse
Affiliation(s)
- Eugene D Sverdlov
- Institute of Molecular Genetics, Russian Academy of Sciences, 2 Kurchatov Sq., Moscow, 123182, Russia.
| | | |
Collapse
|
23
|
Lartillot N. Phylogenetic patterns of GC-biased gene conversion in placental mammals and the evolutionary dynamics of recombination landscapes. Mol Biol Evol 2012; 30:489-502. [PMID: 23079417 DOI: 10.1093/molbev/mss239] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
GC-biased gene conversion (gBGC) is a major evolutionary force shaping genomic nucleotide landscapes, distorting the estimation of the strength of selection, and having potentially deleterious effects on genome-wide fitness. Yet, a global quantitative picture, at large evolutionary scale, of the relative strength of gBGC compared with selection and random drift is still lacking. Furthermore, owing to its dependence on the local recombination rate, gBGC results in modulations of the substitution patterns along genomes and across time which, if correctly interpreted, may yield quantitative insights into the long-term evolutionary dynamics of recombination landscapes. Deriving a model of the substitution process at putatively neutral nucleotide positions from population-genetics arguments, and accounting for among-lineage and among-gene effects, we propose a reconstruction of the variation in gBGC intensity at the scale of placental mammals, and of its scaling with body-size and karyotypic traits. Our results are compatible with a simple population genetics model relating gBGC to effective population size and recombination rate. In addition, among-gene variation and phylogenetic patterns of exon-specific levels of gBGC reveal the presence of rugged recombination landscapes, and suggest that short-lived recombination hot-spots are a general feature of placentals. Across placental mammals, variation in gBGC strength spans two orders of magnitude, at its lowest in apes, strongest in lagomorphs, microbats or tenrecs, and near or above the nearly neutral threshold in most other lineages. Combined with among-gene variation, such high levels of biased gene conversion are likely to significantly impact midly selected positions, and to represent a substantial mutation load. Altogether, our analysis suggests a more important role of gBGC in placental genome evolution, compared with what could have been anticipated from studies conducted in anthropoid primates.
Collapse
Affiliation(s)
- Nicolas Lartillot
- Centre Robert-Cedergren pour la Bioinformatique, Département de Biochimie, Université de Montréal, Québec, Canada.
| |
Collapse
|
24
|
Lartillot N. Interaction between selection and biased gene conversion in mammalian protein-coding sequence evolution revealed by a phylogenetic covariance analysis. Mol Biol Evol 2012; 30:356-68. [PMID: 23024185 DOI: 10.1093/molbev/mss231] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
According to the nearly-neutral model, variation in long-term effective population size among species should result in correlated variation in the ratio of nonsynonymous over synonymous substitution rates (dN/dS). Previous empirical investigations in mammals have been consistent with this prediction, suggesting an important role for nearly-neutral effects on protein-coding sequence evolution. GC-biased gene conversion (gBGC), on the other hand, is increasingly recognized as a major evolutionary force shaping genome nucleotide composition. When sufficiently strong compared with random drift, gBGC may significantly interfere with a nearly-neutral regime and impact dN/dS in a complex manner. Here, we investigate the phylogenetic correlations between dN/dS, the equilibrium GC composition (GC*), and several life-history and karyotypic traits in placental mammals. We show that the equilibrium GC composition decreases with body mass and increases with the number of chromosomes, suggesting a modulation of the strength of biased gene conversion due to changes in effective population size and genome-wide recombination rate. The variation in dN/dS is complex and only partially fits the prediction of the nearly-neutral theory. However, specifically restricting estimation of the dN/dS ratio on GC-conservative transversions, which are immune from gBGC, results in correlations that are more compatible with a nearly-neutral interpretation. Our investigation indicates the presence of complex interactions between selection and biased gene conversion and suggests that further mechanistic development is warranted, to tease out mutation, selection, drift, and conversion.
Collapse
Affiliation(s)
- Nicolas Lartillot
- Centre Robert-Cedergren pour la Bioinformatique, Département de Biochimie, Université de Montréal, Québec, Canada.
| |
Collapse
|
25
|
Carrigan MA, Uryasev O, Davis RP, Zhai L, Hurley TD, Benner SA. The natural history of class I primate alcohol dehydrogenases includes gene duplication, gene loss, and gene conversion. PLoS One 2012; 7:e41175. [PMID: 22859968 PMCID: PMC3409193 DOI: 10.1371/journal.pone.0041175] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2012] [Accepted: 06/18/2012] [Indexed: 01/29/2023] Open
Abstract
BACKGROUND Gene duplication is a source of molecular innovation throughout evolution. However, even with massive amounts of genome sequence data, correlating gene duplication with speciation and other events in natural history can be difficult. This is especially true in its most interesting cases, where rapid and multiple duplications are likely to reflect adaptation to rapidly changing environments and life styles. This may be so for Class I of alcohol dehydrogenases (ADH1s), where multiple duplications occurred in primate lineages in Old and New World monkeys (OWMs and NWMs) and hominoids. METHODOLOGY/PRINCIPAL FINDINGS To build a preferred model for the natural history of ADH1s, we determined the sequences of nine new ADH1 genes, finding for the first time multiple paralogs in various prosimians (lemurs, strepsirhines). Database mining then identified novel ADH1 paralogs in both macaque (an OWM) and marmoset (a NWM). These were used with the previously identified human paralogs to resolve controversies relating to dates of duplication and gene conversion in the ADH1 family. Central to these controversies are differences in the topologies of trees generated from exonic (coding) sequences and intronic sequences. CONCLUSIONS/SIGNIFICANCE We provide evidence that gene conversions are the primary source of difference, using molecular clock dating of duplications and analyses of microinsertions and deletions (micro-indels). The tree topology inferred from intron sequences appear to more correctly represent the natural history of ADH1s, with the ADH1 paralogs in platyrrhines (NWMs) and catarrhines (OWMs and hominoids) having arisen by duplications shortly predating the divergence of OWMs and NWMs. We also conclude that paralogs in lemurs arose independently. Finally, we identify errors in database interpretation as the source of controversies concerning gene conversion. These analyses provide a model for the natural history of ADH1s that posits four ADH1 paralogs in the ancestor of Catarrhine and Platyrrhine primates, followed by the loss of an ADH1 paralog in the human lineage.
Collapse
Affiliation(s)
- Matthew A Carrigan
- Foundation for Applied Molecular Evolution, Gainesville, Florida, United States of America.
| | | | | | | | | | | |
Collapse
|
26
|
Voelker RB, Erkelenz S, Reynoso V, Schaal H, Berglund JA. Frequent gain and loss of intronic splicing regulatory elements during the evolution of vertebrates. Genome Biol Evol 2012; 4:659-74. [PMID: 22619362 PMCID: PMC3606033 DOI: 10.1093/gbe/evs051] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Splicing regulatory elements (SREs) are sequences bound by proteins that influence splicing of nearby splice sites. Constitutively spliced introns have evolved to utilize many different splicing factors. The evolutionary processes that influenced which splicing factors are used for splicing of individual introns are generally unclear. We demonstrate that in the lineage that gave rise to mammals, many introns lost U-rich sequences and gained G-rich sequences, both of which resemble known SREs. The apparent conversion of U-rich to G-rich SREs suggests that the associated splicing factors are functionally equivalent. In support of this we demonstrated that U-rich and G-rich SREs are both capable of promoting splicing of an SRE-dependent splicing reporter. Furthermore, we demonstrate, using the heterologous MS2 tethering system (bacterial MS2 coat fusion-protein and its RNA stem-loop binding site), that both the U-rich SRE-binding protein (TIA1) and the G-rich SRE-binding protein (HNRNPF) can promote splicing of the same intron. We also observed that gain of G-rich SREs is significantly associated with G/C-rich genomic isochores, suggesting that gain or loss of SREs was driven by the same processes that ultimately resulted in the formation of mammalian genomic isochores. We propose the following model for the gain and loss of mammalian SREs. Ancestral U-rich SREs located in genomic regions that were experiencing high rates of A/T to G/C conversion would have suffered frequent deleterious mutations. However, this same process resulted in increased formation of functionally equivalent G-rich SREs, and acquisition of new G-rich SREs decreased purifying selection on the U-rich SREs, which were then free to decay.
Collapse
Affiliation(s)
- Rodger B Voelker
- Institute of Molecular Biology, Department of Chemistry, University of Oregon, OR, USA
| | | | | | | | | |
Collapse
|
27
|
Ma X, Rogacheva MV, Nishant KT, Zanders S, Bustamante CD, Alani E. Mutation hot spots in yeast caused by long-range clustering of homopolymeric sequences. Cell Rep 2012; 1:36-42. [PMID: 22832106 DOI: 10.1016/j.celrep.2011.10.003] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2011] [Revised: 09/29/2011] [Accepted: 10/21/2011] [Indexed: 11/18/2022] Open
Abstract
Evolutionary theory assumes that mutations occur randomly in the genome; however, studies performed in a variety of organisms indicate the existence of context-dependent mutation biases. Sources of mutagenesis variation across large genomic contexts (e.g., hundreds of bases) have not been identified. Here, we use high-coverage whole-genome sequencing of a conditional mismatch repair mutant line of diploid yeast to identify mutations that accumulated after 160 generations of growth. The vast majority of the mutations accumulated as insertion/deletions (in/dels) in homopolymeric [poly(dA:dT)] and repetitive DNA tracts. Surprisingly, the likelihood of an in/del mutation in a given poly(dA:dT) tract is increased by the presence of nearby poly(dA:dT) tracts in up to a 1,000 bp region centered on the given tract. Our work suggests that specific mutation hot spots can contribute disproportionately to the genetic variation that is introduced into populations and provides long-range genomic sequence context that contributes to mutagenesis.
Collapse
Affiliation(s)
- Xin Ma
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14853, USA
| | | | | | | | | | | |
Collapse
|
28
|
Kelkar YD, Eckert KA, Chiaromonte F, Makova KD. A matter of life or death: how microsatellites emerge in and vanish from the human genome. Genome Res 2011; 21:2038-48. [PMID: 21994250 DOI: 10.1101/gr.122937.111] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Microsatellites--tandem repeats of short DNA motifs--are abundant in the human genome and have high mutation rates. While microsatellite instability is implicated in numerous genetic diseases, the molecular processes involved in their emergence and disappearance are still not well understood. Microsatellites are hypothesized to follow a life cycle, wherein they are born and expand into adulthood, until their degradation and death. Here we identified microsatellite births/deaths in human, chimpanzee, and orangutan genomes, using macaque and marmoset as outgroups. We inferred mutations causing births/deaths based on parsimony, and investigated local genomic environments affecting them. We also studied birth/death patterns within transposable elements (Alus and L1s), coding regions, and disease-associated loci. We observed that substitutions were the predominant cause for births of short microsatellites, while insertions and deletions were important for births of longer microsatellites. Substitutions were the cause for deaths of microsatellites of virtually all lengths. AT-rich L1 sequences exhibited elevated frequency of births/deaths over their entire length, while GC-rich Alus only in their 3' poly(A) tails and middle A-stretches, with differences depending on transposable element integration timing. Births/deaths were strongly selected against in coding regions. Births/deaths occurred in genomic regions with high substitution rates, protomicrosatellite content, and L1 density, but low GC content and Alu density. The majority of the 17 disease-associated microsatellites examined are evolutionarily ancient (were acquired by the common ancestor of simians). Our genome-wide investigation of microsatellite life cycle has fundamental applications for predicting the susceptibility of birth/death of microsatellites, including many disease-causing loci.
Collapse
Affiliation(s)
- Yogeshwar D Kelkar
- Department of Biology, Penn State University, University Park, Pennsylvania 16802, USA
| | | | | | | |
Collapse
|
29
|
Mugal CF, Ellegren H. Substitution rate variation at human CpG sites correlates with non-CpG divergence, methylation level and GC content. Genome Biol 2011; 12:R58. [PMID: 21696599 PMCID: PMC3218846 DOI: 10.1186/gb-2011-12-6-r58] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2011] [Revised: 05/04/2011] [Accepted: 06/22/2011] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND A major goal in the study of molecular evolution is to unravel the mechanisms that induce variation in the germ line mutation rate and in the genome-wide mutation profile. The rate of germ line mutation is considerably higher for cytosines at CpG sites than for any other nucleotide in the human genome, an increase commonly attributed to cytosine methylation at CpG sites. The CpG mutation rate, however, is not uniform across the genome and, as methylation levels have recently been shown to vary throughout the genome, it has been hypothesized that methylation status may govern variation in the rate of CpG mutation. RESULTS Here, we use genome-wide methylation data from human sperm cells to investigate the impact of DNA methylation on the CpG substitution rate in introns of human genes. We find that there is a significant correlation between the extent of methylation and the substitution rate at CpG sites. Further, we show that the CpG substitution rate is positively correlated with non-CpG divergence, suggesting susceptibility to factors responsible for the general mutation rate in the genome, and negatively correlated with GC content. We only observe a minor contribution of gene expression level, while recombination rate appears to have no significant effect. CONCLUSIONS Our study provides the first direct empirical support for the hypothesis that variation in the level of germ line methylation contributes to substitution rate variation at CpG sites. Moreover, we show that other genomic features also impact on CpG substitution rate variation.
Collapse
Affiliation(s)
- Carina F Mugal
- Department of Evolutionary Biology, Uppsala University, Norbyvägen 18D, Uppsala, Sweden
| | | |
Collapse
|
30
|
Ying H, Huttley G. Exploiting CpG hypermutability to identify phenotypically significant variation within human protein-coding genes. Genome Biol Evol 2011; 3:938-49. [PMID: 21398426 PMCID: PMC3184784 DOI: 10.1093/gbe/evr021] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
The CpG dinucleotide is disproportionately represented in human genetic variation due to the hypermutability of 5-methyl-cytosine (5mC). We exploit this hypermutability and a novel codon substitution model to identify candidate functionally important exonic nucleotides. Population genetic theory suggests that codon positions with high cross-species CpG frequency will derive from stronger purifying selection. Using the phylogeny-based maximum likelihood inference framework, we applied codon substitution models with context-dependent parameters to measure the mutagenic and selective processes affecting CpG dinucleotides within exonic sequence. The suitability of these models was validated on >2,000 protein coding genes from a naturally occurring biological control, four yeast species that do not methylate their DNA. As expected, our analyses of yeast revealed no evidence for an elevated CpG transition rate or for substitution suppression affecting CpG-containing codons. Our analyses of >12,000 protein-coding genes from four primate lineages confirm the systemic influence of 5mC hypermutability on the divergence of these genes. After adjusting for confounding influences of mutation and the properties of the encoded amino acids, we confirmed that CpG-containing codons are under greater purifying selection in primates. Genes with significant evidence of enhanced suppression of nonsynonymous CpG changes were also shown to be significantly enriched in Online Mendelian Inheritance in Man. We developed a method for ranking candidate phenotypically influential CpG positions in human genes. Application of this method indicates that of the ∼1 million exonic CpG dinucleotides within humans, ∼20% are strong candidates for both hypermutability and disease association.
Collapse
Affiliation(s)
- Hua Ying
- Department of Genome Biology, John Curtin School of Medical Research, The Australian National University, Canberra, ACT 0200, Australia
| | | |
Collapse
|
31
|
Clément Y, Arndt PF. Substitution patterns are under different influences in primates and rodents. Genome Biol Evol 2011; 3:236-45. [PMID: 21339508 PMCID: PMC3068003 DOI: 10.1093/gbe/evr011] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
There are large-scale variations of the GC-content along mammalian chromosomes that have been called isochore structures. Primates and rodents have different isochore structures, which suggests that these lineages exhibit different modes of GC-content evolution. It has been shown that, in the human lineage, GC-biased gene conversion (gBGC), a neutral process associated with meiotic recombination, acts on GC-content evolution by influencing A or T to G or C substitution rates. We computed genome-wide substitution patterns in the mouse lineage from multiple alignments and compared them with substitution patterns in the human lineage. We found that in the mouse lineage, gBGC is active but weaker than in the human lineage and that male-specific recombination better predicts GC-content evolution than female-specific recombination. Furthermore, we were able to show that G or C to A or T substitution rates are predicted by a combination of different factors in both lineages. A or T to G or C substitution rates are most strongly predicted by meiotic recombination in the human lineage but by CpG odds ratio (the observed CpG frequency normalized by the expected CpG frequency) in the mouse lineage, suggesting that substitution patterns are under different influences in primates and rodents.
Collapse
Affiliation(s)
- Yves Clément
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany.
| | | |
Collapse
|
32
|
Bhati J, Sonah H, Jhang T, Singh NK, Sharma TR. Comparative Analysis and EST Mining Reveals High Degree of Conservation among Five Brassicaceae Species. Comp Funct Genomics 2010; 2010:520238. [PMID: 20886055 PMCID: PMC2945637 DOI: 10.1155/2010/520238] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2010] [Revised: 05/21/2010] [Accepted: 07/11/2010] [Indexed: 11/23/2022] Open
Abstract
Brassicaceae is an important family of the plant kingdom which includes several plants of major economic importance. The Brassica spp. and Arabidopsis share much-conserved colinearity between their genomes which can be exploited for the genomic research in Brassicaceae crops. In this study, 131,286 ESTs of five Brassicaceae species were assembled into unigene contigs and compared with Arabidopsis gene indices. Almost all the unigenes of Brassicaceae species showed high similarities with Arabidopsis genes except those of B. napus, where 90% of unigenes were found similar. A total of 9,699 SSRs were identified in the unigenes. PCR primers were designed based on this information and amplified across species for validation. Functional annotation of unigenes showed that the majority of the genes are present in metabolism and energy functional classes. It is expected that comparative genome analysis between Arabidopsis and related crop species will expedite research in the more complex Brassica genomes. This would be helpful for genomics as well as evolutionary studies, and DNA markers developed can be used for mapping, tagging, and cloning of important genes in Brassicaceae.
Collapse
Affiliation(s)
- Jyotika Bhati
- Genoinformatics Laboratory, National Research Centre on Plant Biotechnology, Pusa Campus (IARI), New Delhi 110012, India
| | - Humira Sonah
- Genoinformatics Laboratory, National Research Centre on Plant Biotechnology, Pusa Campus (IARI), New Delhi 110012, India
| | - Tripta Jhang
- Genoinformatics Laboratory, National Research Centre on Plant Biotechnology, Pusa Campus (IARI), New Delhi 110012, India
| | - Nagender Kumar Singh
- Genoinformatics Laboratory, National Research Centre on Plant Biotechnology, Pusa Campus (IARI), New Delhi 110012, India
| | - Tilak Raj Sharma
- Genoinformatics Laboratory, National Research Centre on Plant Biotechnology, Pusa Campus (IARI), New Delhi 110012, India
| |
Collapse
|
33
|
Lin JY, Stupar RM, Hans C, Hyten DL, Jackson SA. Structural and functional divergence of a 1-Mb duplicated region in the soybean (Glycine max) genome and comparison to an orthologous region from Phaseolus vulgaris. THE PLANT CELL 2010; 22:2545-61. [PMID: 20729383 PMCID: PMC2947175 DOI: 10.1105/tpc.110.074229] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/22/2010] [Revised: 07/21/2010] [Accepted: 07/30/2010] [Indexed: 05/03/2023]
Abstract
Soybean (Glycine max) has undergone at least two rounds of polyploidization, resulting in a paleopolyploid genome that is a mosaic of homoeologous regions. To determine the structural and functional impact of these duplications, we sequenced two ~1-Mb homoeologous regions of soybean, Gm8 and Gm15, derived from the most recent ~13 million year duplication event and the orthologous region from common bean (Phaseolus vulgaris), Pv5. We observed inversions leading to major structural variation and a bias between the two chromosome segments as Gm15 experienced more gene movement (gene retention rate of 81% in Gm15 versus 91% in Gm8) and a nearly twofold increase in the deletion of long terminal repeat (LTR) retrotransposons via solo LTR formation. Functional analyses of Gm15 and Gm8 revealed decreases in gene expression and synonymous substitution rates for Gm15, for instance, a 38% increase in transcript levels from Gm8 relative to Gm15. Transcriptional divergence of homoeologs was found based on expression patterns among seven tissues and developmental stages. Our results indicate asymmetric evolution between homoeologous regions of soybean as evidenced by structural changes and expression variances of homoeologous genes.
Collapse
Affiliation(s)
- Jer-Young Lin
- Molecular and Evolutionary Genetics, Purdue University, West Lafayette, Indiana 47907
| | - Robert M. Stupar
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, Minnesota 55108
| | - Christian Hans
- Molecular and Evolutionary Genetics, Purdue University, West Lafayette, Indiana 47907
| | - David L. Hyten
- Soybean Genomics and Improvement Lab, U.S. Department of Agriculture–Agricultural Research Service, Beltsville, Maryland 20705
| | - Scott A. Jackson
- Molecular and Evolutionary Genetics, Purdue University, West Lafayette, Indiana 47907
| |
Collapse
|
34
|
Detection of heterozygous mutations in the genome of mismatch repair defective diploid yeast using a Bayesian approach. Genetics 2010; 186:493-503. [PMID: 20660644 DOI: 10.1534/genetics.110.120105] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
DNA replication errors that escape polymerase proofreading and mismatch repair (MMR) can lead to base substitution and frameshift mutations. Such mutations can disrupt gene function, reduce fitness, and promote diseases such as cancer and are also the raw material of molecular evolution. To analyze with limited bias genomic features associated with DNA polymerase errors, we performed a genome-wide analysis of mutations that accumulate in MMR-deficient diploid lines of Saccharomyces cerevisiae. These lines were derived from a common ancestor and were grown for 160 generations, with bottlenecks reducing the population to one cell every 20 generations. We sequenced to between 8- and 20-fold coverage one wild-type and three mutator lines using Illumina Solexa 36-bp reads. Using an experimentally aware Bayesian genotype caller developed to pool experimental data across sequencing runs for all strains, we detected 28 heterozygous single-nucleotide polymorphisms (SNPs) and 48 single-nt insertion/deletions (indels) from the data set. This method was evaluated on simulated data sets and found to have a very low false-positive rate (∼6 × 10(-5)) and a false-negative rate of 0.08 within the unique mapping regions of the genome that contained at least sevenfold coverage. The heterozygous mutations identified by the Bayesian genotype caller were confirmed by Sanger sequencing. All of the mutations were unique to a given line, except for a single-nt deletion mutation which occurred independently in two lines. All 48 indels, composed of 46 deletions and two insertions, occurred in homopolymer (HP) tracts [i.e., 47 poly(A) or (T) tracts, 1 poly(G) or (C) tract] between 5 and 13 bp long. Our findings are of interest because HP tracts are present at high levels in the yeast genome (>77,400 for 5- to 20-nt HP tracts), and frameshift mutations in these regions are likely to disrupt gene function. In addition, they demonstrate that the mutation pattern seen previously in mismatch repair defective strains using a limited number of reporters holds true for the entire genome.
Collapse
|
35
|
Cooper DN, Ball EV, Mort M. Chromosomal distribution of disease genes in the human genome. Genet Test Mol Biomarkers 2010; 14:441-6. [PMID: 20642358 DOI: 10.1089/gtmb.2010.0081] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Genes are nonrandomly distributed in the human genome, both within and between chromosomes. Thus, genes of similar function and common evolutionary origin are often clustered, as are genes with similar expression profiles. We now report that the >2400 genes known to underlie human monogenic inherited disease are non-randomly distributed in the genome over and above the general nonrandomness evident in the distribution of human genes. Further, a subset of 315 inherited disease genes subject to gross deletion was found to exhibit a degree of clustering that was twice that manifested by disease genes in general. The clustering of human disease genes is likely to have important implications for understanding the genotype-phenotype relationship in contiguous gene syndromes as well as those conditions characterized by multigene deletions or complex chromosomal rearrangements.
Collapse
Affiliation(s)
- David N Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff, United Kingdom.
| | | | | |
Collapse
|
36
|
Mugal CF, Wolf JBW, von Grünberg HH, Ellegren H. Conservation of neutral substitution rate and substitutional asymmetries in mammalian genes. Genome Biol Evol 2010; 2:19-28. [PMID: 20333222 PMCID: PMC2839347 DOI: 10.1093/gbe/evp056] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/22/2009] [Indexed: 12/21/2022] Open
Abstract
Local variation in neutral substitution rate across mammalian genomes is governed by several factors, including sequence context variables and structural variables. In addition, the interplay of replication and transcription, known to induce a strand bias in mutation rate, gives rise to variation in substitutional strand asymmetries. Here, we address the conservation of variation in mutation rate and substitutional strand asymmetries using primate- and rodent-specific repeat elements located within the introns of protein-coding genes. We find significant but weak conservation of local mutation rates between human and mouse orthologs. Likewise, substitutional strand asymmetries are conserved between human and mouse, where substitution rate asymmetries show a higher degree of conservation than mutation rate. Moreover, we provide evidence that replication and transcription are correlated to the strength of substitutional asymmetries. The effect of transcription is particularly visible for genes with highly conserved gene expression. In comparison with replication and transcription, mutation rate influences the strength of substitutional asymmetries only marginally.
Collapse
Affiliation(s)
- C F Mugal
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden.
| | | | | | | |
Collapse
|
37
|
Oldmeadow C, Mengersen K, Mattick JS, Keith JM. Multiple evolutionary rate classes in animal genome evolution. Mol Biol Evol 2009; 27:942-53. [PMID: 19955480 DOI: 10.1093/molbev/msp299] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
The proportion of functional sequence in the human genome is currently a subject of debate. The most widely accepted figure is that approximately 5% is under purifying selection. In Drosophila, estimates are an order of magnitude higher, though this corresponds to a similar quantity of sequence. These estimates depend on the difference between the distribution of genomewide evolutionary rates and that observed in a subset of sequences presumed to be neutrally evolving. Motivated by the widening gap between these estimates and experimental evidence of genome function, especially in mammals, we developed a sensitive technique for evaluating such distributions and found that they are much more complex than previously apparent. We found strong evidence for at least nine well-resolved evolutionary rate classes in an alignment of four Drosophila species and at least seven classes in an alignment of four mammals, including human. We also identified at least three rate classes in human ancestral repeats. By positing that the largest of these ancestral repeat classes is neutrally evolving, we estimate that the proportion of nonneutrally evolving sequence is 30% of human ancestral repeats and 45% of the aligned portion of the genome. However, we also question whether any of the classes represent neutrally evolving sequences and argue that a plausible alternative is that they reflect variable structure-function constraints operating throughout the genomes of complex organisms.
Collapse
Affiliation(s)
- Christopher Oldmeadow
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, QLD, Australia
| | | | | | | |
Collapse
|
38
|
Abstract
High-throughput DNA analyses are increasingly being used to detect rare mutations in moderately sized genomes. These methods have yielded genome mutation rates that are markedly higher than those obtained using pre-genomic strategies. Recent work in a variety of organisms has shown that mutation rate is strongly affected by sequence context and genome position. These observations suggest that high-throughput DNA analyses will ultimately allow researchers to identify trans-acting factors and cis sequences that underlie mutation rate variation. Such work should provide insights on how mutation rate variability can impact genome organization and disease progression.
Collapse
Affiliation(s)
- Koodali T Nishant
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853-2703, USA
| | | | | |
Collapse
|
39
|
Duret L, Galtier N. Biased gene conversion and the evolution of mammalian genomic landscapes. Annu Rev Genomics Hum Genet 2009; 10:285-311. [PMID: 19630562 DOI: 10.1146/annurev-genom-082908-150001] [Citation(s) in RCA: 468] [Impact Index Per Article: 31.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Recombination is typically thought of as a symmetrical process resulting in large-scale reciprocal genetic exchanges between homologous chromosomes. Recombination events, however, are also accompanied by short-scale, unidirectional exchanges known as gene conversion in the neighborhood of the initiating double-strand break. A large body of evidence suggests that gene conversion is GC-biased in many eukaryotes, including mammals and human. AT/GC heterozygotes produce more GC- than AT-gametes, thus conferring a population advantage to GC-alleles in high-recombining regions. This apparently unimportant feature of our molecular machinery has major evolutionary consequences. Structurally, GC-biased gene conversion explains the spatial distribution of GC-content in mammalian genomes-the so-called isochore structure. Functionally, GC-biased gene conversion promotes the segregation and fixation of deleterious AT --> GC mutations, thus increasing our genomic mutation load. Here we review the recent evidence for a GC-biased gene conversion process in mammals, and its consequences for genomic landscapes, molecular evolution, and human functional genomics.
Collapse
Affiliation(s)
- Laurent Duret
- Université de Lyon 1, CNRS, UMR5558, Laboratoire de Biométrie et Biologie Evolutive, F-69622, Villeurbanne, France.
| | | |
Collapse
|
40
|
Adelson DL, Raison JM, Edgar RC. Characterization and distribution of retrotransposons and simple sequence repeats in the bovine genome. Proc Natl Acad Sci U S A 2009; 106:12855-60. [PMID: 19625614 PMCID: PMC2722308 DOI: 10.1073/pnas.0901282106] [Citation(s) in RCA: 73] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2009] [Indexed: 12/11/2022] Open
Abstract
Interspersed repeat composition and distribution in mammals have been best characterized in the human and mouse genomes. The bovine genome contains typical eutherian mammal repeats, but also has a significant number of long interspersed nuclear element RTE (BovB) elements proposed to have been horizontally transferred from squamata. Our analysis of the BovB repeats has indicated that only a few of them are currently likely to retrotranspose in cattle. However, bovine L1 repeats (L1 BT) have many likely active copies. Comparison of substitution rates for BovB and L1 BT indicates that L1 BT is a younger repeat family than BovB. In contrast to mouse and human, L1 occurrence is not negatively correlated with G+C content. However, BovB, Bov A2, ART2A, and Bov-tA are negatively correlated with G+C, although Bov-tAs correlation is weaker. Also, by performing genome wide correlation analysis of interspersed and simple sequence repeats, we have identified genome territories by repeat content that appear to define ancestral vs. ruminant-specific genomic regions. These ancestral regions, enriched with L2 and MIR repeats, are largely conserved between bovine and human.
Collapse
Affiliation(s)
- David L Adelson
- School of Molecular and Biomedical Science, University of Adelaide, North Terrace, Adelaide, South Australia, 5005, Australia.
| | | | | |
Collapse
|
41
|
Zhang Z, Townsend JP. Maximum-likelihood model averaging to profile clustering of site types across discrete linear sequences. PLoS Comput Biol 2009; 5:e1000421. [PMID: 19557160 PMCID: PMC2695770 DOI: 10.1371/journal.pcbi.1000421] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2009] [Accepted: 05/21/2009] [Indexed: 11/19/2022] Open
Abstract
A major analytical challenge in computational biology is the detection and description of clusters of specified site types, such as polymorphic or substituted sites within DNA or protein sequences. Progress has been stymied by a lack of suitable methods to detect clusters and to estimate the extent of clustering in discrete linear sequences, particularly when there is no a priori specification of cluster size or cluster count. Here we derive and demonstrate a maximum likelihood method of hierarchical clustering. Our method incorporates a tripartite divide-and-conquer strategy that models sequence heterogeneity, delineates clusters, and yields a profile of the level of clustering associated with each site. The clustering model may be evaluated via model selection using the Akaike Information Criterion, the corrected Akaike Information Criterion, and the Bayesian Information Criterion. Furthermore, model averaging using weighted model likelihoods may be applied to incorporate model uncertainty into the profile of heterogeneity across sites. We evaluated our method by examining its performance on a number of simulated datasets as well as on empirical polymorphism data from diverse natural alleles of the Drosophila alcohol dehydrogenase gene. Our method yielded greater power for the detection of clustered sites across a breadth of parameter ranges, and achieved better accuracy and precision of estimation of clusters, than did the existing empirical cumulative distribution function statistics.
Collapse
Affiliation(s)
- Zhang Zhang
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, Connecticut, United States of America
| | - Jeffrey P. Townsend
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, Connecticut, United States of America
- * E-mail:
| |
Collapse
|
42
|
Imamura H, Karro JE, Chuang JH. Weak preservation of local neutral substitution rates across mammalian genomes. BMC Evol Biol 2009; 9:89. [PMID: 19416516 PMCID: PMC2689173 DOI: 10.1186/1471-2148-9-89] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2008] [Accepted: 05/05/2009] [Indexed: 01/06/2023] Open
Abstract
Background The rate at which neutral (non-functional) bases undergo substitution is highly dependent on their location within a genome. However, it is not clear how fast these location-dependent rates change, or to what extent the substitution rate patterns are conserved between lineages. To address this question, which is critical not only for understanding the substitution process but also for evaluating phylogenetic footprinting algorithms, we examine ancestral repeats: a predominantly neutral dataset with a significantly higher genomic density than other datasets commonly used to study substitution rate variation. Using this repeat data, we measure the extent to which orthologous ancestral repeat sequences exhibit similar substitution patterns in separate mammalian lineages, allowing us to ascertain how well local substitution rates have been preserved across species. Results We calculated substitution rates for each ancestral repeat in each of three independent mammalian lineages (primate – from human/macaque alignments, rodent – from mouse/rat alignments, and laurasiatheria – from dog/cow alignments). We then measured the correlation of local substitution rates among these lineages. Overall we found the correlations between lineages to be statistically significant, but too weak to have much predictive power (r2 <5%). These correlations were found to be primarily driven by regional effects at the scale of several hundred kb or larger. A few repeat classes (e.g. 7SK, Charlie8, and MER121) also exhibited stronger conservation of rate patterns, likely due to the effect of repeat-specific purifying selection. These classes should be excluded when estimating local neutral substitution rates. Conclusion Although local neutral substitution rates have some correlations among mammalian species, these correlations have little predictive power on the scale of individual repeats. This indicates that local substitution rates have changed significantly among the lineages we have studied, and are likely to have changed even more for more diverged lineages. The correlations that do persist are too weak to be responsible for many of the highly conserved elements found by phylogenetic footprinting algorithms, leading us to conclude that such elements must be conserved due to selective forces.
Collapse
Affiliation(s)
- Hideo Imamura
- Boston College, Department of Biology, Chestnut Hill, MA 02467, USA.
| | | | | |
Collapse
|
43
|
Abstract
Why are some genomic positions more mutable than others? The identification of cryptic mutation hotspots in the human genome indicates that the determinants of mutation rates are more complex than anticipated.
Collapse
Affiliation(s)
- Laurent Duret
- Laboratoire de Biométrie et Biologie Evolutive, Université de Lyon, France.
| |
Collapse
|
44
|
Singh ND, Arndt PF, Clark AG, Aquadro CF. Strong evidence for lineage and sequence specificity of substitution rates and patterns in Drosophila. Mol Biol Evol 2009; 26:1591-605. [PMID: 19351792 DOI: 10.1093/molbev/msp071] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Rates of single nucleotide substitution in Drosophila are highly variable within the genome, and several examples illustrate that evolutionary rates differ among Drosophila species as well. Here, we use a maximum likelihood method to quantify lineage-specific substitutional patterns and apply this method to 4-fold degenerate synonymous sites and introns from more than 8,000 genes aligned in the Drosophila melanogaster group. We find that within species, different classes of sequence evolve at different rates, with long introns evolving most slowly and short introns evolving most rapidly. Relative rates of individual single nucleotide substitutions vary approximately 3-fold among lineages, yielding patterns of substitution that are comparatively less GC-biased in the melanogaster species complex relative to Drosophila yakuba and Drosophila erecta. These results are consistent with a model coupling a mutational shift toward reduced GC content, or a shift in mutation-selection balance, in the D. melanogaster species complex, with variation in selective constraint among different classes of DNA sequence. Finally, base composition of coding and intronic sequences is not at equilibrium with respect to substitutional patterns, which primarily reflects the slow rate of the substitutional process. These results thus support the view that mutational and/or selective processes are labile on an evolutionary timescale and that if the process is indeed selection driven, then the distribution of selective constraint is variable across the genome.
Collapse
Affiliation(s)
- Nadia D Singh
- Department of Molecular Biology and Genetics, Cornell University.
| | | | | | | |
Collapse
|
45
|
Jeffreys AJ, Neumann R. The rise and fall of a human recombination hot spot. Nat Genet 2009; 41:625-9. [PMID: 19349985 PMCID: PMC2678279 DOI: 10.1038/ng.346] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2008] [Accepted: 01/16/2009] [Indexed: 12/26/2022]
Abstract
Human meiotic crossovers mainly cluster into narrow hot spots that profoundly influence patterns of haplotype diversity and that may also affect genome instability and sequence evolution. Hot spots also seem to be ephemeral, but processes of hot-spot activation and their subsequent evolutionary dynamics remain unknown. We now analyze the life cycle of a recombination hot spot. Sperm typing revealed a polymorphic hot spot that was activated in cis by a single base change, providing evidence for a primary sequence determinant necessary, though not sufficient, to activate recombination. This activating mutation occurred roughly 70,000 y ago and has persisted to the present, most likely fortuitously through genetic drift despite its systematic elimination by biased gene conversion. Nonetheless, this self-destructive conversion will eventually lead to hot-spot extinction. These findings define a subclass of highly transient hot spots and highlight the importance of understanding hot-spot turnover and how it influences haplotype diversity.
Collapse
|
46
|
Phillips N, Salomon M, Custer A, Ostrow D, Baer CF. Spontaneous mutational and standing genetic (co)variation at dinucleotide microsatellites in Caenorhabditis briggsae and Caenorhabditis elegans. Mol Biol Evol 2008; 26:659-69. [PMID: 19109257 DOI: 10.1093/molbev/msn287] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Understanding the evolutionary processes responsible for shaping genetic variation within and between species requires separating the effects of mutation and selection. Differences between the patterns of genetic variation observed in nature and when mutations are allowed to accumulate in the relative absence of selection can reveal biases imposed by selection. We characterize the genetic variation at dinucleotide microsatellite repeats in four sets of 250-generation mutation accumulation (MA) lines, two in the species Caenorhabditis briggsae and two in Caenorhabditis elegans, and compare the mutational variation with the standing variation in those species. We also compare the mutational properties of microsatellites with the cumulative effects of mutations on fitness in the same lines. Integrated over the whole genome, we infer that the mutation rate of C. briggsae is about twice that of C. elegans, consistent with the cumulative mutational effects on fitness. The mutational spectrum (ratio of insertions to deletions) differs between repeat types and, in some cases, between species. The per-locus mutation rate is significantly positively correlated with the standing genetic variation at the same locus in both species, providing justification for the common practice of using the standing genetic variance as a surrogate for the mutation rate.
Collapse
|
47
|
Kim J, Sanderson MJ. Penalized likelihood phylogenetic inference: bridging the parsimony-likelihood gap. Syst Biol 2008; 57:665-74. [PMID: 18853355 DOI: 10.1080/10635150802422274] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022] Open
Abstract
The increasing diversity and heterogeneity of molecular data for phylogeny estimation has led to development of complex models and model-based estimators. Here, we propose a penalized likelihood (PL) framework in which the levels of complexity in the underlying model can be smoothly controlled. We demonstrate the PL framework for a four-taxon tree case and investigate its properties. The PL framework yields an estimator in which the majority of currently employed estimators such as the maximum-parsimony estimator, homogeneous likelihood estimator, gamma mixture likelihood estimator, etc., become special cases of a single family of PL estimators. Furthermore, using the appropriate penalty function, the complexity of the underlying models can be partitioned into separately controlled classes allowing flexible control of model complexity.
Collapse
Affiliation(s)
- Junhyong Kim
- Department of Biology and Penn Genome Frontiers Institute, University of Pennsylvania, Philadelphia, Pennsylvania, USA.
| | | |
Collapse
|
48
|
Peifer M, Karro JE, von Grünberg HH. Is there an acceleration of the CpG transition rate during the mammalian radiation? Bioinformatics 2008; 24:2157-64. [PMID: 18662928 PMCID: PMC2553435 DOI: 10.1093/bioinformatics/btn391] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2008] [Revised: 07/27/2008] [Accepted: 07/27/2008] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION In this article we build a model of the CpG dinucleotide substitution rate and use it to challenge the claim that, that rate underwent a sudden mammalian-specific increase approximately 90 million years ago. The evidence supporting this hypothesis comes from the application of a model of neutral substitution rates able to account for elevated CpG dinucleotide substitution rates. With the initial goal of improving that model's accuracy, we introduced a modification enabling us to account for boundary effects arising by the truncation of the Markov field, as well as improving the optimization procedure required for estimating the substitution rates. RESULTS When using this modified method to reproduce the supporting analysis, the evidence of the rate shift vanished. Our analysis suggests that the CpG-specific rate has been constant over the relevant time period and that the asserted acceleration of the CpG rate is likely an artifact of the original model.
Collapse
Affiliation(s)
- M Peifer
- Institute of Chemistry, Karl-Franzens University Graz, Graz, Austria.
| | | | | |
Collapse
|
49
|
|
50
|
Analysis of transposon interruptions suggests selection for L1 elements on the X chromosome. PLoS Genet 2008; 4:e1000172. [PMID: 18769724 PMCID: PMC2517846 DOI: 10.1371/journal.pgen.1000172] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2008] [Accepted: 07/17/2008] [Indexed: 01/02/2023] Open
Abstract
It has been hypothesised that the massive accumulation of L1 transposable elements on the X chromosome is due to their function in X inactivation, and that the accumulation of Alu elements near genes is adaptive. We tested the possible selective advantage of these two transposable element (TE) families with a novel method, interruption analysis. In mammalian genomes, a large number of TEs interrupt other TEs due to the high overall abundance and age of repeats, and these interruptions can be used to test whether TEs are selectively neutral. Interruptions of TEs, which are beneficial for the host, are expected to be deleterious and underrepresented compared with neutral ones. We found that L1 elements in the regions of the X chromosome that contain the majority of the inactivated genes are significantly less frequently interrupted than on the autosomes, while L1s near genes that escape inactivation are interrupted with higher frequency, supporting the hypothesis that L1s on the X chromosome play a role in its inactivation. In addition, we show that TEs are less frequently interrupted in introns than in intergenic regions, probably due to selection against the expansion of introns, but the insertion pattern of Alus is comparable to other repeats. Recent experimental findings (for example the ENCODE project) show that many functional non-coding regions of genomes are not conserved across species, making the in-silico discovery of such regions challenging. Transposable elements (TEs), which represent 45 percent of the human genome and typically show no sequence conservation, are particularly intriguing from this point of view, because the highly nonrandom genomic distribution of many TE families in genomes has led to hypotheses that their presence is adaptive and have an epigenetic (regulatory) function. We use a novel approach based on the analysis of interrupted TEs to investigate if repeats are under selection that does not rely on sequence conservation. L1 elements, the most active transposable elements of the human genome, are highly overrepresented on the X-chromosome and were suggested to enhance its inactivation in mammals. We find that the interruption pattern of L1 repeats indicates a function for L1 elements in the inactivation of the mammalian X chromosome. Additionally, we show that a considerable fraction of TEs in introns are under selection for integrity, possibly due to selection on intron size or on TEs themselves.
Collapse
|