1
|
Gene tree species tree reconciliation with gene conversion. J Math Biol 2019; 78:1981-2014. [PMID: 30767052 DOI: 10.1007/s00285-019-01331-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2017] [Revised: 10/03/2018] [Indexed: 01/19/2023]
Abstract
Gene tree/species tree reconciliation is a recent decisive progress in phylogenetic methods, accounting for the possible differences between gene histories and species histories. Reconciliation consists in explaining these differences by gene-scale events such as duplication, loss, transfer, which translates mathematically into a mapping between gene tree nodes and species tree nodes or branches. Gene conversion is a frequent and important evolutionary event, which results in the replacement of a gene by a copy of another from the same species and in the same gene tree. Including this event in reconciliation models has never been attempted because it introduces a dependency between lineages, and standard algorithms based on dynamic programming become ineffective. We propose here a novel mathematical framework including gene conversion as an evolutionary event in gene tree/species tree reconciliation. We describe a randomized algorithm that finds, in polynomial running time, a reconciliation minimizing the number of duplications, losses and conversions in the case when their weights are equal. We show that the space of optimal reconciliations includes an analog of the last common ancestor reconciliation, but is not limited to it. Our algorithm outputs any optimal reconciliation with a non-null probability. We argue that this study opens a research avenue on including gene conversion in reconciliation, and discuss its possible importance in biology.
Collapse
|
2
|
Ramanauskas K, Igić B. The evolutionary history of plant T2/S-type ribonucleases. PeerJ 2017; 5:e3790. [PMID: 28924504 PMCID: PMC5598434 DOI: 10.7717/peerj.3790] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2017] [Accepted: 08/18/2017] [Indexed: 12/22/2022] Open
Abstract
A growing number of T2/S-RNases are being discovered in plant genomes. Members of this protein family have a variety of known functions, but the vast majority are still uncharacterized. We present data and analyses of phylogenetic relationships among T2/S-RNases, and pay special attention to the group that contains the female component of the most widespread system of self-incompatibility in flowering plants. The returned emphasis on the initially identified component of this mechanism yields important conjectures about its evolutionary context. First, we find that the clade involved in self-rejection (class III) is found exclusively in core eudicots, while the remaining clades contain members from other vascular plants. Second, certain features, such as intron patterns, isoelectric point, and conserved amino acid regions, help differentiate S-RNases, which are necessary for expression of self-incompatibility, from other T2/S-RNase family members. Third, we devise and present a set of approaches to clarify new S-RNase candidates from existing genome assemblies. We use genomic features to identify putative functional and relictual S-loci in genomes of plants with unknown mechanisms of self-incompatibility. The widespread occurrence of possible relicts suggests that the loss of functional self-incompatibility may leave traces long after the fact, and that this manner of molecular fossil-like data could be an important source of information about the history and distribution of both RNase-based and other mechanisms of self-incompatibility. Finally, we release a public resource intended to aid the search for S-locus RNases, and help provide increasingly detailed information about their taxonomic distribution.
Collapse
Affiliation(s)
- Karolis Ramanauskas
- Department of Biological Sciences, University of Illinois at Chicago, Chicago, IL, United States of America
| | - Boris Igić
- Department of Biological Sciences, University of Illinois at Chicago, Chicago, IL, United States of America
| |
Collapse
|
3
|
Gjini E, Haydon DT, David Barry J, Cobbold CA. Revisiting the diffusion approximation to estimate evolutionary rates of gene family diversification. J Theor Biol 2014; 341:111-22. [PMID: 24120993 DOI: 10.1016/j.jtbi.2013.10.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2012] [Revised: 06/21/2013] [Accepted: 10/02/2013] [Indexed: 11/18/2022]
Abstract
Genetic diversity in multigene families is shaped by multiple processes, including gene conversion and point mutation. Because multi-gene families are involved in crucial traits of organisms, quantifying the rates of their genetic diversification is important. With increasing availability of genomic data, there is a growing need for quantitative approaches that integrate the molecular evolution of gene families with their higher-scale function. In this study, we integrate a stochastic simulation framework with population genetics theory, namely the diffusion approximation, to investigate the dynamics of genetic diversification in a gene family. Duplicated genes can diverge and encode new functions as a result of point mutation, and become more similar through gene conversion. To model the evolution of pairwise identity in a multigene family, we first consider all conversion and mutation events in a discrete manner, keeping track of their details and times of occurrence; second we consider only the infinitesimal effect of these processes on pairwise identity accounting for random sampling of genes and positions. The purely stochastic approach is closer to biological reality and is based on many explicit parameters, such as conversion tract length and family size, but is more challenging analytically. The population genetics approach is an approximation accounting implicitly for point mutation and gene conversion, only in terms of per-site average probabilities. Comparison of these two approaches across a range of parameter combinations reveals that they are not entirely equivalent, but that for certain relevant regimes they do match. As an application of this modelling framework, we consider the distribution of nucleotide identity among VSG genes of African trypanosomes, representing the most prominent example of a multi-gene family mediating parasite antigenic variation and within-host immune evasion.
Collapse
Affiliation(s)
- Erida Gjini
- Instituto Gulbenkian de Ciência Oeiras, Portugal.
| | - Daniel T Haydon
- Institute of Biodiversity, Animal Health and Comparative Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, United Kingdom; The Boyd Orr Centre for Population and Ecosystem Health, University of Glasgow, Glasgow, United Kingdom; Wellcome Trust Centre for Molecular Parasitology, Institute of Infection, Immunity and Inflammation, University of Glasgow, Glasgow, United Kingdom
| | - J David Barry
- Wellcome Trust Centre for Molecular Parasitology, Institute of Infection, Immunity and Inflammation, University of Glasgow, Glasgow, United Kingdom
| | - Christina A Cobbold
- School of Mathematics and Statistics, College of Science and Engineering, University of Glasgow, Glasgow, United Kingdom; The Boyd Orr Centre for Population and Ecosystem Health, University of Glasgow, Glasgow, United Kingdom
| |
Collapse
|
4
|
Petronella N, Drouin G. Purifying selection against gene conversions in the folate receptor genes of primates. Genomics 2013; 103:40-7. [PMID: 24184359 DOI: 10.1016/j.ygeno.2013.10.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2013] [Revised: 09/20/2013] [Accepted: 10/22/2013] [Indexed: 01/07/2023]
Abstract
We characterized the gene conversions between the human folate receptor (FOLR) genes and those of five other primate species. We found 26 gene conversions having an average length of 534 nucleotides. The length of these conversions is correlated with sequence similarity, converted regions have a higher GC-content and the average size of converted regions from a functional donor to another functional donor is significantly smaller than the average size from a functional donor to a pseudogene. Furthermore, the few conversions observed in the FOLR1 and FOLR2 genes did not change any amino acids in their coding regions and did not affect their promoter regions. In contrast, the promoter and coding regions of the FOLR3 gene are frequently converted and these conversions changed many amino acids in marmoset. These results suggest that purifying selection is limiting the functional impact that frequent gene conversions have on functional folate receptor genes.
Collapse
Affiliation(s)
- Nicholas Petronella
- Département de biologie et Centre de recherche avancée en génomique environnementale, Université d'Ottawa, Ottawa, Ontario K1N 6N5, Canada
| | - Guy Drouin
- Département de biologie et Centre de recherche avancée en génomique environnementale, Université d'Ottawa, Ottawa, Ontario K1N 6N5, Canada.
| |
Collapse
|
5
|
Strong purifying selection against gene conversions in the trypsin genes of primates. Hum Genet 2012; 131:1739-49. [PMID: 22752798 DOI: 10.1007/s00439-012-1196-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2012] [Accepted: 06/20/2012] [Indexed: 01/27/2023]
Abstract
The trypsin gene families of primate species are composed of members who share a remarkable level of sequence similarity. Here, we investigated the gene conversions occurring within the trypsin gene family in five primate species. A total of 36 conversion events, with an average length (±standard deviation) of 1,526 ± 1,124 nucleotides, were detected using two methods. Such extensive gene conversions are likely both the cause and the consequence of the high sequence similarity between primate trypsin genes. In the trypsins encoded by these genes, both the overall amino acid sequences and critical amino acid residues are conserved. Therefore, the numerous long gene conversions we detected between trypsin genes did not alter any of their functionally important amino acid sites. This suggest that, in the trypsin genes of the five primate species studied here, strong purifying selection against gene conversions is occurring in regions containing functionally important residues.
Collapse
|
6
|
Song G, Riemer C, Dickins B, Kim HL, Zhang L, Zhang Y, Hsu CH, Hardison RC, Nisc Comparative Sequencing Program, Green ED, Miller W. Revealing mammalian evolutionary relationships by comparative analysis of gene clusters. Genome Biol Evol 2012; 4:586-601. [PMID: 22454131 PMCID: PMC3342878 DOI: 10.1093/gbe/evs032] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/19/2012] [Indexed: 12/13/2022] Open
Abstract
Many software tools for comparative analysis of genomic sequence data have been released in recent decades. Despite this, it remains challenging to determine evolutionary relationships in gene clusters due to their complex histories involving duplications, deletions, inversions, and conversions. One concept describing these relationships is orthology. Orthologs derive from a common ancestor by speciation, in contrast to paralogs, which derive from duplication. Discriminating orthologs from paralogs is a necessary step in most multispecies sequence analyses, but doing so accurately is impeded by the occurrence of gene conversion events. We propose a refined method of orthology assignment based on two paradigms for interpreting its definition: by genomic context or by sequence content. X-orthology (based on context) traces orthology resulting from speciation and duplication only, while N-orthology (based on content) includes the influence of conversion events. We developed a computational method for automatically mapping both types of orthology on a per-nucleotide basis in gene cluster regions studied by comparative sequencing, and we make this mapping accessible by visualizing the output. All of these steps are incorporated into our newly extended CHAP 2 package. We evaluate our method using both simulated data and real gene clusters (including the well-characterized α-globin and β-globin clusters). We also illustrate use of CHAP 2 by analyzing four more loci: CCL (chemokine ligand), IFN (interferon), CYP2abf (part of cytochrome P450 family 2), and KIR (killer cell immunoglobulin-like receptors). These new methods facilitate and extend our understanding of evolution at these and other loci by adding automated accurate evolutionary inference to the biologist's toolkit. The CHAP 2 package is freely available from http://www.bx.psu.edu/miller_lab.
Collapse
Affiliation(s)
- Giltae Song
- Center for Comparative Genomics and Bioinformatics, Pennsylvania State University, PA, USA.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
7
|
Song G, Hsu CH, Riemer C, Zhang Y, Kim HL, Hoffmann F, Zhang L, Hardison RC, Green ED, Miller W. Conversion events in gene clusters. BMC Evol Biol 2011; 11:226. [PMID: 21798034 PMCID: PMC3161012 DOI: 10.1186/1471-2148-11-226] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2011] [Accepted: 07/28/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Gene clusters containing multiple similar genomic regions in close proximity are of great interest for biomedical studies because of their associations with inherited diseases. However, such regions are difficult to analyze due to their structural complexity and their complicated evolutionary histories, reflecting a variety of large-scale mutational events. In particular, conversion events can mislead inferences about the relationships among these regions, as traced by traditional methods such as construction of phylogenetic trees or multi-species alignments. RESULTS To correct the distorted information generated by such methods, we have developed an automated pipeline called CHAP (Cluster History Analysis Package) for detecting conversion events. We used this pipeline to analyze the conversion events that affected two well-studied gene clusters (α-globin and β-globin) and three gene clusters for which comparative sequence data were generated from seven primate species: CCL (chemokine ligand), IFN (interferon), and CYP2abf (part of cytochrome P450 family 2). CHAP is freely available at http://www.bx.psu.edu/miller_lab. CONCLUSIONS These studies reveal the value of characterizing conversion events in the context of studying gene clusters in complex genomes.
Collapse
Affiliation(s)
- Giltae Song
- Center for Comparative Genomics and Bioinformatics, Pennsylvania State University, University Park, PA 16802 USA.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|