1
|
Bohnenkämper L, Stoye J, Doerr D. Reconstructing rearrangement phylogenies of natural genomes. Algorithms Mol Biol 2025; 20:10. [PMID: 40483529 PMCID: PMC12144824 DOI: 10.1186/s13015-025-00279-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2025] [Accepted: 05/07/2025] [Indexed: 06/11/2025] Open
Abstract
BACKGROUND We study the classical problem of inferring ancestral genomes from a set of extant genomes under a given phylogeny, known as the Small Parsimony Problem (SPP). Genomes are represented as sequences of oriented markers, organized in one or more linear or circular chromosomes. Any marker may appear in several copies, without restriction on orientation or genomic location, known as the natural genomes model. Evolutionary events along the branches of the phylogeny encompass large scale rearrangements, including segmental inversions, translocations, gain and loss (DCJ-indel model). Even under simpler rearrangement models, such as the classical breakpoint model without duplicates, the SPP is computationally intractable. Nevertheless, the SPP for natural genomes under the DCJ-indel model has been studied recently, with limited success. METHODS Building on prior work, we present a highly optimized ILP that is able to solve the SPP for sufficiently small phylogenies and gene families. A notable improvement w.r.t. the previous result is an optimized way of handling both circular and linear chromosomes. This is especially relevant to the SPP, since the chromosomal structure of ancestral genomes is unknown and the solution space for this chromosomal structure is typically large. RESULTS We benchmark our method on simulated and real data. On simulated phylogenies we observe a considerable performance improvement on problems that include linear chromosomes. And even when the ground truth contains only one circular chromosome per genome, our method outperforms its predecessor due to its optimized handling of the solution space. The practical advantage becomes also visible in an analysis of seven Anopheles taxa.
Collapse
Affiliation(s)
- Leonard Bohnenkämper
- Faculty of Technology, Bielefeld University, Universitätsstraße 25, 33615, Bielefeld, NRW, Germany
- Center for Biotechnology (CeBiTec), Bielefeld University, Universitätsstraße 25, 33615, Bielefeld, NRW, Germany
| | - Jens Stoye
- Faculty of Technology, Bielefeld University, Universitätsstraße 25, 33615, Bielefeld, NRW, Germany
- Center for Biotechnology (CeBiTec), Bielefeld University, Universitätsstraße 25, 33615, Bielefeld, NRW, Germany
| | - Daniel Doerr
- Department for Endocrinology and Diabetology, Medical Faculty, Heinrich Heine University Düsseldorf, University Hospital Düsseldorf, Moorenstr. 5, 40225, Düsseldorf, NRW, Germany.
- German Diabetes Center (DDZ), Leibniz Institute for Diabetes Research Germany, Auf'm Hennekamp 65, 40225, Düsseldorf, NRW, Germany.
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Moorenstr. 5, 40225, Düsseldorf, NRW, Germany.
| |
Collapse
|
2
|
Kloub L, Gosselin S, Graf J, Gogarten JP, Bansal MS. Investigating Additive and Replacing Horizontal Gene Transfers Using Phylogenies and Whole Genomes. Genome Biol Evol 2024; 16:evae180. [PMID: 39163267 PMCID: PMC11375855 DOI: 10.1093/gbe/evae180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2024] [Revised: 07/29/2024] [Accepted: 08/12/2024] [Indexed: 08/22/2024] Open
Abstract
Horizontal gene transfer (HGT) is fundamental to microbial evolution and adaptation. When a gene is horizontally transferred, it may either add itself as a new gene to the recipient genome (possibly displacing nonhomologous genes) or replace an existing homologous gene. Currently, studies do not usually distinguish between "additive" and "replacing" HGTs, and their relative frequencies, integration mechanisms, and specific roles in microbial evolution are poorly understood. In this work, we develop a novel computational framework for large-scale classification of HGTs as either additive or replacing. Our framework leverages recently developed phylogenetic approaches for HGT detection and classifies HGTs inferred between terminal edges based on gene orderings along genomes and phylogenetic relationships between the microbial species under consideration. The resulting method, called DART, is highly customizable and scalable and can classify a large fraction of inferred HGTs with high confidence and statistical support. Our application of DART to a large dataset of thousands of gene families from 103 Aeromonas genomes provides insights into the relative frequencies, functional biases, and integration mechanisms of additive and replacing HGTs. Among other results, we find that (i) the relative frequency of additive HGT increases with increasing phylogenetic distance, (ii) replacing HGT dominates at shorter phylogenetic distances, (iii) additive and replacing HGTs have strikingly different functional profiles, (iv) homologous recombination in flanking regions of a novel gene may be a frequent integration mechanism for additive HGT, and (v) phages and mobile genetic elements likely play an important role in facilitating additive HGT.
Collapse
Affiliation(s)
- Lina Kloub
- School of Computing, University of Connecticut, 371 Fairfield Way, Unit 4155, Storrs, CT 06269-4155, USA
| | - Sophia Gosselin
- Department of Molecular and Cell Biology, University of Connecticut, 91 North Eagleville Road, Unit 3125, Storrs, CT 06269-3125, USA
| | - Joerg Graf
- Department of Molecular and Cell Biology, University of Connecticut, 91 North Eagleville Road, Unit 3125, Storrs, CT 06269-3125, USA
- Pacific Biosciences Research Center, University of Hawaii, Honolulu, HI 96822, USA
| | - Johann Peter Gogarten
- Department of Molecular and Cell Biology, University of Connecticut, 91 North Eagleville Road, Unit 3125, Storrs, CT 06269-3125, USA
- The Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269, USA
| | - Mukul S Bansal
- School of Computing, University of Connecticut, 371 Fairfield Way, Unit 4155, Storrs, CT 06269-4155, USA
- The Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269, USA
| |
Collapse
|
3
|
Goli RC, Chishi KG, Ganguly I, Singh S, Dixit S, Rathi P, Diwakar V, Sree C C, Limbalkar OM, Sukhija N, Kanaka K. Global and Local Ancestry and its Importance: A Review. Curr Genomics 2024; 25:237-260. [PMID: 39156729 PMCID: PMC11327809 DOI: 10.2174/0113892029298909240426094055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2024] [Revised: 03/02/2024] [Accepted: 03/11/2024] [Indexed: 08/20/2024] Open
Abstract
The fastest way to significantly change the composition of a population is through admixture, an evolutionary mechanism. In animal breeding history, genetic admixture has provided both short-term and long-term advantages by utilizing the phenomenon of complementarity and heterosis in several traits and genetic diversity, respectively. The traditional method of admixture analysis by pedigree records has now been replaced greatly by genome-wide marker data that enables more precise estimations. Among these markers, SNPs have been the popular choice since they are cost-effective, not so laborious, and automation of genotyping is easy. Certain markers can suggest the possibility of a population's origin from a sample of DNA where the source individual is unknown or unwilling to disclose their lineage, which are called Ancestry-Informative Markers (AIMs). Revealing admixture level at the locus-specific level is termed as local ancestry and can be exploited to identify signs of recent selective response and can account for genetic drift. Considering the importance of genetic admixture and local ancestry, in this mini-review, both concepts are illustrated, encompassing basics, their estimation/identification methods, tools/software used and their applications.
Collapse
Affiliation(s)
| | - Kiyevi G. Chishi
- ICAR-National Dairy Research Institute, Karnal, 132001, Haryana, India
| | - Indrajit Ganguly
- ICAR-National Bureau of Animal Genetic Resources, Karnal, 132001, Haryana, India
| | - Sanjeev Singh
- ICAR-National Bureau of Animal Genetic Resources, Karnal, 132001, Haryana, India
| | - S.P. Dixit
- ICAR-National Bureau of Animal Genetic Resources, Karnal, 132001, Haryana, India
| | - Pallavi Rathi
- ICAR-National Dairy Research Institute, Karnal, 132001, Haryana, India
| | - Vikas Diwakar
- ICAR-National Dairy Research Institute, Karnal, 132001, Haryana, India
| | - Chandana Sree C
- ICAR-National Dairy Research Institute, Karnal, 132001, Haryana, India
| | | | - Nidhi Sukhija
- ICAR-National Dairy Research Institute, Karnal, 132001, Haryana, India
- Central Tasar Research and Training Institute, Ranchi, 835303, Jharkhand, India
| | - K.K Kanaka
- ICAR- Indian Institute of Agricultural Biotechnology, Ranchi, 834010, Jharkhand, India
| |
Collapse
|
4
|
Altenhoff AM, Warwick Vesztrocy A, Bernard C, Train CM, Nicheperovich A, Prieto Baños S, Julca I, Moi D, Nevers Y, Majidian S, Dessimoz C, Glover NM. OMA orthology in 2024: improved prokaryote coverage, ancestral and extant GO enrichment, a revamped synteny viewer and more in the OMA Ecosystem. Nucleic Acids Res 2024; 52:D513-D521. [PMID: 37962356 PMCID: PMC10767875 DOI: 10.1093/nar/gkad1020] [Citation(s) in RCA: 25] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 10/17/2023] [Accepted: 10/23/2023] [Indexed: 11/15/2023] Open
Abstract
In this update paper, we present the latest developments in the OMA browser knowledgebase, which aims to provide high-quality orthology inferences and facilitate the study of gene families, genomes and their evolution. First, we discuss the addition of new species in the database, particularly an expanded representation of prokaryotic species. The OMA browser now offers Ancestral Genome pages and an Ancestral Gene Order viewer, allowing users to explore the evolutionary history and gene content of ancestral genomes. We also introduce a revamped Local Synteny Viewer to compare genomic neighborhoods across both extant and ancestral genomes. Hierarchical Orthologous Groups (HOGs) are now annotated with Gene Ontology annotations, and users can easily perform extant or ancestral GO enrichments. Finally, we recap new tools in the OMA Ecosystem, including OMAmer for proteome mapping, OMArk for proteome quality assessment, OMAMO for model organism selection and Read2Tree for phylogenetic species tree construction from reads. These new features provide exciting opportunities for orthology analysis and comparative genomics. OMA is accessible at https://omabrowser.org.
Collapse
Affiliation(s)
- Adrian M Altenhoff
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- ETH Zurich, Computer Science, Universitätstr. 6, 8092 Zurich, Switzerland
| | - Alex Warwick Vesztrocy
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Charles Bernard
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Clement-Marie Train
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Alina Nicheperovich
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Silvia Prieto Baños
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Irene Julca
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - David Moi
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Yannis Nevers
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Sina Majidian
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Christophe Dessimoz
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Natasha M Glover
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| |
Collapse
|
5
|
Cribbie EP, Doerr D, Chauve C. AGO, a Framework for the Reconstruction of Ancestral Syntenies and Gene Orders. Methods Mol Biol 2024; 2802:247-265. [PMID: 38819563 DOI: 10.1007/978-1-0716-3838-5_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Reconstructing ancestral gene orders from the genome data of extant species is an important problem in comparative and evolutionary genomics. In a phylogenomics setting that accounts for gene family evolution through gene duplication and gene loss, the reconstruction of ancestral gene orders involves several steps, including multiple sequence alignment, the inference of reconciled gene trees, and the inference of ancestral syntenies and gene adjacencies. For each of the steps of such a process, several methods can be used and implemented using a growing corpus of, often parameterized, tools; in practice, interfacing such tools into an ancestral gene order reconstruction pipeline is far from trivial. This chapter introduces AGO, a Python-based framework aimed at creating ancestral gene order reconstruction pipelines allowing to interface and parameterize different bioinformatics tools. The authors illustrate the features of AGO by reconstructing ancestral gene orders for the X chromosome of three ancestral Anopheles species using three different pipelines. AGO is freely available at https://github.com/cchauve/AGO-pipeline .
Collapse
Affiliation(s)
- Evan P Cribbie
- Department of Mathematics, Simon Fraser University, Burnaby, BC, Canada
| | - Daniel Doerr
- Department for Endocrinology and Diabetology, Medical Faculty and University Hospital Düsseldorf, German Diabetes Center (DDZ), Leibniz Institute for Diabetes Research, and Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Cedric Chauve
- Department of Mathematics, Simon Fraser University, Burnaby, BC, Canada.
| |
Collapse
|
6
|
Affiliation(s)
- Hugo Menet
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558,Villeurbanne, France
| | - Vincent Daubin
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558,Villeurbanne, France
- * E-mail: (VD); (ET)
| | - Eric Tannier
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558,Villeurbanne, France
- Inria, centre de recherche de Lyon, Villeurbanne, France
- * E-mail: (VD); (ET)
| |
Collapse
|
7
|
Khandai K, Navarro-Martinez C, Smith B, Buonopane R, Byun SA, Patterson M. Determining Significant Correlation Between Pairs of Extant Characters in a Small Parsimony Framework. J Comput Biol 2022; 29:1132-1154. [PMID: 35723627 DOI: 10.1089/cmb.2022.0141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
When studying the evolutionary relationships among a set of species, the principle of parsimony states that a relationship involving the fewest number of evolutionary events is likely the correct one. Due to its simplicity, this principle was formalized in the context of computational evolutionary biology decades ago by, for example, Fitch and Sankoff. Because the parsimony framework does not require a model of evolution, unlike maximum likelihood or Bayesian approaches, it is often a good starting point when no reasonable estimate of such a model is available. In this work, we devise a method for determining if pairs of discrete characters are significantly correlated across all most parsimonious reconstructions, given a set of species on these characters, and an evolutionary tree. The first step of this method is to use Sankoff's algorithm to compute all most parsimonious assignments of ancestral states (of each character) to the internal nodes of the phylogeny. Correlation between a pair of evolutionary events (e.g., absent to present) for a pair of characters is then determined by the (co-) occurrence patterns between the sets of their respective ancestral assignments. The probability of obtaining a correlation this extreme (or more) under a null hypothesis where the events happen randomly on the evolutionary tree is then used to assess the significance of this correlation. We implement this method: parcours (PARsimonious CO-occURrenceS) and use it to identify significantly correlated evolution among vocalizations and morphological characters in the Felidae family.
Collapse
Affiliation(s)
- Kaustubh Khandai
- Department of Computer Science, Georgia State University, Atlanta, Georgia, USA
| | | | - Brendan Smith
- Department of Biology, Fairfield University, Fairfield, Connecticut, USA
| | - Rebecca Buonopane
- Department of Biology, Fairfield University, Fairfield, Connecticut, USA
| | - Soyong Ashley Byun
- Department of Biology, Fairfield University, Fairfield, Connecticut, USA
| | - Murray Patterson
- Department of Computer Science, Georgia State University, Atlanta, Georgia, USA
| |
Collapse
|
8
|
Abstract
The Small Parsimony Problem (SPP) aims at finding the gene orders at internal nodes of a given phylogenetic tree such that the overall genome rearrangement distance along the tree branches is minimized. This problem is intractable in most genome rearrangement models, especially when gene duplication and loss are considered. In this work, we describe an Integer Linear Program algorithm to solve the SPP for natural genomes, i.e. genomes that contain conserved, unique, and duplicated markers. The evolutionary model that we consider is the DCJ-indel model that includes the Double-Cut and Join rearrangement operation and the insertion and deletion of genome segments. We evaluate our algorithm on simulated data and show that it is able to reconstruct very efficiently and accurately ancestral gene orders in a very comprehensive evolutionary model.
Collapse
Affiliation(s)
- Daniel Doerr
- Faculty of Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Cedric Chauve
- Department of Mathematic, Simon Fraser University, Canada
| |
Collapse
|
9
|
Abstract
Syntenies are genomic segments of consecutive genes identified by a certain conservation in gene content and order. The notion of conservation may vary from one definition to another, the more constrained requiring identical gene contents and gene orders, while more relaxed definitions just require a certain similarity in gene content, and not necessarily in the same order. Regardless of the way they are identified, the goal is to characterize homologous genomic regions, i.e., regions deriving from a common ancestral region, reflecting a certain gene co-evolution that can enlighten important functional properties. In addition of being able to identify them, it is also necessary to infer the evolutionary history that has led from the ancestral segment to the extant ones. In this field, most algorithmic studies address the problem of inferring rearrangement scenarios explaining the disruption in gene order between segments with the same gene content, some of them extending the evolutionary model to gene insertion and deletion. However, syntenies also evolve through other events modifying their content in genes, such as duplications, losses or horizontal gene transfers, i.e., the movement of genes from one species to another. Although the reconciliation approach between a gene tree and a species tree addresses the problem of inferring such events for single-gene families, little effort has been dedicated to the generalization to segmental events and to syntenies. This paper reviews some of the main algorithmic methods for inferring ancestral syntenies and focus on those integrating both gene orders and gene trees.
Collapse
|
10
|
Fargeot L, Loot G, Prunier JG, Rey O, Veyssière C, Blanchet S. Patterns of Epigenetic Diversity in Two Sympatric Fish Species: Genetic vs. Environmental Determinants. Genes (Basel) 2021; 12:107. [PMID: 33467145 PMCID: PMC7830833 DOI: 10.3390/genes12010107] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 01/05/2021] [Accepted: 01/13/2021] [Indexed: 12/12/2022] Open
Abstract
Epigenetic components are hypothesized to be sensitive to the environment, which should permit species to adapt to environmental changes. In wild populations, epigenetic variation should therefore be mainly driven by environmental variation. Here, we tested whether epigenetic variation (DNA methylation) observed in wild populations is related to their genetic background, and/or to the local environment. Focusing on two sympatric freshwater fish species (Gobio occitaniae and Phoxinus phoxinus), we tested the relationships between epigenetic differentiation, genetic differentiation (using microsatellite and single nucleotide polymorphism (SNP) markers), and environmental distances between sites. We identify positive relationships between pairwise genetic and epigenetic distances in both species. Moreover, epigenetic marks better discriminated populations than genetic markers, especially in G. occitaniae. In G. occitaniae, both pairwise epigenetic and genetic distances were significantly associated to environmental distances between sites. Nonetheless, when controlling for genetic differentiation, the link between epigenetic differentiation and environmental distances was not significant anymore, indicating a noncausal relationship. Our results suggest that fish epigenetic variation is mainly genetically determined and that the environment weakly contributed to epigenetic variation. We advocate the need to control for the genetic background of populations when inferring causal links between epigenetic variation and environmental heterogeneity in wild populations.
Collapse
Affiliation(s)
- Laura Fargeot
- Centre National de la Recherche Scientifique (CNRS), Université Paul Sabatier (UPS), Station d’Ecologie Théorique et Expérimentale, UMR 5321, F-09200 Moulis, France;
| | - Géraldine Loot
- CNRS, UPS, École Nationale de Formation Agronomique (ENFA), UMR 5174 EDB (Laboratoire Évolution & Diversité Biologique), 118 route de Narbonne, F-31062 Toulouse CEDEX 4, France; (G.L.); (C.V.)
- Université Paul Sabatier (UPS), Institut Universitaire de France (IUF), F-75231 Paris CEDEX 05, France
| | - Jérôme G. Prunier
- Centre National de la Recherche Scientifique (CNRS), Université Paul Sabatier (UPS), Station d’Ecologie Théorique et Expérimentale, UMR 5321, F-09200 Moulis, France;
| | - Olivier Rey
- CNRS, Interaction Hôtes-Parasites-Environnements (IHPE), UMR 5244, F-66860 Perpignan, France;
| | - Charlotte Veyssière
- CNRS, UPS, École Nationale de Formation Agronomique (ENFA), UMR 5174 EDB (Laboratoire Évolution & Diversité Biologique), 118 route de Narbonne, F-31062 Toulouse CEDEX 4, France; (G.L.); (C.V.)
| | - Simon Blanchet
- Centre National de la Recherche Scientifique (CNRS), Université Paul Sabatier (UPS), Station d’Ecologie Théorique et Expérimentale, UMR 5321, F-09200 Moulis, France;
- CNRS, UPS, École Nationale de Formation Agronomique (ENFA), UMR 5174 EDB (Laboratoire Évolution & Diversité Biologique), 118 route de Narbonne, F-31062 Toulouse CEDEX 4, France; (G.L.); (C.V.)
| |
Collapse
|
11
|
Computational Evolutionary Biology. Adv Bioinformatics 2021. [DOI: 10.1007/978-981-33-6191-1_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
|
12
|
Paszek J, Tiuryn J, Górecki P. Minimizing genomic duplication episodes. Comput Biol Chem 2020; 89:107260. [PMID: 33038778 DOI: 10.1016/j.compbiolchem.2020.107260] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2020] [Accepted: 04/02/2020] [Indexed: 11/17/2022]
Abstract
BACKGROUND The genomic duplication study is fundamental to understand the process of evolution. In evolutionary molecular biology, many approaches focus on discovering the occurrences of gene duplications and multiple gene duplication episodes and their locations in the Tree of Life. To reconstruct such episodes, one can cluster single gene duplications inferred by reconciling a set of gene trees with a species tree. RESULTS We propose an efficient quadratic time algorithm to solve the problem of genomic duplication clustering, in which input gene trees are rooted, episode locations are restricted to preserve the minimal number of single gene duplications, clustering rules are described by minimum episodes method, and the goal is based on the recently introduced new approach to minimize the maximal number of duplication episodes on a single path, called here the MP score. Based on our theoretical results, we show new algorithmic relationships between the MP score and the minimum episodes (ME) score, defined as the minimal number of duplication episodes. CONCLUSIONS Our evaluation analysis on three empirical datasets demonstrates, that under the model in which the minimal number of duplications is preserved, the duplication clusterings with minimal MP score support the clusterings with the minimal total number of duplication episodes. AVAILABILITY The software is available at https://bitbucket.org/pgor17/rmp.
Collapse
Affiliation(s)
- Jarosław Paszek
- Warsaw University, Faculty of Mathematics, Informatics and Mechanics, Banacha 2, 02-097 Warsaw, Poland.
| | - Jerzy Tiuryn
- Warsaw University, Faculty of Mathematics, Informatics and Mechanics, Banacha 2, 02-097 Warsaw, Poland.
| | - Paweł Górecki
- Warsaw University, Faculty of Mathematics, Informatics and Mechanics, Banacha 2, 02-097 Warsaw, Poland.
| |
Collapse
|
13
|
Delabre M, El-Mabrouk N, Huber KT, Lafond M, Moulton V, Noutahi E, Castellanos MS. Evolution through segmental duplications and losses: a Super-Reconciliation approach. Algorithms Mol Biol 2020; 15:12. [PMID: 32508979 PMCID: PMC7249433 DOI: 10.1186/s13015-020-00171-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Accepted: 05/05/2020] [Indexed: 02/02/2023] Open
Abstract
The classical gene and species tree reconciliation, used to infer the history of gene gain and loss explaining the evolution of gene families, assumes an independent evolution for each family. While this assumption is reasonable for genes that are far apart in the genome, it is not appropriate for genes grouped into syntenic blocks, which are more plausibly the result of a concerted evolution. Here, we introduce the Super-Reconciliation problem which consists in inferring a history of segmental duplication and loss events (involving a set of neighboring genes) leading to a set of present-day syntenies from a single ancestral one. In other words, we extend the traditional Duplication-Loss reconciliation problem of a single gene tree, to a set of trees, accounting for segmental duplications and losses. Existency of a Super-Reconciliation depends on individual gene tree consistency. In addition, ignoring rearrangements implies that existency also depends on gene order consistency. We first show that the problem of reconstructing a most parsimonious Super-Reconciliation, if any, is NP-hard and give an exact exponential-time algorithm to solve it. Alternatively, we show that accounting for rearrangements in the evolutionary model, but still only minimizing segmental duplication and loss events, leads to an exact polynomial-time algorithm. We finally assess time efficiency of the former exponential time algorithm for the Duplication-Loss model on simulated datasets, and give a proof of concept on the opioid receptor genes.
Collapse
|
14
|
Mane AC, Lafond M, Feijao PC, Chauve C. The distance and median problems in the single-cut-or-join model with single-gene duplications. Algorithms Mol Biol 2020; 15:8. [PMID: 32391071 PMCID: PMC7197181 DOI: 10.1186/s13015-020-00169-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2020] [Accepted: 04/16/2020] [Indexed: 11/10/2022] Open
Abstract
Background. In the field of genome rearrangement algorithms, models accounting for gene duplication lead often to hard problems. For example, while computing the pairwise distance is tractable in most duplication-free models, the problem is NP-complete for most extensions of these models accounting for duplicated genes. Moreover, problems involving more than two genomes, such as the genome median and the Small Parsimony problem, are intractable for most duplication-free models, with some exceptions, for example the Single-Cut-or-Join (SCJ) model. Results. We introduce a variant of the SCJ distance that accounts for duplicated genes, in the context of directed evolution from an ancestral genome to a descendant genome where orthology relations between ancestral genes and their descendant are known. Our model includes two duplication mechanisms: single-gene tandem duplication and the creation of single-gene circular chromosomes. We prove that in this model, computing the directed distance and a parsimonious evolutionary scenario in terms of SCJ and single-gene duplication events can be done in linear time. We also show that the directed median problem is tractable for this distance, while the rooted median problem, where we assume that one of the given genomes is ancestral to the median, is NP-complete. We also describe an Integer Linear Program for solving this problem. We evaluate the directed distance and rooted median algorithms on simulated data. Conclusion. Our results provide a simple genome rearrangement model, extending the SCJ model to account for single-gene duplications, for which we prove a mix of tractability and hardness results. For the NP-complete rooted median problem, we design a simple Integer Linear Program. Our publicly available implementation of these algorithms for the directed distance and median problems allow to solve efficiently these problems on large instances.
Collapse
|
15
|
Rubert DP, Martinez FV, Stoye J, Doerr D. Analysis of local genome rearrangement improves resolution of ancestral genomic maps in plants. BMC Genomics 2020; 21:273. [PMID: 32299356 PMCID: PMC7160886 DOI: 10.1186/s12864-020-6609-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
BACKGROUND Computationally inferred ancestral genomes play an important role in many areas of genome research. We present an improved workflow for the reconstruction from highly diverged genomes such as those of plants. RESULTS Our work relies on an established workflow in the reconstruction of ancestral plants, but improves several steps of this process. Instead of using gene annotations for inferring the genome content of the ancestral sequence, we identify genomic markers through a process called genome segmentation. This enables us to reconstruct the ancestral genome from hundreds of thousands of markers rather than the tens of thousands of annotated genes. We also introduce the concept of local genome rearrangement, through which we refine syntenic blocks before they are used in the reconstruction of contiguous ancestral regions. With the enhanced workflow at hand, we reconstruct the ancestral genome of eudicots, a major sub-clade of flowering plants, using whole genome sequences of five modern plants. CONCLUSIONS Our reconstructed genome is highly detailed, yet its layout agrees well with that reported in Badouin et al. (2017). Using local genome rearrangement, not only the marker-based, but also the gene-based reconstruction of the eudicot ancestor exhibited increased genome content, evidencing the power of this novel concept.
Collapse
Affiliation(s)
- Diego P. Rubert
- Faculdade de Computação – FACOM, Universidade Federal de Mato Grosso do Sul – UFMS, Campo Grande, Brazil
- Faculty of Technology and Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany
| | - Fábio V. Martinez
- Faculdade de Computação – FACOM, Universidade Federal de Mato Grosso do Sul – UFMS, Campo Grande, Brazil
- Faculty of Technology and Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany
| | - Jens Stoye
- Faculty of Technology and Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany
| | - Daniel Doerr
- Faculty of Technology and Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany
| |
Collapse
|
16
|
Waterhouse RM, Aganezov S, Anselmetti Y, Lee J, Ruzzante L, Reijnders MJMF, Feron R, Bérard S, George P, Hahn MW, Howell PI, Kamali M, Koren S, Lawson D, Maslen G, Peery A, Phillippy AM, Sharakhova MV, Tannier E, Unger MF, Zhang SV, Alekseyev MA, Besansky NJ, Chauve C, Emrich SJ, Sharakhov IV. Evolutionary superscaffolding and chromosome anchoring to improve Anopheles genome assemblies. BMC Biol 2020; 18:1. [PMID: 31898513 PMCID: PMC6939337 DOI: 10.1186/s12915-019-0728-3] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2019] [Accepted: 11/26/2019] [Indexed: 11/18/2022] Open
Abstract
Background New sequencing technologies have lowered financial barriers to whole genome sequencing, but resulting assemblies are often fragmented and far from ‘finished’. Updating multi-scaffold drafts to chromosome-level status can be achieved through experimental mapping or re-sequencing efforts. Avoiding the costs associated with such approaches, comparative genomic analysis of gene order conservation (synteny) to predict scaffold neighbours (adjacencies) offers a potentially useful complementary method for improving draft assemblies. Results We evaluated and employed 3 gene synteny-based methods applied to 21 Anopheles mosquito assemblies to produce consensus sets of scaffold adjacencies. For subsets of the assemblies, we integrated these with additional supporting data to confirm and complement the synteny-based adjacencies: 6 with physical mapping data that anchor scaffolds to chromosome locations, 13 with paired-end RNA sequencing (RNAseq) data, and 3 with new assemblies based on re-scaffolding or long-read data. Our combined analyses produced 20 new superscaffolded assemblies with improved contiguities: 7 for which assignments of non-anchored scaffolds to chromosome arms span more than 75% of the assemblies, and a further 7 with chromosome anchoring including an 88% anchored Anopheles arabiensis assembly and, respectively, 73% and 84% anchored assemblies with comprehensively updated cytogenetic photomaps for Anopheles funestus and Anopheles stephensi. Conclusions Experimental data from probe mapping, RNAseq, or long-read technologies, where available, all contribute to successful upgrading of draft assemblies. Our evaluations show that gene synteny-based computational methods represent a valuable alternative or complementary approach. Our improved Anopheles reference assemblies highlight the utility of applying comparative genomics approaches to improve community genomic resources.
Collapse
Affiliation(s)
- Robert M Waterhouse
- Department of Ecology and Evolution, University of Lausanne, and Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland.
| | - Sergey Aganezov
- Department of Computer Science, Princeton University, Princeton, NJ, 08450, USA.,Department of Computer Science, Johns Hopkins University, Baltimore, MD, 21218, USA
| | | | - Jiyoung Lee
- The Interdisciplinary PhD Program in Genetics, Bioinformatics, and Computational Biology, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA
| | - Livio Ruzzante
- Department of Ecology and Evolution, University of Lausanne, and Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Maarten J M F Reijnders
- Department of Ecology and Evolution, University of Lausanne, and Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Romain Feron
- Department of Ecology and Evolution, University of Lausanne, and Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Sèverine Bérard
- ISEM, Univ Montpellier, CNRS, EPHE, IRD, Montpellier, France
| | - Phillip George
- Department of Entomology, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA
| | - Matthew W Hahn
- Departments of Biology and Computer Science, Indiana University, Bloomington, IN, 47405, USA
| | - Paul I Howell
- Centers for Disease Control and Prevention, Atlanta, GA, 30329, USA
| | - Maryam Kamali
- Department of Entomology, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA.,Department of Medical Entomology and Parasitology, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Daniel Lawson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Gareth Maslen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Ashley Peery
- Department of Entomology, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Maria V Sharakhova
- Department of Entomology, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA.,Laboratory of Ecology, Genetics and Environmental Protection, Tomsk State University, Tomsk, Russia, 634050
| | - Eric Tannier
- Laboratoire de Biométrie et Biologie Evolutive, Université Lyon 1, Unité Mixte de Recherche 5558 Centre National de la Recherche Scientifique, 69622, Villeurbanne, France.,Institut national de recherche en informatique et en automatique, Montbonnot, 38334, Grenoble, Rhône-Alpes, France
| | - Maria F Unger
- Eck Institute for Global Health and Department of Biological Sciences, University of Notre Dame, Galvin Life Sciences Building, Notre Dame, IN, 46556, USA
| | - Simo V Zhang
- Departments of Biology and Computer Science, Indiana University, Bloomington, IN, 47405, USA
| | - Max A Alekseyev
- Department of Mathematics and Computational Biology Institute, George Washington University, Ashburn, VA, 20147, USA
| | - Nora J Besansky
- Eck Institute for Global Health and Department of Biological Sciences, University of Notre Dame, Galvin Life Sciences Building, Notre Dame, IN, 46556, USA
| | - Cedric Chauve
- Department of Mathematics, Simon Fraser University, Burnaby, British Columbia, V5A 1S6, Canada
| | - Scott J Emrich
- Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN, 37996, USA
| | - Igor V Sharakhov
- The Interdisciplinary PhD Program in Genetics, Bioinformatics, and Computational Biology, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA. .,Department of Entomology, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA. .,Laboratory of Ecology, Genetics and Environmental Protection, Tomsk State University, Tomsk, Russia, 634050.
| |
Collapse
|
17
|
Wang J, Cui B, Zhao Y, Guo M. A New Algorithm for Identifying Genome Rearrangements in the Mammalian Evolution. Front Genet 2019; 10:1020. [PMID: 31737036 PMCID: PMC6828935 DOI: 10.3389/fgene.2019.01020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2019] [Accepted: 09/24/2019] [Indexed: 11/13/2022] Open
Abstract
Genome rearrangements are the evolutionary events on level of genomes. It is a global view on evolution research of species to analyze the genome rearrangements. We introduce a new method called RGRPT (recovering the genome rearrangements based on phylogenetic tree) used to identify the genome rearrangements. We test the RGRPT using simulated data. The results of experiments show that RGRPT have high sensitivity and specificity compared with other tools when to predict rearrangement events. We use RGRPT to predict the rearrangement events of six mammalian genomes (human, chimpanzee, rhesus macaque, mouse, rat, and dog). RGRPT has recognized a total of 1,157 rearrangement events for them at 10 kb resolution, including 858 reversals, 16 translocations, 249 transpositions, and 34 fusions/fissions. And RGRPT has recognized 475 rearrangement events for them at 50 kb resolution, including 332 reversals, 13 translocations, 94 transpositions, and 36 fusions/fissions. The code source of RGRPT is available from https://github.com/wangjuanimu/data-of-genome-rearrangement.
Collapse
Affiliation(s)
- Juan Wang
- School of Computer Science, Inner Mongolia University, Hohhot, China
| | - Bo Cui
- School of Computer Science, Inner Mongolia University, Hohhot, China
| | - Yulan Zhao
- School of Computer Science, Inner Mongolia University, Hohhot, China
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China.,Beijing University of Civil Engineering and Architecture, Beijing Key Laboratory of Intelligent Processing for Building Big Data, Beijing, China
| |
Collapse
|
18
|
McGrath C. Highlight: New Solutions and Open Questions in Computational Evolutionary Biology. Genome Biol Evol 2019; 11:3179-3180. [PMID: 31702001 PMCID: PMC6839029 DOI: 10.1093/gbe/evz237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/23/2019] [Indexed: 11/13/2022] Open
|
19
|
Duchemin W, Gence G, Arigon Chifolleau AM, Arvestad L, Bansal MS, Berry V, Boussau B, Chevenet F, Comte N, Davín AA, Dessimoz C, Dylus D, Hasic D, Mallo D, Planel R, Posada D, Scornavacca C, Szöllosi G, Zhang L, Tannier É, Daubin V. RecPhyloXML: a format for reconciled gene trees. Bioinformatics 2019; 34:3646-3652. [PMID: 29762653 PMCID: PMC6198865 DOI: 10.1093/bioinformatics/bty389] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Accepted: 05/09/2018] [Indexed: 12/21/2022] Open
Abstract
Motivation A reconciliation is an annotation of the nodes of a gene tree with evolutionary events—for example, speciation, gene duplication, transfer, loss, etc.—along with a mapping onto a species tree. Many algorithms and software produce or use reconciliations but often using different reconciliation formats, regarding the type of events considered or whether the species tree is dated or not. This complicates the comparison and communication between different programs. Results Here, we gather a consortium of software developers in gene tree species tree reconciliation to propose and endorse a format that aims to promote an integrative—albeit flexible—specification of phylogenetic reconciliations. This format, named recPhyloXML, is accompanied by several tools such as a reconciled tree visualizer and conversion utilities. Availability and implementation http://phylariane.univ-lyon1.fr/recphyloxml/.
Collapse
Affiliation(s)
- Wandrille Duchemin
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France.,INRIA Grenoble Rhône-Alpes, F-38334 Montbonnot, France.,MTA-ELTE Lendület Evolutionary Genomics Research Group, Budapest, Hungary.,Department of Biological Physics, Eötvös Loránd University, Budapest, Hungary
| | - Guillaume Gence
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France
| | - Anne-Muriel Arigon Chifolleau
- LIRMM, Université de Montpellier, CNRS, Montpellier, France.,Institut de Biologie Computationnelle (IBC), Montpellier, France
| | - Lars Arvestad
- Department of Mathematics, Stockholm University, Stockholm, Sweden.,Swedish e-Science Research Centre (SeRC), Stockholm, Sweden
| | - Mukul S Bansal
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA.,Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
| | - Vincent Berry
- LIRMM, Université de Montpellier, CNRS, Montpellier, France.,Institut de Biologie Computationnelle (IBC), Montpellier, France.,ISEM, CNRS, Université de Montpellier, IRD, EPHE, Montpellier, France
| | - Bastien Boussau
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France
| | - François Chevenet
- LIRMM, Université de Montpellier, CNRS, Montpellier, France.,MIVEGEC, CNRS 5290, IRD 224, Université de Montpellier, Montpellier, France
| | - Nicolas Comte
- INRIA Grenoble Rhône-Alpes, F-38334 Montbonnot, France
| | - Adrián A Davín
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France.,MTA-ELTE Lendület Evolutionary Genomics Research Group, Budapest, Hungary.,Department of Biological Physics, Eötvös Loránd University, Budapest, Hungary
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland.,Department of Genetics, Evolution and Environment, University College London, London, UK.,Department of Computer Science, University College London, London, UK.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - David Dylus
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | - Damir Hasic
- Department of Mathematics, Faculty of Science, University of Sarajevo, Sarajevo, Bosnia and Herzegovina
| | - Diego Mallo
- Virginia G. Piper Center for Personalized Diagnostics, Biodesign Institute, Arizona State University, Tempe, AZ, USA
| | - Rémi Planel
- Laboratoire d'Analyse Bio-informatique en Génomique et Métabolisme CNRS-UMR 8030, Commissariat à l'Énergie Atomique (CEA), Institut de Génomique, Genoscope, Evry, France
| | - David Posada
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain
| | - Celine Scornavacca
- Institut de Biologie Computationnelle (IBC), Montpellier, France.,ISEM, CNRS, Université de Montpellier, IRD, EPHE, Montpellier, France
| | - Gergely Szöllosi
- MTA-ELTE Lendület Evolutionary Genomics Research Group, Budapest, Hungary.,Department of Biological Physics, Eötvös Loránd University, Budapest, Hungary
| | - Louxin Zhang
- Department of Mathematics, National University of Singapore, Singapore, Singapore
| | - Éric Tannier
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France.,INRIA Grenoble Rhône-Alpes, F-38334 Montbonnot, France
| | - Vincent Daubin
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France
| |
Collapse
|
20
|
Feng S, Li H, Song F, Wang Y, Stejskal V, Cai W, Li Z. A novel mitochondrial genome fragmentation pattern in Liposcelis brunnea, the type species of the genus Liposcelis (Psocodea: Liposcelididae). Int J Biol Macromol 2019; 132:1296-1303. [DOI: 10.1016/j.ijbiomac.2019.04.034] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2019] [Revised: 03/22/2019] [Accepted: 04/05/2019] [Indexed: 10/27/2022]
|
21
|
Herbst L, Li H, Steel M. Quantifying the accuracy of ancestral state prediction in a phylogenetic tree under maximum parsimony. J Math Biol 2019; 78:1953-1979. [PMID: 30758663 DOI: 10.1007/s00285-019-01330-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2018] [Revised: 01/21/2019] [Indexed: 11/26/2022]
Abstract
In phylogenetic studies, biologists often wish to estimate the ancestral discrete character state at an interior vertex v of an evolutionary tree T from the states that are observed at the leaves of the tree. A simple and fast estimation method-maximum parsimony-takes the ancestral state at v to be any state that minimises the number of state changes in T required to explain its evolution on T. In this paper, we investigate the reconstruction accuracy of this estimation method further, under a simple symmetric model of state change, and obtain a number of new results, both for 2-state characters, and r-state characters ([Formula: see text]). Our results rely on establishing new identities and inequalities, based on a coupling argument that involves a simpler 'coin toss' approach to ancestral state reconstruction.
Collapse
Affiliation(s)
- Lina Herbst
- Institute of Mathematics and Computer Science, University of Greifswald, Greifswald, Germany
| | - Heyang Li
- School of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand
| | - Mike Steel
- Biomathematics Research Centre, University of Canterbury, Christchurch, New Zealand.
| |
Collapse
|
22
|
Dondi R, Lafond M, Scornavacca C. Reconciling multiple genes trees via segmental duplications and losses. Algorithms Mol Biol 2019; 14:7. [PMID: 30930955 PMCID: PMC6425616 DOI: 10.1186/s13015-019-0139-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2018] [Accepted: 02/23/2019] [Indexed: 01/18/2023] Open
Abstract
Reconciling gene trees with a species tree is a fundamental problem to understand the evolution of gene families. Many existing approaches reconcile each gene tree independently. However, it is well-known that the evolution of gene families is interconnected. In this paper, we extend a previous approach to reconcile a set of gene trees with a species tree based on segmental macro-evolutionary events, where segmental duplication events and losses are associated with cost δ and λ , respectively. We show that the problem is polynomial-time solvable when δ ≤ λ (via LCA-mapping), while if δ > λ the problem is NP-hard, even when λ = 0 and a single gene tree is given, solving a long standing open problem on the complexity of multi-gene reconciliation. On the positive side, we give a fixed-parameter algorithm for the problem, where the parameters are δ / λ and the number d of segmental duplications, of time complexity O ⌈ δ λ ⌉ d · n · δ λ . Finally, we demonstrate the usefulness of this algorithm on two previously studied real datasets: we first show that our method can be used to confirm or raise doubt on hypothetical segmental duplications on a set of 16 eukaryotes, then show how we can detect whole genome duplications in yeast genomes.
Collapse
Affiliation(s)
- Riccardo Dondi
- Dipartimento di Filosofia, Lettere, Comunicazione, Università degli Studi di Bergamo, Bergamo, Italy
| | - Manuel Lafond
- Department of Computer Science, Universitè de Sherbrooke, Sherbrooke, Canada
| | | |
Collapse
|
23
|
Gutiérrez-Velázquez MV, Almaraz-Abarca N, Herrera-Arrieta Y, Ávila-Reyes JA, González-Valdez LS, Torres-Ricario R, Uribe-Soto JN, Monreal-García HM. Comparison of the phenolic contents and epigenetic and genetic variability of wild and cultivated watercress ( Rorippa nasturtium var. aquaticum L.). ELECTRON J BIOTECHN 2018. [DOI: 10.1016/j.ejbt.2018.04.005] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
|
24
|
Anselmetti Y, Duchemin W, Tannier E, Chauve C, Bérard S. Phylogenetic signal from rearrangements in 18 Anopheles species by joint scaffolding extant and ancestral genomes. BMC Genomics 2018; 19:96. [PMID: 29764366 PMCID: PMC5954271 DOI: 10.1186/s12864-018-4466-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Background Genomes rearrangements carry valuable information for phylogenetic inference or the elucidation of molecular mechanisms of adaptation. However, the detection of genome rearrangements is often hampered by current deficiencies in data and methods: Genomes obtained from short sequence reads have generally very fragmented assemblies, and comparing multiple gene orders generally leads to computationally intractable algorithmic questions. Results We present a computational method, ADseq, which, by combining ancestral gene order reconstruction, comparative scaffolding and de novo scaffolding methods, overcomes these two caveats. ADseq provides simultaneously improved assemblies and ancestral genomes, with statistical supports on all local features. Compared to previous comparative methods, it runs in polynomial time, it samples solutions in a probabilistic space, and it can handle a significantly larger gene complement from the considered extant genomes, with complex histories including gene duplications and losses. We use ADseq to provide improved assemblies and a genome history made of duplications, losses, gene translocations, rearrangements, of 18 complete Anopheles genomes, including several important malaria vectors. We also provide additional support for a differentiated mode of evolution of the sex chromosome and of the autosomes in these mosquito genomes. Conclusions We demonstrate the method’s ability to improve extant assemblies accurately through a procedure simulating realistic assembly fragmentation. We study a debated issue regarding the phylogeny of the Gambiae complex group of Anopheles genomes in the light of the evolution of chromosomal rearrangements, suggesting that the phylogenetic signal they carry can differ from the phylogenetic signal carried by gene sequences, more prone to introgression. Electronic supplementary material The online version of this article (10.1186/s12864-018-4466-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yoann Anselmetti
- ISEM, Université de Montpellier, CNRS, IRD, EPHE, Montpellier, France.,Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR5558, 43 Boulevard du 11 novembre 1918, Villeurbanne cedex, 69622, France
| | - Wandrille Duchemin
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR5558, 43 Boulevard du 11 novembre 1918, Villeurbanne cedex, 69622, France.,INRIA Grenoble - Rhône-Alpes, 655 Avenue de l'Europe, Montbonnot-Saint-Martin, 38330, France
| | - Eric Tannier
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR5558, 43 Boulevard du 11 novembre 1918, Villeurbanne cedex, 69622, France.,INRIA Grenoble - Rhône-Alpes, 655 Avenue de l'Europe, Montbonnot-Saint-Martin, 38330, France
| | - Cedric Chauve
- Department of Mathematics, Simon Fraser University, 8888 University Drive, Burnaby, V5A1S6, BC, Canada
| | - Sèverine Bérard
- ISEM, Université de Montpellier, CNRS, IRD, EPHE, Montpellier, France.
| |
Collapse
|
25
|
Abstract
Background One of evolutionary molecular biology fundamental issues is to discover genomic duplication events and their correspondence to the species tree. Such events can be reconstructed by clustering single gene duplications inferred by reconciling a set of gene trees with a species tree. Results Here we propose the first solutions to the genomic duplication problem in which every reconciliation with the minimal number of single gene duplications is allowed and the method of clustering called minimum episodes under the assumption that input gene trees are unrooted. Conclusions We showed new theoretical properties of unrooted reconciliation for the duplication cost and apply them to design several exact and heuristic algorithms for solving the problem. Our evaluation study on empirical dataset confirmed several genomic duplication events from the literature and demonstrate that algorithms can be successfully applied.
Collapse
Affiliation(s)
- Jarosław Paszek
- Warsaw University, Faculty of Mathematics, Informatics and Mechanics, Banacha 2, Warsaw, 02-097, Poland.
| | - Paweł Górecki
- Warsaw University, Faculty of Mathematics, Informatics and Mechanics, Banacha 2, Warsaw, 02-097, Poland
| |
Collapse
|
26
|
Anselmetti Y, Luhmann N, Bérard S, Tannier E, Chauve C. Comparative Methods for Reconstructing Ancient Genome Organization. Methods Mol Biol 2018; 1704:343-362. [PMID: 29277873 DOI: 10.1007/978-1-4939-7463-4_13] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
Comparative genomics considers the detection of similarities and differences between extant genomes, and, based on more or less formalized hypotheses regarding the involved evolutionary processes, inferring ancestral states explaining the similarities and an evolutionary history explaining the differences. In this chapter, we focus on the reconstruction of the organization of ancient genomes into chromosomes. We review different methodological approaches and software, applied to a wide range of datasets from different kingdoms of life and at different evolutionary depths. We discuss relations with genome assembly, and potential approaches to validate computational predictions on ancient genomes that are almost always only accessible through these predictions.
Collapse
Affiliation(s)
- Yoann Anselmetti
- Institut des Sciences de l'Évolution, Université Montpellier 2, Montpellier, France
| | - Nina Luhmann
- Faculty of Technology, Bielefeld University, Bielefeld, Germany.,Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany.,International Research Training Group1906, Bielefeld University, Bielefeld, Germany
| | - Sèverine Bérard
- Institut des Sciences de l'Évolution, Université Montpellier 2, Montpellier, France
| | - Eric Tannier
- UMR CNRS 5558 - LBBE "Biométrie et Biologie Évolutive", Inria Grenoble Rhône-Alpes and University of Lyon, Lyon, France
| | - Cedric Chauve
- Department of Mathematics, Simon Fraser University, 8888 University Drive, Burnaby, BC, Canada, V5A 1S6.
| |
Collapse
|