1
|
Hartmann T, Middendorf M, Bernt M. Genome Rearrangement Analysis : Cut and Join Genome Rearrangements and Gene Cluster Preserving Approaches. Methods Mol Biol 2024; 2802:215-245. [PMID: 38819562 DOI: 10.1007/978-1-0716-3838-5_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Genome rearrangements are mutations that change the gene content of a genome or the arrangement of the genes on a genome. Several years of research on genome rearrangements have established different algorithmic approaches for solving some fundamental problems in comparative genomics based on gene order information. This review summarizes the literature on genome rearrangement analysis along two lines of research. The first line considers rearrangement models that are particularly well suited for a theoretical analysis. These models use rearrangement operations that cut chromosomes into fragments and then join the fragments into new chromosomes. The second line works with rearrangement models that reflect several biologically motivated constraints, e.g., the constraint that gene clusters have to be preserved. In this chapter, the border between algorithmically "easy" and "hard" rearrangement problems is sketched and a brief review is given on the available software tools for genome rearrangement analysis.
Collapse
Affiliation(s)
- Tom Hartmann
- Swarm Intelligence and Complex Systems Group, Institute of Computer Science, University Leipzig, Leipzig, Germany
| | - Martin Middendorf
- Swarm Intelligence and Complex Systems Group, Institute of Computer Science, University Leipzig, Leipzig, Germany.
| | | |
Collapse
|
2
|
Stevenson J, Terauds V, Sumner J. Rearrangement Events on Circular Genomes. Bull Math Biol 2023; 85:107. [PMID: 37749280 PMCID: PMC10520144 DOI: 10.1007/s11538-023-01209-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Accepted: 08/31/2023] [Indexed: 09/27/2023]
Abstract
Early literature on genome rearrangement modelling views the problem of computing evolutionary distances as an inherently combinatorial one. In particular, attention is given to estimating distances using the minimum number of events required to transform one genome into another. In hindsight, this approach is analogous to early methods for inferring phylogenetic trees from DNA sequences such as maximum parsimony-both are motivated by the principle that the true distance minimises evolutionary change, and both are effective if this principle is a true reflection of reality. Recent literature considers genome rearrangement under statistical models, continuing this parallel with DNA-based methods, with the goal of using model-based methods (for example maximum likelihood techniques) to compute distance estimates that incorporate the large number of rearrangement paths that can transform one genome into another. Crucially, this approach requires one to decide upon a set of feasible rearrangement events and, in this paper, we focus on characterising well-motivated models for signed, uni-chromosomal circular genomes, where the number of regions remains fixed. Since rearrangements are often mathematically described using permutations, we isolate the sets of permutations representing rearrangements that are biologically reasonable in this context, for example inversions and transpositions. We provide precise mathematical expressions for these rearrangements, and then describe them in terms of the set of cuts made in the genome when they are applied. We directly compare cuts to breakpoints, and use this concept to count the distinct rearrangement actions which apply a given number of cuts. Finally, we provide some examples of rearrangement models, and include a discussion of some questions that arise when defining plausible models.
Collapse
Affiliation(s)
| | - Venta Terauds
- University of Tasmania, Hobart, Australia
- University of South Australia, Adelaide, Australia
| | | |
Collapse
|
3
|
Zabelkin A, Avdeyev P, Alexeev N. TruEst: a better estimator of evolutionary distance under the INFER model. J Math Biol 2023; 87:25. [PMID: 37423919 DOI: 10.1007/s00285-023-01955-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2022] [Revised: 06/11/2023] [Accepted: 06/15/2023] [Indexed: 07/11/2023]
Abstract
Genome rearrangements are evolutionary events that shuffle genomic architectures. The number of genome rearrangements that happened between two genomes is often used as the evolutionary distance between these species. This number is often estimated as the minimum number of genome rearrangements required to transform one genome into another which are only reliable for closely-related genomes. These estimations often underestimate the evolutionary distance for genomes that have substantially evolved from each other, and advanced statistical methods can be used to improve accuracy. Several statistical estimators have been developed, under various evolutionary models, of which the most complete one, INFER, takes into account different degrees of genome fragility. We present TruEst-an efficient tool that estimates the evolutionary distance between the genomes under the INFER model of genome rearrangements. We apply our method to both simulated and real data. It shows high accuracy on the simulated data. On the real datasets of mammal genomes the method found several pairs of genomes for which the estimated distances are in high consistency with the previous ancestral reconstruction studies.
Collapse
Affiliation(s)
- Alexey Zabelkin
- International Laboratory "Computer Technologies", ITMO University, Saint Petersburg, Russia.
| | - Pavel Avdeyev
- Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, Texas, USA
| | | |
Collapse
|
4
|
Terauds V, Stevenson J, Sumner J. A symmetry-inclusive algebraic approach to genome rearrangement. J Bioinform Comput Biol 2021; 19:2140015. [PMID: 34806949 DOI: 10.1142/s0219720021400151] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Of the many modern approaches to calculating evolutionary distance via models of genome rearrangement, most are tied to a particular set of genomic modeling assumptions and to a restricted class of allowed rearrangements. The "position paradigm", in which genomes are represented as permutations signifying the position (and orientation) of each region, enables a refined model-based approach, where one can select biologically plausible rearrangements and assign to them relative probabilities/costs. Here, one must further incorporate any underlying structural symmetry of the genomes into the calculations and ensure that this symmetry is reflected in the model. In our recently-introduced framework of genome algebras, each genome corresponds to an element that simultaneously incorporates all of its inherent physical symmetries. The representation theory of these algebras then provides a natural model of evolution via rearrangement as a Markov chain. Whilst the implementation of this framework to calculate distances for genomes with "practical" numbers of regions is currently computationally infeasible, we consider it to be a significant theoretical advance: one can incorporate different genomic modeling assumptions, calculate various genomic distances, and compare the results under different rearrangement models. The aim of this paper is to demonstrate some of these features.
Collapse
Affiliation(s)
- Venta Terauds
- Discipline of Mathematics, University of Tasmania, Private Bag 37, Sandy Bay, Tasmania 7001, Australia
| | - Joshua Stevenson
- Discipline of Mathematics, University of Tasmania, Private Bag 37, Sandy Bay, Tasmania 7001, Australia
| | - Jeremy Sumner
- Discipline of Mathematics, University of Tasmania, Private Bag 37, Sandy Bay, Tasmania 7001, Australia
| |
Collapse
|
5
|
Chakraborty M, Chang CH, Khost DE, Vedanayagam J, Adrion JR, Liao Y, Montooth KL, Meiklejohn CD, Larracuente AM, Emerson JJ. Evolution of genome structure in the Drosophila simulans species complex. Genome Res 2021; 31:380-396. [PMID: 33563718 PMCID: PMC7919458 DOI: 10.1101/gr.263442.120] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Accepted: 12/28/2020] [Indexed: 12/25/2022]
Abstract
The rapid evolution of repetitive DNA sequences, including satellite DNA, tandem duplications, and transposable elements, underlies phenotypic evolution and contributes to hybrid incompatibilities between species. However, repetitive genomic regions are fragmented and misassembled in most contemporary genome assemblies. We generated highly contiguous de novo reference genomes for the Drosophila simulans species complex (D. simulans, D. mauritiana, and D. sechellia), which speciated ∼250,000 yr ago. Our assemblies are comparable in contiguity and accuracy to the current D. melanogaster genome, allowing us to directly compare repetitive sequences between these four species. We find that at least 15% of the D. simulans complex species genomes fail to align uniquely to D. melanogaster owing to structural divergence-twice the number of single-nucleotide substitutions. We also find rapid turnover of satellite DNA and extensive structural divergence in heterochromatic regions, whereas the euchromatic gene content is mostly conserved. Despite the overall preservation of gene synteny, euchromatin in each species has been shaped by clade- and species-specific inversions, transposable elements, expansions and contractions of satellite and tRNA tandem arrays, and gene duplications. We also find rapid divergence among Y-linked genes, including copy number variation and recent gene duplications from autosomes. Our assemblies provide a valuable resource for studying genome evolution and its consequences for phenotypic evolution in these genetic model species.
Collapse
Affiliation(s)
- Mahul Chakraborty
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, California 92697, USA
| | - Ching-Ho Chang
- Department of Biology, University of Rochester, Rochester, New York 14627, USA
| | - Danielle E Khost
- Department of Biology, University of Rochester, Rochester, New York 14627, USA
- FAS Informatics and Scientific Applications, Harvard University, Cambridge, Massachusetts 02138, USA
| | - Jeffrey Vedanayagam
- Department of Developmental Biology, Memorial Sloan-Kettering Cancer Center, New York, New York 10065, USA
| | - Jeffrey R Adrion
- Institute of Ecology and Evolution, University of Oregon, Eugene, Oregon 97403, USA
| | - Yi Liao
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, California 92697, USA
| | - Kristi L Montooth
- School of Biological Sciences, University of Nebraska-Lincoln, Lincoln, Nebraska 68502, USA
| | - Colin D Meiklejohn
- School of Biological Sciences, University of Nebraska-Lincoln, Lincoln, Nebraska 68502, USA
| | | | - J J Emerson
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, California 92697, USA
| |
Collapse
|
6
|
|
7
|
Avdeyev P, Alexeev N, Rong Y, Alekseyev MA. A unified ILP framework for core ancestral genome reconstruction problems. Bioinformatics 2020; 36:2993-3003. [PMID: 32058559 DOI: 10.1093/bioinformatics/btaa100] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2018] [Revised: 12/06/2019] [Accepted: 02/07/2020] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION One of the key computational problems in comparative genomics is the reconstruction of genomes of ancestral species based on genomes of extant species. Since most dramatic changes in genomic architectures are caused by genome rearrangements, this problem is often posed as minimization of the number of genome rearrangements between extant and ancestral genomes. The basic case of three given genomes is known as the genome median problem. Whole-genome duplications (WGDs) represent yet another type of dramatic evolutionary events and inspire the reconstruction of preduplicated ancestral genomes, referred to as the genome halving problem. Generalization of WGDs to whole-genome multiplication events leads to the genome aliquoting problem. RESULTS In this study, we propose polynomial-size integer linear programming (ILP) formulations for the aforementioned problems. We further obtain such formulations for the restricted and conserved versions of the median and halving problems, which have been recently introduced to improve biological relevance of the solutions. Extensive evaluation of solutions to the different ILP problems demonstrates their good accuracy. Furthermore, since the ILP formulations for the conserved versions have linear size, they provide a novel practical approach to ancestral genome reconstruction, which combines the advantages of homology- and rearrangements-based methods. AVAILABILITY AND IMPLEMENTATION Code and data are available in https://github.com/AvdeevPavel/ILP-WGD-reconstructor. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pavel Avdeyev
- Department of Mathematics, The George Washington University, Washington, DC 20052, USA
| | - Nikita Alexeev
- Computer Technologies Laboratory, ITMO University, Saint Petersburg, 197101, Russia
| | - Yongwu Rong
- Department of Mathematics, Queens College, City University of New York, Flushing, NY 11367, USA
| | - Max A Alekseyev
- Department of Mathematics, The George Washington University, Washington, DC 20052, USA.,Department of Biostatistics and Bioinformatics, The George Washington University, Washington, DC 20052, USA
| |
Collapse
|
8
|
Xia R, Lin Y, Zhou J, Geng T, Feng B, Tang J. Phylogenetic Reconstruction for Copy-Number Evolution Problems. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:694-699. [PMID: 29993694 DOI: 10.1109/tcbb.2018.2829698] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Cancer is known for its heterogeneity and is regarded as an evolutionary process driven by somatic mutations and clonal expansions. This evolutionary process can be modeled by a phylogenetic tree and phylogenetic analysis of multiple subclones of cancer cells can facilitate the study of the tumor variants progression. Copy-number aberration occurs frequently in many types of tumors in terms of segmental amplifications and deletions. In this paper, we developed a distance-based method for reconstructing phylogenies from copy-number profiles of cancer cells. We demonstrate the importance of distance correction from the edit (minimum) distance to the estimated actual number of events. Experimental results show that our approaches provide accurate and scalable results in estimating the actual number of evolutionary events between copy number profiles and in reconstructing phylogenies.
Collapse
|
9
|
Maximum Likelihood Estimates of Rearrangement Distance: Implementing a Representation-Theoretic Approach. Bull Math Biol 2018; 81:535-567. [PMID: 30264286 DOI: 10.1007/s11538-018-0511-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2017] [Accepted: 09/18/2018] [Indexed: 10/28/2022]
Abstract
The calculation of evolutionary distance via models of genome rearrangement has an inherent combinatorial complexity. Various algorithms and estimators have been used to address this; however, many of these set quite specific conditions for the underlying model. A recently proposed technique, applying representation theory to calculate evolutionary distance between circular genomes as a maximum likelihood estimate, reduces the computational load by converting the combinatorial problem into a numerical one. We show that the technique may be applied to models with any choice of rearrangements and relative probabilities thereof; we then investigate the symmetry of circular genome rearrangement models in general. We discuss the practical implementation of the technique and, without introducing any bona fide numerical approximations, give the results of some initial calculations for genomes with up to 11 regions.
Collapse
|
10
|
Avdeyev P, Jiang S, Alekseyev MA. Implicit Transpositions in DCJ Scenarios. Front Genet 2018; 8:212. [PMID: 29312438 PMCID: PMC5733028 DOI: 10.3389/fgene.2017.00212] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2017] [Accepted: 11/29/2017] [Indexed: 11/13/2022] Open
Abstract
Genome rearrangements are large-scale evolutionary events that shuffle genomic architectures. The minimal number of such events between two genomes is often used in phylogenomic studies to measure the evolutionary distance between the genomes. Double-Cut-and-Join (DCJ) operations represent a convenient model of most common genome rearrangements (reversals, translocations, fissions, and fusions), while other genome rearrangements, such as transpositions, can be modeled by pairs of DCJs. Since the DCJ model does not directly account for transpositions, their impact on DCJ scenarios is unclear. In the present work, we study implicit appearance of transpositions (as pairs of DCJs) in DCJ scenarios. We consider shortest DCJ scenarios satisfying the maximum parsimony assumption, as well as more general DCJ scenarios based on some realistic but less restrictive assumptions. In both cases, we derive a uniform lower bound for the rate of implicit transpositions, which depends only on the genomes but not a particular DCJ scenario between them. Our results imply that implicit appearance of transpositions in DCJ scenarios may be unavoidable or even abundant for some pairs of genomes. We estimate that for mammalian genomes implicit transpositions constitute at least 6% of genome rearrangements.
Collapse
Affiliation(s)
- Pavel Avdeyev
- Department of Mathematics and the Computational Biology Institute, George Washington University, Washington, DC, United States
| | - Shuai Jiang
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, United States
| | - Max A Alekseyev
- Department of Mathematics and the Computational Biology Institute, George Washington University, Washington, DC, United States
| |
Collapse
|
11
|
Genome Rearrangement Analysis: Cut and Join Genome Rearrangements and Gene Cluster Preserving Approaches. Methods Mol Biol 2018; 1704:261-289. [PMID: 29277869 DOI: 10.1007/978-1-4939-7463-4_9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Genome rearrangements are mutations that change the gene content of a genome or the arrangement of the genes on a genome. Several years of research on genome rearrangements have established different algorithmic approaches for solving some fundamental problems in comparative genomics based on gene order information. This review summarizes the literature on genome rearrangement analysis along two lines of research. The first line considers rearrangement models that are particularly well suited for a theoretical analysis. These models use rearrangement operations that cut chromosomes into fragments and then join the fragments into new chromosomes. The second line works with rearrangement models that reflect several biologically motivated constraints, e.g., the constraint that gene clusters have to be preserved. In this chapter, the border between algorithmically "easy" and "hard" rearrangement problems is sketched and a brief review is given on the available software tools for genome rearrangement analysis.
Collapse
|
12
|
Abstract
Background The ability to estimate the evolutionary distance between extant genomes plays a crucial role in many phylogenomic studies. Often such estimation is based on the parsimony assumption, implying that the distance between two genomes can be estimated as the rearrangement distance equal the minimal number of genome rearrangements required to transform one genome into the other. However, in reality the parsimony assumption may not always hold, emphasizing the need for estimation that does not rely on the rearrangement distance. The distance that accounts for the actual (rather than minimal) number of rearrangements between two genomes is often referred to as the true evolutionary distance. While there exists a method for the true evolutionary distance estimation, it however assumes that genomes can be broken by rearrangements equally likely at any position in the course of evolution. This assumption, known as the random breakage model, has recently been refuted in favor of the more rigorous fragile breakage model postulating that only certain “fragile” genomic regions are prone to rearrangements. Results We propose a new method for estimating the true evolutionary distance between two genomes under the fragile breakage model. We evaluate the proposed method on simulated genomes, which show its high accuracy. We further apply the proposed method for estimation of evolutionary distances within a set of five yeast genomes and a set of two fish genomes. Conclusions The true evolutionary distances between the five yeast genomes estimated with the proposed method reveals that some pairs of yeast genomes violate the parsimony assumption. The proposed method further demonstrates that the rearrangement distance between the two fish genomes underestimates their evolutionary distance by about 20%. These results demonstrate how drastically the two distances can differ and justify the use of true evolutionary distance in phylogenomic studies.
Collapse
Affiliation(s)
- Nikita Alexeev
- Computational Biology Institute at the George Washington University, Ashburn, 20147, VA, USA.
| | - Max A Alekseyev
- Computational Biology Institute at the George Washington University, Ashburn, 20147, VA, USA
| |
Collapse
|
13
|
Abstract
The history of particular genes and that of the species that carry them can be different for a variety of reasons. In particular, gene trees and species trees can differ due to well-known evolutionary processes such as gene duplication and loss, lateral gene transfer, or incomplete lineage sorting. Species tree reconstruction methods have been developed to take this incongruence into account; these can be divided grossly into supertree and supermatrix approaches. Here we introduce a new Bayesian hierarchical model that we have recently developed and implemented in the program guenomu. The new model considers multiple sources of gene tree/species tree disagreement. Guenomu takes as input posterior distributions of unrooted gene tree topologies for multiple gene families, in order to estimate the posterior distribution of rooted species tree topologies.
Collapse
Affiliation(s)
- Leonardo de Oliveira Martins
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain.
- Department of Materials, Imperial College London, London, UK.
| | - David Posada
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain
| |
Collapse
|
14
|
Biller P, Guéguen L, Knibbe C, Tannier E. Breaking Good: Accounting for Fragility of Genomic Regions in Rearrangement Distance Estimation. Genome Biol Evol 2016; 8:1427-39. [PMID: 27190002 PMCID: PMC4898800 DOI: 10.1093/gbe/evw083] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Models of evolution by genome rearrangements are prone to two types of flaws: One is to ignore the diversity of susceptibility to breakage across genomic regions, and the other is to suppose that susceptibility values are given. Without necessarily supposing their precise localization, we call "solid" the regions that are improbably broken by rearrangements and "fragile" the regions outside solid ones. We propose a model of evolution by inversions where breakage probabilities vary across fragile regions and over time. It contains as a particular case the uniform breakage model on the nucleotidic sequence, where breakage probabilities are proportional to fragile region lengths. This is very different from the frequently used pseudouniform model where all fragile regions have the same probability to break. Estimations of rearrangement distances based on the pseudouniform model completely fail on simulations with the truly uniform model. On pairs of amniote genomes, we show that identifying coding genes with solid regions yields incoherent distance estimations, especially with the pseudouniform model, and to a lesser extent with the truly uniform model. This incoherence is solved when we coestimate the number of fragile regions with the rearrangement distance. The estimated number of fragile regions is surprisingly small, suggesting that a minority of regions are recurrently used by rearrangements. Estimations for several pairs of genomes at different divergence times are in agreement with a slowly evolvable colocalization of active genomic regions in the cell.
Collapse
Affiliation(s)
- Priscila Biller
- INRIA Grenoble Rhône-Alpes, Montbonnot, France University of Campinas, São Paulo, Brazil
| | | | - Carole Knibbe
- INRIA Grenoble Rhône-Alpes, Montbonnot, France Université Lyon 1, LIRIS, UMR5205, Villeurbanne, France
| | - Eric Tannier
- INRIA Grenoble Rhône-Alpes, Montbonnot, France Université Lyon 1, LBBE, UMR5558, Villeurbanne, France
| |
Collapse
|
15
|
Watanabe S, Fučíková K, Lewis LA, Lewis PO. Hiding in plain sight: Koshicola spirodelophila gen. et sp. nov. (Chaetopeltidales, Chlorophyceae), a novel green alga associated with the aquatic angiosperm Spirodela polyrhiza. AMERICAN JOURNAL OF BOTANY 2016; 103:865-75. [PMID: 27208355 DOI: 10.3732/ajb.1500481] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/13/2015] [Accepted: 03/04/2016] [Indexed: 05/25/2023]
Abstract
PREMISE OF THE STUDY Discovery and morphological characterization of a novel epiphytic aquatic green alga increases our understanding of Chaetopeltidales, a poorly known order in Chlorophyceae. Chloroplast genomic data from this taxon reveals an unusual architecture previously unknown in green algae. METHODS Using light and electron microscopy, we characterized the morphology and ultrastructure of a novel taxon of green algae. Bayesian phylogenetic analyses of nuclear and plastid genes were used to test the hypothesized membership of this taxon in order Chaetopeltidales. With next-generation sequence data, we assembled the plastid genome of this novel taxon and compared its gene content and architecture to that of related species to further investigate plastid genome traits. KEY RESULTS The morphology and ultrastructure of this alga are consistent with placement in Chaetopeltidales (Chlorophyceae), but a distinct trait combination supports recognition of this alga as a new genus and species-Koshicola spirodelophila gen. et sp. nov. Its placement in the phylogeny as a descendant of a deep division in the Chaetopeltidales is supported by analysis of molecular data sets. The chloroplast genome is among the largest reported in green algae and the genes are distributed on three large (rather than a single) chromosome, in contrast to other studied green algae. CONCLUSIONS The discovery of Koshicola spirodelophila gen. et sp. nov. highlights the importance of investigating even commonplace habitats to explore new microalgal diversity. This work expands our understanding of the morphological and chloroplast genomic features of green algae, and in particular those of the poorly studied Chaetopeltidales.
Collapse
Affiliation(s)
- Shin Watanabe
- Department of Biology, Graduate School of Science and Engineering, University of Toyama, 3190 Gofuku, Toyama 930-8555, Japan
| | - Karolina Fučíková
- Department of Ecology and Evolutionary Biology, University of Connecticut, 75 North Eagleville Road, Storrs, Connecticut 06269 USA
| | - Louise A Lewis
- Department of Ecology and Evolutionary Biology, University of Connecticut, 75 North Eagleville Road, Storrs, Connecticut 06269 USA
| | - Paul O Lewis
- Department of Ecology and Evolutionary Biology, University of Connecticut, 75 North Eagleville Road, Storrs, Connecticut 06269 USA
| |
Collapse
|
16
|
|
17
|
Abstract
We study statistical estimators of the number of genomic events separating two genomes under a Double Cut-and Join (DCJ) rearrangement model, by a method of moment estimation. We first propose an exact, closed, analytically invertible formula for the expected number of breakpoints after a given number of DCJs. This improves over the heuristic, recursive and computationally slower previously proposed one. Then we explore the analogies of genome evolution by DCJ with evolution of binary sequences under substitutions, permutations under transpositions, and random graphs. Each of these are presented in the literature with intuitive justifications, and are used to import results from better known fields. We formalize the relations by proving a correspondence between moments in sequence and genome evolution, provided substitutions appear four by four in the corresponding model. Eventually we prove a bounded error on two estimators of the number of cycles in the breakpoint graph after a given number of rearrangements, by an analogy with cycles in permutations and components in random graphs.
Collapse
Affiliation(s)
- Priscila Biller
- Institute of Computing, University of Campinas, São Paulo, Brazil
- Institut National de Recherche en Informatique et en Automatique (INRIA) Grenoble Rhône-Alps, 655 avenue de L'Europe, 38330 Montbonnot, France
| | - Laurent Guéguen
- Laboratoire de Biométrie et Biologie Évolutive, LBBE, UMR CNRS 5558, University of Lyon 1, 43 boulevard du 11 novembre 1918, 69622, Villeurbanne, France
| | - Eric Tannier
- Institut National de Recherche en Informatique et en Automatique (INRIA) Grenoble Rhône-Alps, 655 avenue de L'Europe, 38330 Montbonnot, France
- Laboratoire de Biométrie et Biologie Évolutive, LBBE, UMR CNRS 5558, University of Lyon 1, 43 boulevard du 11 novembre 1918, 69622, Villeurbanne, France
| |
Collapse
|
18
|
Hu F, Lin Y, Tang J. MLGO: phylogeny reconstruction and ancestral inference from gene-order data. BMC Bioinformatics 2014; 15:354. [PMID: 25376663 PMCID: PMC4236499 DOI: 10.1186/s12859-014-0354-6] [Citation(s) in RCA: 79] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2014] [Accepted: 10/16/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The rapid accumulation of whole-genome data has renewed interest in the study of using gene-order data for phylogenetic analyses and ancestral reconstruction. Current software and web servers typically do not support duplication and loss events along with rearrangements. RESULTS MLGO (Maximum Likelihood for Gene-Order Analysis) is a web tool for the reconstruction of phylogeny and/or ancestral genomes from gene-order data. MLGO is based on likelihood computation and shows advantages over existing methods in terms of accuracy, scalability and flexibility. CONCLUSIONS To the best of our knowledge, it is the first web tool for analysis of large-scale genomic changes including not only rearrangements but also gene insertions, deletions and duplications. The web tool is available from http://www.geneorder.org/server.php .
Collapse
Affiliation(s)
- Fei Hu
- Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin University, Tianjin, 300072, China. .,Department of Computer Science and Engineering, University of South Carolina, Columbia, 29208, SC, USA.
| | - Yu Lin
- Department of Computer Science and Engineering, University of California, San Diego, 92093 La Jolla, CA, USA.
| | - Jijun Tang
- Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin University, Tianjin, 300072, China. .,Department of Computer Science and Engineering, University of South Carolina, Columbia, 29208, SC, USA.
| |
Collapse
|
19
|
Dempsey K, Currall B, Hallworth R, Ali H. A New Approach for Sequence Analysis. Bioinformatics 2013. [DOI: 10.4018/978-1-4666-3604-0.ch079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
Understanding the structure-function relationship of proteins offers the key to biological processes, and can offer knowledge for better investigation of matters with widespread impact, such as pathological disease and drug intervention. This relationship is dictated at the simplest level by the primary protein sequence. Since useful structures and functions are conserved within biology, a sequence with known structure-function relationship can be compared to related sequences to aid in novel structure-function prediction. Sequence analysis provides a means for suggesting evolutionary relationships, and inferring structural or functional similarity. It is crucial to consider these parameters while comparing sequences as they influence both the algorithms used and the implications of the results. For example, proteins that are closely related on an evolutionary time scale may have very similar structure, but entirely different functions. In contrast, proteins which have undergone convergent evolution may have dissimilar primary structure, but perform similar functions. This chapter details how the aspects of evolution, structure, and function can be taken into account when performing sequence analysis, and proposes an expansion on traditional approaches resulting in direct improvement of said analysis. This model is applied to a case study in the prestin protein and shows that the proposed approach provides a better understanding of input and output and can improve the performance of sequence analysis by means of motif detection software.
Collapse
Affiliation(s)
- Kathryn Dempsey
- University of Nebraska at Omaha, USA & University of Nebraska Medical Center, USA
| | | | | | - Hesham Ali
- University of Nebraska at Omaha, USA & University of Nebraska Medical Center, USA
| |
Collapse
|
20
|
Lin Y, Hu F, Tang J, Moret BM. Maximum likelihood phylogenetic reconstruction from high-resolution whole-genome data and a tree of 68 eukaryotes. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2013:285-96. [PMID: 23424133 PMCID: PMC3712796] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
The rapid accumulation of whole-genome data has renewed interest in the study of the evolution of genomic architecture, under such events as rearrangements, duplications, losses. Comparative genomics, evolutionary biology, and cancer research all require tools to elucidate the mechanisms, history, and consequences of those evolutionary events, while phylogenetics could use whole-genome data to enhance its picture of the Tree of Life. Current approaches in the area of phylogenetic analysis are limited to very small collections of closely related genomes using low-resolution data (typically a few hundred syntenic blocks); moreover, these approaches typically do not include duplication and loss events. We describe a maximum likelihood (ML) approach for phylogenetic analysis that takes into account genome rearrangements as well as duplications, insertions, and losses. Our approach can handle high-resolution genomes (with 40,000 or more markers) and can use in the same analysis genomes with very different numbers of markers. Because our approach uses a standard ML reconstruction program (RAxML), it scales up to large trees. We present the results of extensive testing on both simulated and real data showing that our approach returns very accurate results very quickly. In particular, we analyze a dataset of 68 high-resolution eukaryotic genomes, with from 3,000 to 42,000 genes, from the eGOB database; the analysis, including bootstrapping, takes just 3 hours on a desktop system and returns a tree in agreement with all well supported branches, while also suggesting resolutions for some disputed placements.
Collapse
Affiliation(s)
- Yu Lin
- Laboratory for Computational Biology and Bioinformatics, EPFL, Lausanne VD, CH-1015, Switzerland
| | - Fei Hu
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC 29208, USA
| | - Jijun Tang
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC 29208, USA
| | - Bernard M.E. Moret
- Laboratory for Computational Biology and Bioinformatics, EPFL, Lausanne VD, CH-1015, Switzerland
| |
Collapse
|
21
|
Luo H, Arndt W, Zhang Y, Shi G, Alekseyev M, Tang J, Hughes AL, Friedman R. Phylogenetic analysis of genome rearrangements among five mammalian orders. Mol Phylogenet Evol 2012; 65:871-82. [PMID: 22929217 PMCID: PMC4425404 DOI: 10.1016/j.ympev.2012.08.008] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2012] [Revised: 08/11/2012] [Accepted: 08/13/2012] [Indexed: 01/16/2023]
Abstract
Evolutionary relationships among placental mammalian orders have been controversial. Whole genome sequencing and new computational methods offer opportunities to resolve the relationships among 10 genomes belonging to the mammalian orders Primates, Rodentia, Carnivora, Perissodactyla and Artiodactyla. By application of the double cut and join distance metric, where gene order is the phylogenetic character, we computed genomic distances among the sampled mammalian genomes. With a marsupial outgroup, the gene order tree supported a topology in which Rodentia fell outside the cluster of Primates, Carnivora, Perissodactyla, and Artiodactyla. Results of breakpoint reuse rate and synteny block length analyses were consistent with the prediction of random breakage model, which provided a diagnostic test to support use of gene order as an appropriate phylogenetic character in this study. We discussed the influence of rate differences among lineages and other factors that may contribute to different resolutions of mammalian ordinal relationships by different methods of phylogenetic reconstruction.
Collapse
Affiliation(s)
- Haiwei Luo
- Department of Biological Sciences, University of South Carolina, Columbia 29208, USA
| | - William Arndt
- Department of Computer Science and Engineering, University of South Carolina, Columbia 29208, USA
| | - Yiwei Zhang
- Department of Computer Science and Engineering, University of South Carolina, Columbia 29208, USA
| | - Guanqun Shi
- Department of Computer Science, University of California, Riverside, 92521, USA
| | - Max Alekseyev
- Department of Computer Science and Engineering, University of South Carolina, Columbia 29208, USA
| | - Jijun Tang
- Department of Computer Science and Engineering, University of South Carolina, Columbia 29208, USA
| | - Austin L. Hughes
- Department of Biological Sciences, University of South Carolina, Columbia 29208, USA
| | - Robert Friedman
- Department of Biological Sciences, University of South Carolina, Columbia 29208, USA
| |
Collapse
|
22
|
Lin Y, Rajan V, Moret BME. TIBA: a tool for phylogeny inference from rearrangement data with bootstrap analysis. Bioinformatics 2012; 28:3324-5. [DOI: 10.1093/bioinformatics/bts603] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
23
|
Hilker R, Sickinger C, Pedersen CNS, Stoye J. UniMoG--a unifying framework for genomic distance calculation and sorting based on DCJ. Bioinformatics 2012; 28:2509-11. [PMID: 22815356 PMCID: PMC3463123 DOI: 10.1093/bioinformatics/bts440] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
Abstract
Summary: UniMoG is a software combining five genome rearrangement models: double cut and join (DCJ), restricted DCJ, Hannenhalli and Pevzner (HP), inversion and translocation. It can compute the pairwise genomic distances and a corresponding optimal sorting scenario for an arbitrary number of genomes. All five models can be unified through the DCJ model, thus the implementation is based on DCJ and, where reasonable, uses the most efficient existing algorithms for each distance and sorting problem. Both textual and graphical output is possible for visualizing the operations. Availability and implementation: The software is available through the Bielefeld University Bioinformatics Web Server at http://bibiserv.techfak.uni-bielefeld.de/dcj with instructions and example data. Contact:rhilker@cebitec.uni-bielefeld.de
Collapse
Affiliation(s)
- Rolf Hilker
- Computational Genomics, Bielefeld University, 33615 Bielefeld, Germany.
| | | | | | | |
Collapse
|
24
|
Lin Y, Rajan V, Moret BME. Fast and accurate phylogenetic reconstruction from high-resolution whole-genome data and a novel robustness estimator. J Comput Biol 2012; 18:1131-9. [PMID: 21899420 DOI: 10.1089/cmb.2011.0114] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The rapid accumulation of whole-genome data has renewed interest in the study of genomic rearrangements. Comparative genomics, evolutionary biology, and cancer research all require models and algorithms to elucidate the mechanisms, history, and consequences of these rearrangements. However, even simple models lead to NP-hard problems, particularly in the area of phylogenetic analysis. Current approaches are limited to small collections of genomes and low-resolution data (typically a few hundred syntenic blocks). Moreover, whereas phylogenetic analyses from sequence data are deemed incomplete unless bootstrapping scores (a measure of confidence) are given for each tree edge, no equivalent to bootstrapping exists for rearrangement-based phylogenetic analysis. We describe a fast and accurate algorithm for rearrangement analysis that scales up, in both time and accuracy, to modern high-resolution genomic data. We also describe a novel approach to estimate the robustness of results-an equivalent to the bootstrapping analysis used in sequence-based phylogenetic reconstruction. We present the results of extensive testing on both simulated and real data showing that our algorithm returns very accurate results, while scaling linearly with the size of the genomes and cubically with their number. We also present extensive experimental results showing that our approach to robustness testing provides excellent estimates of confidence, which, moreover, can be tuned to trade off thresholds between false positives and false negatives. Together, these two novel approaches enable us to attack heretofore intractable problems, such as phylogenetic inference for high-resolution vertebrate genomes, as we demonstrate on a set of six vertebrate genomes with 8,380 syntenic blocks. A copy of the software is available on demand.
Collapse
Affiliation(s)
- Y Lin
- Laboratory for Computational Biology and Bioinformatics, EPFL, Lausanne, Switzerland
| | | | | |
Collapse
|
25
|
Lv J, Havlak P, Putnam NH. Constraints on genes shape long-term conservation of macro-synteny in metazoan genomes. BMC Bioinformatics 2011; 12 Suppl 9:S11. [PMID: 22151646 PMCID: PMC3283319 DOI: 10.1186/1471-2105-12-s9-s11] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Many metazoan genomes conserve chromosome-scale gene linkage relationships ("macro-synteny") from the common ancestor of multicellular animal life 1234, but the biological explanation for this conservation is still unknown. Double cut and join (DCJ) is a simple, well-studied model of neutral genome evolution amenable to both simulation and mathematical analysis 5, but as we show here, it is not sufficent to explain long-term macro-synteny conservation. RESULTS We examine a family of simple (one-parameter) extensions of DCJ to identify models and choices of parameters consistent with the levels of macro- and micro-synteny conservation observed among animal genomes. Our software implements a flexible strategy for incorporating genomic context into the DCJ model to incorporate various types of genomic context ("DCJ-[C]"), and is available as open source software from http://github.com/putnamlab/dcj-c. CONCLUSIONS A simple model of genome evolution, in which DCJ moves are allowed only if they maintain chromosomal linkage among a set of constrained genes, can simultaneously account for the level of macro-synteny conservation and for correlated conservation among multiple pairs of species. Simulations under this model indicate that a constraint on approximately 7% of metazoan genes is sufficient to constrain genome rearrangement to an average rate of 25 inversions and 1.7 translocations per million years.
Collapse
Affiliation(s)
- Jie Lv
- Department of Ecology and Evolutionary Biology, Rice University, Houston, TX 77098, USA
| | | | | |
Collapse
|
26
|
Kang S, Tang J, Schaeffer SW, Bader DA. Rec-DCM-Eigen: reconstructing a less parsimonious but more accurate tree in shorter time. PLoS One 2011; 6:e22483. [PMID: 21887219 PMCID: PMC3160844 DOI: 10.1371/journal.pone.0022483] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2011] [Accepted: 06/22/2011] [Indexed: 11/19/2022] Open
Abstract
Maximum parsimony (MP) methods aim to reconstruct the phylogeny of extant species by finding the most parsimonious evolutionary scenario using the species' genome data. MP methods are considered to be accurate, but they are also computationally expensive especially for a large number of species. Several disk-covering methods (DCMs), which decompose the input species to multiple overlapping subgroups (or disks), have been proposed to solve the problem in a divide-and-conquer way. We design a new DCM based on the spectral method and also develop the COGNAC (Comparing Orders of Genes using Novel Algorithms and high-performance Computers) software package. COGNAC uses the new DCM to reduce the phylogenetic tree search space and selects an output tree from the reduced search space based on the MP principle. We test the new DCM using gene order data and inversion distance. The new DCM not only reduces the number of candidate tree topologies but also excludes erroneous tree topologies which can be selected by original MP methods. Initial labeling of internal genomes affects the accuracy of MP methods using gene order data, and the new DCM enables more accurate initial labeling as well. COGNAC demonstrates superior accuracy as a consequence. We compare COGNAC with FastME and the combination of the state of the art DCM (Rec-I-DCM3) and GRAPPA. COGNAC clearly outperforms FastME in accuracy. COGNAC--using the new DCM--also reconstructs a much more accurate tree in significantly shorter time than GRAPPA with Rec-I-DCM3.
Collapse
Affiliation(s)
- Seunghwa Kang
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - Jijun Tang
- Department of Computer Science and Engineering, University of South Carolina, Columbia, South Carolina, United States of America
| | - Stephen W. Schaeffer
- Department of Biology, The Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - David A. Bader
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia, United States of America
| |
Collapse
|
27
|
Bergeron A, Medvedev P, Stoye J. Rearrangement models and single-cut operations. J Comput Biol 2010; 17:1213-25. [PMID: 20874405 DOI: 10.1089/cmb.2010.0091] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
There have been many widely used genome rearrangement models, such as reversals, Hannenhalli-Pevzner (HP), and double-cut and join. Though each one can be precisely defined, the general notion of a model remains undefined. In this paper, we give a formal set-theoretic definition, which allows us to investigate and prove relationships between distances under various existing and new models. Among our results is that sorting in the HP model is equivalent to sorting in the reversal model when the initial and final genomes are linear uni-chromosomal. We also initiate the formal study of single-cut operations by giving a linear time algorithm for the distance problem under a new single-cut and join model.
Collapse
Affiliation(s)
- Anne Bergeron
- Départment d'informatique, Université du Québec à Montréal, Montreal, QC, Canada
| | | | | |
Collapse
|
28
|
Huang YL, Huang CC, Tang CY, Lu CL. SoRT2: a tool for sorting genomes and reconstructing phylogenetic trees by reversals, generalized transpositions and translocations. Nucleic Acids Res 2010; 38:W221-7. [PMID: 20538651 PMCID: PMC2896082 DOI: 10.1093/nar/gkq520] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2010] [Revised: 05/20/2010] [Accepted: 05/24/2010] [Indexed: 12/03/2022] Open
Abstract
SoRT(2) is a web server that allows the user to perform genome rearrangement analysis involving reversals, generalized transpositions and translocations (including fusions and fissions), and infer phylogenetic trees of genomes being considered based on their pairwise genome rearrangement distances. It takes as input two or more linear/circular multi-chromosomal gene (or synteny block) orders in FASTA-like format. When the input is two genomes, SoRT(2) will quickly calculate their rearrangement distance, as well as a corresponding optimal scenario by highlighting the genes involved in each rearrangement operation. In the case of multiple genomes, SoRT(2) will also construct phylogenetic trees of these genomes based on a matrix of their pairwise rearrangement distances using distance-based approaches, such as neighbor-joining (NJ), unweighted pair group method with arithmetic mean (UPGMA) and Fitch-Margoliash (FM) methods. In addition, if the function of computing jackknife support values is selected, SoRT(2) will further perform the jackknife analysis to evaluate statistical reliability of the constructed NJ, UPGMA and FM trees. SoRT(2) is available online at http://bioalgorithm.life.nctu.edu.tw/SORT2/.
Collapse
Affiliation(s)
- Yen-Lin Huang
- Department of Computer Science, National Tsing Hua University, Institute of Bioinformatics and Systems Biology and Department of Biological Science and Technology, National Chiao Tung University, Hsinchu 300, Taiwan, R.O.C
| | - Chen-Cheng Huang
- Department of Computer Science, National Tsing Hua University, Institute of Bioinformatics and Systems Biology and Department of Biological Science and Technology, National Chiao Tung University, Hsinchu 300, Taiwan, R.O.C
| | - Chuan Yi Tang
- Department of Computer Science, National Tsing Hua University, Institute of Bioinformatics and Systems Biology and Department of Biological Science and Technology, National Chiao Tung University, Hsinchu 300, Taiwan, R.O.C
| | - Chin Lung Lu
- Department of Computer Science, National Tsing Hua University, Institute of Bioinformatics and Systems Biology and Department of Biological Science and Technology, National Chiao Tung University, Hsinchu 300, Taiwan, R.O.C
| |
Collapse
|
29
|
Abstract
Background The rapidly increasing availability of whole-genome sequences has enabled the study of whole-genome evolution. Evolutionary mechanisms based on genome rearrangements have attracted much attention and given rise to many models; somewhat independently, the mechanisms of gene duplication and loss have seen much work. However, the two are not independent and thus require a unified treatment, which remains missing to date. Moreover, existing rearrangement models do not fit the dichotomy between most prokaryotic genomes (one circular chromosome) and most eukaryotic genomes (multiple linear chromosomes). Results To handle rearrangements, gene duplications and losses, we propose a new evolutionary model and the corresponding method for estimating true evolutionary distance. Our model, inspired from the DCJ model, is simple and the first to respect the prokaryotic/eukaryotic structural dichotomy. Experimental results on a wide variety of genome structures demonstrate the very high accuracy and robustness of our distance estimator. Conclusion We give the first robust, statistically based, estimate of genomic pairwise distances based on rearrangements, duplications and losses, under a model that respects the structural dichotomy between prokaryotic and eukaryotic genomes. Accurate and robust estimates in true evolutionary distances should translate into much better phylogenetic reconstructions as well as more accurate genomic alignments, while our new model of genome rearrangements provides another refinement in simplicity and verisimilitude.
Collapse
|
30
|
Fast and Accurate Phylogenetic Reconstruction from High-Resolution Whole-Genome Data and a Novel Robustness Estimator. ACTA ACUST UNITED AC 2010. [DOI: 10.1007/978-3-642-16181-0_12] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2023]
|
31
|
Alekseyev MA, Pevzner PA. Breakpoint graphs and ancestral genome reconstructions. Genes Dev 2009; 19:943-57. [PMID: 19218533 PMCID: PMC2675983 DOI: 10.1101/gr.082784.108] [Citation(s) in RCA: 91] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2008] [Accepted: 01/22/2009] [Indexed: 11/24/2022]
Abstract
Recently completed whole-genome sequencing projects marked the transition from gene-based phylogenetic studies to phylogenomics analysis of entire genomes. We developed an algorithm MGRA for reconstructing ancestral genomes and used it to study the rearrangement history of seven mammalian genomes: human, chimpanzee, macaque, mouse, rat, dog, and opossum. MGRA relies on the notion of the multiple breakpoint graphs to overcome some limitations of the existing approaches to ancestral genome reconstructions. MGRA also generates the rearrangement-based characters guiding the phylogenetic tree reconstruction when the phylogeny is unknown.
Collapse
Affiliation(s)
- Max A. Alekseyev
- Department of Computer Science and Engineering, University of California at San Diego, La Jolla, California 92093-0404, USA
| | - Pavel A. Pevzner
- Department of Computer Science and Engineering, University of California at San Diego, La Jolla, California 92093-0404, USA
| |
Collapse
|