1
|
Hartmann T, Middendorf M, Bernt M. Genome Rearrangement Analysis : Cut and Join Genome Rearrangements and Gene Cluster Preserving Approaches. Methods Mol Biol 2024; 2802:215-245. [PMID: 38819562 DOI: 10.1007/978-1-0716-3838-5_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Genome rearrangements are mutations that change the gene content of a genome or the arrangement of the genes on a genome. Several years of research on genome rearrangements have established different algorithmic approaches for solving some fundamental problems in comparative genomics based on gene order information. This review summarizes the literature on genome rearrangement analysis along two lines of research. The first line considers rearrangement models that are particularly well suited for a theoretical analysis. These models use rearrangement operations that cut chromosomes into fragments and then join the fragments into new chromosomes. The second line works with rearrangement models that reflect several biologically motivated constraints, e.g., the constraint that gene clusters have to be preserved. In this chapter, the border between algorithmically "easy" and "hard" rearrangement problems is sketched and a brief review is given on the available software tools for genome rearrangement analysis.
Collapse
Affiliation(s)
- Tom Hartmann
- Swarm Intelligence and Complex Systems Group, Institute of Computer Science, University Leipzig, Leipzig, Germany
| | - Martin Middendorf
- Swarm Intelligence and Complex Systems Group, Institute of Computer Science, University Leipzig, Leipzig, Germany.
| | | |
Collapse
|
2
|
Sharakhov IV, Bondarenko SM, Artemov GN, Onufriev AV. The Role of Chromosome–Nuclear Envelope Attachments in 3D Genome Organization. BIOCHEMISTRY (MOSCOW) 2018; 83:350-358. [DOI: 10.1134/s0006297918040065] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
3
|
Genome Rearrangement Analysis: Cut and Join Genome Rearrangements and Gene Cluster Preserving Approaches. Methods Mol Biol 2018; 1704:261-289. [PMID: 29277869 DOI: 10.1007/978-1-4939-7463-4_9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Genome rearrangements are mutations that change the gene content of a genome or the arrangement of the genes on a genome. Several years of research on genome rearrangements have established different algorithmic approaches for solving some fundamental problems in comparative genomics based on gene order information. This review summarizes the literature on genome rearrangement analysis along two lines of research. The first line considers rearrangement models that are particularly well suited for a theoretical analysis. These models use rearrangement operations that cut chromosomes into fragments and then join the fragments into new chromosomes. The second line works with rearrangement models that reflect several biologically motivated constraints, e.g., the constraint that gene clusters have to be preserved. In this chapter, the border between algorithmically "easy" and "hard" rearrangement problems is sketched and a brief review is given on the available software tools for genome rearrangement analysis.
Collapse
|
4
|
Abstract
Background The ability to estimate the evolutionary distance between extant genomes plays a crucial role in many phylogenomic studies. Often such estimation is based on the parsimony assumption, implying that the distance between two genomes can be estimated as the rearrangement distance equal the minimal number of genome rearrangements required to transform one genome into the other. However, in reality the parsimony assumption may not always hold, emphasizing the need for estimation that does not rely on the rearrangement distance. The distance that accounts for the actual (rather than minimal) number of rearrangements between two genomes is often referred to as the true evolutionary distance. While there exists a method for the true evolutionary distance estimation, it however assumes that genomes can be broken by rearrangements equally likely at any position in the course of evolution. This assumption, known as the random breakage model, has recently been refuted in favor of the more rigorous fragile breakage model postulating that only certain “fragile” genomic regions are prone to rearrangements. Results We propose a new method for estimating the true evolutionary distance between two genomes under the fragile breakage model. We evaluate the proposed method on simulated genomes, which show its high accuracy. We further apply the proposed method for estimation of evolutionary distances within a set of five yeast genomes and a set of two fish genomes. Conclusions The true evolutionary distances between the five yeast genomes estimated with the proposed method reveals that some pairs of yeast genomes violate the parsimony assumption. The proposed method further demonstrates that the rearrangement distance between the two fish genomes underestimates their evolutionary distance by about 20%. These results demonstrate how drastically the two distances can differ and justify the use of true evolutionary distance in phylogenomic studies.
Collapse
Affiliation(s)
- Nikita Alexeev
- Computational Biology Institute at the George Washington University, Ashburn, 20147, VA, USA.
| | - Max A Alekseyev
- Computational Biology Institute at the George Washington University, Ashburn, 20147, VA, USA
| |
Collapse
|
5
|
Avdeyev P, Jiang S, Aganezov S, Hu F, Alekseyev MA. Reconstruction of Ancestral Genomes in Presence of Gene Gain and Loss. J Comput Biol 2016; 23:150-64. [PMID: 26885568 DOI: 10.1089/cmb.2015.0160] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
Since most dramatic genomic changes are caused by genome rearrangements as well as gene duplications and gain/loss events, it becomes crucial to understand their mechanisms and reconstruct ancestral genomes of the given genomes. This problem was shown to be NP-complete even in the "simplest" case of three genomes, thus calling for heuristic rather than exact algorithmic solutions. At the same time, a larger number of input genomes may actually simplify the problem in practice as it was earlier illustrated with MGRA, a state-of-the-art software tool for reconstruction of ancestral genomes of multiple genomes. One of the key obstacles for MGRA and other similar tools is presence of breakpoint reuses when the same breakpoint region is broken by several different genome rearrangements in the course of evolution. Furthermore, such tools are often limited to genomes composed of the same genes with each gene present in a single copy in every genome. This limitation makes these tools inapplicable for many biological datasets and degrades the resolution of ancestral reconstructions in diverse datasets. We address these deficiencies by extending the MGRA algorithm to genomes with unequal gene contents. The developed next-generation tool MGRA2 can handle gene gain/loss events and shares the ability of MGRA to reconstruct ancestral genomes uniquely in the case of limited breakpoint reuse. Furthermore, MGRA2 employs a number of novel heuristics to cope with higher breakpoint reuse and process datasets inaccessible for MGRA. In practical experiments, MGRA2 shows superior performance for simulated and real genomes as compared to other ancestral genome reconstruction tools.
Collapse
Affiliation(s)
- Pavel Avdeyev
- 1 Computational Biology Institute & Department of Mathematics, The George Washington University , Washington, DC, U.S.A
| | - Shuai Jiang
- 2 Department of Computer Science and Engineering, University of South Carolina , Columbia, SC, U.S.A
| | - Sergey Aganezov
- 1 Computational Biology Institute & Department of Mathematics, The George Washington University , Washington, DC, U.S.A.,3 Department of Higher Mathematics, ITMO University , St. Petersburg, Russia
| | - Fei Hu
- 2 Department of Computer Science and Engineering, University of South Carolina , Columbia, SC, U.S.A
| | - Max A Alekseyev
- 1 Computational Biology Institute & Department of Mathematics, The George Washington University , Washington, DC, U.S.A
| |
Collapse
|
6
|
Alexeev N, Zograf P. Random matrix approach to the distribution of genomic distance. J Comput Biol 2014; 21:622-31. [PMID: 24650202 DOI: 10.1089/cmb.2013.0066] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The cycle graph introduced by Bafna and Pevzner is an important tool for evaluating the distance between two genomes, that is, the minimal number of rearrangements needed to transform one genome into another. We interpret this distance in topological terms and relate it to the random matrix theory. Namely, the number of genomes at a given 2-break distance from a fixed one (the Hultman number) is represented by a coefficient in the genus expansion of a matrix integral over the space of complex matrices with the Gaussian measure. We study generating functions for the Hultman numbers and prove that the two-break distance distribution is asymptotically normal.
Collapse
Affiliation(s)
- Nikita Alexeev
- 1 Chebyshev Laboratory, St. Petersburg State University , St. Petersburg, Russia
| | | |
Collapse
|
7
|
Abstract
In comparative genomics, the rearrangement distance between two genomes (equal the minimal number of genome rearrangements required to transform them into a single genome) is often used for measuring their evolutionary remoteness. Generalization of this measure to three genomes is known as the median score (while a resulting genome is called median genome). In contrast to the rearrangement distance between two genomes which can be computed in linear time, computing the median score for three genomes is NP-hard. This inspires a quest for simpler and faster approximations for the median score, the most natural of which appears to be the halved sum of pairwise distances which in fact represents a lower bound for the median score. In this work, we study relationship and interplay of pairwise distances between three genomes and their median score under the model of Double-Cut-and-Join (DCJ) rearrangements. Most remarkably we show that while a rearrangement may change the sum of pairwise distances by at most 2 (and thus change the lower bound by at most 1), even the most "powerful" rearrangements in this respect that increase the lower bound by 1 (by moving one genome farther away from each of the other two genomes), which we call strong, do not necessarily affect the median score. This observation implies that the two measures are not as well-correlated as one's intuition may suggest. We further prove that the median score attains the lower bound exactly on the triples of genomes that can be obtained from a single genome with strong rearrangements. While the sum of pairwise distances with the factor 2/3 represents an upper bound for the median score, its tightness remains unclear. Nonetheless, we show that the difference of the median score and its lower bound is not bounded by a constant.
Collapse
|
8
|
Luo H, Arndt W, Zhang Y, Shi G, Alekseyev M, Tang J, Hughes AL, Friedman R. Phylogenetic analysis of genome rearrangements among five mammalian orders. Mol Phylogenet Evol 2012; 65:871-82. [PMID: 22929217 PMCID: PMC4425404 DOI: 10.1016/j.ympev.2012.08.008] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2012] [Revised: 08/11/2012] [Accepted: 08/13/2012] [Indexed: 01/16/2023]
Abstract
Evolutionary relationships among placental mammalian orders have been controversial. Whole genome sequencing and new computational methods offer opportunities to resolve the relationships among 10 genomes belonging to the mammalian orders Primates, Rodentia, Carnivora, Perissodactyla and Artiodactyla. By application of the double cut and join distance metric, where gene order is the phylogenetic character, we computed genomic distances among the sampled mammalian genomes. With a marsupial outgroup, the gene order tree supported a topology in which Rodentia fell outside the cluster of Primates, Carnivora, Perissodactyla, and Artiodactyla. Results of breakpoint reuse rate and synteny block length analyses were consistent with the prediction of random breakage model, which provided a diagnostic test to support use of gene order as an appropriate phylogenetic character in this study. We discussed the influence of rate differences among lineages and other factors that may contribute to different resolutions of mammalian ordinal relationships by different methods of phylogenetic reconstruction.
Collapse
Affiliation(s)
- Haiwei Luo
- Department of Biological Sciences, University of South Carolina, Columbia 29208, USA
| | - William Arndt
- Department of Computer Science and Engineering, University of South Carolina, Columbia 29208, USA
| | - Yiwei Zhang
- Department of Computer Science and Engineering, University of South Carolina, Columbia 29208, USA
| | - Guanqun Shi
- Department of Computer Science, University of California, Riverside, 92521, USA
| | - Max Alekseyev
- Department of Computer Science and Engineering, University of South Carolina, Columbia 29208, USA
| | - Jijun Tang
- Department of Computer Science and Engineering, University of South Carolina, Columbia 29208, USA
| | - Austin L. Hughes
- Department of Biological Sciences, University of South Carolina, Columbia 29208, USA
| | - Robert Friedman
- Department of Biological Sciences, University of South Carolina, Columbia 29208, USA
| |
Collapse
|
9
|
Anchoring the dog to its relatives reveals new evolutionary breakpoints across 11 species of the Canidae and provides new clues for the role of B chromosomes. Chromosome Res 2011; 19:685-708. [DOI: 10.1007/s10577-011-9233-4] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2011] [Revised: 08/15/2011] [Accepted: 08/16/2011] [Indexed: 12/27/2022]
|
10
|
Alekseyev MA, Pevzner PA. Comparative genomics reveals birth and death of fragile regions in mammalian evolution. Genome Biol 2010; 11:R117. [PMID: 21118492 PMCID: PMC3156956 DOI: 10.1186/gb-2010-11-11-r117] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2010] [Revised: 10/05/2010] [Accepted: 11/30/2010] [Indexed: 12/15/2022] Open
Abstract
Background An important question in genome evolution is whether there exist fragile regions (rearrangement hotspots) where chromosomal rearrangements are happening over and over again. Although nearly all recent studies supported the existence of fragile regions in mammalian genomes, the most comprehensive phylogenomic study of mammals raised some doubts about their existence. Results Here we demonstrate that fragile regions are subject to a birth and death process, implying that fragility has a limited evolutionary lifespan. Conclusions This finding implies that fragile regions migrate to different locations in different mammals, explaining why there exist only a few chromosomal breakpoints shared between different lineages. The birth and death of fragile regions as a phenomenon reinforces the hypothesis that rearrangements are promoted by matching segmental duplications and suggests putative locations of the currently active fragile regions in the human genome.
Collapse
Affiliation(s)
- Max A Alekseyev
- Department of Computer Science & Engineering, University of South Carolina, 301 Main St, Columbia, SC 29208, USA.
| | | |
Collapse
|
11
|
Huang YL, Lu CL. Sorting by reversals, generalized transpositions, and translocations using permutation groups. J Comput Biol 2010; 17:685-705. [PMID: 20500022 DOI: 10.1089/cmb.2009.0025] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
In this article, we consider the problem of sorting a linear/circular, multi-chromosomal genome by reversals, block-interchanges (i.e., generalized transpositions), and translocations (including fusions and fissions) where the used operations can be weighted differently, which aims to find a sequence of reversal, block-interchange, and translocation operations such that the sum of these operation weights in the sequence is minimum. It is known that this sorting problem can be solved in polynomial time on the basis of breakpoint graphs, when block-interchanges are weighted 2 (or >or=3) and the others are weighted 1. In this study, we design a novel and easily implemented algorithm for this problem by utilizing the permutation group theory in algebra.
Collapse
Affiliation(s)
- Yen-Lin Huang
- Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan
| | | |
Collapse
|
12
|
progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One 2010; 5:e11147. [PMID: 20593022 PMCID: PMC2892488 DOI: 10.1371/journal.pone.0011147] [Citation(s) in RCA: 2818] [Impact Index Per Article: 201.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2010] [Accepted: 05/24/2010] [Indexed: 11/21/2022] Open
Abstract
Background Multiple genome alignment remains a challenging problem. Effects of recombination including rearrangement, segmental duplication, gain, and loss can create a mosaic pattern of homology even among closely related organisms. Methodology/Principal Findings We describe a new method to align two or more genomes that have undergone rearrangements due to recombination and substantial amounts of segmental gain and loss (flux). We demonstrate that the new method can accurately align regions conserved in some, but not all, of the genomes, an important case not handled by our previous work. The method uses a novel alignment objective score called a sum-of-pairs breakpoint score, which facilitates accurate detection of rearrangement breakpoints when genomes have unequal gene content. We also apply a probabilistic alignment filtering method to remove erroneous alignments of unrelated sequences, which are commonly observed in other genome alignment methods. We describe new metrics for quantifying genome alignment accuracy which measure the quality of rearrangement breakpoint predictions and indel predictions. The new genome alignment algorithm demonstrates high accuracy in situations where genomes have undergone biologically feasible amounts of genome rearrangement, segmental gain and loss. We apply the new algorithm to a set of 23 genomes from the genera Escherichia, Shigella, and Salmonella. Analysis of whole-genome multiple alignments allows us to extend the previously defined concepts of core- and pan-genomes to include not only annotated genes, but also non-coding regions with potential regulatory roles. The 23 enterobacteria have an estimated core-genome of 2.46Mbp conserved among all taxa and a pan-genome of 15.2Mbp. We document substantial population-level variability among these organisms driven by segmental gain and loss. Interestingly, much variability lies in intergenic regions, suggesting that the Enterobacteriacae may exhibit regulatory divergence. Conclusions The multiple genome alignments generated by our software provide a platform for comparative genomic and population genomic studies. Free, open-source software implementing the described genome alignment approach is available from http://gel.ahabs.wisc.edu/mauve.
Collapse
|
13
|
Alekseyev MA, Pevzner PA. Breakpoint graphs and ancestral genome reconstructions. Genes Dev 2009; 19:943-57. [PMID: 19218533 PMCID: PMC2675983 DOI: 10.1101/gr.082784.108] [Citation(s) in RCA: 102] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2008] [Accepted: 01/22/2009] [Indexed: 11/24/2022]
Abstract
Recently completed whole-genome sequencing projects marked the transition from gene-based phylogenetic studies to phylogenomics analysis of entire genomes. We developed an algorithm MGRA for reconstructing ancestral genomes and used it to study the rearrangement history of seven mammalian genomes: human, chimpanzee, macaque, mouse, rat, dog, and opossum. MGRA relies on the notion of the multiple breakpoint graphs to overcome some limitations of the existing approaches to ancestral genome reconstructions. MGRA also generates the rearrangement-based characters guiding the phylogenetic tree reconstruction when the phylogeny is unknown.
Collapse
Affiliation(s)
- Max A. Alekseyev
- Department of Computer Science and Engineering, University of California at San Diego, La Jolla, California 92093-0404, USA
| | - Pavel A. Pevzner
- Department of Computer Science and Engineering, University of California at San Diego, La Jolla, California 92093-0404, USA
| |
Collapse
|