Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Yancopoulos S, Attie O, Friedberg R. Efficient sorting of genomic permutations by translocation, inversion and block interchange. Bioinformatics 2005;21:3340-6. [PMID: 15951307 DOI: 10.1093/bioinformatics/bti535] [Citation(s) in RCA: 216] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

For:	Yancopoulos S, Attie O, Friedberg R. Efficient sorting of genomic permutations by translocation, inversion and block interchange. Bioinformatics 2005;21:3340-6. [PMID: 15951307 DOI: 10.1093/bioinformatics/bti535] [Citation(s) in RCA: 216] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Number

Cited by Other Article(s)

Bohnenkämper L, Stoye J, Doerr D. Reconstructing rearrangement phylogenies of natural genomes. Algorithms Mol Biol 2025;20:10. [PMID: 40483529 PMCID: PMC12144824 DOI: 10.1186/s13015-025-00279-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2025] [Accepted: 05/07/2025] [Indexed: 06/11/2025] Open

Banse P, Luiselli J, Parsons DP, Grohens T, Foley M, Trujillo L, Rouzaud‐Cornabas J, Knibbe C, Beslon G. Forward-in-time simulation of chromosomal rearrangements: The invisible backbone that sustains long-term adaptation. Mol Ecol 2024;33:e17234. [PMID: 38078552 PMCID: PMC11628651 DOI: 10.1111/mec.17234] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 11/20/2023] [Accepted: 11/24/2023] [Indexed: 12/11/2024]

Frolova D, Lima L, Roberts LW, Bohnenkämper L, Wittler R, Stoye J, Iqbal Z. Applying rearrangement distances to enable plasmid epidemiology with pling. Microb Genom 2024;10:001300. [PMID: 39401066 PMCID: PMC11472880 DOI: 10.1099/mgen.0.001300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Accepted: 09/05/2024] [Indexed: 10/15/2024] Open

Zanetti JPP, Oliveira LP, Meidanis J, Chindelevitch L. Counting Sorting Scenarios and Intermediate Genomes for the Rank Distance. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024;21:316-327. [PMID: 37200133 DOI: 10.1109/tcbb.2023.3277733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]

Bohnenkämper L. The Floor Is Lava: Halving Natural Genomes with Viaducts, Piers, and Pontoons. J Comput Biol 2024;31:294-311. [PMID: 38621180 PMCID: PMC11057688 DOI: 10.1089/cmb.2023.0330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/17/2024] Open

Bohnenkämper L. Recombinations, chains and caps: resolving problems with the DCJ-indel model. Algorithms Mol Biol 2024;19:8. [PMID: 38414060 PMCID: PMC10900646 DOI: 10.1186/s13015-024-00253-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 01/21/2024] [Indexed: 02/29/2024] Open

Abstract

One of the most fundamental problems in genome rearrangement studies is the (genomic) distance problem. It is typically formulated as finding the minimum number of rearrangements under a model that are needed to transform one genome into the other. A powerful multi-chromosomal model is the Double Cut and Join (DCJ) model.While the DCJ model is not able to deal with some situations that occur in practice, like duplicated or lost regions, it was extended over time to handle these cases. First, it was extended to the DCJ-indel model, solving the issue of lost markers. Later ILP-solutions for so called natural genomes, in which each genomic region may occur an arbitrary number of times, were developed, enabling in theory to solve the distance problem for any pair of genomes. However, some theoretical and practical issues remained unsolved. On the theoretical side of things, there exist two disparate views of the DCJ-indel model, motivated in the same way, but with different conceptualizations that could not be reconciled so far. On the practical side, while ILP solutions for natural genomes typically perform well on telomere to telomere resolved genomes, they have been shown in recent years to quickly loose performance on genomes with a large number of contigs or linear chromosomes. This has been linked to a particular technique, namely capping. Simply put, capping circularizes linear chromosomes by concatenating them during solving time, increasing the solution space of the ILP superexponentially. Recently, we introduced a new conceptualization of the DCJ-indel model within the context of another rearrangement problem. In this manuscript, we will apply this new conceptualization to the distance problem. In doing this, we uncover the relation between the disparate conceptualizations of the DCJ-indel model. We are also able to derive an ILP solution to the distance problem that does not rely on capping. This solution significantly improves upon the performance of previous solutions on genomes with high numbers of contigs while still solving the problem exactly and being competitive in performance otherwise. We demonstrate the performance advantage on simulated genomes as well as showing its practical usefulness in an analysis of 11 Drosophila genomes.

Collapse

Braga MDV, Brockmann LR, Klerx K, Stoye J. Investigating the complexity of the double distance problems. Algorithms Mol Biol 2024;19:1. [PMID: 38178195 PMCID: PMC10765962 DOI: 10.1186/s13015-023-00246-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 11/10/2023] [Indexed: 01/06/2024] Open

Abstract

BACKGROUND

Two genomes [Formula: see text] and [Formula: see text] over the same set of gene families form a canonical pair when each of them has exactly one gene from each family. Denote by [Formula: see text] the number of common families of [Formula: see text] and [Formula: see text]. Different distances of canonical genomes can be derived from a structure called breakpoint graph, which represents the relation between the two given genomes as a collection of cycles of even length and paths. Let [Formula: see text] and [Formula: see text] be respectively the numbers of cycles of length i and of paths of length j in the breakpoint graph of genomes [Formula: see text] and [Formula: see text]. Then, the breakpoint distance of [Formula: see text] and [Formula: see text] is equal to [Formula: see text]. Similarly, when the considered rearrangements are those modeled by the double-cut-and-join (DCJ) operation, the rearrangement distance of [Formula: see text] and [Formula: see text] is [Formula: see text], where c is the total number of cycles and [Formula: see text] is the total number of paths of even length.

MOTIVATION

The distance formulation is a basic unit for several other combinatorial problems related to genome evolution and ancestral reconstruction, such as median or double distance. Interestingly, both median and double distance problems can be solved in polynomial time for the breakpoint distance, while they are NP-hard for the rearrangement distance. One way of exploring the complexity space between these two extremes is to consider a [Formula: see text] distance, defined to be [Formula: see text], and increasingly investigate the complexities of median and double distance for the [Formula: see text] distance, then the [Formula: see text] distance, and so on.

RESULTS

While for the median much effort was done in our and in other research groups but no progress was obtained even for the [Formula: see text] distance, for solving the double distance under [Formula: see text] and [Formula: see text] distances we could devise linear time algorithms, which we present here.

Collapse

Braga MDV, Doerr D, Rubert DP, Stoye J. Family-Free Genome Comparison. Methods Mol Biol 2024;2802:57-72. [PMID: 38819556 DOI: 10.1007/978-1-0716-3838-5_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]

Hartmann T, Middendorf M, Bernt M. Genome Rearrangement Analysis : Cut and Join Genome Rearrangements and Gene Cluster Preserving Approaches. Methods Mol Biol 2024;2802:215-245. [PMID: 38819562 DOI: 10.1007/978-1-0716-3838-5_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]

Katriel G, Mahanaymi U, Brezner S, Kezel N, Koutschan C, Zeilberger D, Steel M, Snir S. Gene Transfer-Based Phylogenetics: Analytical Expressions and Additivity via Birth-Death Theory. Syst Biol 2023;72:1403-1417. [PMID: 37862116 DOI: 10.1093/sysbio/syad060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Revised: 09/01/2023] [Accepted: 10/05/2023] [Indexed: 10/22/2023] Open

Abstract

The genomic era has opened up vast opportunities in molecular systematics, one of which is deciphering the evolutionary history in fine detail. Under this mass of data, analyzing the point mutations of standard markers is often too crude and slow for fine-scale phylogenetics. Nevertheless, genome dynamics (GD) events provide alternative, often richer information. The synteny index (SI) between a pair of genomes combines gene order and gene content information, allowing the comparison of genomes of unequal gene content, together with order considerations of their common genes. Recently, genome dynamics has been modeled as a continuous-time Markov process, and gene distance in the genome as a birth-death-immigration process. Nevertheless, due to complexities arising in this setting, no precise and provably consistent estimators could be derived, resulting in heuristic solutions. Here, we extend this modeling approach by using techniques from birth-death theory to derive explicit expressions of the system's probabilistic dynamics in the form of rational functions of the model parameters. This, in turn, allows us to infer analytically accurate distances between organisms based on their SI. Subsequently, we establish additivity of this estimated evolutionary distance (a desirable property yielding phylogenetic consistency). Applying the new measure in simulation studies shows that it provides accurate results in realistic settings and even under model extensions such as gene gain/loss or over a tree structure. In the real-data realm, we applied the new formulation to unique data structure that we constructed-the ordered orthology DB-based on a new version of the EggNOG database, to construct a tree with more than 4.5K taxa. To the best of our knowledge, this is the largest gene-order-based tree constructed and it overcomes shortcomings found in previous approaches. Constructing a GD-based tree allows to confirm and contrast findings based on other phylogenetic approaches, as we show.

Collapse

Ozeri E, Zehavi M, Ziv-Ukelson M. New algorithms for structure informed genome rearrangement. Algorithms Mol Biol 2023;18:17. [PMID: 38037088 PMCID: PMC10691145 DOI: 10.1186/s13015-023-00239-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Accepted: 08/17/2023] [Indexed: 12/02/2023] Open

Abstract

We define two new computational problems in the domain of perfect genome rearrangements, and propose three algorithms to solve them. The rearrangement scenarios modeled by the problems consider Reversal and Block Interchange operations, and a PQ-tree is utilized to guide the allowed operations and to compute their weights. In the first problem, [Formula: see text] ([Formula: see text]), we define the basic structure-informed rearrangement measure. Here, we assume that the gene order members of the gene cluster from which the PQ-tree is constructed are permutations. The PQ-tree representing the gene cluster is ordered such that the series of gene IDs spelled by its leaves is equivalent to that of the reference gene order. Then, a structure-informed genome rearrangement distance is computed between the ordered PQ-tree and the target gene order. The second problem, [Formula: see text] ([Formula: see text]), generalizes [Formula: see text], where the gene order members are not necessarily permutations and the structure informed rearrangement measure is extended to also consider up to [Formula: see text] and [Formula: see text] gene insertion and deletion operations, respectively, when modelling the PQ-tree informed divergence process from the reference gene order to the target gene order. The first algorithm solves [Formula: see text] in [Formula: see text] time and [Formula: see text] space, where [Formula: see text] is the maximum number of children of a node, n is the length of the string and the number of leaves in the tree, and [Formula: see text] and [Formula: see text] are the number of P-nodes and Q-nodes in the tree, respectively. If one of the penalties of [Formula: see text] is 0, then the algorithm runs in [Formula: see text] time and [Formula: see text] space. The second algorithm solves [Formula: see text] in [Formula: see text] time and [Formula: see text] space, where [Formula: see text] is the maximum number of children of a node, n is the length of the string, m is the number of leaves in the tree, [Formula: see text] and [Formula: see text] are the number of P-nodes and Q-nodes in the tree, respectively, and allowing up to [Formula: see text] deletions from the tree and up to [Formula: see text] deletions from the string. The third algorithm is intended to reduce the space complexity of the second algorithm. It solves a variant of the problem (where one of the penalties of [Formula: see text] is 0) in [Formula: see text] time and [Formula: see text] space. The algorithm is implemented as a software tool, denoted MEM-Rearrange, and applied to the comparative and evolutionary analysis of 59 chromosomal gene clusters extracted from a dataset of 1487 prokaryotic genomes.

Collapse

Bury-Moné S, Thibessard A, Lioy VS, Leblond P. Dynamics of the Streptomyces chromosome: chance and necessity. Trends Genet 2023;39:873-887. [PMID: 37679290 DOI: 10.1016/j.tig.2023.07.008] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 07/26/2023] [Accepted: 07/28/2023] [Indexed: 09/09/2023]

Bonnet K, Marschall T, Doerr D. Constructing founder sets under allelic and non-allelic homologous recombination. Algorithms Mol Biol 2023;18:15. [PMID: 37775806 PMCID: PMC10543304 DOI: 10.1186/s13015-023-00241-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 08/23/2023] [Indexed: 10/01/2023] Open

Rubert DP, Braga MDV. Efficient gene orthology inference via large-scale rearrangements. Algorithms Mol Biol 2023;18:14. [PMID: 37770945 PMCID: PMC10540461 DOI: 10.1186/s13015-023-00238-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Accepted: 08/17/2023] [Indexed: 09/30/2023] Open

Abstract

BACKGROUND

Recently we developed a gene orthology inference tool based on genome rearrangements (Journal of Bioinformatics and Computational Biology 19:6, 2021). Given a set of genomes our method first computes all pairwise gene similarities. Then it runs pairwise ILP comparisons to compute optimal gene matchings, which minimize, by taking the similarities into account, the weighted rearrangement distance between the analyzed genomes (a problem that is NP-hard). The gene matchings are then integrated into gene families in the final step. The mentioned ILP includes an optimal capping that connects each end of a linear segment of one genome to an end of a linear segment in the other genome, producing an exponential increase of the search space.

RESULTS

In this work, we design and implement a heuristic capping algorithm that replaces the optimal capping by clustering (based on their gene content intersections) the linear segments into [Formula: see text] subsets, whose ends are capped independently. Furthermore, in each subset, instead of allowing all possible connections, we let only the ends of content-related segments be connected. Although there is no guarantee that m is much bigger than one, and with the possible side effect of resulting in sub-optimal instead of optimal gene matchings, the heuristic works very well in practice, from both the speed performance and the quality of computed solutions. Our experiments on primate and fruit fly genomes show two positive results. First, for complete assemblies of five primates the version with heuristic capping reports orthologies that are very similar to the orthologies computed by the version of our tool with optimal capping. Second, we were able to efficiently analyze fruit fly genomes with incomplete assemblies distributed in hundreds or even thousands of contigs, obtaining gene families that are very similar to [Formula: see text] families. Indeed, our tool inferred a higher number of complete cliques, with a higher intersection with [Formula: see text], when compared to gene families computed by other inference tools. We added a post-processing for refining, with the aid of the [Formula: see text] algorithm, our ambiguous families (those with more than one gene per genome), improving even more the accuracy of our results. Our approach is implemented into a pipeline incorporating the pre-computation of gene similarities and the post-processing refinement of ambiguous families with [Formula: see text]. Both the original version with optimal capping and the new modified version with heuristic capping can be downloaded, together with their detailed documentations, at https://gitlab.ub.uni-bielefeld.de/gi/FFGC or as a Conda package at https://anaconda.org/bioconda/ffgc .

Collapse

Alexandrino AO, Oliveira AR, Jean G, Fertin G, Dias U, Dias Z. Reversal and Transposition Distance on Unbalanced Genomes Using Intergenic Information. J Comput Biol 2023;30:861-876. [PMID: 37222724 DOI: 10.1089/cmb.2023.0087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2023] Open

Zabelkin A, Avdeyev P, Alexeev N. TruEst: a better estimator of evolutionary distance under the INFER model. J Math Biol 2023;87:25. [PMID: 37423919 DOI: 10.1007/s00285-023-01955-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2022] [Revised: 06/11/2023] [Accepted: 06/15/2023] [Indexed: 07/11/2023]

Reconstruction of hundreds of reference ancestral genomes across the eukaryotic kingdom. Nat Ecol Evol 2023;7:355-366. [PMID: 36646945 PMCID: PMC9998269 DOI: 10.1038/s41559-022-01956-z] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Accepted: 11/22/2022] [Indexed: 01/18/2023]

Miardan MM, Jamshidpey A, Sankoff D. Escape from Parsimony of a Double-Cut-and-Join Genome Evolution Process. J Comput Biol 2023;30:118-130. [PMID: 36595359 DOI: 10.1089/cmb.2021.0468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open

Complete Genome and Molecular Characterization of a New Cyprinid Herpesvirus 2 (CyHV-2) SH-01 Strain Isolated from Cultured Crucian Carp. Viruses 2022;14:v14092068. [PMID: 36146873 PMCID: PMC9503944 DOI: 10.3390/v14092068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 09/09/2022] [Accepted: 09/13/2022] [Indexed: 11/17/2022] Open

Ma J, Jiang H, Zhu D, Yang R. Algorithms and Hardness for Scaffold Filling to Maximize Increased Duo-Preservations. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022;19:2071-2079. [PMID: 34038366 DOI: 10.1109/tcbb.2021.3083896] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]

Shi T, Huneau C, Zhang Y, Li Y, Chen J, Salse J, Wang Q. The slow-evolving Acorus tatarinowii genome sheds light on ancestral monocot evolution. NATURE PLANTS 2022;8:764-777. [PMID: 35835857 PMCID: PMC9300462 DOI: 10.1038/s41477-022-01187-x] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Accepted: 05/30/2022] [Indexed: 05/03/2023]

Doerr D, Chauve C. Small parsimony for natural genomes in the DCJ-indel model. J Bioinform Comput Biol 2021;19:2140009. [PMID: 34806948 DOI: 10.1142/s0219720021400096] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]

Rubert DP, Doerr D, Braga MDV. The potential of family-free rearrangements towards gene orthology inference. J Bioinform Comput Biol 2021;19:2140014. [PMID: 34775922 DOI: 10.1142/s021972002140014x] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]

Chua M, Tan A, Tremblay-Savard O. BOPAL 2.0 and a study of tRNA and rRNA gene evolution in Clostridium. J Bioinform Comput Biol 2021;19:2140007. [PMID: 34775921 DOI: 10.1142/s0219720021400072] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Alexandrino AO, Oliveira AR, Dias U, Dias Z. Labeled Cycle Graph for Transposition and Indel Distance. J Comput Biol 2021;29:243-256. [PMID: 34724796 DOI: 10.1089/cmb.2021.0279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Hartmann T, Bannach M, Middendorf M. Sorting Signed Permutations by Inverse Tandem Duplication Random Losses. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021;18:2177-2188. [PMID: 31095495 DOI: 10.1109/tcbb.2019.2917198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]

Oliveira AR, Jean G, Fertin G, Brito KL, Dias U, Dias Z. Sorting Permutations by Intergenic Operations. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021;18:2080-2093. [PMID: 33945484 DOI: 10.1109/tcbb.2021.3077418] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]

Willing E, Stoye J, Braga MDV. Computing the Inversion-Indel Distance. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021;18:2314-2326. [PMID: 32324562 DOI: 10.1109/tcbb.2020.2988950] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

Habib S, Dong S, Liu Y, Liao W, Zhang S. The complete mitochondrial genome of Cycas debaoensis revealed unexpected static evolution in gymnosperm species. PLoS One 2021;16:e0255091. [PMID: 34293066 PMCID: PMC8297867 DOI: 10.1371/journal.pone.0255091] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Accepted: 07/11/2021] [Indexed: 11/18/2022] Open

Abstract

Mitochondrial genomes of vascular plants are well known for their liability in architecture evolution. However, the evolutionary features of mitogenomes at intra-generic level are seldom studied in vascular plants, especially among gymnosperms. Here we present the complete mitogenome of Cycas debaoensis, an endemic cycad species to the Guangxi region in southern China. In addition to assemblage of draft mitochondrial genome, we test the conservation of gene content and mitogenomic stability by comparing it to the previously published mitogenome of Cycas taitungensis. Furthermore, we explored the factors such as structural rearrangements and nuclear surveillance of double-strand break repair (DSBR) proteins in Cycas in comparison to other vascular plant groups. The C. debaoensis mitogenome is 413,715 bp in size and encodes 69 unique genes, including 40 protein coding genes, 26 tRNAs, and 3 rRNA genes, similar to that of C. taitungensis. Cycas mitogenomes maintained the ancestral intron content of seed plants (26 introns), which is reduced in other lineages of gymnosperms, such as Ginkgo biloba, Taxus cuspidata and Welwitschia mirabilis due to selective pressure or retroprocessing events. C. debaoensis mitogenome holds 1,569 repeated sequences (> 50 bp), which partially account for fairly large intron size (1200 bp in average) of Cycas mitogenome. The comparison of RNA-editing sites revealed 267 shared non-silent editing site among predicted vs. empirically observed editing events. Another 33 silent editing sites from empirical data increase the total number of editing sites in Cycas debaoensis mitochondrial protein coding genes to 300. Our study revealed unexpected conserved evolution between the two Cycas species. Furthermore, we found strict collinearity of the gene order along with the identical set of genomic content in Cycas mt genomes. The stability of Cycas mt genomes is surprising despite the existence of large number of repeats. This structural stability may be related to the relative expansion of three DSBR protein families (i.e., RecA, OSB, and RecG) in Cycas nuclear genome, which inhibit the homologous recombinations, by monitoring the accuracy of mitochondrial chromosome repair.

Collapse

Brito KL, Alexandrino AO, Oliveira AR, Dias U, Dias Z. Reversals and transpositions distance with proportion restriction. J Bioinform Comput Biol 2021;19:2150013. [PMID: 34162319 DOI: 10.1142/s021972002150013x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Zhao T, Zwaenepoel A, Xue JY, Kao SM, Li Z, Schranz ME, Van de Peer Y. Whole-genome microsynteny-based phylogeny of angiosperms. Nat Commun 2021;12:3498. [PMID: 34108452 PMCID: PMC8190143 DOI: 10.1038/s41467-021-23665-0] [Citation(s) in RCA: 55] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Accepted: 05/10/2021] [Indexed: 02/05/2023] Open

Predicting the Evolution of Syntenies—An Algorithmic Review. ALGORITHMS 2021. [DOI: 10.3390/a14050152] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]

Rubert DP, Martinez FV, Braga MDV. Natural family-free genomic distance. Algorithms Mol Biol 2021;16:4. [PMID: 33971908 PMCID: PMC8111734 DOI: 10.1186/s13015-021-00183-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Accepted: 04/13/2021] [Indexed: 11/13/2022] Open

Abstract

BACKGROUND

A classical problem in comparative genomics is to compute the rearrangement distance, that is the minimum number of large-scale rearrangements required to transform a given genome into another given genome. The traditional approaches in this area are family-based, i.e., require the classification of DNA fragments of both genomes into families. Furthermore, the most elementary family-based models, which are able to compute distances in polynomial time, restrict the families to occur at most once in each genome. In contrast, the distance computation in models that allow multifamilies (i.e., families with multiple occurrences) is NP-hard. Very recently, Bohnenkämper et al. (J Comput Biol 28:410-431, 2021) proposed an ILP formulation for computing the genomic distance of genomes with multifamilies, allowing structural rearrangements, represented by the generic double cut and join (DCJ) operation, and content-modifying insertions and deletions of DNA segments. This ILP is very efficient, but must maximize a matching of the genes in each multifamily, in order to prevent the free lunch artifact that would otherwise let empty or almost empty matchings give smaller distances.

RESULTS

In this paper, we adopt the alternative family-free setting that, instead of family classification, simply uses the pairwise similarities between DNA fragments of both genomes to compute their rearrangement distance. We adapted the ILP mentioned above and developed a model in which pairwise similarities are used to assign weights to both matched and unmatched genes, so that an optimal solution does not necessarily maximize the matching. Our model then results in a natural family-free genomic distance, that takes into consideration all given genes, without prior classification into families, and has a search space composed of matchings of any size. In spite of its bigger search space, our ILP seems to be boosted by a reduction of the number of co-optimal solutions due to the weights. Indeed, it converged faster than the original one by Bohnenkämper et al. for instances with the same number of multiple connections. We can handle not only bacterial genomes, but also fungi and insects, or sets of chromosomes of mammals and plants. In a comparison study of six fruit fly genomes, we obtained accurate results.

Collapse

Bohnenkämper L, Braga MD, Doerr D, Stoye J. Computing the Rearrangement Distance of Natural Genomes. J Comput Biol 2021;28:410-431. [PMID: 33393848 PMCID: PMC8082732 DOI: 10.1089/cmb.2020.0434] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open

Abstract

The computation of genomic distances has been a very active field of computational comparative genomics over the past 25 years. Substantial results include the polynomial-time computability of the inversion distance by Hannenhalli and Pevzner in 1995 and the introduction of the double cut and join distance by Yancopoulos et al. in 2005. Both results, however, rely on the assumption that the genomes under comparison contain the same set of unique markers (syntenic genomic regions, sometimes also referred to as genes). In 2015, Shao et al. relax this condition by allowing for duplicate markers in the analysis. This generalized version of the genomic distance problem is NP-hard, and they give an integer linear programming (ILP) solution that is efficient enough to be applied to real-world datasets. A restriction of their approach is that it can be applied only to balanced genomes that have equal numbers of duplicates of any marker. Therefore, it still needs a delicate preprocessing of the input data in which excessive copies of unbalanced markers have to be removed. In this article, we present an algorithm solving the genomic distance problem for natural genomes, in which any marker may occur an arbitrary number of times. Our method is based on a new graph data structure, the multi-relational diagram, that allows an elegant extension of the ILP by Shao et al. to count runs of markers that are under- or over-represented in one genome with respect to the other and need to be inserted or deleted, respectively. With this extension, previous restrictions on the genome configurations are lifted, for the first time enabling an uncompromising rearrangement analysis. Any marker sequence can directly be used for the distance calculation. The evaluation of our approach shows that it can be used to analyze genomes with up to a few 10,000 markers, which we demonstrate on simulated and real data.

Collapse

Biological computation and computational biology: survey, challenges, and discussion. Artif Intell Rev 2021. [DOI: 10.1007/s10462-020-09951-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]

Bhatia S, Egri-Nagy A, Serdoz S, Praeger CE, Gebhardt V, Francis A. A Path-Deformation Framework for Determining Weighted Genome Rearrangement Distance. Front Genet 2020;11:1035. [PMID: 33193592 PMCID: PMC7542183 DOI: 10.3389/fgene.2020.01035] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2020] [Accepted: 08/11/2020] [Indexed: 11/16/2022] Open

Zhang Z, Wang W, Xia R, Pan G, Wang J, Tang J. Achieving large and distant ancestral genome inference by using an improved discrete quantum-behaved particle swarm optimization algorithm. BMC Bioinformatics 2020;21:516. [PMID: 33176688 PMCID: PMC7656761 DOI: 10.1186/s12859-020-03833-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2020] [Accepted: 10/23/2020] [Indexed: 11/16/2022] Open

Abstract

Background

Reconstructing ancestral genomes is one of the central problems presented in genome rearrangement analysis since finding the most likely true ancestor is of significant importance in phylogenetic reconstruction. Large scale genome rearrangements can provide essential insights into evolutionary processes. However, when the genomes are large and distant, classical median solvers have failed to adequately address these challenges due to the exponential increase of the search space. Consequently, solving ancestral genome inference problems constitutes a task of paramount importance that continues to challenge the current methods used in this area, whose difficulty is further increased by the ongoing rapid accumulation of whole-genome data.

Results

In response to these challenges, we provide two contributions for ancestral genome inference. First, an improved discrete quantum-behaved particle swarm optimization algorithm (IDQPSO) by averaging two of the fitness values is proposed to address the discrete search space. Second, we incorporate DCJ sorting into the IDQPSO (IDQPSO-Median). In comparison with the other methods, when the genomes are large and distant, IDQPSO-Median has the lowest median score, the highest adjacency accuracy, and the closest distance to the true ancestor. In addition, we have integrated our IDQPSO-Median approach with the GRAPPA framework. Our experiments show that this new phylogenetic method is very accurate and effective by using IDQPSO-Median.

Conclusions

Our experimental results demonstrate the advantages of IDQPSO-Median approach over the other methods when the genomes are large and distant. When our experimental results are evaluated in a comprehensive manner, it is clear that the IDQPSO-Median approach we propose achieves better scalability compared to existing algorithms. Moreover, our experimental results by using simulated and real datasets confirm that the IDQPSO-Median, when integrated with the GRAPPA framework, outperforms other heuristics in terms of accuracy, while also continuing to infer phylogenies that were equivalent or close to the true trees within 5 days of computation, which is far beyond the difficulty level that can be handled by GRAPPA.

Collapse

Avdeyev P, Alexeev N, Rong Y, Alekseyev MA. A unified ILP framework for core ancestral genome reconstruction problems. Bioinformatics 2020;36:2993-3003. [PMID: 32058559 DOI: 10.1093/bioinformatics/btaa100] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2018] [Revised: 12/06/2019] [Accepted: 02/07/2020] [Indexed: 11/14/2022] Open

Alexandrino AO, Oliveira AR, Dias U, Dias Z. Genome Rearrangement Distance with Reversals, Transpositions, and Indels. J Comput Biol 2020;28:235-247. [PMID: 33085536 DOI: 10.1089/cmb.2020.0121] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Greenman CD, Penso-Dolfin L, Wu T. The complexity of genome rearrangement combinatorics under the infinite sites model. J Theor Biol 2020;501:110335. [DOI: 10.1016/j.jtbi.2020.110335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2019] [Revised: 04/16/2020] [Accepted: 05/14/2020] [Indexed: 11/30/2022]

Perumal S, Koh CS, Jin L, Buchwaldt M, Higgins EE, Zheng C, Sankoff D, Robinson SJ, Kagale S, Navabi ZK, Tang L, Horner KN, He Z, Bancroft I, Chalhoub B, Sharpe AG, Parkin IAP. A high-contiguity Brassica nigra genome localizes active centromeres and defines the ancestral Brassica genome. NATURE PLANTS 2020;6:929-941. [PMID: 32782408 PMCID: PMC7419231 DOI: 10.1038/s41477-020-0735-y] [Citation(s) in RCA: 78] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/03/2020] [Accepted: 06/28/2020] [Indexed: 05/19/2023]

Rubert DP, Martinez FV, Stoye J, Doerr D. Analysis of local genome rearrangement improves resolution of ancestral genomic maps in plants. BMC Genomics 2020;21:273. [PMID: 32299356 PMCID: PMC7160886 DOI: 10.1186/s12864-020-6609-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open

A mean first passage time genome rearrangement distance. J Math Biol 2020;80:1971-1992. [PMID: 32253463 DOI: 10.1007/s00285-020-01487-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Revised: 03/11/2020] [Indexed: 10/24/2022]

Brito KL, Jean G, Fertin G, Oliveira AR, Dias U, Dias Z. Sorting by Genome Rearrangements on Both Gene Order and Intergenic Sizes. J Comput Biol 2020;27:156-174. [PMID: 31891533 DOI: 10.1089/cmb.2019.0293] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open

Abstract

During the evolutionary process, genomes are affected by various genome rearrangements, that is, events that modify large stretches of the genetic material. In the literature, a large number of models have been proposed to estimate the number of events that occurred during evolution; most of them represent a genome as an ordered sequence of genes, and, in particular, disregard the genetic material between consecutive genes. However, recent studies showed that taking into account the genetic material between consecutive genes can enhance evolutionary distance estimations. Reversal and transposition are genome rearrangements that have been widely studied in the literature. A reversal inverts a (contiguous) segment of the genome, while a transposition swaps the positions of two consecutive segments. Genomes also undergo nonconservative events (events that alter the amount of genetic material) such as insertions and deletions, in which genetic material from intergenic regions of the genome is inserted or deleted, respectively. In this article, we study a genome rearrangement model that considers both gene order and sizes of intergenic regions. We investigate the reversal distance, and also the reversal and transposition distance between two genomes in two scenarios: with and without nonconservative events. We show that these problems are NP-hard and we present constant ratio approximation algorithms for all of them. More precisely, we provide a 4-approximation algorithm for the reversal distance, both in the conservative and nonconservative versions. For the reversal and transposition distance, we provide a 4.5-approximation algorithm, both in the conservative and nonconservative versions. We also perform experimental tests to verify the behavior of our algorithms, as well as to compare the practical and theoretical results. We finally extend our study to scenarios in which events have different costs, and we present constant ratio approximation algorithms for each scenario.

Collapse

Brito KL, Oliveira AR, Dias U, Dias Z. Heuristics for the Reversal and Transposition Distance Problem. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020;17:2-13. [PMID: 31603793 DOI: 10.1109/tcbb.2019.2945759] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]

Parameterized Algorithms in Bioinformatics: An Overview. ALGORITHMS 2019. [DOI: 10.3390/a12120256] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]

Wang J, Cui B, Zhao Y, Guo M. A New Algorithm for Identifying Genome Rearrangements in the Mammalian Evolution. Front Genet 2019;10:1020. [PMID: 31737036 PMCID: PMC6828935 DOI: 10.3389/fgene.2019.01020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2019] [Accepted: 09/24/2019] [Indexed: 11/13/2022] Open

Oliveira AR, Jean G, Fertin G, Dias U, Dias Z. Super short operations on both gene order and intergenic sizes. Algorithms Mol Biol 2019;14:21. [PMID: 31709002 PMCID: PMC6833170 DOI: 10.1186/s13015-019-0156-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2019] [Accepted: 10/14/2019] [Indexed: 12/03/2022] Open

Oliveira AR, Brito KL, Dias U, Dias Z. On the Complexity of Sorting by Reversals and Transpositions Problems. J Comput Biol 2019;26:1223-1229. [DOI: 10.1089/cmb.2019.0078] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Jiang H, Qingge L, Zhu D, Zhu B. A 2-approximation algorithm for the contig-based genomic scaffold filling problem. J Bioinform Comput Biol 2019;16:1850022. [PMID: 30616473 DOI: 10.1142/s0219720018500221] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Abstract

The genomic scaffold filling problem has attracted a lot of attention recently. The problem is on filling an incomplete sequence (scaffold) <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>I</mml:mi></mml:math> into <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:msup><mml:mrow><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>'</mml:mi></mml:mrow></mml:msup></mml:math> , with respect to a complete reference genome <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>G</mml:mi></mml:math> , such that the number of common/shared adjacencies between <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>G</mml:mi></mml:math> and <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:msup><mml:mrow><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>'</mml:mi></mml:mrow></mml:msup></mml:math> is maximized. The problem is NP-complete, and admits a constant-factor approximation. However, the sequence input <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>I</mml:mi></mml:math> is not quite practical and does not fit most of the real datasets (where a scaffold is more often given as a list of contigs). In this paper, we revisit the genomic scaffold filling problem by considering this important case when a scaffold <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>S</mml:mi></mml:math> is given, the missing genes can only be inserted in between the contigs, and the objective is to maximize the number of common adjacencies between <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>G</mml:mi></mml:math> and the filled genome <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:msup><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi>'</mml:mi></mml:mrow></mml:msup></mml:math> . For this problem, we present a simple NP-completeness proof, we then present a factor-2 approximation algorithm.

Collapse