1
|
Martín-Vide C, Vega-Rodríguez MA, Wheeler T. A 3.5-Approximation Algorithm for Sorting by Intergenic Transpositions. ALGORITHMS FOR COMPUTATIONAL BIOLOGY 2020. [PMCID: PMC7197096 DOI: 10.1007/978-3-030-42266-0_2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
Genome Rearrangements affect large stretches of genomes during evolution. One of the most studied genome rearrangement is the transposition, which occurs when a sequence of genes is moved to another position inside the genome. Mathematical models have been used to estimate the evolutionary distance between two different genomes based on genome rearrangements. However, many of these models have focused only on the (order of the) genes of a genome, disregarding other important elements in it. Recently, researchers have shown that considering existing regions between each pair of genes, called intergenic regions, can enhance the distance estimation in realistic data. In this work, we study the transposition distance between two genomes, but we also consider intergenic regions, a problem we name Sorting Permutations by Intergenic Transpositions (SbIT). We show that this problem is NP-hard and propose a 3.5-approximation algorithm for it.
Collapse
|
2
|
Abstract
BACKGROUND Gene order changes, under rearrangements, insertions, deletions and duplications, have been used as a new type of data source for phylogenetic reconstruction. Because these changes are rare compared to sequence mutations, they allow the inference of phylogeny further back in evolutionary time. There exist many computational methods for the reconstruction of gene-order phylogenies, including widely used maximum parsimonious methods and maximum likelihood methods. However, both methods face challenges in handling large genomes with many duplicated genes, especially in the presence of whole genome duplication. METHODS In this paper, we present three simple yet powerful methods based on maximum-likelihood (ML) approaches that encode multiplicities of both gene adjacency and gene content information for phylogenetic reconstruction. RESULTS Extensive experiments on simulated data sets show that our new method achieves the most accurate phylogenies compared to existing approaches. We also evaluate our method on real whole-genome data from eleven mammals. The package is publicly accessible at http://www.geneorder.org . CONCLUSIONS Our new encoding schemes successfully incorporate the multiplicity information of gene adjacencies and gene content into an ML framework, and show promising results in reconstruct phylogenies for whole-genome data in the presence of massive duplications.
Collapse
Affiliation(s)
- Lingxi Zhou
- Department of Computer Science and Engineering, University of South Carolina, Columbia, 29208 South Carolina USA
| | - Yu Lin
- Research School of Computer Science, Australian National University, Canberra, 2601 ACT Australia
| | - Bing Feng
- Department of Computer Science and Engineering, University of South Carolina, Columbia, 29208 South Carolina USA
| | - Jieyi Zhao
- University of Texas School of Biomedical Informatics at Houston, Houston, 77030 Texas USA
| | - Jijun Tang
- School of Computer Science and Engineering, Tianjin University, Tianjin, 300072 China
- Department of Computer Science and Engineering, University of South Carolina, Columbia, 29208 South Carolina USA
| |
Collapse
|
3
|
Avdeyev P, Jiang S, Aganezov S, Hu F, Alekseyev MA. Reconstruction of Ancestral Genomes in Presence of Gene Gain and Loss. J Comput Biol 2016; 23:150-64. [PMID: 26885568 DOI: 10.1089/cmb.2015.0160] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
Since most dramatic genomic changes are caused by genome rearrangements as well as gene duplications and gain/loss events, it becomes crucial to understand their mechanisms and reconstruct ancestral genomes of the given genomes. This problem was shown to be NP-complete even in the "simplest" case of three genomes, thus calling for heuristic rather than exact algorithmic solutions. At the same time, a larger number of input genomes may actually simplify the problem in practice as it was earlier illustrated with MGRA, a state-of-the-art software tool for reconstruction of ancestral genomes of multiple genomes. One of the key obstacles for MGRA and other similar tools is presence of breakpoint reuses when the same breakpoint region is broken by several different genome rearrangements in the course of evolution. Furthermore, such tools are often limited to genomes composed of the same genes with each gene present in a single copy in every genome. This limitation makes these tools inapplicable for many biological datasets and degrades the resolution of ancestral reconstructions in diverse datasets. We address these deficiencies by extending the MGRA algorithm to genomes with unequal gene contents. The developed next-generation tool MGRA2 can handle gene gain/loss events and shares the ability of MGRA to reconstruct ancestral genomes uniquely in the case of limited breakpoint reuse. Furthermore, MGRA2 employs a number of novel heuristics to cope with higher breakpoint reuse and process datasets inaccessible for MGRA. In practical experiments, MGRA2 shows superior performance for simulated and real genomes as compared to other ancestral genome reconstruction tools.
Collapse
Affiliation(s)
- Pavel Avdeyev
- 1 Computational Biology Institute & Department of Mathematics, The George Washington University , Washington, DC, U.S.A
| | - Shuai Jiang
- 2 Department of Computer Science and Engineering, University of South Carolina , Columbia, SC, U.S.A
| | - Sergey Aganezov
- 1 Computational Biology Institute & Department of Mathematics, The George Washington University , Washington, DC, U.S.A.,3 Department of Higher Mathematics, ITMO University , St. Petersburg, Russia
| | - Fei Hu
- 2 Department of Computer Science and Engineering, University of South Carolina , Columbia, SC, U.S.A
| | - Max A Alekseyev
- 1 Computational Biology Institute & Department of Mathematics, The George Washington University , Washington, DC, U.S.A
| |
Collapse
|
4
|
Ashkenazy H, Cohen O, Pupko T, Huchon D. Indel reliability in indel-based phylogenetic inference. Genome Biol Evol 2014; 6:3199-209. [PMID: 25409663 PMCID: PMC4986452 DOI: 10.1093/gbe/evu252] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
It is often assumed that it is unlikely that the same insertion or deletion (indel) event occurred at the same position in two independent evolutionary lineages, and thus, indel-based inference of phylogeny should be less subject to homoplasy compared with standard inference which is based on substitution events. Indeed, indels were successfully used to solve debated evolutionary relationships among various taxonomical groups. However, indels are never directly observed but rather inferred from the alignment and thus indel-based inference may be sensitive to alignment errors. It is hypothesized that phylogenetic reconstruction would be more accurate if it relied only on a subset of reliable indels instead of the entire indel data. Here, we developed a method to quantify the reliability of indel characters by measuring how often they appear in a set of alternative multiple sequence alignments. Our approach is based on the assumption that indels that are consistently present in most alternative alignments are more reliable compared with indels that appear only in a small subset of these alignments. Using simulated and empirical data, we studied the impact of filtering and weighting indels by their reliability scores on the accuracy of indel-based phylogenetic reconstruction. The new method is available as a web-server at http://guidance.tau.ac.il/RELINDEL/.
Collapse
Affiliation(s)
- Haim Ashkenazy
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv, Israel
| | - Ofir Cohen
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv, Israel Present address: Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
| | - Tal Pupko
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv, Israel
| | - Dorothée Huchon
- Department of Zoology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv, Israel
| |
Collapse
|
5
|
Hu F, Lin Y, Tang J. MLGO: phylogeny reconstruction and ancestral inference from gene-order data. BMC Bioinformatics 2014; 15:354. [PMID: 25376663 PMCID: PMC4236499 DOI: 10.1186/s12859-014-0354-6] [Citation(s) in RCA: 79] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2014] [Accepted: 10/16/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The rapid accumulation of whole-genome data has renewed interest in the study of using gene-order data for phylogenetic analyses and ancestral reconstruction. Current software and web servers typically do not support duplication and loss events along with rearrangements. RESULTS MLGO (Maximum Likelihood for Gene-Order Analysis) is a web tool for the reconstruction of phylogeny and/or ancestral genomes from gene-order data. MLGO is based on likelihood computation and shows advantages over existing methods in terms of accuracy, scalability and flexibility. CONCLUSIONS To the best of our knowledge, it is the first web tool for analysis of large-scale genomic changes including not only rearrangements but also gene insertions, deletions and duplications. The web tool is available from http://www.geneorder.org/server.php .
Collapse
Affiliation(s)
- Fei Hu
- Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin University, Tianjin, 300072, China. .,Department of Computer Science and Engineering, University of South Carolina, Columbia, 29208, SC, USA.
| | - Yu Lin
- Department of Computer Science and Engineering, University of California, San Diego, 92093 La Jolla, CA, USA.
| | - Jijun Tang
- Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin University, Tianjin, 300072, China. .,Department of Computer Science and Engineering, University of South Carolina, Columbia, 29208, SC, USA.
| |
Collapse
|
6
|
Santos-Garcia D, Latorre A, Moya A, Gibbs G, Hartung V, Dettner K, Kuechler SM, Silva FJ. Small but powerful, the primary endosymbiont of moss bugs, Candidatus Evansia muelleri, holds a reduced genome with large biosynthetic capabilities. Genome Biol Evol 2014; 6:1875-1893. [PMID: 25115011 PMCID: PMC4122945 DOI: 10.1093/gbe/evu149] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/05/2014] [Indexed: 02/07/2023] Open
Abstract
Moss bugs (Coleorrhyncha: Peloridiidae) are members of the order Hemiptera, and like many hemipterans, they have symbiotic associations with intracellular bacteria to fulfill nutritional requirements resulting from their unbalanced diet. The primary endosymbiont of the moss bugs, Candidatus Evansia muelleri, is phylogenetically related to Candidatus Carsonella ruddii and Candidatus Portiera aleyrodidarum, primary endosymbionts of psyllids and whiteflies, respectively. In this work, we report the genome of Candidatus Evansia muelleri Xc1 from Xenophyes cascus, which is the only obligate endosymbiont present in the association. This endosymbiont possesses an extremely reduced genome similar to Carsonella and Portiera. It has crossed the borderline to be considered as an autonomous cell, requiring the support of the insect host for some housekeeping cell functions. Interestingly, in spite of its small genome size, Evansia maintains enriched amino acid (complete or partial pathways for ten essential and six nonessential amino acids) and sulfur metabolisms, probably related to the poor diet of the insect, based on bryophytes, which contains very low levels of nitrogenous and sulfur compounds. Several facts, including the congruence of host (moss bugs, whiteflies, and psyllids) and endosymbiont phylogenies and the retention of the same ribosomal RNA operon during genome reduction in Evansia, Portiera, and Carsonella, suggest the existence of an ancient endosymbiotic Halomonadaceae clade associated with Hemiptera. Three possible scenarios for the origin of these three primary endosymbiont genera are proposed and discussed.
Collapse
Affiliation(s)
- Diego Santos-Garcia
- Institut Cavanilles de Biodiversitat i Biologia Evolutiva, Universitat de València, Spain
| | - Amparo Latorre
- Institut Cavanilles de Biodiversitat i Biologia Evolutiva, Universitat de València, Spain
- Unidad Mixta de Investigación en Genómica y Salud (FISABIO-Salud Pública and Universitat de València), Spain
| | - Andrés Moya
- Institut Cavanilles de Biodiversitat i Biologia Evolutiva, Universitat de València, Spain
- Unidad Mixta de Investigación en Genómica y Salud (FISABIO-Salud Pública and Universitat de València), Spain
| | - George Gibbs
- School of Biological Science, Victoria University, Wellington, New Zealand
| | - Viktor Hartung
- Museum für Naturkunde, Leibniz-Institute for Research on Evolution and Biodiversity, Berlin, Germany
| | - Konrad Dettner
- Department of Animal Ecology II, University of Bayreuth, Germany
| | | | - Francisco J. Silva
- Institut Cavanilles de Biodiversitat i Biologia Evolutiva, Universitat de València, Spain
- Unidad Mixta de Investigación en Genómica y Salud (FISABIO-Salud Pública and Universitat de València), Spain
| |
Collapse
|
7
|
Kawashima Y, Nishihara H, Akasaki T, Nikaido M, Tsuchiya K, Segawa S, Okada N. The complete mitochondrial genomes of deep-sea squid (Bathyteuthis abyssicola), bob-tail squid (Semirossia patagonica) and four giant cuttlefish (Sepia apama, S. latimanus, S. lycidas and S. pharaonis), and their application to the phylogenetic analysis of Decapodiformes. Mol Phylogenet Evol 2013; 69:980-93. [PMID: 23811434 DOI: 10.1016/j.ympev.2013.06.007] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2011] [Revised: 06/01/2013] [Accepted: 06/13/2013] [Indexed: 11/18/2022]
Abstract
We determined the complete mitochondrial (mt) genomes of the deep-sea squid (Bathyteuthis abyssicola; supperfamily Bathyteuthoidea), the bob-tail squid (Semirossia patagonica; order Sepiolida) and four giant cuttlefish (Sepia apama, S. latimanus, S. lycidas and S. pharaonis; order Sepiida). The unique structures of the mt genomes of Bathyteuthis and Semirossia provide new information about the evolution of decapodiform mt genomes. We show that the mt genome of B. abyssicola, like those of other oegopsids studied so far, has two long duplicated regions that include seven genes (COX1-3, ATP6 and ATP8, tRNA(Asn), and either ND2 or ND3) and that one of the duplicated COX3 genes has lost its function. The mt genome of S. patagonica is unlike any other decapodiforms and, like Nautilus, its ATP6 and ATP8 genes are not adjacent to each other. The four giant cuttlefish have identical mt gene order to other cuttlefish determined to date. Molecular phylogenetic analyses using maximum likelihood and Bayesian methods suggest that traditional order Sepioidea (Sepiolida+Sepiida) is paraphyletic and Sepia (cuttlefish) has the sister-relationship with all other decapodiforms. Taking both the phylogenetic analyses and the mt gene order analyses into account, it is likely that the octopus-type mt genome is an ancestral state and that it had maintained from at least the Cephalopoda ancestor to the common ancestor of Oegopsida, Myopsida and Sepiolida.
Collapse
Affiliation(s)
- Yuumi Kawashima
- Central Customs Laboratory, 6-3-5, Kashiwanoha, Kashiwa-shi, Chiba 277-0082, Japan
| | | | | | | | | | | | | |
Collapse
|
8
|
Moret BME, Lin Y, Tang J. Rearrangements in Phylogenetic Inference: Compare, Model, or Encode? MODELS AND ALGORITHMS FOR GENOME EVOLUTION 2013. [DOI: 10.1007/978-1-4471-5298-9_7] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
|