1
|
Representing and extending ensembles of parsimonious evolutionary histories with a directed acyclic graph. J Math Biol 2023; 87:75. [PMID: 37878119 PMCID: PMC10600060 DOI: 10.1007/s00285-023-02006-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 09/12/2023] [Accepted: 09/26/2023] [Indexed: 10/26/2023]
Abstract
In many situations, it would be useful to know not just the best phylogenetic tree for a given data set, but the collection of high-quality trees. This goal is typically addressed using Bayesian techniques, however, current Bayesian methods do not scale to large data sets. Furthermore, for large data sets with relatively low signal one cannot even store every good tree individually, especially when the trees are required to be bifurcating. In this paper, we develop a novel object called the "history subpartition directed acyclic graph" (or "history sDAG" for short) that compactly represents an ensemble of trees with labels (e.g. ancestral sequences) mapped onto the internal nodes. The history sDAG can be built efficiently and can also be efficiently trimmed to only represent maximally parsimonious trees. We show that the history sDAG allows us to find many additional equally parsimonious trees, extending combinatorially beyond the ensemble used to construct it. We argue that this object could be useful as the "skeleton" of a more complete uncertainty quantification.
Collapse
|
2
|
All quiet on the western front? The evolutionary history of monogeneans (Dactylogyridae: Cichlidogyrus, Onchobdella) infecting a West and Central African tribe of cichlid fishes (Chromidotilapiini). Parasite 2023; 30:25. [PMID: 37404116 DOI: 10.1051/parasite/2023023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Accepted: 05/21/2023] [Indexed: 07/06/2023] Open
Abstract
Owing to the largely unexplored diversity of metazoan parasites, their speciation mechanisms and the circumstances under which such speciation occurs - in allopatry or sympatry - remain vastly understudied. Cichlids and their monogenean flatworm parasites have previously served as a study system for macroevolutionary processes, e.g., for the role of East African host radiations on parasite communities. Here, we investigate the diversity and evolution of the poorly explored monogeneans infecting a West and Central African lineage of cichlid fishes: Chromidotilapiini, which is the most species-rich tribe of cichlids in this region. We screened gills of 149 host specimens (27 species) from natural history collections and measured systematically informative characters of the sclerotised attachment and reproductive organs of the parasites. Ten monogenean species (Dactylogyridae: Cichlidogyrus and Onchobdella) were found, eight of which are newly described and one redescribed herein. The phylogenetic positions of chromidotilapiines-infecting species of Cichlidogyrus were inferred through a parsimony analysis of the morphological characters. Furthermore, we employed machine learning algorithms to detect morphological features associated with the main lineages of Cichlidogyrus. Although the results of these experimental algorithms remain inconclusive, the parsimony analysis indicates that West and Central African lineages of Cichlidogyrus and Onchobdella are monophyletic, unlike the paraphyletic host lineages. Several instances of host sharing suggest occurrences of intra-host speciation (sympatry) and host switching (allopatry). Some morphological variation was recorded that may also indicate the presence of species complexes. We conclude that collection material can provide important insights on parasite evolution despite the lack of well-preserved DNA material.
Collapse
|
3
|
Bipartite molecular approach for species delimitation and resolving cryptic speciation of Exobasidium vexans within the Exobasidium genus. Comput Biol Chem 2021; 92:107496. [PMID: 33930740 DOI: 10.1016/j.compbiolchem.2021.107496] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Accepted: 04/21/2021] [Indexed: 02/07/2023]
Abstract
Exobasidium vexans, a basidiomycete pathogen, is the causal organism of blister blight disease in tea. The molecular identification of the pathogen remains a challenge due to the limited availability of genomic data in sequence repositories and cryptic speciation within its genus Exobasidium. In this study, the nuclear internal transcribed spacer rDNA region (ITS) based DNA barcode was developed for E. vexans, to address the problem of molecular identification within the background of cryptic speciation. The isolation of E. vexans strain was confirmed through morphological studies followed by molecular identification utilizing the developed ITS barcode. Phylogenetic analysis based on Maximum Parsimony (MP), Maximum Likelihood (ML) and Bayesian Inference (BI) confirmed the molecular identification of the pathogen as E. vexans strain. Further, BI analysis using BEAST mediated the estimation of the divergence time and evolutionary relationship of E. vexans within genus Exobasidium. The speciation process followed the Yule diversification model wherein the genus Exobasidium is approximated to have diverged in the Paleozoic era. The study thus sheds light on the molecular barcode-based species delimitation and evolutionary relationship of E. vexans within its genus Exobasidium.
Collapse
|
4
|
Assessing topological congruence among concatenation-based phylogenomic approaches in empirical datasets. Mol Phylogenet Evol 2021; 161:107086. [PMID: 33609710 DOI: 10.1016/j.ympev.2021.107086] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2020] [Revised: 09/25/2020] [Accepted: 01/22/2021] [Indexed: 10/22/2022]
Abstract
Assessing the effect of methodological decisions on the resulting hypotheses is critical in phylogenetics. Recent studies have focused on evaluating how model selection, orthology definition and confounding factors affect phylogenomic results. Here, we compare the results of three concatenated phylogenetic methods (Maximum Likelihood, ML; Bayesian Inference, BI; Maximum Parsimony, MP) in 157 empirical phylogenomic datasets. The resulting trees were very similar, with 96.7% of all nodes shared between BI and ML (90.6% for ML-MP and 89.1% for BI-MP). Differing nodes were predominantly those of lower support. The main conclusions of most of the studies agreed for the three phylogenetic methods and the discordance involved nodes considered as recalcitrant problems in systematics. The differences between methods were proportionally larger in datasets that analyze the relationships at higher taxonomic levels (particularly phyla and kingdoms), and independent of the number of characters included in the datasets. Note: a spanish version of this article is available in the Supplementary material (Supplementary material online).
Collapse
|
5
|
On the uniqueness of the maximum parsimony tree for data with up to two substitutions: An extension of the classic Buneman theorem in phylogenetics. Mol Phylogenet Evol 2019; 137:127-137. [PMID: 30928353 DOI: 10.1016/j.ympev.2019.03.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2018] [Revised: 03/20/2019] [Accepted: 03/20/2019] [Indexed: 11/26/2022]
Abstract
One of the main aims of phylogenetics is the reconstruction of the correct evolutionary tree when data concerning the underlying species set are given. These data typically come in the form of DNA, RNA or protein alignments, which consist of various characters (also often referred to as sites). Often, however, tree reconstruction methods based on criteria like maximum parsimony may fail to provide a unique tree for a given dataset, or, even worse, reconstruct the 'wrong' tree (i.e. a tree that differs from the one that generated the data). On the other hand it has long been known that if the alignment consists of all the characters that correspond to edges of a particular tree, i.e. they all require exactly k=1 substitution to be realized on that tree, then this tree will be recovered by maximum parsimony methods. This is based on Buneman's theorem in mathematical phylogenetics. It is the goal of the present manuscript to extend this classic result as follows: We prove that if an alignment consists of all characters that require exactly k=2 substitutions on a particular tree, this tree will always be the unique maximum parsimony tree (and we also show that this can be generalized to characters which require at most k=2 substitutions). In particular, this also proves a conjecture based on a recently published observation by Goloboff et al. affirmatively for the special case of k=2.
Collapse
|
6
|
Quantifying the accuracy of ancestral state prediction in a phylogenetic tree under maximum parsimony. J Math Biol 2019; 78:1953-1979. [PMID: 30758663 DOI: 10.1007/s00285-019-01330-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2018] [Revised: 01/21/2019] [Indexed: 11/26/2022]
Abstract
In phylogenetic studies, biologists often wish to estimate the ancestral discrete character state at an interior vertex v of an evolutionary tree T from the states that are observed at the leaves of the tree. A simple and fast estimation method-maximum parsimony-takes the ancestral state at v to be any state that minimises the number of state changes in T required to explain its evolution on T. In this paper, we investigate the reconstruction accuracy of this estimation method further, under a simple symmetric model of state change, and obtain a number of new results, both for 2-state characters, and r-state characters ([Formula: see text]). Our results rely on establishing new identities and inequalities, based on a coupling argument that involves a simpler 'coin toss' approach to ancestral state reconstruction.
Collapse
|
7
|
Statistical Inconsistency of Maximum Parsimony for k-Tuple-Site Data. Bull Math Biol 2019; 81:1173-1200. [PMID: 30607881 DOI: 10.1007/s11538-018-00552-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2017] [Accepted: 12/05/2018] [Indexed: 10/27/2022]
Abstract
One of the main aims of phylogenetics is to reconstruct the "Tree of Life." In this respect, different methods and criteria are used to analyze DNA sequences of different species and to compare them in order to derive the evolutionary relationships of these species. Maximum parsimony is one such criterion for tree reconstruction, and it is the one which we will use in this paper. However, it is well known that tree reconstruction methods can lead to wrong relationship estimates. One typical problem of maximum parsimony is long branch attraction, which can lead to statistical inconsistency. In this work, we will consider a blockwise approach to alignment analysis, namely the so-called k-tuple analyses. For four taxa, it has already been shown that k-tuple-based analyses are statistically inconsistent if and only if the standard character-based (site-based) analyses are statistically inconsistent. So, in the four-taxon case, going from individual sites to k-tuples does not lead to any improvement. However, real biological analyses often consider more than only four taxa. Therefore, we analyze the case of five taxa for 2- and 3-tuple-site data and consider alphabets with two and four elements. We show that the equivalence of single-site data and k-tuple-site data then no longer holds. Even so, we can show that maximum parsimony is statistically inconsistent for k-tuple-site data and five taxa.
Collapse
|
8
|
Finding a most parsimonious or likely tree in a network with respect to an alignment. J Math Biol 2018; 78:527-547. [PMID: 30121824 PMCID: PMC6437133 DOI: 10.1007/s00285-018-1282-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2017] [Revised: 05/16/2018] [Indexed: 11/22/2022]
Abstract
Phylogenetic networks are often constructed by merging multiple conflicting phylogenetic signals into a directed acyclic graph. It is interesting to explore whether a network constructed in this way induces biologically-relevant phylogenetic signals that were not present in the input. Here we show that, given a multiple alignment A for a set of taxa X and a rooted phylogenetic network N whose leaves are labelled by X, it is NP-hard to locate a most parsimonious phylogenetic tree displayed by N (with respect to A) even when the level of N—the maximum number of reticulation nodes within a biconnected component—is 1 and A contains only 2 distinct states. (If, additionally, gaps are allowed the problem becomes APX-hard.) We also show that under the same conditions, and assuming a simple binary symmetric model of character evolution, finding a most likely tree displayed by the network is NP-hard. These negative results contrast with earlier work on parsimony in which it is shown that if A consists of a single column the problem is fixed parameter tractable in the level. We conclude with a discussion of why, despite the NP-hardness, both the parsimony and likelihood problem can likely be well-solved in practice.
Collapse
|
9
|
Varestrongylus (Nematoda: Protostrongylidae), lungworms of ungulates: a phylogenetic framework based on comparative morphology. Parasitol Res 2018; 117:2075-2083. [PMID: 29721655 DOI: 10.1007/s00436-018-5893-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2018] [Accepted: 04/24/2018] [Indexed: 11/27/2022]
Abstract
Varestrongylus Bhalerao, 1932 comprises ten valid lungworm species infecting wild and domestic ungulates from Eurasia and North America. Here, we present a phylogenetic hypothesis for the genus based on morphological characters in a broader context for the family Protostrongylidae and discuss species relationships and aspects of character evolution. Phylogenetic analysis of 25 structural attributes, including binary and multistate characters, among the 10 species of Varestrongylus resulted in one fully resolved most parsimonious tree (61 steps; consistency index = 0.672, retention index = 0.722, and consistency index excluding uninformative characters = 0.667). Varestrongylus forms a monophyletic clade and is the sister of Pneumostrongylus, supporting recognition of the subfamily Varestrongylinae. Monophyly for Varestrongylus is diagnosed by six unequivocal synapomorphies, all associated with structural characters of the copulatory system of males. Varestrongylus pneumonicus is basal, and sister to all other species within the genus, which form two subclades. The subclade I contains V. sagittatus + V. tuvae and V. qinghaiensis + V. longispiculatus. Subclade II contains V. alpenae, V. capricola, V. capreoli, and V. eleguneniensis + V. alces. Both subclades are diagnosed by two unambiguous synapomorphies. Highlighted is the continuing importance of phylogenetic assessments based on comparative morphology as a foundation to explore the structure of the biosphere across space and time.
Collapse
|
10
|
On the Accuracy of Ancestral Sequence Reconstruction for Ultrametric Trees with Parsimony. Bull Math Biol 2018; 80:864-879. [PMID: 29476399 DOI: 10.1007/s11538-018-0407-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2017] [Accepted: 02/16/2018] [Indexed: 10/18/2022]
Abstract
We examine a mathematical question concerning the reconstruction accuracy of the Fitch algorithm for reconstructing the ancestral sequence of the most recent common ancestor given a phylogenetic tree and sequence data for all taxa under consideration. In particular, for the symmetric four-state substitution model which is also known as Jukes-Cantor model, we answer affirmatively a conjecture of Li, Steel and Zhang which states that for any ultrametric phylogenetic tree and a symmetric model, the Fitch parsimony method using all terminal taxa is more accurate, or at least as accurate, for ancestral state reconstruction than using any particular terminal taxon or any particular pair of taxa. This conjecture had so far only been answered for two-state data by Fischer and Thatte. Here, we focus on answering the biologically more relevant case with four states, which corresponds to ancestral sequence reconstruction from DNA or RNA data.
Collapse
|
11
|
MPBoot: fast phylogenetic maximum parsimony tree inference and bootstrap approximation. BMC Evol Biol 2018; 18:11. [PMID: 29390973 PMCID: PMC5796505 DOI: 10.1186/s12862-018-1131-3] [Citation(s) in RCA: 91] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2017] [Accepted: 01/25/2018] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND The nonparametric bootstrap is widely used to measure the branch support of phylogenetic trees. However, bootstrapping is computationally expensive and remains a bottleneck in phylogenetic analyses. Recently, an ultrafast bootstrap approximation (UFBoot) approach was proposed for maximum likelihood analyses. However, such an approach is still missing for maximum parsimony. RESULTS To close this gap we present MPBoot, an adaptation and extension of UFBoot to compute branch supports under the maximum parsimony principle. MPBoot works for both uniform and non-uniform cost matrices. Our analyses on biological DNA and protein showed that under uniform cost matrices, MPBoot runs on average 4.7 (DNA) to 7 times (protein data) (range: 1.2-20.7) faster than the standard parsimony bootstrap implemented in PAUP*; but 1.6 (DNA) to 4.1 times (protein data) slower than the standard bootstrap with a fast search routine in TNT (fast-TNT). However, for non-uniform cost matrices MPBoot is 5 (DNA) to 13 times (protein data) (range:0.3-63.9) faster than fast-TNT. We note that MPBoot achieves better scores more frequently than PAUP* and fast-TNT. However, this effect is less pronounced if an intensive but slower search in TNT is invoked. Moreover, experiments on large-scale simulated data show that while both PAUP* and TNT bootstrap estimates are too conservative, MPBoot bootstrap estimates appear more unbiased. CONCLUSIONS MPBoot provides an efficient alternative to the standard maximum parsimony bootstrap procedure. It shows favorable performance in terms of run time, the capability of finding a maximum parsimony tree, and high bootstrap accuracy on simulated as well as empirical data sets. MPBoot is easy-to-use, open-source and available at http://www.cibiv.at/software/mpboot .
Collapse
|
12
|
Complexity and algorithms for copy-number evolution problems. Algorithms Mol Biol 2017; 12:13. [PMID: 28515774 PMCID: PMC5433102 DOI: 10.1186/s13015-017-0103-2] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2016] [Accepted: 04/11/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Cancer is an evolutionary process characterized by the accumulation of somatic mutations in a population of cells that form a tumor. One frequent type of mutations is copy number aberrations, which alter the number of copies of genomic regions. The number of copies of each position along a chromosome constitutes the chromosome's copy-number profile. Understanding how such profiles evolve in cancer can assist in both diagnosis and prognosis. RESULTS We model the evolution of a tumor by segmental deletions and amplifications, and gauge distance from profile [Formula: see text] to [Formula: see text] by the minimum number of events needed to transform [Formula: see text] into [Formula: see text]. Given two profiles, our first problem aims to find a parental profile that minimizes the sum of distances to its children. Given k profiles, the second, more general problem, seeks a phylogenetic tree, whose k leaves are labeled by the k given profiles and whose internal vertices are labeled by ancestral profiles such that the sum of edge distances is minimum. CONCLUSIONS For the former problem we give a pseudo-polynomial dynamic programming algorithm that is linear in the profile length, and an integer linear program formulation. For the latter problem we show it is NP-hard and give an integer linear program formulation that scales to practical problem instance sizes. We assess the efficiency and quality of our algorithms on simulated instances. AVAILABILITY https://github.com/raphael-group/CNT-ILP.
Collapse
|
13
|
Cases in which ancestral maximum likelihood will be confusingly misleading. J Theor Biol 2017; 420:318-323. [PMID: 28263816 DOI: 10.1016/j.jtbi.2017.03.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2016] [Revised: 02/25/2017] [Accepted: 03/01/2017] [Indexed: 11/27/2022]
Abstract
Ancestral maximum likelihood (AML) is a phylogenetic tree reconstruction criteria that "lies between" maximum parsimony (MP) and maximum likelihood (ML). ML has long been known to be statistically consistent. On the other hand, Felsenstein (1978) showed that MP is statistically inconsistent, and even positively misleading: There are cases where the parsimony criteria, applied to data generated according to one tree topology, will be optimized on a different tree topology. The question of weather AML is statistically consistent or not has been open for a long time. Mossel et al. (2009) have shown that AML can "shrink" short tree edges, resulting in a star tree with no internal resolution, which yields a better AML score than the original (resolved) model. This result implies that AML is statistically inconsistent, but not that it is positively misleading, because the star tree is compatible with any other topology. We show that AML is confusingly misleading: For some simple, four taxa (resolved) tree, the ancestral likelihood optimization criteria is maximized on an incorrect (resolved) tree topology, as well as on a star tree (both with specific edge lengths), while the tree with the original, correct topology, has strictly lower ancestral likelihood. Interestingly, the two short edges in the incorrect, resolved tree topology are of length zero, and are not adjacent, so this resolved tree is in fact a simple path. While for MP, the underlying phenomenon can be described as long edge attraction, it turns out that here we have long edge repulsion.
Collapse
|
14
|
Live phylogeny with polytomies: Finding the most compact parsimonious trees. Comput Biol Chem 2017; 69:171-177. [PMID: 28391977 DOI: 10.1016/j.compbiolchem.2017.03.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2017] [Accepted: 03/27/2017] [Indexed: 11/24/2022]
Abstract
Construction of phylogenetic trees has traditionally focused on binary trees where all species appear on leaves, a problem for which numerous efficient solutions have been developed. Certain application domains though, such as viral evolution and transmission, paleontology, linguistics, and phylogenetic stemmatics, often require phylogeny inference that involves placing input species on ancestral tree nodes (live phylogeny), and polytomies. These requirements, despite their prevalence, lead to computationally harder algorithmic solutions and have been sparsely examined in the literature to date. In this article we prove some unique properties of most parsimonious live phylogenetic trees with polytomies, and their mapping to traditional binary phylogenetic trees. We show that our problem reduces to finding the most compact parsimonious tree for n species, and describe a novel efficient algorithm to find such trees without resorting to exhaustive enumeration of all possible tree topologies.
Collapse
|
15
|
Evaluating multi-locus phylogenies for species boundaries determination in the genus Diaporthe. PeerJ 2017; 5:e3120. [PMID: 28367371 PMCID: PMC5372842 DOI: 10.7717/peerj.3120] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2016] [Accepted: 02/24/2017] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND Species identification is essential for controlling disease, understanding epidemiology, and to guide the implementation of phytosanitary measures against fungi from the genus Diaporthe. Accurate Diaporthe species separation requires using multi-loci phylogenies. However, defining the optimal set of loci that can be used for species identification is still an open problem. METHODS Here we addressed that problem by identifying five loci that have been sequenced in 142 Diaporthe isolates representing 96 species: TEF1, TUB, CAL, HIS and ITS. We then used every possible combination of those loci to build, analyse, and compare phylogenetic trees. RESULTS As expected, species separation is better when all five loci are simultaneously used to build the phylogeny of the isolates. However, removing the ITS locus has little effect on reconstructed phylogenies, identifying the TEF1-TUB-CAL-HIS 4-loci tree as almost equivalent to the 5-loci tree. We further identify the best 3-loci, 2-loci, and 1-locus trees that should be used for species separation in the genus. DISCUSSION Our results question the current use of the ITS locus for DNA barcoding in the genus Diaporthe and suggest that TEF1 might be a better choice if one locus barcoding needs to be done.
Collapse
|
16
|
A combined chloroplast atpB-rbcL and trnL-F phylogeny unveils the ancestry of balsams (Impatiens spp.) in the Western Ghats of India. 3 Biotech 2016; 6:258. [PMID: 28330330 PMCID: PMC5135705 DOI: 10.1007/s13205-016-0574-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2016] [Accepted: 11/23/2016] [Indexed: 11/29/2022] Open
Abstract
Only a few Impatiens spp. from South India (one of the five centers of diversity for Impatiens species) were included in the published datum of molecular phylogeny of the family Balsaminaceae. The present investigation is a novel attempt to reveal the phylogenetic association of Impatiens species of South India, by placing them in the global phylogeny of Impatiens based on a combined analysis of two chloroplast genes. Thirty species of genus Impatiens were collected from different locations of South India. Total genomic DNA was extracted from fresh plant leaf, and polymerase chain reaction was carried out using atpB-rbcL and trnL-F intergenic spacer-specific forward and reverse primers. Thirteen sequences of Impatiens species from three centers of diversity were obtained from GenBank for reconstructing the evolutionary relationships within the genus Impatiens. Bayesian inference analysis was carried out in MrBayes v.3.2.2. This analysis supported Southeast Asia as the ancestral place of origin of extant Impatiens species. Molecular phylogeny of South Indian Impatiens spp. based on combined chloroplast sequences showed the same association as that of morphological taxonomy. Sections Scapigerae, Tomentosae, Sub-Umbellatae, and Racemosae showed Southeast Asian relationship, while sections Annuae and Microsepalae showed African affinity.
Collapse
|
17
|
Convex recoloring as an evolutionary marker. Mol Phylogenet Evol 2016; 107:209-220. [PMID: 27818264 DOI: 10.1016/j.ympev.2016.10.018] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2016] [Revised: 10/16/2016] [Accepted: 10/25/2016] [Indexed: 11/27/2022]
Abstract
With the availability of enormous quantities of genetic data it has become common to construct very accurate trees describing the evolutionary history of the species under study, as well as every single gene of these species. These trees allow us to examine the evolutionary compliance of given markers (characters). A marker compliant with the history of the species investigated, has undergone mutations along the species tree branches, such that every subtree of that tree exhibits a different state. Convex recoloring (CR) uses combinatorial representation to measure the adequacy of a taxonomic classifier to a given tree. Despite its biological origins, research on CR has been almost exclusively dedicated to mathematical properties of the problem, or variants of it with little, if any, relationship to taxonomy. In this work we return to the origins of CR. We put CR in a statistical framework and introduce and learn the notion of the statistical significance of a character. We apply this measure to two data sets - Passerine birds and prokaryotes, and four examples. These examples demonstrate various applications of CR, from evolutionary relatedness, through lateral evolution, to supertree construction. The above study was done with a new software that we provide, containing algorithmic improvement with a graphical output of a (optimally) recolored tree. AVAILABILITY A code implementing the features and a README is available at http://research.haifa.ac.il/ssagi/software/convexrecoloring.zip.
Collapse
|
18
|
Analysis of gene copy number changes in tumor phylogenetics. Algorithms Mol Biol 2016; 11:26. [PMID: 27688796 PMCID: PMC5034472 DOI: 10.1186/s13015-016-0088-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2016] [Accepted: 09/08/2016] [Indexed: 02/04/2023] Open
Abstract
BACKGOUND Evolution of cancer cells is characterized by large scale and rapid changes in the chromosomal landscape. The fluorescence in situ hybridization (FISH) technique provides a way to measure the copy numbers of preselected genes in a group of cells and has been found to be a reliable source of data to model the evolution of tumor cells. Chowdhury et al. (Bioinformatics 29(13):189-98, 23; PLoS Comput Biol 10(7):1003740, 24) recently develop a computational model for tumor progression driven by gains and losses in cell count patterns obtained by FISH probes. Their model aims to find the rectilinear Steiner minimum tree (RSMT) (Chowdhury et al. in Bioinformatics 29(13):189-98, 23) and the duplication Steiner minimum tree (DSMT) (Chowdhury et al. in PLoS Comput Biol 10(7):1003740, 24) that describe the progression of FISH cell count patterns over its branches in a parsimonious manner. Both the RSMT and DSMT problems are NP-hard and heuristics are required to solve the problems efficiently. METHODS In this paper we propose two approaches to solve the RSMT problem, one inspired by iterative methods to address the "small phylogeny" problem (Sankoff et al. in J Mol Evol 7(2):133-49, 27; Blanchette et al. in Genome Inform 8:25-34, 28), and the other based on maximum parsimony phylogeny inference. We further show how to extend these heuristics to obtain solutions to the DSMT problem, that models large scale duplication events. RESULTS Experimental results from both simulated and real tumor data show that our methods outperform previous heuristics (Chowdhury et al. in Bioinformatics 29(13):189-98, 23; Chowdhury et al. in PLoS Comput Biol 10(7):1003740, 24) in obtaining solutions to both RSMT and DSMT problems. CONCLUSION The methods introduced here are able to provide more parsimony phylogenies compared to earlier ones which are consider better choices.
Collapse
|
19
|
Characterizing Local Optima for Maximum Parsimony. Bull Math Biol 2016; 78:1058-75. [PMID: 27234257 DOI: 10.1007/s11538-016-0174-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2014] [Accepted: 05/04/2016] [Indexed: 10/21/2022]
Abstract
Finding the best phylogenetic tree under the maximum parsimony optimality criterion is computationally difficult. We quantify the occurrence of such optima for well-behaved sets of data. When nearest neighbor interchange operations are used, multiple local optima can occur even for "perfect" sequence data, which results in hill-climbing searches that never reach a global optimum. In contrast, we show that when neighbors are defined via the subtree prune and regraft metric, there is a single local optimum for perfect sequence data, and thus, every such search finds a global optimum quickly. We further characterize conditions for which sequences simulated under the Cavender-Farris-Neyman and Jukes-Cantor models of evolution yield well-behaved search spaces.
Collapse
|
20
|
Inferring Phylogenetic Relationships of Indian Citron (Citrus medica L.) based on rbcL and matK Sequences of Chloroplast DNA. Biochem Genet 2016; 54:249-269. [PMID: 26956119 DOI: 10.1007/s10528-016-9716-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2015] [Accepted: 01/28/2016] [Indexed: 10/22/2022]
Abstract
Phylogenetic relationships of Indian Citron (Citrus medica L.) with other important Citrus species have been inferred through sequence analyses of rbcL and matK gene region of chloroplast DNA. The study was based on 23 accessions of Citrus genotypes representing 15 taxa of Indian Citrus, collected from wild, semi-wild, and domesticated stocks. The phylogeny was inferred using the maximum parsimony (MP) and neighbor-joining (NJ) methods. Both MP and NJ trees separated all the 23 accessions of Citrus into five distinct clusters. The chloroplast DNA (cpDNA) analysis based on rbcL and matK sequence data carried out in Indian taxa of Citrus was useful in differentiating all the true species and species/varieties of probable hybrid origin in distinct clusters or groups. Sequence analysis based on rbcL and matK gene provided unambiguous identification and disposition of true species like C. maxima, C. medica, C. reticulata, and related hybrids/cultivars. The separation of C. maxima, C. medica, and C. reticulata in distinct clusters or sub-clusters supports their distinctiveness as the basic species of edible Citrus. However, the cpDNA sequence analysis of rbcL and matK gene could not find any clear cut differentiation between subgenera Citrus and Papeda as proposed in Swingle's system of classification.
Collapse
|
21
|
Genotype, antifungal susceptibility, and biofilm formation of Trichosporon asahii isolated from the urine of hospitalized patients. Rev Argent Microbiol 2016; 48:62-6. [PMID: 26916812 DOI: 10.1016/j.ram.2015.11.005] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2015] [Revised: 11/05/2015] [Accepted: 11/11/2015] [Indexed: 10/22/2022] Open
Abstract
In this study, the culture analysis of urine samples from patients hospitalized in the Central-West region of Brazil was performed, and the isolated microorganisms were phylogenetically identified as Trichosporon asahii. Maximum parsimony analysis of the IGS1 sequences revealed three novel genotypes that have not been described. The minimum inhibitory concentrations of the nine isolates identified were in the range of 0.06-1 μg/ml for amphotericin B, 0.25-4 μg/ml for fluconazole, and 0.03-0.06 μg/ml for itraconazole. Approximately 6/9 of the T. asahii isolates could form biofilms on the surface of polystyrene microplates. This study reports that the microorganisms isolated here as T. asahii are agents of nosocomial urinary tract infections. Furthermore, the IGS1 region can be considered a new epidemiological tool for genotyping T. asahii isolates. The least common genotypes reported in this study can be related to local epidemiological trends.
Collapse
|
22
|
Genetic diversity and antimicrobial activity of endophytic Myrothecium spp. isolated from Calophyllum apetalum and Garcinia morella. Mol Biol Rep 2015; 42:1533-43. [PMID: 26409457 DOI: 10.1007/s11033-015-3884-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2015] [Accepted: 06/06/2015] [Indexed: 02/02/2023]
Abstract
Calophyllum apetalum and Garcinia morella, medicinal plants are endemic to Western Ghats, Karnataka, India. Sixteen Myrothecium isolates were obtained from the tissues of bark and twigs of these plants. The purpose of this study was to explore the antimicrobial activity and genetic variability of the endophytic Myrothecium isolates. The antimicrobial activity as well as the genetic diversity of endophytic Myrothecium species was investigated through RAPD, ISSR and ITS sequence analysis. Myrothecium isolates were genotypically compared by RAPD and ISSR techniques, 510 and 189 reproducible polymorphic bands were obtained using 20 RAPD and ten ISSR primers respectively. The isolates grouped into four main clades and subgroups using unweighted pair group method with arithmetic mean cluster analysis. rDNA ITS sequence analysis presented better resolution for characterising the isolates of Myrothecium spp. The clustering patterns of the isolates were almost similar when compared with RAPD and ISSR dendograms. The results signify that RAPD, ISSR and ITS analysis can be employed to distinguish the genetic diversity of the Myrothecium species. The endophytic and pathogenic strains were compared by maximum parsimony, maximum likelihood and neighbour joining methods. One isolate (JX862206) amongst the 16 Myrothecium isolates exhibited potent antibacterial and as well as anti-Candida activity.
Collapse
|
23
|
The most parsimonious tree for random data. Mol Phylogenet Evol 2014; 80:165-8. [PMID: 25079136 DOI: 10.1016/j.ympev.2014.07.010] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2014] [Revised: 07/18/2014] [Accepted: 07/19/2014] [Indexed: 11/28/2022]
Abstract
Applying a method to reconstruct a phylogenetic tree from random data provides a way to detect whether that method has an inherent bias towards certain tree 'shapes'. For maximum parsimony, applied to a sequence of random 2-state data, each possible binary phylogenetic tree has exactly the same distribution for its parsimony score. Despite this pleasing and slightly surprising symmetry, some binary phylogenetic trees are more likely than others to be a most parsimonious (MP) tree for a sequence of k such characters, as we show. For k=2, and unrooted binary trees on six taxa, any tree with a caterpillar shape has a higher chance of being an MP tree than any tree with a symmetric shape. On the other hand, if we take any two binary trees, on any number of taxa, we prove that this bias between the two trees vanishes as the number of characters k grows. However, again there is a twist: MP trees on six taxa for k=2 random binary characters are more likely to have certain shapes than a uniform distribution on binary phylogenetic trees predicts. Moreover, this shape bias appears, from simulations, to be more pronounced for larger values of k.
Collapse
|
24
|
Genetic diversity and phylogenetic analysis of Citrus (L) from north-east India as revealed by meiosis, and molecular analysis of internal transcribed spacer region of rDNA. Meta Gene 2014; 2:237-51. [PMID: 25606407 PMCID: PMC4287869 DOI: 10.1016/j.mgene.2014.01.008] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2013] [Revised: 01/25/2014] [Accepted: 01/28/2014] [Indexed: 11/25/2022] Open
Abstract
The north-eastern region of India is reported to be the center of origin and rich in diversity of Citrus (L.) species, where some wild and endangered species namely Citrus indica, Citrus macroptera, Citrus latipes, Citrus ichagensis and Citrus assamensis exist in their natural and undisturbed habitat. In order to have comprehensive information about the extent of genetic variability and the occurrence of cryptic genomic hybridity between and within various Citrus species, a combined approach involving morphological, cytogenetical and molecular approaches were adopted in the present study. Cytogenetic approaches are known to resolve taxonomic riddles in a more efficient manner, by clearly delineating taxa at species and sub species levels. Male meiotic studies revealed a gametic chromosome number of n = 9, without any evidence of numerical variations. Bivalents outnumbered all other types of associations in pollen mother cells (PMCs) analyzed at diplotene, diakinesis and metaphase I. Univalents were frequently encountered in nine species presently studied, though their presence appropriately did not influence the distributional pattern of the chromosomes at anaphases I and II. The molecular approaches for phylogenetic analysis based on sequence data related to ITS 1, ITS 2 and ITS 1 + 5.8 s + ITS 2 of rDNA using maximum parsimony method and Bayesian inference have thrown light on species inter-relationship and evolution of Citrus species confirming our cytogenetical interpretations. The three true basic species i.e. Citrus medica, Citrus maxima and Citrus reticulata with their unique status have been resolved into distinct clades with molecular approaches as well. C. indica which occupies a unique position in the phylogenetic ladder of the genus Citrus has been resolved as a distinct clade and almost behaving as an out-group. The presences of quadrivalents in C. indica also echo and support its unique position. From our study it is amply clear that C. reticulata also has close relation to C. ichagensis, as these species have clustered together, denoting their close genetic relationship. On the other hand, our studies did not demonstrate a clear differentiation between subgenera Citrus and Papeda at the rDNA level. The combined approach of cytogenetical and molecular analysis did complement our early karyological findings and helped in resolving many a taxonomic riddles.
Collapse
|
25
|
[A bird's eye view of the algorithms and software packages for reconstructing phylogenetic trees]. DONG WU XUE YAN JIU = ZOOLOGICAL RESEARCH 2013; 34:640-650. [PMID: 24415699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
The prototype phylogenetic tree, i.e., evolutionary "tree" or "tree of life", was first conceived by Charles Darwin in his seminal book "The Origin of Species", and its reconstructions have been approached by generations of biologists ever since. In this article, we briefly reviewed the major algorithms and software packages for reconstructing phylogenetic trees. Specifically we discuss four categories of phylogeny algorithms including distance-matrix, maximum parsimony, maximum likelihood, and Bayesian framework, as well as software packages (PHYLIP, MEGA, MrBayes) based on them.
Collapse
|
26
|
Hide and seek: placing and finding an optimal tree for thousands of homoplasy-rich sequences. Mol Phylogenet Evol 2013; 69:1186-9. [PMID: 23939134 DOI: 10.1016/j.ympev.2013.08.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2013] [Revised: 07/30/2013] [Accepted: 08/01/2013] [Indexed: 10/26/2022]
Abstract
Finding optimal evolutionary trees from sequence data is typically an intractable problem, and there is usually no way of knowing how close to optimal the best tree from some search truly is. The problem would seem to be particularly acute when we have many taxa and when that data has high levels of homoplasy, in which the individual characters require many changes to fit on the best tree. However, a recent mathematical result has provided a precise tool to generate a short number of high-homoplasy characters for any given tree, so that this tree is provably the optimal tree under the maximum parsimony criterion. This provides, for the first time, a rigorous way to test tree search algorithms on homoplasy-rich data, where we know in advance what the 'best' tree is. In this short note we consider just one search program (TNT) but show that it is able to locate the globally optimal tree correctly for 32,768 taxa, even though the characters in the dataset require, on average, 1148 state-changes each to fit on this tree, and the number of characters is only 57.
Collapse
|
27
|
PTree: pattern-based, stochastic search for maximum parsimony phylogenies. PeerJ 2013; 1:e89. [PMID: 23825794 PMCID: PMC3698465 DOI: 10.7717/peerj.89] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2012] [Accepted: 05/28/2013] [Indexed: 11/24/2022] Open
Abstract
Phylogenetic reconstruction is vital to analyzing the evolutionary relationship of genes within and across populations of different species. Nowadays, with next generation sequencing technologies producing sets comprising thousands of sequences, robust identification of the tree topology, which is optimal according to standard criteria such as maximum parsimony, maximum likelihood or posterior probability, with phylogenetic inference methods is a computationally very demanding task. Here, we describe a stochastic search method for a maximum parsimony tree, implemented in a software package we named PTree. Our method is based on a new pattern-based technique that enables us to infer intermediate sequences efficiently where the incorporation of these sequences in the current tree topology yields a phylogenetic tree with a lower cost. Evaluation across multiple datasets showed that our method is comparable to the algorithms implemented in PAUP* or TNT, which are widely used by the bioinformatics community, in terms of topological accuracy and runtime. We show that our method can process large-scale datasets of 1,000-8,000 sequences. We believe that our novel pattern-based method enriches the current set of tools and methods for phylogenetic tree inference. The software is available under: http://algbio.cs.uni-duesseldorf.de/webapps/wa-download/.
Collapse
|
28
|
Secondary structure and phylogenetic utility of the ribosomal large subunit (28S) in monogeneans of the genus Thaparocleidus and Bifurcohaptor (Monogenea: Dactylogyridae). J Parasit Dis 2013; 37:74-83. [PMID: 24431545 PMCID: PMC3590372 DOI: 10.1007/s12639-012-0134-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2011] [Accepted: 06/09/2012] [Indexed: 11/30/2022] Open
Abstract
Present communication deals with secondary structure of 28S rDNA of two already known species of monogeneans viz., Bifurcohaptor indicus and Thaparocleidus parvulus parasitizing gill filaments of a freshwater fish, Mystus vittatus for phylogenetic inference. Secondary structure data are best used as accessory taxonomic characters as their phylogenetic resolving power and confidence in validity. Secondary structure of the 28S rDNA transcript could provide information for identifying homologous nucleotide characters, useful for cladistic inference of relationships. Such structure data could be used as taxonomic character. The study supports that species-level sequence variability renders 28S sequence as a unique window for examining the behavior of fast evolving, non-coding DNA sequences. Apart from this it also confirms that molecular similarity present in various species could be host-induced.
Collapse
|