1
|
|
2
|
Estimating trees from filtered data: identifiability of models for morphological phylogenetics. J Theor Biol 2009; 263:108-19. [PMID: 20004210 DOI: 10.1016/j.jtbi.2009.12.001] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2009] [Revised: 12/01/2009] [Accepted: 12/01/2009] [Indexed: 11/23/2022]
Abstract
As an alternative to parsimony analyses, stochastic models have been proposed (Lewis, 2001; Nylander et al., 2004) for morphological characters, so that maximum likelihood or Bayesian analyses may be used for phylogenetic inference. A key feature of these models is that they account for ascertainment bias, in that only varying, or parsimony-informative characters are observed. However, statistical consistency of such model-based inference requires that the model parameters be identifiable from the joint distribution they entail, and this issue has not been addressed. Here we prove that parameters for several such models, with finite state spaces of arbitrary size, are identifiable, provided the tree has at least eight leaves. If the tree topology is already known, then seven leaves suffice for identifiability of the numerical parameters. The method of proof involves first inferring a full distribution of both parsimony-informative and non-informative pattern joint probabilities from the parsimony-informative ones, using phylogenetic invariants. The failure of identifiability of the tree parameter for four-taxon trees is also investigated.
Collapse
|
3
|
ATTWOOD SW, JOHNSTON DA. Nucleotide sequence differences reveal genetic variation in Neotricula aperta (Gastropoda: Pomatiopsidae), the snail host of schistosomiasis in the lower Mekong Basin. Biol J Linn Soc Lond 2008. [DOI: 10.1111/j.1095-8312.2001.tb01344.x] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
4
|
Abstract
There are many examples of groups (such as birds, bees, mammals, multicellular animals, and flowering plants) that have undergone a rapid radiation. In such cases, where there is a combination of short internal and long external branches, correctly estimating and rooting phylogenetic trees is known to be a difficult problem. In this simulation study, we tested the performances of different phylogenetic methods at estimating a tree that models a rapid radiation. We found that maximum likelihood, corrected and uncorrected neighbor-joining, and corrected and uncorrected parsimony, all suffer from biases toward specific tree topologies. In addition, we found that using a single-taxon outgroup to root a tree frequently disrupts an otherwise correct ingroup phylogeny. Moreover, for uncorrected parsimony, we found cases where several individual trees (in which the outgroup was placed incorrectly) were selected more frequently than the correct tree. Even for parameter settings where the correct tree was selected most frequently when using extremely long sequences, for sequences of up to 60,000 nucleotides the incorrectly rooted trees were each selected more frequently than the correct tree. For all the cases tested here, tree estimation using a two taxon outgroup was more accurate than when using a single-taxon outgroup. However, the ingroup was most accurately recovered when no outgroup was used.
Collapse
Affiliation(s)
- Liat Shavit
- The Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North, New Zealand.
| | | | | | | |
Collapse
|
5
|
Grant T, Kluge AG. Data exploration in phylogenetic inference: scientific, heuristic, or neither. Cladistics 2005; 19:379-418. [DOI: 10.1111/j.1096-0031.2003.tb00311.x] [Citation(s) in RCA: 121] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
|
6
|
|
7
|
FAIVOVICH JULIÁN, HADDAD CÉLIOF, GARCIA PAULOC, FROST DARRELR, CAMPBELL JONATHANA, WHEELER WARDC. SYSTEMATIC REVIEW OF THE FROG FAMILY HYLIDAE, WITH SPECIAL REFERENCE TO HYLINAE: PHYLOGENETIC ANALYSIS AND TAXONOMIC REVISION. BULLETIN OF THE AMERICAN MUSEUM OF NATURAL HISTORY 2005. [DOI: 10.1206/0003-0090(2005)294[0001:srotff]2.0.co;2] [Citation(s) in RCA: 466] [Impact Index Per Article: 24.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
8
|
Green MD, Veller MG, Brooks DR. Assessing Modes of Speciation: Range Asymmetry and Biogeographical Congruence. Cladistics 2002. [DOI: 10.1111/j.1096-0031.2002.tb00143.x] [Citation(s) in RCA: 20] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
|
9
|
Breitling R, Laubner D, Adamski J. Structure-based phylogenetic analysis of short-chain alcohol dehydrogenases and reclassification of the 17beta-hydroxysteroid dehydrogenase family. Mol Biol Evol 2001; 18:2154-61. [PMID: 11719564 DOI: 10.1093/oxfordjournals.molbev.a003761] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Short-chain alcohol dehydrogenases (SCAD) constitute a large and diverse family of ancient origin. Several of its members play an important role in human physiology and disease, especially in the metabolism of steroid substrates (e.g., prostaglandins, estrogens, androgens, and corticosteroids). Their involvement in common human disorders such as endocrine-related cancer, osteoporosis, and Alzheimer disease makes them an important candidate for drug targets. Recent phylogenetic analysis of SCAD is incomplete and does not allow any conclusions on very ancient divergences or on a functional characterization of novel proteins within this complex family. We have developed a 3D structure-based approach to establish the deep-branching pattern within the SCAD family. In this approach, pairwise superpositions of X-ray structures were used to calculate similarity scores as an input for a tree-building algorithm. The resulting phylogeny was validated by comparison with the results of sequence-based algorithms and biochemical data. It was possible to use the 3D data as a template for the reliable determination of the phylogenetic position of novel proteins as a first step toward functional predictions. We were able to discern new patterns in the phylogenetic relationships of the SCAD family, including a basal dichotomy of the 17beta-hydroxysteroid dehydrogenases (17beta-HSDs). These data provide an important contribution toward the development of type-specific inhibitors for 17beta-HSDs for the treatment and prevention of disease. Our structure-based phylogenetic approach can also be applied to increase the reliability of evolutionary reconstructions in other large protein families.
Collapse
Affiliation(s)
- R Breitling
- Institute of Experimental Genetics, Genome Analysis Center, GSF-National Research Center for Environment and Health, Neuherberg, Germany
| | | | | |
Collapse
|
10
|
Sun L, Gurnon JR, Adams BJ, Graves MV, Van Etten JL. Characterization of a beta-1,3-glucanase encoded by chlorella virus PBCV-1. Virology 2000; 276:27-36. [PMID: 11021991 DOI: 10.1006/viro.2000.0500] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Sequence analysis of the 330-kb chlorella virus PBCV-1 genome revealed an open-reading frame, A94L, that encodes a protein with significant amino acid identity to Glycoside Hydrolase Family 16 beta-1,3-glucanases. The a94l gene was cloned and the protein was expressed as a GST-A94L fusion protein in Escherichia coli. The recombinant A94L protein hydrolyzed the beta-1,3-glucose polymer laminarin and had slightly less hydrolytic activity on beta-1,3-1, 4-glucose polymers, lichenan and barley beta-glucan. The recombinant enzyme had the highest activity at 65 degrees C and pH 8. We predicted that the a94l-encoded beta-1,3-glucanase is involved in degrading the host cell wall either during virus release and/or is packaged in the virion particle and involved in virus entry. Therefore, we expected a94l to be expressed late in virus infection. However, contrary to expectations, both the a94l mRNA and the A94L protein appeared 15 min after PBCV-1 infection and disappeared 60- and 120-min p.i. postinfection, respectively, indicating that a94l is an early gene. Twenty-seven of 42 chlorella viruses contained the a94l gene. To our knowledge, this is the first report of a virus-encoded beta-1,3-glucanase.
Collapse
Affiliation(s)
- L Sun
- Department of Plant Pathology, University of Nebraska, Lincoln, Nebraska, 68583-0722, USA
| | | | | | | | | |
Collapse
|
11
|
Abstract
Methods such as maximum parsimony (MP) are frequently criticized as being statistically unsound and not being based on any "model." On the other hand, advocates of MP claim that maximum likelihood (ML) has some fundamental problems. Here, we explore the connection between the different versions of MP and ML methods, particularly in light of recent theoretical results. We describe links between the two methods--for example, we describe how MP can be regarded as an ML method when there is no common mechanism between sites (such as might occur with morphological data and certain forms of molecular data). In the process, we clarify certain historical points of disagreement between proponents of the two methodologies, including a discussion of several forms of the ML optimality criterion. We also describe some additional results that shed light on how much needs to be assumed about underlying models of sequence evolution in order to successfully reconstruct evolutionary trees.
Collapse
Affiliation(s)
- M Steel
- Biomathematics Research Centre, University of Canterbury, Christchurch, New Zealand.
| | | |
Collapse
|
12
|
Marshall HD, Baker AJ. Colonization history of atlantic island common chaffinches (Fringilla coelebs) revealed by mitochondrial DNA. Mol Phylogenet Evol 1999; 11:201-12. [PMID: 10191065 DOI: 10.1006/mpev.1998.0552] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Common chaffinches (Fringilla coelebs) are thought to have colonized the Atlantic island archipelagoes (the Azores, Madeira, and the Canaries) from neighboring continental populations (Iberia and north Africa) within the last million years. However, colonization may have occurred separately from north Africa to the Canaries and from Iberia to the Azores (as would be predicted geographically) or in one wave from Iberia to the Azores and then to Madeira and the Canaries. These alternatives have different implications for the evolution of morphometric and plumage differentiation in island chaffinches. To determine the most likely colonization route, we estimated the phylogenetic relationships among island and continental subspecies of common chaffinch using sequences from four mtDNA genes (cytochrome b, ATPase 6, NADH 5, and the control region). The most strongly supported mtDNA phylogeny places the continental subspecies together as the sister group to a monophyletic clade containing the island subspecies. This is consistent with a single wave of colonization, and suggests that patterns of similarity among Atlantic island common chaffinches, such as blue pigmentation, short wings, and long tarsi, are due to common colonization history rather than to convergent evolution in a common island environment. However, spectral analysis of phylogenetic splits showed that although monophyly of island haplotypes is favored, there is also substantial support for their polyphyletic origin. We attribute the latter to the confounding effect of homoplasy at multistate sites and to the relatively rapid sequence of colonization events which provided insufficient time for the accumulation of strong phylogenetic signal. These problems are likely to be significant impediments in attempts to test hypotheses of phylogenetic histories of recently evolved populations and taxa.
Collapse
Affiliation(s)
- H D Marshall
- Department of Zoology, Royal Ontario Museum, 100 Queen's Park, Toronto, M5S 2C6, Canada
| | | |
Collapse
|
13
|
Waddell PJ, Penny D, Moore T. Hadamard conjugations and modeling sequence evolution with unequal rates across sites. Mol Phylogenet Evol 1997; 8:33-50. [PMID: 9242594 DOI: 10.1006/mpev.1997.0405] [Citation(s) in RCA: 56] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
This paper considers the many different distributions that may approximate the distribution of site rates in DNA sequences and shows how the Hadamard conjugation may be modified to take these into account. This is done for both 2-state and 4-state data. Distributions which give simple closed forms include the gamma (gamma) distribution, the inverse Gaussian distribution (which is similar to the lognormal), and a mixture of either of these with a proportion of sites which cannot change (invariant sites). It is seen that the tail of a distribution can have major effects upon the coefficient of variation of site rates. Because the Hadamard conjugation can be used to either correct data or predict the data given the model (i.e., the likelihood of site patterns), light is shed on properties of maximum likelihood tree selection with unequal site rates. Analysis of rRNA shows how unequal rates across sites can change the optimal tree. Maximum likelihood analysis also shows that distinct distributions fit each data set, with the gamma often not being the best. Analyzing both these data and a long stretch of primate mtDNA reveals evidence of many "hidden" multiple substitutions, while signals not corresponding to the preferred biological tree generally decrease an unequal rates are allowed for. Last, we discuss the expected behavior of sequences evolving by models where stabilizing selection alone explains unequal site rates. Such models do not explain "synapomorphies" or informative changes in ancient molecules, because while stabilizing selection can vastly decrease change at a site, it will also vastly accelerate back-substitution (leaving only a covarion model to explain old synapomorphies). When and why models allowing a continuous distribution of site rates (e.g., gamma) will approximate covarion evolution requires further study.
Collapse
Affiliation(s)
- P J Waddell
- Department of Plant Biology and Biotechnology, School of Biological Sciences, Massey University, Palmerston North, New Zealand
| | | | | |
Collapse
|
14
|
Abstract
Cladistic analysis is an approach to phylogeny reconstruction that groups taxa in such a way that those with historically more-recent ancestors form groups nested within groups of taxa with more-distant ancestors. This nested set of taxa can be represented as a branching diagram or tree (a cladogram), which is an hypothesis of the evolutionary history of the taxa. The analysis is performed by searching for nested groups of shared derived character states. These shared derived character states define monophyletic groups of taxa (clades), which include all of the descendants of the most recent common ancestor. If all of the characters for a set of taxa are congruent, then reconstructing the phylogenetic tree is unproblematic. However, most real data sets contain incongruent characters, and consequently a wide range of tree-building methods has been developed. These methods differ in a variety of characteristics, and they may produce topologically distinct trees for a single data set. None of the currently-available methods are simultaneously efficient, powerful, consistent and robust, and thus there is no single ideal method. However, many of them appear to perform well under a wide range of conditions, with the exception of the UPGMA method and the Invariants method.
Collapse
Affiliation(s)
- D A Morrison
- Molecular Parasitology Unit, University of Technology Sydney, Gore Hill, NSW, Australia.
| |
Collapse
|
15
|
Takezaki N, Nei M. Inconsistency of the maximum parsimony method when the rate of nucleotide substitution is constant. J Mol Evol 1994; 39:210-8. [PMID: 7932784 DOI: 10.1007/bf00163810] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
The inconsistency of the maximum parsimony method is known to occur even when the rate of nucleotide substitution is constant. To understand why this inconsistency occurs, a mathematical study was conducted for the cases of five, six, and seven sequences. The results obtained indicate that this inconsistency occurs because the probability of occurrence of nucleotide configurations generated by one substitution on a short interior branch is often lower than that of configurations generated by more substitutions on other longer branches. The chance of occurrence of this event--or, the inconsistency of the maximum parsimony method--apparently increases as the number of sequences increases. The inconsistency may occur even when the extent of sequence divergence is relatively small.
Collapse
Affiliation(s)
- N Takezaki
- Institute of Molecular Evolutionary Genetics, Pennsylvania State University, University Park 16802
| | | |
Collapse
|
16
|
Charleston MA, Hendy MD, Penny D. The effects of sequence length, tree topology, and number of taxa on the performance of phylogenetic methods. J Comput Biol 1994; 1:133-51. [PMID: 8790460 DOI: 10.1089/cmb.1994.1.133] [Citation(s) in RCA: 26] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
Simulations were used to study the performance of several character-based and distance-based phylogenetic methods in obtaining the correct tree from pseudo-randomly generated input data. The study included all the topologies of unrooted binary trees with from 4 to 10 pendant vertices (taxa) inclusive. The length of the character sequences used ranged from 10 to 10(5) characters exponentially. The methods studied include Closest Tree, Compatibility, Li's method, Maximum Parsimony, Neighbor-joining, Neighborliness, and UPGMA. We also provide a modification to Li's method (SimpLi) which is consistent with additive data. We give estimations of the sequence lengths required for given confidence in the output of these methods under the assumptions of molecular evolution used in this study. A notation for characterizing all tree topologies is described. We show that when the number of taxa, the maximum path length, and the minimum edge length are held constant, there it little but significant dependence of the performance of the methods on the tree topology. We show that those methods that are consistent with the model used perform similarly, whereas the inconsistent methods, UPGMA and Li's method, perform very poorly.
Collapse
Affiliation(s)
- M A Charleston
- Department of Mathematics, Massey University, Palmerston North, New Zealand
| | | | | |
Collapse
|