1
|
Czech L, Stamatakis A, Dunthorn M, Barbera P. Metagenomic Analysis Using Phylogenetic Placement-A Review of the First Decade. FRONTIERS IN BIOINFORMATICS 2022; 2:871393. [PMID: 36304302 PMCID: PMC9580882 DOI: 10.3389/fbinf.2022.871393] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Accepted: 04/11/2022] [Indexed: 12/20/2022] Open
Abstract
Phylogenetic placement refers to a family of tools and methods to analyze, visualize, and interpret the tsunami of metagenomic sequencing data generated by high-throughput sequencing. Compared to alternative (e. g., similarity-based) methods, it puts metabarcoding sequences into a phylogenetic context using a set of known reference sequences and taking evolutionary history into account. Thereby, one can increase the accuracy of metagenomic surveys and eliminate the requirement for having exact or close matches with existing sequence databases. Phylogenetic placement constitutes a valuable analysis tool per se, but also entails a plethora of downstream tools to interpret its results. A common use case is to analyze species communities obtained from metagenomic sequencing, for example via taxonomic assignment, diversity quantification, sample comparison, and identification of correlations with environmental variables. In this review, we provide an overview over the methods developed during the first 10 years. In particular, the goals of this review are 1) to motivate the usage of phylogenetic placement and illustrate some of its use cases, 2) to outline the full workflow, from raw sequences to publishable figures, including best practices, 3) to introduce the most common tools and methods and their capabilities, 4) to point out common placement pitfalls and misconceptions, 5) to showcase typical placement-based analyses, and how they can help to analyze, visualize, and interpret phylogenetic placement data.
Collapse
Affiliation(s)
- Lucas Czech
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA, United States
| | - Alexandros Stamatakis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
- Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | - Micah Dunthorn
- Natural History Museum, University of Oslo, Oslo, Norway
| | | |
Collapse
|
2
|
Barbera P, Czech L, Lutteropp S, Stamatakis A. SCRAPP: A tool to assess the diversity of microbial samples from phylogenetic placements. Mol Ecol Resour 2020; 21:340-349. [PMID: 32996237 PMCID: PMC7756409 DOI: 10.1111/1755-0998.13255] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Revised: 07/24/2020] [Accepted: 08/25/2020] [Indexed: 12/04/2022]
Abstract
Microbial ecology research is currently driven by the continuously decreasing cost of DNA sequencing and the improving accuracy of data analysis methods. One such analysis method is phylogenetic placement, which establishes the phylogenetic identity of the anonymous environmental sequences in a sample by means of a given phylogenetic reference tree. However, assessing the diversity of a sample remains challenging, as traditional methods do not scale well with the increasing data volumes and/or do not leverage the phylogenetic placement information. Here, we present scrapp, a highly parallel and scalable tool that uses a molecular species delimitation algorithm to quantify the diversity distribution over the reference phylogeny for a given phylogenetic placement of the sample. scrapp employs a novel approach to cluster phylogenetic placements, called placement space clustering, to efficiently perform dimensionality reduction, so as to scale on large data volumes. Furthermore, it uses the phylogeny‐aware molecular species delimitation method mPTP to quantify diversity. We evaluated scrapp using both, simulated and empirical data sets. We use simulated data to verify our approach. Tests on an empirical data set show that scrapp‐derived metrics can classify samples by their diversity‐correlated features equally well or better than existing, commonly used approaches. scrapp is available at https://github.com/pbdas/scrapp.
Collapse
Affiliation(s)
- Pierre Barbera
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Lucas Czech
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Sarah Lutteropp
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Alexandros Stamatakis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany.,Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
| |
Collapse
|
3
|
Louca S, Mazel F, Doebeli M, Parfrey LW. A census-based estimate of Earth's bacterial and archaeal diversity. PLoS Biol 2019; 17:e3000106. [PMID: 30716065 PMCID: PMC6361415 DOI: 10.1371/journal.pbio.3000106] [Citation(s) in RCA: 93] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2018] [Accepted: 12/21/2018] [Indexed: 12/17/2022] Open
Abstract
The global diversity of Bacteria and Archaea, the most ancient and most widespread forms of life on Earth, is a subject of intense controversy. This controversy stems largely from the fact that existing estimates are entirely based on theoretical models or extrapolations from small and biased data sets. Here, in an attempt to census the bulk of Earth's bacterial and archaeal ("prokaryotic") clades and to estimate their overall global richness, we analyzed over 1.7 billion 16S ribosomal RNA amplicon sequences in the V4 hypervariable region obtained from 492 studies worldwide, covering a multitude of environments and using multiple alternative primers. From this data set, we recovered 739,880 prokaryotic operational taxonomic units (OTUs, 16S-V4 gene clusters at 97% similarity), a commonly used measure of microbial richness. Using several statistical approaches, we estimate that there exist globally about 0.8–1.6 million prokaryotic OTUs, of which we recovered somewhere between 47%–96%, representing >99.98% of prokaryotic cells. Consistent with this conclusion, our data set independently "recaptured" 91%–93% of 16S sequences from multiple previous global surveys, including PCR-independent metagenomic surveys. The distribution of relative OTU abundances is consistent with a log-normal model commonly observed in larger organisms; the total number of OTUs predicted by this model is also consistent with our global richness estimates. By combining our estimates with the ratio of full-length versus partial-length (V4) sequence diversity in the SILVA sequence database, we further estimate that there exist about 2.2–4.3 million full-length OTUs worldwide. When restricting our analysis to the Americas, while controlling for the number of studies, we obtain similar richness estimates as for the global data set, suggesting that most OTUs are globally distributed. Qualitatively similar results are also obtained for other 16S similarity thresholds (90%, 95%, and 99%). Our estimates constrain the extent of a poorly quantified rare microbial biosphere and refute recent predictions that there exist trillions of prokaryotic OTUs. A massive survey of Earth's Bacteria and Archaea reveals that their diversity is orders of magnitude lower than previously thought. The study also indicates that extinctions played an important role in prokaryotic evolution. The global diversity of Bacteria and Archaea ("prokaryotes"), the most ancient and most widespread forms of life on Earth, is subject to high uncertainty. Here, to estimate the global diversity of prokaryotes, we analyzed a large number of 16S ribosomal RNA gene sequences, found in all prokaryotes and commonly used to catalogue prokaryotic diversity. Sequences were obtained from a multitude of environments across thousands of geographic locations worldwide. From this data set, we recovered 739,880 prokaryotic operational taxonomic units (OTUs), i.e., 16S gene clusters sharing 97% similarity, roughly corresponding to prokaryotic species. Using several statistical approaches and through comparison with existing databases and previous independent surveys, we estimate that there exist globally between 0.8 and 1.6 million prokaryotic OTUs. When restricting our analysis to the Americas, while controlling for the number of studies, we obtain similar estimates as for the global data set, suggesting that most OTUs are not restricted to a single continent but are instead globally distributed. Our estimates constrain the extent of a commonly hypothesized but poorly quantified rare prokaryotic biosphere and refute recent predictions that there exists trillions of prokaryotic OTUs. Our findings also indicate that, contrary to common speculation, extinctions may strongly influence global prokaryotic diversity.
Collapse
Affiliation(s)
- Stilianos Louca
- Department of Biology, University of Oregon, Eugene, Oregon, United States of America
- Institute of Ecology and Evolution, University of Oregon, Eugene, Oregon, United States of America
- Biodiversity Research Centre, University of British Columbia, Vancouver, Canada
- Department of Zoology, University of British Columbia, Vancouver, Canada
- * E-mail:
| | - Florent Mazel
- Biodiversity Research Centre, University of British Columbia, Vancouver, Canada
- Department of Botany, University of British Columbia, Vancouver, Canada
| | - Michael Doebeli
- Biodiversity Research Centre, University of British Columbia, Vancouver, Canada
- Department of Zoology, University of British Columbia, Vancouver, Canada
- Department of Mathematics, University of British Columbia, Vancouver, Canada
| | - Laura Wegener Parfrey
- Biodiversity Research Centre, University of British Columbia, Vancouver, Canada
- Department of Zoology, University of British Columbia, Vancouver, Canada
- Department of Botany, University of British Columbia, Vancouver, Canada
| |
Collapse
|
4
|
Santos L, Alves A, Alves R. Evaluating multi-locus phylogenies for species boundaries determination in the genus Diaporthe. PeerJ 2017; 5:e3120. [PMID: 28367371 PMCID: PMC5372842 DOI: 10.7717/peerj.3120] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2016] [Accepted: 02/24/2017] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND Species identification is essential for controlling disease, understanding epidemiology, and to guide the implementation of phytosanitary measures against fungi from the genus Diaporthe. Accurate Diaporthe species separation requires using multi-loci phylogenies. However, defining the optimal set of loci that can be used for species identification is still an open problem. METHODS Here we addressed that problem by identifying five loci that have been sequenced in 142 Diaporthe isolates representing 96 species: TEF1, TUB, CAL, HIS and ITS. We then used every possible combination of those loci to build, analyse, and compare phylogenetic trees. RESULTS As expected, species separation is better when all five loci are simultaneously used to build the phylogeny of the isolates. However, removing the ITS locus has little effect on reconstructed phylogenies, identifying the TEF1-TUB-CAL-HIS 4-loci tree as almost equivalent to the 5-loci tree. We further identify the best 3-loci, 2-loci, and 1-locus trees that should be used for species separation in the genus. DISCUSSION Our results question the current use of the ITS locus for DNA barcoding in the genus Diaporthe and suggest that TEF1 might be a better choice if one locus barcoding needs to be done.
Collapse
Affiliation(s)
- Liliana Santos
- Departamento de Biologia, CESAM, Universidade de Aveiro, Aveiro, Portugal
| | - Artur Alves
- Departamento de Biologia, CESAM, Universidade de Aveiro, Aveiro, Portugal
| | - Rui Alves
- Departament de Ciències Mèdiques Bàsiques, Universitat de Lleida and IRBLleida, Lleida, Spain
| |
Collapse
|
5
|
Su Z, Wang Z, López-Giráldez F, Townsend JP. The impact of incorporating molecular evolutionary model into predictions of phylogenetic signal and noise. Front Ecol Evol 2014. [DOI: 10.3389/fevo.2014.00011] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
6
|
Zhang J, Mamlouk AM, Martinetz T, Chang S, Wang J, Hilgenfeld R. PhyloMap: an algorithm for visualizing relationships of large sequence data sets and its application to the influenza A virus genome. BMC Bioinformatics 2011; 12:248. [PMID: 21689434 PMCID: PMC3142226 DOI: 10.1186/1471-2105-12-248] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2011] [Accepted: 06/20/2011] [Indexed: 11/10/2022] Open
Abstract
Background Results of phylogenetic analysis are often visualized as phylogenetic trees. Such a tree can typically only include up to a few hundred sequences. When more than a few thousand sequences are to be included, analyzing the phylogenetic relationships among them becomes a challenging task. The recent frequent outbreaks of influenza A viruses have resulted in the rapid accumulation of corresponding genome sequences. Currently, there are more than 7500 influenza A virus genomes in the database. There are no efficient ways of representing this huge data set as a whole, thus preventing a further understanding of the diversity of the influenza A virus genome. Results Here we present a new algorithm, "PhyloMap", which combines ordination, vector quantization, and phylogenetic tree construction to give an elegant representation of a large sequence data set. The use of PhyloMap on influenza A virus genome sequences reveals the phylogenetic relationships of the internal genes that cannot be seen when only a subset of sequences are analyzed. Conclusions The application of PhyloMap to influenza A virus genome data shows that it is a robust algorithm for analyzing large sequence data sets. It utilizes the entire data set, minimizes bias, and provides intuitive visualization. PhyloMap is implemented in JAVA, and the source code is freely available at http://www.biochem.uni-luebeck.de/public/software/phylomap.html
Collapse
Affiliation(s)
- Jiajie Zhang
- Institute of Biochemistry, Center for Structural and Cell Biology in Medicine, University of Lübeck, Ratzeburger Allee 160, 23538 Lübeck, Germany
| | | | | | | | | | | |
Collapse
|
7
|
Berger SA, Krompass D, Stamatakis A. Performance, accuracy, and Web server for evolutionary placement of short sequence reads under maximum likelihood. Syst Biol 2011; 60:291-302. [PMID: 21436105 PMCID: PMC3078422 DOI: 10.1093/sysbio/syr010] [Citation(s) in RCA: 335] [Impact Index Per Article: 25.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2010] [Revised: 06/08/2010] [Accepted: 01/24/2011] [Indexed: 11/23/2022] Open
Abstract
We present an evolutionary placement algorithm (EPA) and a Web server for the rapid assignment of sequence fragments (short reads) to edges of a given phylogenetic tree under the maximum-likelihood model. The accuracy of the algorithm is evaluated on several real-world data sets and compared with placement by pair-wise sequence comparison, using edit distances and BLAST. We introduce a slow and accurate as well as a fast and less accurate placement algorithm. For the slow algorithm, we develop additional heuristic techniques that yield almost the same run times as the fast version with only a small loss of accuracy. When those additional heuristics are employed, the run time of the more accurate algorithm is comparable with that of a simple BLAST search for data sets with a high number of short query sequences. Moreover, the accuracy of the EPA is significantly higher, in particular when the sample of taxa in the reference topology is sparse or inadequate. Our algorithm, which has been integrated into RAxML, therefore provides an equally fast but more accurate alternative to BLAST for tree-based inference of the evolutionary origin and composition of short sequence reads. We are also actively developing a Web server that offers a freely available service for computing read placements on trees using the EPA.
Collapse
Affiliation(s)
- Simon A. Berger
- The Exelixis Lab, Scientific Computing Group, Heidelberg Institute for Theoretical Studies, Schloss-Wolfsbrunnenweg 35, D-69118 Heidelberg, Germany
| | - Denis Krompass
- The Exelixis Lab, Scientific Computing Group, Heidelberg Institute for Theoretical Studies, Schloss-Wolfsbrunnenweg 35, D-69118 Heidelberg, Germany
| | - Alexandros Stamatakis
- The Exelixis Lab, Scientific Computing Group, Heidelberg Institute for Theoretical Studies, Schloss-Wolfsbrunnenweg 35, D-69118 Heidelberg, Germany
| |
Collapse
|
8
|
Stamatakis A, Göker M, Grimm GW. Maximum Likelihood Analyses of 3,490 rbcL Sequences: Scalability of Comprehensive Inference versus Group-Specific Taxon Sampling. Evol Bioinform Online 2010; 6:73-90. [PMID: 20535232 PMCID: PMC2880847 DOI: 10.4137/ebo.s4528] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
The constant accumulation of sequence data poses new computational and methodological challenges for phylogenetic inference, since multiple sequence alignments grow both in the horizontal (number of base pairs, phylogenomic alignments) as well as vertical (number of taxa) dimension. Put aside the ongoing controversial discussion about appropriate models, partitioning schemes, and assembly methods for phylogenomic alignments, coupled with the high computational cost to infer these, for many organismic groups, a sufficient number of taxa is often exclusively available from one or just a few genes (e.g., rbcL, matK, rDNA). In this paper we address scalability of Maximum-Likelihood-based phylogeny reconstruction with respect to the number of taxa by example of several large nested single-gene rbcL alignments comprising 400 up to 3,491 taxa. In order to test the effect of taxon sampling, we employ an appropriately adapted taxon jackknifing approach. In contrast to standard jackknifing, this taxon subsampling procedure is not conducted entirely at random, but based on drawing subsamples from empirical taxon-groups which can either be user-defined or determined by using taxonomic information from databases. Our results indicate that, despite an unfavorable number of sequences to number of base pairs ratio, i.e., many relatively short sequences, Maximum Likelihood tree searches and bootstrap analyses scale well on single-gene rbcL alignments with a dense taxon sampling up to several thousand sequences. Moreover, the newly implemented taxon subsampling procedure can be beneficial for inferring higher level relationships and interpreting bootstrap support from comprehensive analysis.
Collapse
Affiliation(s)
- Alexandros Stamatakis
- The Exelixis Lab, Dept. of Computer Science, Technische Universität München, Germany
| | - Markus Göker
- German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany
| | - Guido W. Grimm
- Department of Palaeobotany, Swedish Museum of Natural History, Stockholm, Sweden
| |
Collapse
|
9
|
Price MN, Dehal PS, Arkin AP. FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Mol Biol Evol 2009; 26:1641-50. [PMID: 19377059 PMCID: PMC2693737 DOI: 10.1093/molbev/msp077] [Citation(s) in RCA: 3137] [Impact Index Per Article: 209.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Gene families are growing rapidly, but standard methods for inferring phylogenies do not scale to alignments with over 10,000 sequences. We present FastTree, a method for constructing large phylogenies and for estimating their reliability. Instead of storing a distance matrix, FastTree stores sequence profiles of internal nodes in the tree. FastTree uses these profiles to implement Neighbor-Joining and uses heuristics to quickly identify candidate joins. FastTree then uses nearest neighbor interchanges to reduce the length of the tree. For an alignment with N sequences, L sites, and a different characters, a distance matrix requires O(N2) space and O(N2L) time, but FastTree requires just O(NLa + N) memory and O(Nlog (N)La) time. To estimate the tree's reliability, FastTree uses local bootstrapping, which gives another 100-fold speedup over a distance matrix. For example, FastTree computed a tree and support values for 158,022 distinct 16S ribosomal RNAs in 17 h and 2.4 GB of memory. Just computing pairwise Jukes–Cantor distances and storing them, without inferring a tree or bootstrapping, would require 17 h and 50 GB of memory. In simulations, FastTree was slightly more accurate than Neighbor-Joining, BIONJ, or FastME; on genuine alignments, FastTree's topologies had higher likelihoods. FastTree is available at http://microbesonline.org/fasttree.
Collapse
Affiliation(s)
- Morgan N Price
- Physical Biosciences Division, Lawrence Berkeley National Laboratory, CA, USA.
| | | | | |
Collapse
|
10
|
Yoon HS, Grant J, Tekle YI, Wu M, Chaon BC, Cole JC, Logsdon JM, Patterson DJ, Bhattacharya D, Katz LA. Broadly sampled multigene trees of eukaryotes. BMC Evol Biol 2008; 8:14. [PMID: 18205932 PMCID: PMC2249577 DOI: 10.1186/1471-2148-8-14] [Citation(s) in RCA: 114] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2007] [Accepted: 01/18/2008] [Indexed: 11/17/2022] Open
Abstract
Background Our understanding of the eukaryotic tree of life and the tremendous diversity of microbial eukaryotes is in flux as additional genes and diverse taxa are sampled for molecular analyses. Despite instability in many analyses, there is an increasing trend to classify eukaryotic diversity into six major supergroups: the 'Amoebozoa', 'Chromalveolata', 'Excavata', 'Opisthokonta', 'Plantae', and 'Rhizaria'. Previous molecular analyses have often suffered from either a broad taxon sampling using only single-gene data or have used multigene data with a limited sample of taxa. This study has two major aims: (1) to place taxa represented by 72 sequences, 61 of which have not been characterized previously, onto a well-sampled multigene genealogy, and (2) to evaluate the support for the six putative supergroups using two taxon-rich data sets and a variety of phylogenetic approaches. Results The inferred trees reveal strong support for many clades that also have defining ultrastructural or molecular characters. In contrast, we find limited to no support for most of the putative supergroups as only the 'Opisthokonta' receive strong support in our analyses. The supergroup 'Amoebozoa' has only moderate support, whereas the 'Chromalveolata', 'Excavata', 'Plantae', and 'Rhizaria' receive very limited or no support. Conclusion Our analytical approach substantiates the power of increased taxon sampling in placing diverse eukaryotic lineages within well-supported clades. At the same time, this study indicates that the six supergroup hypothesis of higher-level eukaryotic classification is likely premature. The use of a taxon-rich data set with 105 lineages, which still includes only a small fraction of the diversity of microbial eukaryotes, fails to resolve deeper phylogenetic relationships and reveals no support for four of the six proposed supergroups. Our analyses provide a point of departure for future taxon- and gene-rich analyses of the eukaryotic tree of life, which will be critical for resolving their phylogenetic interrelationships.
Collapse
Affiliation(s)
- Hwan Su Yoon
- Department of Biological Sciences, Smith College, Northampton, MA 01063, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
11
|
Pepke SL, Butt D, Nadeau I, Roger AJ, Blouin C. Using confidence set heuristics during topology search improves the robustness of phylogenetic inference. J Mol Evol 2006; 64:80-9. [PMID: 17160642 DOI: 10.1007/s00239-006-0072-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2006] [Accepted: 10/03/2006] [Indexed: 10/23/2022]
Abstract
We examine the impact of likelihood surface characteristics on phylogenetic inference. Amino acid data sets simulated from topologies with branch length features chosen to represent varying degrees of difficulty for likelihood maximization are analyzed. We present situations where the tree found to achieve the global maximum in likelihood is often not equal to the true tree. We use the program covSEARCH to demonstrate how the use of adaptively sized pools of candidate trees that are updated using confidence tests results in solution sets that are highly likely to contain the true tree. This approach requires more computation than traditional maximum likelihood methods, hence covSEARCH is best suited to small to medium-sized alignments or large alignments with some constrained nodes. The majority rule consensus tree computed from the confidence sets also proves to be different from the generating topology. Although low phylogenetic signal in the input alignment can result in large confidence sets of trees, some biological information can still be obtained based on nodes that exhibit high support within the confidence set. Two real data examples are analyzed: mammal mitochondrial proteins and a small tubulin alignment. We conclude that the technique of confidence set optimization can significantly improve the robustness of phylogenetic inference at a reasonable computational cost. Additionally, when either very short internal branches or very long terminal branches are present, confident resolution of specific bipartitions or subtrees, rather than whole-tree phylogenies, may be the most realistic goal for phylogenetic methods.
Collapse
Affiliation(s)
- Shirley L Pepke
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada B3H 1X5
| | | | | | | | | |
Collapse
|
12
|
Short-wavelength sensitive opsin (SWS1) as a new marker for vertebrate phylogenetics. BMC Evol Biol 2006; 6:97. [PMID: 17107620 PMCID: PMC1664589 DOI: 10.1186/1471-2148-6-97] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2006] [Accepted: 11/15/2006] [Indexed: 11/23/2022] Open
Abstract
Background Vertebrate SWS1 visual pigments mediate visual transduction in response to light at short wavelengths. Due to their importance in vision, SWS1 genes have been isolated from a surprisingly wide range of vertebrates, including lampreys, teleosts, amphibians, reptiles, birds, and mammals. The SWS1 genes exhibit many of the characteristics of genes typically targeted for phylogenetic analyses. This study investigates both the utility of SWS1 as a marker for inferring vertebrate phylogenetic relationships, and the characteristics of the gene that contribute to its phylogenetic utility. Results Phylogenetic analyses of vertebrate SWS1 genes produced topologies that were remarkably congruent with generally accepted hypotheses of vertebrate evolution at both higher and lower taxonomic levels. The few exceptions were generally associated with areas of poor taxonomic sampling, or relationships that have been difficult to resolve using other molecular markers. The SWS1 data set was characterized by a substantial amount of among-site rate variation, and a relatively unskewed substitution rate matrix, even when the data were partitioned into different codon sites and individual taxonomic groups. Although there were nucleotide biases in some groups at third positions, these biases were not convergent across different taxonomic groups. Conclusion Our results suggest that SWS1 may be a good marker for vertebrate phylogenetics due to the variable yet consistent patterns of sequence evolution exhibited across fairly wide taxonomic groups. This may result from constraints imposed by the functional role of SWS1 pigments in visual transduction.
Collapse
|
13
|
Stamatakis A, Ott M, Ludwig T. RAxML-OMP: An Efficient Program for Phylogenetic Inference on SMPs. LECTURE NOTES IN COMPUTER SCIENCE 2005. [DOI: 10.1007/11535294_25] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
|
14
|
Bininda-Emonds ORP, Gittleman JL, Steel MA. The (Super)Tree of Life: Procedures, Problems, and Prospects. ACTA ACUST UNITED AC 2002. [DOI: 10.1146/annurev.ecolsys.33.010802.150511] [Citation(s) in RCA: 193] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Olaf R. P. Bininda-Emonds
- Current address: Lehrstuhl für Tierzucht, Technical University of Munich, D-85354 Freising-Weihenstephan, Germany;
- Institute of Evolutionary and Ecological Sciences, Leiden University, Kaiserstraat 63, 2300 RA Leiden, The Netherlands
- Department of Biology, Gilmer Hall, University of Virginia, Charlottesville, Virginia 22904-4328;
- Biomathematics Research Center, Department of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand;
| | - John L. Gittleman
- Current address: Lehrstuhl für Tierzucht, Technical University of Munich, D-85354 Freising-Weihenstephan, Germany;
- Institute of Evolutionary and Ecological Sciences, Leiden University, Kaiserstraat 63, 2300 RA Leiden, The Netherlands
- Department of Biology, Gilmer Hall, University of Virginia, Charlottesville, Virginia 22904-4328;
- Biomathematics Research Center, Department of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand;
| | - Mike A. Steel
- Current address: Lehrstuhl für Tierzucht, Technical University of Munich, D-85354 Freising-Weihenstephan, Germany;
- Institute of Evolutionary and Ecological Sciences, Leiden University, Kaiserstraat 63, 2300 RA Leiden, The Netherlands
- Department of Biology, Gilmer Hall, University of Virginia, Charlottesville, Virginia 22904-4328;
- Biomathematics Research Center, Department of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand;
| |
Collapse
|