1
|
Lesica P, Lavin M. Will molecular phylogenetics help decrease nomenclatural instability? Am J Bot 2023; 110:e16219. [PMID: 37561649 DOI: 10.1002/ajb2.16219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Revised: 06/26/2023] [Accepted: 06/27/2023] [Indexed: 08/12/2023]
Affiliation(s)
- Peter Lesica
- Division of Biological Sciences, University of Montana, Missoula, 59812, Montana, USA
| | - Matt Lavin
- Plant Sciences and Plant Pathology Department, Montana State University, Bozeman, 59717, Montana, USA
| |
Collapse
|
2
|
Briand S, Dessimoz C, El-Mabrouk N, Nevers Y. A Linear Time Solution to the Labeled Robinson-Foulds Distance Problem. Syst Biol 2022; 71:1391-1403. [PMID: 35426933 PMCID: PMC9557742 DOI: 10.1093/sysbio/syac028] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Revised: 04/02/2022] [Accepted: 04/07/2022] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION A large variety of pairwise measures of similarity or dissimilarity have been developed for comparing phylogenetic trees, e.g. species trees or gene trees. Due to its intuitive definition in terms of tree clades and bipartitions and its computational efficiency, the Robinson-Foulds (RF) distance is the most widely used for trees with unweighted edges and labels restricted to leaves (representing the genetic elements being compared). However, in the case of gene trees, an important information revealing the nature of the homologous relation between gene pairs (orthologs, paralogs, xenologs) is the type of event associated to each internal node of the tree, typically speciations or duplications, but other types of events may also be considered, such as horizontal gene transfers. This labeling of internal nodes is usually inferred from a gene tree/species tree reconciliation method. Here, we address the problem of comparing such event-labeled trees. The problem differs from the classical problem of comparing uniformly labeled trees (all labels belonging to the same alphabet) that may be done using the Tree Edit Distance (TED) mainly due to the fact that, in our case, two different alphabets are considered for the leaves and internal nodes of the tree, and leaves are not affected by edit operations. RESULTS We propose an extension of the RF distance to event-labeled trees, based on edit operations comparable to those considered for TED: node insertion, node deletion and label substitution. We show that this new Labeled Robinson Foulds (LRF) distance can be computed in linear time, in addition of maintaining other desirable properties: being a metric, reducing to RF for trees with no labels on internal nodes and maintaining an intuitive interpretation. The algorithm for computing the LRF distance enables novel analyses on event-label trees such as reconciled gene trees. Here, we use it to study the impact of taxon sampling on labeled gene tree inference, and conclude that denser taxon sampling yields trees with better topology but worse labeling.
Collapse
Affiliation(s)
- Samuel Briand
- Département d'informatique et de recherche opérationnelle (DIRO), Universit de Montral, Canada
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Switzerland.,Centre for Lifes Origins and Evolution, Genetics Evolution and Environment, University College London, UK.,Department of Computer Science, University College London, UK.,SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Nadia El-Mabrouk
- Département d'informatique et de recherche opérationnelle (DIRO), Universit de Montral, Canada
| | - Yannis Nevers
- Department of Computational Biology, University of Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
3
|
Tea YK, Xu X, DiBattista JD, Lo N, Cowman PF, Ho SYW. Phylogenomic Analysis of Concatenated Ultraconserved Elements Reveals the Recent Evolutionary Radiation of the Fairy Wrasses (Teleostei: Labridae: Cirrhilabrus). Syst Biol 2021; 71:1-12. [PMID: 33620490 DOI: 10.1093/sysbio/syab012] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Revised: 02/11/2021] [Accepted: 02/17/2021] [Indexed: 01/22/2023] Open
Abstract
The fairy wrasses (genus Cirrhilabrus) are among the most successful of the extant wrasse lineages (Teleostei: Labridae), with their 61 species accounting for nearly 10% of the family. Although species complexes within the genus have been diagnosed on the basis of coloration patterns and synapomorphies, attempts to resolve evolutionary relationships among these groups using molecular and morphological data have largely been unsuccessful. Here we use a phylogenomic approach with a data set comprising 991 ultraconserved elements (UCEs) and mitochondrial COI to uncover the evolutionary history and patterns of temporal and spatial diversification of the fairy wrasses. Our analyses of phylogenetic signal suggest that most gene-tree incongruence is caused by estimation error, leading to poor resolution in a summary-coalescent analysis of the data. In contrast, analyses of concatenated sequences are able to resolve the major relationships of Cirrhilabrus. We determine the placements of species that were previously regarded as incertae sedis and find evidence for the nesting of Conniella, an unusual, monotypic genus, within Cirrhilabrus. Our relaxed-clock dating analysis indicates that the major divergences within the genus occurred around the Miocene-Pliocene boundary, followed by extensive cladogenesis of species complexes in the Pliocene-Pleistocene. Biogeographic reconstruction suggests that the fairy wrasses emerged within the Coral Triangle, with episodic fluctuations of sea levels during glacial cycles coinciding with shallow divergence events but providing few opportunities for more widespread dispersal. Our study demonstrates both the resolving power and limitations of UCEs across shallow timescales where there is substantial estimation error in individual gene trees.
Collapse
Affiliation(s)
- Yi-Kai Tea
- School of Life and Environmental Sciences, University of Sydney, New South Wales 2006, Australia.,Australian Museum Research Institute, Australian Museum, 1 William St, Sydney, New South Wales 2010, Australia
| | - Xin Xu
- School of Life and Environmental Sciences, University of Sydney, New South Wales 2006, Australia.,College of Life Sciences, Hunan Normal University, Changsha, Hunan 410081, China
| | - Joseph D DiBattista
- Australian Museum Research Institute, Australian Museum, 1 William St, Sydney, New South Wales 2010, Australia
| | - Nathan Lo
- School of Life and Environmental Sciences, University of Sydney, New South Wales 2006, Australia
| | - Peter F Cowman
- ARC Centre of Excellence for Coral Reef Studies, James Cook University, Townsville, Queensland, Australia.,Biodiversity and Geosciences Program, Museum of Tropical Queensland, Queensland Museum, Townsville, Queensland 4810, Australia
| | - Simon Y W Ho
- School of Life and Environmental Sciences, University of Sydney, New South Wales 2006, Australia
| |
Collapse
|
4
|
Man J, Gallagher JP, Bartlett M. Structural evolution drives diversification of the large LRR-RLK gene family. New Phytol 2020; 226:1492-1505. [PMID: 31990988 PMCID: PMC7318236 DOI: 10.1111/nph.16455] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/16/2019] [Accepted: 01/19/2020] [Indexed: 05/11/2023]
Abstract
●Cells are continuously exposed to chemical signals that they must discriminate between and respond to appropriately. In embryophytes, the leucine-rich repeat receptor-like kinases (LRR-RLKs) are signal receptors critical in development and defense. LRR-RLKs have diversified to hundreds of genes in many plant genomes. Although intensively studied, a well-resolved LRR-RLK gene tree has remained elusive. ●To resolve the LRR-RLK gene tree, we developed an improved gene discovery method based on iterative hidden Markov model searching and phylogenetic inference. We used this method to infer complete gene trees for each of the LRR-RLK subclades and reconstructed the deepest nodes of the full gene family. ●We discovered that the LRR-RLK gene family is even larger than previously thought, and that protein domain gains and losses are prevalent. These structural modifications, some of which likely predate embryophyte diversification, led to misclassification of some LRR-RLK variants as members of other gene families. Our work corrects this misclassification. ●Our results reveal ongoing structural evolution generating novel LRR-RLK genes. These new genes are raw material for the diversification of signaling in development and defense. Our methods also enable phylogenetic reconstruction in any large gene family.
Collapse
Affiliation(s)
- Jarrett Man
- Biology DepartmentUniversity of Massachusetts Amherst611 North Pleasant Street, 221 Morrill 3AmherstMA01003USA
| | - Joseph P. Gallagher
- Biology DepartmentUniversity of Massachusetts Amherst611 North Pleasant Street, 221 Morrill 3AmherstMA01003USA
| | - Madelaine Bartlett
- Biology DepartmentUniversity of Massachusetts Amherst611 North Pleasant Street, 221 Morrill 3AmherstMA01003USA
| |
Collapse
|
5
|
Mao Y, Hou S, Shi J, Economo EP. TREEasy: An automated workflow to infer gene trees, species trees, and phylogenetic networks from multilocus data. Mol Ecol Resour 2020; 20. [PMID: 32073732 DOI: 10.1111/1755-0998.13149] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2019] [Revised: 01/27/2020] [Accepted: 02/10/2020] [Indexed: 11/30/2022]
Abstract
Multilocus genomic data sets can be used to infer a rich set of information about the evolutionary history of a lineage, including gene trees, species trees, and phylogenetic networks. However, user-friendly tools to run such integrated analyses are lacking, and workflows often require tedious reformatting and handling time to shepherd data through a series of individual programs. Here, we present a tool written in Python-TREEasy-that performs automated sequence alignment (with MAFFT), gene tree inference (with IQ-Tree), species inference from concatenated data (with IQ-Tree and RaxML-NG), species tree inference from gene trees (with ASTRAL, MP-EST, and STELLS2), and phylogenetic network inference (with SNaQ and PhyloNet). The tool only requires FASTA files and nine parameters as inputs. The tool can be run as command line or through a Graphical User Interface (GUI). As examples, we reproduced a recent analysis of staghorn coral evolution, and performed a new analysis on the evolution of the "WGD clade" of yeast. The latter revealed novel patterns that were not identified by previous analyses. TREEasy represents a reliable and simple tool to accelerate research in systematic biology (https://github.com/MaoYafei/TREEasy).
Collapse
Affiliation(s)
- Yafei Mao
- Biodiversity and Biocomplexity Unit, Okinawa Institute of Science and Technology Graduate University, Onna, Japan
| | - Siqing Hou
- Cognitive Neurorobotics Research Unit, Okinawa Institute of Science and Technology Graduate University, Onna, Japan
| | - Junfeng Shi
- Shanghai Key Laboratory of Stomatology, Shanghai Research Institute of Stomatology, Shanghai Ninth People's Hospital, College of Stomatology, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Evan P Economo
- Biodiversity and Biocomplexity Unit, Okinawa Institute of Science and Technology Graduate University, Onna, Japan
| |
Collapse
|
6
|
Blanco-Pastor JL, Bertrand YJK, Liberal IM, Wei Y, Brummer EC, Pfeil BE. Evolutionary networks from RADseq loci point to hybrid origins of Medicago carstiensis and Medicago cretacea. Am J Bot 2019; 106:1219-1228. [PMID: 31535720 DOI: 10.1002/ajb2.1352] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/21/2019] [Accepted: 07/12/2019] [Indexed: 06/10/2023]
Abstract
PREMISE Although hybridization has played an important role in the evolution of many plant species, phylogenetic reconstructions that include hybridizing lineages have been historically constrained by the available models and data. Restriction-site-associated DNA sequencing (RADseq) has been a popular sequencing technique for the reconstruction of hybridization in the next-generation sequencing era. However, the utility of RADseq for the reconstruction of complex evolutionary networks has not been thoroughly investigated. Conflicting phylogenetic relationships in the genus Medicago have been mainly attributed to hybridization, but the specific hybrid origins of taxa have not been yet clarified. METHODS We obtained new molecular data from diploid species of Medicago section Medicago using single-digest RADseq to reconstruct evolutionary networks from gene trees, an approach that is computationally tractable with data sets that include several species and complex hybridization patterns. RESULTS Our analyses revealed that assembly filters to exclusively select a small set of loci with high phylogenetic information led to the most-divergent network topologies. Conversely, alternative clustering thresholds or filters on the number of samples per locus had a lower impact on networks. A strong hybridization signal was detected for M. carstiensis and M. cretacea, while signals were less clear for M. rugosa, M. rhodopea, M. suffruticosa, M. marina, M. scutellata, and M. sativa. CONCLUSIONS Complex network reconstructions from RADseq gene trees were not robust under variations of the assembly parameters and filters. But when the most-divergent networks were discarded, all remaining analyses consistently supported a hybrid origin for M. carstiensis and M. cretacea.
Collapse
Affiliation(s)
- José Luis Blanco-Pastor
- Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, 40530, Göteborg, Sweden
- INRA, Centre Nouvelle-Aquitaine-Poitiers, UR4 (URP3F), 86600, Lusignan, France
| | - Yann J K Bertrand
- Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, 40530, Göteborg, Sweden
- Institute of Botany, Czech Academy of Sciences, Zámek 1, 25243, Průhonice, Czech Republic
| | | | - Yanling Wei
- Plant Breeding Center, Department of Plant Sciences, University of California, Davis, Davis, CA, USA
| | - E Charles Brummer
- Plant Breeding Center, Department of Plant Sciences, University of California, Davis, Davis, CA, USA
| | - Bernard E Pfeil
- Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, 40530, Göteborg, Sweden
| |
Collapse
|
7
|
Cusimano N, Renner SS. Sequential horizontal gene transfers from different hosts in a widespread Eurasian parasitic plant, Cynomorium coccineum. Am J Bot 2019; 106:679-689. [PMID: 31081928 DOI: 10.1002/ajb2.1286] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/08/2018] [Accepted: 03/19/2019] [Indexed: 06/09/2023]
Abstract
PREMISE Parasitic plants with large geographic ranges, and different hosts in parts of their range, may acquire horizontally transferred genes (HGTs), which might sometimes leave a footprint of gradual host and range expansion. Cynomorium coccineum, the only member of the Saxifragales family Cynomoriaceae, is a root holoparasite that occurs in water-stressed habitats from western China to the Canary Islands. It parasitizes at least 10 angiosperm families from different orders, some of them only in parts of its range. This parasite therefore offers an opportunity to trace HGTs as long as parasite-host pairs can be obtained and sequenced. METHODS By sequencing mitochondrial, plastid, and nuclear loci from parasite-host pairs from throughout the parasite's range and with prior information from completely assembled mitochondrial and plastid genomes, we detected 10 HGTs of five mitochondrial genes. RESULTS The 10 HGTs appear to have occurred sequentially as C. coccineum expanded from East to West. Molecular-clock models yield Cynomorium stem ages between 66 and 156 Myr, with relaxed clocks converging on 66-67 Myr. Chinese Sapindales, probably Nitraria, were the first source of transferred genes, followed by Iranian and Mediterranean Caryophyllales. The most recently acquired gene appears to come from a Tamarix host in the Iberian Peninsula. CONCLUSIONS Data on HGTs that have accumulated over the past 15 years, along with this discovery of multiple HGTs within a single widespread species, underline the need for more whole-genome data from parasite-host pairs to investigate whether and how transferred copies coexist with, or replace, native functional genes.
Collapse
Affiliation(s)
- Natalie Cusimano
- Systematic Botany and Mycology, Faculty of Biology, University of Munich (LMU), Munich, Germany
| | - Susanne S Renner
- Systematic Botany and Mycology, Faculty of Biology, University of Munich (LMU), Munich, Germany
| |
Collapse
|
8
|
Kang Q, Schardl CL, Moore N, Yoshida R. CURatio: Genome-wide phylogenomic analysis method using ratios of total branch lengths. IEEE/ACM Trans Comput Biol Bioinform 2018; 17:10.1109/TCBB.2018.2878564. [PMID: 30387738 PMCID: PMC7372714 DOI: 10.1109/tcbb.2018.2878564] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Evolutionary hypotheses provide important underpinnings of biological and medical sciences, and comprehensive, genome-wide understanding of evolutionary relationships among organisms are needed to test and refine such hypotheses. Theory and empirical evidence clearly indicate that phylogenies (trees) of different genes (loci) should not display precisely matching topologies. The main reason for such phylogenetic incongruence is reticulated evolutionary history of most species due to meiotic sexual recombination in eukaryotes, or horizontal transfers of genetic material in prokaryotes. Nevertheless, many genes should display topologically related phylogenies, and should group into one or more (for genetic hybrids) clusters in poly-dimensional "tree space". Unusual evolutionary histories or effects of selection may result in "outlier" genes with phylogenies that fall outside the main distribution(s) of trees in tree space. We present a new phylogenomic method, CURatio, which uses ratios of total branch lengths in gene trees to help identify phylogenetic outliers in a given set of ortholog groups from multiple genomes. An advantage of CURatio over other methods is that genes absent from and/or duplicated in some genomes can be included in the analysis. We conducted a simulation study under the coalescent model, and showed that, given sufficient species depth and topological difference, these ratios are significantly higher for the "outlier" gene phylogenies. Also, we applied CURatio to a set of annotated genomes of the fungal family, Clavicipitaceae, and identified alkaloid biosynthesis genes as outliers, probably due to a history of duplication and loss. The source code is available at https://github.com/QiwenKang/CURatio, and the empirical data set on Clavicipitaceae and simulated data set are available at Mendeley https://data.mendeley.com/datasets/mrxts7wjrr/1.
Collapse
|
9
|
Abstract
Species tree reconstruction from genome-wide data is increasingly being attempted, in most cases using a two-step approach of first estimating individual gene trees and then summarizing them to obtain a species tree. The accuracy of this approach, which promises to account for gene tree discordance, depends on the quality of the inferred gene trees. At the same time, phylogenomic and phylotranscriptomic analyses typically use involved bioinformatics pipelines for data preparation. Errors and shortcomings resulting from these preprocessing steps may impact the species tree analyses at the other end of the pipeline. In this article, we first show that the presence of fragmentary data for some species in a gene alignment, as often seen on real data, can result in substantial deterioration of gene trees, and as a result, the species tree. We then investigate a simple filtering strategy where individual fragmentary sequences are removed from individual genes but the rest of the gene is retained. Both in simulations and by reanalyzing a large insect phylotranscriptomic data set, we show the effectiveness of this simple filtering strategy.
Collapse
Affiliation(s)
- Erfan Sayyari
- Department of Electrical and Computer Engineering, University of California at San Diego, La Jolla, CA
| | | | - Siavash Mirarab
- Department of Electrical and Computer Engineering, University of California at San Diego, La Jolla, CA
| |
Collapse
|
10
|
Abstract
Phylogenetic relationships of the Abyssinian pea (Pisum sativum ssp. abyssinicum) to other subspecies and species in the genus were investigated to test between different hypotheses regarding its origin and domestication. An extensive sample of the Pisum sativum ssp. sativum germplasm was investigated, including groups a-1, a-2, b, c, and d as identified by Kwon et al. (2012). A broad sample of P. fulvum but relatively few P. s. ssp. elatius accessions were analyzed. Partial sequences of 18 genes were compared and these results combined with comparisons of additional genes done by others and available in the literature. In total, 54 genes or gene fragment sequences were involved in the study. The observed affinities between alleles in P. ssp. sativum, P. s. ssp. abyssinicum, P. s. ssp. elatius, and P. fulvum clearly demonstrated a close relationship among the three P. sativum subspecies and rejected the hypothesis that the Abyssinian pea was formed by hybridization between one of the P. sativum subspecies and P. fulvum. If hybridization were involved in the generation of the Abyssinian pea, it must have been between P. s. ssp. sativum and P. s. ssp. elatius, although the Abyssinian pea possesses a considerable number of highly unique alleles, implying that the actual P. s. ssp. elatius germplasm involved in such a hybridization has yet to be tested or that the hybridization occurred much longer ago than the postulated 4000 years bp. Analysis of the P. s. ssp. abyssinicum alleles in genomic regions thought to contain genes critical for domestication indicated that the indehiscent pod trait was independently developed in the Abyssinian pea, whereas the loss of seed dormancy was either derived from P. s. ssp. sativum or at least partially developed before the P. s. ssp. abyssinicum lineage diverged from that leading to P. s. ssp. sativum.
Collapse
Affiliation(s)
- Norman F. Weeden
- Department of Plant Sciences and Plant Pathology, Montana State University, Bozeman, MT, United States
| |
Collapse
|
11
|
Puttick MN, Morris JL, Williams TA, Cox CJ, Edwards D, Kenrick P, Pressel S, Wellman CH, Schneider H, Pisani D, Donoghue PCJ. The Interrelationships of Land Plants and the Nature of the Ancestral Embryophyte. Curr Biol 2018; 28:733-745.e2. [PMID: 29456145 DOI: 10.1016/j.cub.2018.01.063] [Citation(s) in RCA: 240] [Impact Index Per Article: 40.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2017] [Revised: 01/15/2018] [Accepted: 01/22/2018] [Indexed: 11/28/2022]
Abstract
The evolutionary emergence of land plant body plans transformed the planet. However, our understanding of this formative episode is mired in the uncertainty associated with the phylogenetic relationships among bryophytes (hornworts, liverworts, and mosses) and tracheophytes (vascular plants). Here we attempt to clarify this problem by analyzing a large transcriptomic dataset with models that allow for compositional heterogeneity between sites. Zygnematophyceae is resolved as sister to land plants, but we obtain several distinct relationships between bryophytes and tracheophytes. Concatenated sequence analyses that can explicitly accommodate site-specific compositional heterogeneity give more support for a mosses-liverworts clade, "Setaphyta," as the sister to all other land plants, and weak support for hornworts as the sister to all other land plants. Bryophyte monophyly is supported by gene concatenation analyses using models explicitly accommodating lineage-specific compositional heterogeneity and analyses of gene trees. Both maximum-likelihood analyses that compare the fit of each gene tree to proposed species trees and Bayesian supertree estimation based on gene trees support bryophyte monophyly. Of the 15 distinct rooted relationships for embryophytes, we reject all but three hypotheses, which differ only in the position of hornworts. Our results imply that the ancestral embryophyte was more complex than has been envisaged based on topologies recognizing liverworts as the sister lineage to all other embryophytes. This requires many phenotypic character losses and transformations in the liverwort lineage, diminishes inconsistency between phylogeny and the fossil record, and prompts re-evaluation of the phylogenetic affinity of early land plant fossils, the majority of which are considered stem tracheophytes.
Collapse
Affiliation(s)
- Mark N Puttick
- School of Earth Sciences, University of Bristol, Life Sciences Building, Tyndall Avenue, Bristol BS8 1TQ, UK; School of Biological Sciences, University of Bristol, Life Sciences Building, Tyndall Avenue, Bristol BS8 1TQ, UK; Department of Life Sciences, The Natural History Museum, Cromwell Road, London SW7 5BD, UK
| | - Jennifer L Morris
- School of Earth Sciences, University of Bristol, Life Sciences Building, Tyndall Avenue, Bristol BS8 1TQ, UK; School of Earth and Ocean Sciences, Cardiff University, Main Building, Park Place, Cardiff CF10 3AT, UK
| | - Tom A Williams
- School of Biological Sciences, University of Bristol, Life Sciences Building, Tyndall Avenue, Bristol BS8 1TQ, UK
| | - Cymon J Cox
- Centro de Ciências do Mar, Universidade do Algarve, Gambelas, 8005-319 Faro, Portugal
| | - Dianne Edwards
- School of Earth and Ocean Sciences, Cardiff University, Main Building, Park Place, Cardiff CF10 3AT, UK
| | - Paul Kenrick
- Department of Earth Sciences, The Natural History Museum, Cromwell Road, London SW7 5BD, UK
| | - Silvia Pressel
- Department of Life Sciences, The Natural History Museum, Cromwell Road, London SW7 5BD, UK
| | - Charles H Wellman
- Department of Animal and Plant Sciences, University of Sheffield, Alfred Denny Building, Western Bank, Sheffield S10 2TN, UK
| | - Harald Schneider
- Department of Life Sciences, The Natural History Museum, Cromwell Road, London SW7 5BD, UK; Center of Integrative Conservation, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Menglun, Yunnan, China.
| | - Davide Pisani
- School of Earth Sciences, University of Bristol, Life Sciences Building, Tyndall Avenue, Bristol BS8 1TQ, UK; Department of Life Sciences, The Natural History Museum, Cromwell Road, London SW7 5BD, UK.
| | - Philip C J Donoghue
- School of Earth Sciences, University of Bristol, Life Sciences Building, Tyndall Avenue, Bristol BS8 1TQ, UK.
| |
Collapse
|
12
|
Sayyari E, Whitfield JB, Mirarab S. Fragmentary Gene Sequences Negatively Impact Gene Tree and Species Tree Reconstruction. Mol Biol Evol 2017. [PMID: 29029241 DOI: 10.1093/molbev/msx261.] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Species tree reconstruction from genome-wide data is increasingly being attempted, in most cases using a two-step approach of first estimating individual gene trees and then summarizing them to obtain a species tree. The accuracy of this approach, which promises to account for gene tree discordance, depends on the quality of the inferred gene trees. At the same time, phylogenomic and phylotranscriptomic analyses typically use involved bioinformatics pipelines for data preparation. Errors and shortcomings resulting from these preprocessing steps may impact the species tree analyses at the other end of the pipeline. In this article, we first show that the presence of fragmentary data for some species in a gene alignment, as often seen on real data, can result in substantial deterioration of gene trees, and as a result, the species tree. We then investigate a simple filtering strategy where individual fragmentary sequences are removed from individual genes but the rest of the gene is retained. Both in simulations and by reanalyzing a large insect phylotranscriptomic data set, we show the effectiveness of this simple filtering strategy.
Collapse
Affiliation(s)
- Erfan Sayyari
- Department of Electrical and Computer Engineering, University of California at San Diego, La Jolla, CA
| | | | - Siavash Mirarab
- Department of Electrical and Computer Engineering, University of California at San Diego, La Jolla, CA
| |
Collapse
|
13
|
Abstract
Given a gene tree and a species tree, ancestral configurations represent the combinatorially distinct sets of gene lineages that can reach a given node of the species tree. They have been introduced as a data structure for use in the recursive computation of the conditional probability under the multispecies coalescent model of a gene tree topology given a species tree, the cost of this computation being affected by the number of ancestral configurations of the gene tree in the species tree. For matching gene trees and species trees, we obtain enumerative results on ancestral configurations. We study ancestral configurations in balanced and unbalanced families of trees determined by a given seed tree, showing that for seed trees with more than one taxon, the number of ancestral configurations increases for both families exponentially in the number of taxa n. For fixed n, the maximal number of ancestral configurations tabulated at the species tree root node and the largest number of labeled histories possible for a labeled topology occur for trees with precisely the same unlabeled shape. For ancestral configurations at the root, the maximum increases with [Formula: see text], where [Formula: see text] is a quadratic recurrence constant. Under a uniform distribution over the set of labeled trees of given size, the mean number of root ancestral configurations grows with [Formula: see text] and the variance with ∼[Formula: see text]. The results provide a contribution to the combinatorial study of gene trees and species trees.
Collapse
Affiliation(s)
- Filippo Disanto
- Department of Biology, Stanford University , Stanford, California
| | - Noah A Rosenberg
- Department of Biology, Stanford University , Stanford, California
| |
Collapse
|
14
|
Xu B, Yang Z. Challenges in Species Tree Estimation Under the Multispecies Coalescent Model. Genetics 2016; 204:1353-68. [PMID: 27927902 DOI: 10.1534/genetics.116.190173] [Citation(s) in RCA: 101] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2016] [Accepted: 09/25/2016] [Indexed: 11/18/2022] Open
Abstract
The multispecies coalescent (MSC) model has emerged as a powerful framework for inferring species phylogenies while accounting for ancestral polymorphism and gene tree-species tree conflict. A number of methods have been developed in the past few years to estimate the species tree under the MSC. The full likelihood methods (including maximum likelihood and Bayesian inference) average over the unknown gene trees and accommodate their uncertainties properly but involve intensive computation. The approximate or summary coalescent methods are computationally fast and are applicable to genomic datasets with thousands of loci, but do not make an efficient use of information in the multilocus data. Most of them take the two-step approach of reconstructing the gene trees for multiple loci by phylogenetic methods and then treating the estimated gene trees as observed data, without accounting for their uncertainties appropriately. In this article we review the statistical nature of the species tree estimation problem under the MSC, and explore the conceptual issues and challenges of species tree estimation by focusing mainly on simple cases of three or four closely related species. We use mathematical analysis and computer simulation to demonstrate that large differences in statistical performance may exist between the two classes of methods. We illustrate that several counterintuitive behaviors may occur with the summary methods but they are due to inefficient use of information in the data by summary methods and vanish when the data are analyzed using full-likelihood methods. These include (i) unidentifiability of parameters in the model, (ii) inconsistency in the so-called anomaly zone, (iii) singularity on the likelihood surface, and (iv) deterioration of performance upon addition of more data. We discuss the challenges and strategies of species tree inference for distantly related species when the molecular clock is violated, and highlight the need for improving the computational efficiency and model realism of the likelihood methods as well as the statistical efficiency of the summary methods.
Collapse
|
15
|
Abstract
We introduce a gene tree simulator that is designed for use in conjunction with approximate Bayesian computation approaches. We show that it can be used to determine the relative importance of hybrid speciation and introgression compared with incomplete lineage sorting (ILS) in producing patterns of incongruence across gene trees. Important features of the new simulator are (1) a choice of models to capture the decreasing probability of successful hybrid species formation or introgression as a function of genetic distance between potential parent species; (2) the ability for hybrid speciation to result in asymmetrical contributions of genetic material from each parent species; (3) the ability to vary the rates of hybrid speciation, introgression, and divergence speciation in different epochs; and (4) incorporation of the coalescent, so that patterns of incongruence due to ILS can be compared with those due to hybrid evolution. Given a set of gene trees generated by the simulator, we calculate a set of statistics, each measuring in a different way the discordance between the gene trees. We show that these statistics can be used to differentiate whether the gene tree discordance was largely due to hybridization, or only due to lineage sorting.
Collapse
Affiliation(s)
- Michael D Woodhams
- Discipline of Mathematics, School of Physical Sciences, University of Tasmania, Hobart, Australia
| | - Peter J Lockhart
- Institute of Fundamental Sciences, Massey University, Palmerston North, New Zealand
| | - Barbara R Holland
- Discipline of Mathematics, School of Physical Sciences, University of Tasmania, Hobart, Australia
| |
Collapse
|
16
|
Abstract
We present, implement, and evaluate an approach to calculate the internode certainty (IC) and tree certainty (TC) on a given reference tree from a collection of partial gene trees. Previously, the calculation of these values was only possible from a collection of gene trees with exactly the same taxon set as the reference tree. An application to sets of partial gene trees requires mathematical corrections in the IC and TC calculations. We implement our methods in RAxML and test them on empirical datasets. These tests imply that the inclusion of partial trees does matter. However, in order to provide meaningful measurements, any dataset should also include trees containing the full species set.
Collapse
Affiliation(s)
- Kassian Kobert
- Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Leonidas Salichos
- Department of Molecular Biophysics and Biochemistry, Yale University Department of Biological Sciences, Vanderbilt University
| | - Antonis Rokas
- Department of Biological Sciences, Vanderbilt University Department of Biomedical Informatics, Vanderbilt University Medical Center
| | - Alexandros Stamatakis
- Heidelberg Institute for Theoretical Studies, Heidelberg, Germany Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Postfach 6980, Karlsruhe, 76128, Germany
| |
Collapse
|
17
|
Abstract
BACKGROUND Evolutionary studies are complicated by discordance between gene trees and the species tree in which they evolved. Dealing with discordant trees often relies on comparison costs between gene and species trees, including the well-established Robinson-Foulds, gene duplication, and deep coalescence costs. While these costs have provided credible results for binary rooted gene trees, corresponding cost definitions for non-binary unrooted gene trees, which are frequently occurring in practice, are challenged by biological realism. RESULT We propose a natural extension of the well-established costs for comparing unrooted and non-binary gene trees with rooted binary species trees using a binary refinement model. For the duplication cost we describe an efficient algorithm that is based on a linear time reduction and also computes an optimal rooted binary refinement of the given gene tree. Finally, we show that similar reductions lead to solutions for computing the deep coalescence and the Robinson-Foulds costs. CONCLUSION Our binary refinement of Robinson-Foulds, gene duplication, and deep coalescence costs for unrooted and non-binary gene trees together with the linear time reductions provided here for computing these costs significantly extends the range of trees that can be incorporated into approaches dealing with discordance.
Collapse
Affiliation(s)
- Pawel Górecki
- Institute of Informatics, University of Warsaw, Banacha 2, 02-097 Warsaw, Poland
| | - Oliver Eulenstein
- Department of Computer Science, Iowa State University, Atanasoff Hall 212, 50011 Ames, USA
| |
Collapse
|
18
|
Sveinsson S, McDill J, Wong GKS, Li J, Li X, Deyholos MK, Cronk QCB. Phylogenetic pinpointing of a paleopolyploidy event within the flax genus (Linum) using transcriptomics. Ann Bot 2014; 113:753-61. [PMID: 24380843 PMCID: PMC3962240 DOI: 10.1093/aob/mct306] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/08/2013] [Accepted: 12/02/2013] [Indexed: 05/10/2023]
Abstract
BACKGROUND AND AIMS Cultivated flax (Linum usitatissimum) is known to have undergone a whole-genome duplication around 5-9 million years ago. The aim of this study was to investigate whether other whole-genome duplication events have occurred in the evolutionary history of cultivated flax. Knowledge of such whole-genome duplications will be important in understanding the biology and genomics of cultivated flax. METHODS Transcriptomes of 11 Linum species were sequenced using the Illumina platform. The short reads were assembled de novo and the DupPipe pipeline was used to look for signatures of polyploidy events from the age distribution of paralogues. In addition, phylogenies of all paralogues were assembled within an estimated age window of interest. These phylogenies were assessed for evidence of a paleopolyploidy event within the genus Linum. KEY RESULTS A previously unknown paleopolyploidy event that occurred 20-40 million years ago was discovered and shown to be specific to a clade within Linum containing cultivated flax (L. usitatissimum) and other mainly blue-flowered species. The finding was supported by two lines of evidence. First, a significant change of slope (peak) was shown in the age distribution of paralogues that was phylogenetically restricted to, and ubiquitous in, this clade. Second, a large number of paralogue phylogenies were retrieved that are consistent with a polyploidy event occurring within that clade. CONCLUSIONS The results show the utility of multi-species transcriptomics for detecting whole-genome duplication events and demonstrate that that multiple rounds of polyploidy have been important in shaping the evolutionary history of flax. Understanding and characterizing these whole-genome duplication events will be important for future Linum research.
Collapse
Affiliation(s)
- Saemundur Sveinsson
- Department of Botany and Biodiversity Research Centre, University of British Columbia, 6270 University Boulevard, Vancouver, BC V6T 1Z4, Canada
| | - Joshua McDill
- University of Alberta, CW405 Biological Sciences, Edmonton, AB T6G 2E9, Canada
| | - Gane K. S. Wong
- University of Alberta, CW405 Biological Sciences, Edmonton, AB T6G 2E9, Canada
| | - Juanjuan Li
- BGI-Shenzhen, Bei Shan Industrial Zone, Yantian District, Shenzhen, China
| | - Xia Li
- BGI-Shenzhen, Bei Shan Industrial Zone, Yantian District, Shenzhen, China
| | - Michael K. Deyholos
- University of Alberta, CW405 Biological Sciences, Edmonton, AB T6G 2E9, Canada
| | - Quentin C. B. Cronk
- Department of Botany and Biodiversity Research Centre, University of British Columbia, 6270 University Boulevard, Vancouver, BC V6T 1Z4, Canada
| |
Collapse
|
19
|
Dávalos LM, Cirranello AL, Geisler JH, Simmons NB. Understanding phylogenetic incongruence: lessons from phyllostomid bats. Biol Rev Camb Philos Soc 2012; 87:991-1024. [PMID: 22891620 PMCID: PMC3573643 DOI: 10.1111/j.1469-185x.2012.00240.x] [Citation(s) in RCA: 60] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2011] [Revised: 07/04/2012] [Accepted: 07/18/2012] [Indexed: 12/25/2022]
Abstract
All characters and trait systems in an organism share a common evolutionary history that can be estimated using phylogenetic methods. However, differential rates of change and the evolutionary mechanisms driving those rates result in pervasive phylogenetic conflict. These drivers need to be uncovered because mismatches between evolutionary processes and phylogenetic models can lead to high confidence in incorrect hypotheses. Incongruence between phylogenies derived from morphological versus molecular analyses, and between trees based on different subsets of molecular sequences has become pervasive as datasets have expanded rapidly in both characters and species. For more than a decade, evolutionary relationships among members of the New World bat family Phyllostomidae inferred from morphological and molecular data have been in conflict. Here, we develop and apply methods to minimize systematic biases, uncover the biological mechanisms underlying phylogenetic conflict, and outline data requirements for future phylogenomic and morphological data collection. We introduce new morphological data for phyllostomids and outgroups and expand previous molecular analyses to eliminate methodological sources of phylogenetic conflict such as taxonomic sampling, sparse character sampling, or use of different algorithms to estimate the phylogeny. We also evaluate the impact of biological sources of conflict: saturation in morphological changes and molecular substitutions, and other processes that result in incongruent trees, including convergent morphological and molecular evolution. Methodological sources of incongruence play some role in generating phylogenetic conflict, and are relatively easy to eliminate by matching taxa, collecting more characters, and applying the same algorithms to optimize phylogeny. The evolutionary patterns uncovered are consistent with multiple biological sources of conflict, including saturation in morphological and molecular changes, adaptive morphological convergence among nectar-feeding lineages, and incongruent gene trees. Applying methods to account for nucleotide sequence saturation reduces, but does not completely eliminate, phylogenetic conflict. We ruled out paralogy, lateral gene transfer, and poor taxon sampling and outgroup choices among the processes leading to incongruent gene trees in phyllostomid bats. Uncovering and countering the possible effects of introgression and lineage sorting of ancestral polymorphism on gene trees will require great leaps in genomic and allelic sequencing in this species-rich mammalian family. We also found evidence for adaptive molecular evolution leading to convergence in mitochondrial proteins among nectar-feeding lineages. In conclusion, the biological processes that generate phylogenetic conflict are ubiquitous, and overcoming incongruence requires better models and more data than have been collected even in well-studied organisms such as phyllostomid bats.
Collapse
Affiliation(s)
- Liliana M Dávalos
- Department of Ecology and Evolution, and Consortium for Inter-Disciplinary Environmental Research, State University of New York at Stony BrookStony Brook, NY 11794, USA
| | - Andrea L Cirranello
- Division of Vertebrate Zoology (Mammalogy), American Museum of Natural HistoryNew York, NY 10024, USA
- Department of Anatomical Sciences, State University of New York at Stony BrookStony Brook, NY 11794, USA
| | - Jonathan H Geisler
- Department of Anatomy, New York College of Osteopathic MedicineOld Westbury, NY 11568, USA
| | - Nancy B Simmons
- Division of Vertebrate Zoology (Mammalogy), American Museum of Natural HistoryNew York, NY 10024, USA
| |
Collapse
|