1
|
Secaira-Morocho H, Jiang X, Zhu Q. Augmenting microbial phylogenomic signal with tailored marker gene sets. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.03.13.643052. [PMID: 40161675 PMCID: PMC11952537 DOI: 10.1101/2025.03.13.643052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 04/02/2025]
Abstract
Phylogenetic marker genes are traditionally selected from a fixed collection of whole genomes evenly distributed across major microbial phyla, covering only a small fraction of gene families. And yet, most microbial diversity is found in metagenome-assembled genomes that are unevenly distributed and harbor gene families that do not fit the criteria of universal orthologous genes. To address these limitations, we systematically evaluate the phylogenetic signal of gene families annotated from KEGG and EggNOG functional databases for deep microbial phylogenomics. We show that markers selected from an expanded pool of gene families and tailored to the input genomes improve the accuracy of phylogenetic trees across simulated and real-world datasets of whole genomes and metagenome-assembled genomes. The improved accuracy of trees compared to previous markers persists even when metagenome-assembled genomes lack a fraction of open reading frames. The selected markers have functional annotations related to metabolism, cellular processes, and environmental information processing, in addition to replication, translation, and transcription. We introduce TMarSel, a software tool for automated, systematic, free-from-expert opinion, and tailored marker selection that provides flexibility in the number of markers and annotation databases while remaining robust against uneven taxon sampling and incomplete genomic data.
Collapse
Affiliation(s)
- Henry Secaira-Morocho
- Center for Fundamental and Applied Microbiomics and School of Life Sciences, Arizona State University, Tempe, AZ 85287, USA
- National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Xiaofang Jiang
- National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Qiyun Zhu
- Center for Fundamental and Applied Microbiomics and School of Life Sciences, Arizona State University, Tempe, AZ 85287, USA
| |
Collapse
|
2
|
Balaban M, Jiang Y, Zhu Q, McDonald D, Knight R, Mirarab S. Generation of accurate, expandable phylogenomic trees with uDance. Nat Biotechnol 2024; 42:768-777. [PMID: 37500914 PMCID: PMC10818028 DOI: 10.1038/s41587-023-01868-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Accepted: 06/20/2023] [Indexed: 07/29/2023]
Abstract
Phylogenetic trees provide a framework for organizing evolutionary histories across the tree of life and aid downstream comparative analyses such as metagenomic identification. Methods that rely on single-marker genes such as 16S rRNA have produced trees of limited accuracy with hundreds of thousands of organisms, whereas methods that use genome-wide data are not scalable to large numbers of genomes. We introduce updating trees using divide-and-conquer (uDance), a method that enables updatable genome-wide inference using a divide-and-conquer strategy that refines different parts of the tree independently and can build off of existing trees, with high accuracy and scalability. With uDance, we infer a species tree of roughly 200,000 genomes using 387 marker genes, totaling 42.5 billion amino acid residues.
Collapse
Affiliation(s)
- Metin Balaban
- Bioinformatics and Systems Biology Graduate Program, University of California San Diego, La Jolla, CA, USA
| | - Yueyu Jiang
- Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA, USA
| | - Qiyun Zhu
- Biodesign Center for Fundamental and Applied Microbiomics, Arizona State University, Tempe, AZ, USA
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Daniel McDonald
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Rob Knight
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Department of Computer Science and Engineering, Jacobs School of Engineering, University of California San Diego, La Jolla, CA, USA
- Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
- Center for Microbiome Innovation, Jacobs School of Engineering, University of California San Diego, La Jolla, CA, USA
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA, USA.
- Department of Computer Science and Engineering, Jacobs School of Engineering, University of California San Diego, La Jolla, CA, USA.
- Center for Microbiome Innovation, Jacobs School of Engineering, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
3
|
Naranjo AA, Edwards CE, Gitzendanner MA, Soltis DE, Soltis PS. Abundant incongruence in a clade endemic to a biodiversity hotspot: Phylogenetics of the scrub mint clade (Lamiaceae). Mol Phylogenet Evol 2024; 192:108014. [PMID: 38199595 DOI: 10.1016/j.ympev.2024.108014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Revised: 12/26/2023] [Accepted: 01/06/2024] [Indexed: 01/12/2024]
Abstract
The Scrub Mint clade(Lamiaceae) provides a unique system for investigating the evolutionary processes driving diversification in the North American Coastal Plain from both a systematic and biogeographic context. The clade comprisesDicerandra, Conradina, Piloblephis, Stachydeoma, and four species of the broadly defined genus Clinopodium(Mentheae; Lamiaceae), almost all of which are endemic to the North American Eastern Coastal Plain. Most species of this clade are threatened or endangered and restricted to sandhill or a mosaic of scrub habitats. We analyzed relationships in this clade to understand the evolution of the group and identify evolutionary mechanisms acting on the clade, with important implications for conservation. We used a target-capture method to sequence and analyze 238 nuclear loci across all species of scrub mints, reconstructed the phylogeny, and calculated gene tree concordance, gene tree estimation error, and reticulation indices for every node in the tree using ML methods. Phylogenetic networks were used to determine reticulation events. Our nuclear phylogenetic estimates were consistent with previous results, while greatly increasing the robustness of taxon sampling. The phylogeny resolved the full relationship between Dicerandra and Conradina and the less-studied members of the clade (Piloblephis, Stachydeoma, Clinopodium spp.). We found hotspots of gene tree discordance and reticulation throughout the tree, especially in perennial Dicerandra. Several instances of reticulation events were uncovered between annual and perennial Dicerandra, and within the Conradina + allies clade. Incomplete lineage sorting also likely contributed to phylogenetic discordance. These results clarify phylogenetic relationships in the clade and provide insight on important evolutionary drivers in the clade, such as hybridization. General relationships in the group were confirmed, while the large amount of gene tree discordance is likely due to reticulation across the phylogeny.
Collapse
Affiliation(s)
- Andre A Naranjo
- Institute of Environment, Department of Biological Sciences, Florida International University, 11200 SW 8th ST, Miami, FL 33199, USA; Florida Museum of Natural History, University of Florida, 1659 Museum Road, PO Box 117800, Gainesville, FL 32611-7800, USA.
| | | | - Matthew A Gitzendanner
- Department of Biology, University of Florida, PO Box 118526, Gainesville, FL 32611-8526, USA
| | - Douglas E Soltis
- Florida Museum of Natural History, University of Florida, 1659 Museum Road, PO Box 117800, Gainesville, FL 32611-7800, USA; Department of Biology, University of Florida, PO Box 118526, Gainesville, FL 32611-8526, USA
| | - Pamela S Soltis
- Florida Museum of Natural History, University of Florida, 1659 Museum Road, PO Box 117800, Gainesville, FL 32611-7800, USA
| |
Collapse
|
4
|
Patané JSL, Martins J, Setubal JC. A Guide to Phylogenomic Inference. Methods Mol Biol 2024; 2802:267-345. [PMID: 38819564 DOI: 10.1007/978-1-0716-3838-5_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Phylogenomics aims at reconstructing the evolutionary histories of organisms taking into account whole genomes or large fractions of genomes. Phylogenomics has significant applications in fields such as evolutionary biology, systematics, comparative genomics, and conservation genetics, providing valuable insights into the origins and relationships of species and contributing to our understanding of biological diversity and evolution. This chapter surveys phylogenetic concepts and methods aimed at both gene tree and species tree reconstruction while also addressing common pitfalls, providing references to relevant computer programs. A practical phylogenomic analysis example including bacterial genomes is presented at the end of the chapter.
Collapse
Affiliation(s)
- José S L Patané
- Laboratório de Genética e Cardiologia Molecular, Instituto do Coração/Heart Institute Hospital das Clínicas - Faculdade de Medicina da Universidade de São Paulo São Paulo, São Paulo, SP, Brazil
| | - Joaquim Martins
- Integrative Omics group, Biorenewables National Laboratory, Brazilian Center for Research in Energy and Materials, Campinas, SP, Brazil
| | - João Carlos Setubal
- Departmento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, SP, Brazil.
| |
Collapse
|
5
|
Zaharias P, Warnow T. Recent progress on methods for estimating and updating large phylogenies. Philos Trans R Soc Lond B Biol Sci 2022; 377:20210244. [PMID: 35989607 PMCID: PMC9393559 DOI: 10.1098/rstb.2021.0244] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Accepted: 01/07/2022] [Indexed: 12/20/2022] Open
Abstract
With the increased availability of sequence data and even of fully sequenced and assembled genomes, phylogeny estimation of very large trees (even of hundreds of thousands of sequences) is now a goal for some biologists. Yet, the construction of these phylogenies is a complex pipeline presenting analytical and computational challenges, especially when the number of sequences is very large. In the past few years, new methods have been developed that aim to enable highly accurate phylogeny estimations on these large datasets, including divide-and-conquer techniques for multiple sequence alignment and/or tree estimation, methods that can estimate species trees from multi-locus datasets while addressing heterogeneity due to biological processes (e.g. incomplete lineage sorting and gene duplication and loss), and methods to add sequences into large gene trees or species trees. Here we present some of these recent advances and discuss opportunities for future improvements. This article is part of a discussion meeting issue 'Genomic population structures of microbial pathogens'.
Collapse
Affiliation(s)
- Paul Zaharias
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| |
Collapse
|
6
|
Zhang C, Mirarab S. Weighting by Gene Tree Uncertainty Improves Accuracy of Quartet-based Species Trees. Mol Biol Evol 2022; 39:6750035. [PMID: 36201617 PMCID: PMC9750496 DOI: 10.1093/molbev/msac215] [Citation(s) in RCA: 51] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 09/20/2022] [Accepted: 10/03/2022] [Indexed: 01/07/2023] Open
Abstract
Phylogenomic analyses routinely estimate species trees using methods that account for gene tree discordance. However, the most scalable species tree inference methods, which summarize independently inferred gene trees to obtain a species tree, are sensitive to hard-to-avoid errors introduced in the gene tree estimation step. This dilemma has created much debate on the merits of concatenation versus summary methods and practical obstacles to using summary methods more widely and to the exclusion of concatenation. The most successful attempt at making summary methods resilient to noisy gene trees has been contracting low support branches from the gene trees. Unfortunately, this approach requires arbitrary thresholds and poses new challenges. Here, we introduce threshold-free weighting schemes for the quartet-based species tree inference, the metric used in the popular method ASTRAL. By reducing the impact of quartets with low support or long terminal branches (or both), weighting provides stronger theoretical guarantees and better empirical performance than the unweighted ASTRAL. Our simulations show that weighting improves accuracy across many conditions and reduces the gap with concatenation in conditions with low gene tree discordance and high noise. On empirical data, weighting improves congruence with concatenation and increases support. Together, our results show that weighting, enabled by a new optimization algorithm we introduce, improves the utility of summary methods and can reduce the incongruence often observed across analytical pipelines.
Collapse
Affiliation(s)
- Chao Zhang
- Bioinformatics and Systems Biology, UC San Diego, La Jolla, CA, USA
| | | |
Collapse
|
7
|
Thureborn O, Razafimandimbison SG, Wikström N, Rydin C. Target capture data resolve recalcitrant relationships in the coffee family (Rubioideae, Rubiaceae). FRONTIERS IN PLANT SCIENCE 2022; 13:967456. [PMID: 36160958 PMCID: PMC9493367 DOI: 10.3389/fpls.2022.967456] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/12/2022] [Accepted: 08/03/2022] [Indexed: 06/16/2023]
Abstract
Subfamily Rubioideae is the largest of the main lineages in the coffee family (Rubiaceae), with over 8,000 species and 29 tribes. Phylogenetic relationships among tribes and other major clades within this group of plants are still only partly resolved despite considerable efforts. While previous studies have mainly utilized data from the organellar genomes and nuclear ribosomal DNA, we here use a large number of low-copy nuclear genes obtained via a target capture approach to infer phylogenetic relationships within Rubioideae. We included 101 Rubioideae species representing all but two (the monogeneric tribes Foonchewieae and Aitchinsonieae) of the currently recognized tribes, and all but one non-monogeneric tribe were represented by more than one genus. Using data from the 353 genes targeted with the universal Angiosperms353 probe set we investigated the impact of data type, analytical approach, and potential paralogs on phylogenetic reconstruction. We inferred a robust phylogenetic hypothesis of Rubioideae with the vast majority (or all) nodes being highly supported across all analyses and datasets and few incongruences between the inferred topologies. The results were similar to those of previous studies but novel relationships were also identified. We found that supercontigs [coding sequence (CDS) + non-coding sequence] clearly outperformed CDS data in levels of support and gene tree congruence. The full datasets (353 genes) outperformed the datasets with potentially paralogous genes removed (186 genes) in levels of support but increased gene tree incongruence slightly. The pattern of gene tree conflict at short internal branches were often consistent with high levels of incomplete lineage sorting (ILS) due to rapid speciation in the group. While concatenation- and coalescence-based trees mainly agreed, the observed phylogenetic discordance between the two approaches may be best explained by their differences in accounting for ILS. The use of target capture data greatly improved our confidence and understanding of the Rubioideae phylogeny, highlighted by the increased support for previously uncertain relationships and the increased possibility to explore sources of underlying phylogenetic discordance.
Collapse
Affiliation(s)
- Olle Thureborn
- Department of Ecology, Environment and Plant Sciences, Stockholm University, Stockholm, Sweden
| | | | - Niklas Wikström
- Department of Ecology, Environment and Plant Sciences, Stockholm University, Stockholm, Sweden
- Bergius Foundation, Royal Swedish Academy of Sciences, Stockholm, Sweden
| | - Catarina Rydin
- Department of Ecology, Environment and Plant Sciences, Stockholm University, Stockholm, Sweden
- Bergius Foundation, Royal Swedish Academy of Sciences, Stockholm, Sweden
| |
Collapse
|
8
|
Liu B, Chen Y, Zhu H, Liu G. Phylotranscriptomic and Evolutionary Analyses of the Green Algal Order Chaetophorales (Chlorophyceae, Chlorophyta). Genes (Basel) 2022; 13:genes13081389. [PMID: 36011300 PMCID: PMC9407426 DOI: 10.3390/genes13081389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 08/01/2022] [Accepted: 08/03/2022] [Indexed: 11/16/2022] Open
Abstract
Considering the phylogenetic differences in the taxonomic framework of the Chaetophorales as determined by the use of nuclear molecular markers or chloroplast genes, the current study was the first to use phylotranscriptomic analyses comparing the transcriptomes of 12 Chaetophorales algal species. The results showed that a total of 240,133 gene families and 143 single-copy orthogroups were identified. Based on the single-copy orthogroups, supergene analysis and the coalescent-based approach were adopted to perform phylotranscriptomic analysis of the Chaetophorales. The phylogenetic relationships of most species were consistent with those of phylogenetic analyses based on the chloroplast genome data rather than nuclear molecular markers. The Schizomeriaceae and the Aphanochaetaceae clustered into a well-resolved basal clade in the Chaetophorales by either strategy. Evolutionary analyses of divergence time and substitution rate also revealed that the closest relationships existed between the Schizomeriaceae and Aphanochaetaceae. All species in the Chaetophorales exhibited a large number of expanded and contracted gene families, in particular the common ancestor of the Schizomeriaceae and Aphanochaetaceae. The only terrestrial alga, Fritschiella tuberosa, had the greatest number of expanded gene families, which were associated with increased fatty acid biosynthesis. Phylotranscriptomic and evolutionary analyses all robustly identified the unique taxonomic relationship of Chaetophorales consistent with chloroplast genome data, proving the advantages of high-throughput data in phylogeny.
Collapse
Affiliation(s)
- Benwen Liu
- Key Laboratory of Algal Biology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
| | - Yangliang Chen
- Key Laboratory of Algal Biology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
- University of Chinese Academy of Sciences, Beijing 100039, China
| | - Huan Zhu
- Key Laboratory of Algal Biology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
| | - Guoxiang Liu
- Key Laboratory of Algal Biology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
- Correspondence: ; Tel.: +86-027-6878-0576
| |
Collapse
|
9
|
Leducq JB, Sneddon D, Santos M, Condrain-Morel D, Bourret G, Cecilia Martinez-Gomez N, Lee JA, Foster JA, Stolyar S, Jesse Shapiro B, Kembel SW, Sullivan JM, Marx CJ. Comprehensive phylogenomics of Methylobacterium reveals four evolutionary distinct groups and underappreciated phyllosphere diversity. Genome Biol Evol 2022; 14:6652236. [PMID: 35906926 PMCID: PMC9364378 DOI: 10.1093/gbe/evac123] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/25/2022] [Indexed: 11/13/2022] Open
Abstract
Methylobacterium is a group of methylotrophic microbes associated with soil, fresh water, and particularly the phyllosphere, the aerial part of plants that has been well-studied in terms of physiology but whose evolutionary history and taxonomy are unclear. Recent work has suggested that Methylobacterium is much more diverse than thought previously, questioning its status as an ecologically and phylogenetically coherent taxonomic genus. However, taxonomic and evolutionary studies of Methylobacterium have mostly been restricted to model species, often isolated from habitats other than the phyllosphere, and have yet to utilize comprehensive phylogenomic methods to examine gene trees, gene content, or synteny. By analyzing 189 Methylobacterium genomes from a wide range of habitats, including the phyllosphere, we inferred a robust phylogenetic tree while explicitly accounting for the impact of horizontal gene transfer. We showed that Methylobacterium contains four evolutionarily distinct groups of bacteria (namely A, B, C, D), characterized by different genome size, GC content, gene content and genome architecture, revealing the dynamic nature of Methylobacterium genomes. In addition to recovering 59 described species, we identified 45 candidate species, mostly phyllosphere-associated, stressing the significance of plants as a reservoir of Methylobacterium diversity. We inferred an ancient transition from a free-living lifestyle to association with plant roots in Methylobacteriaceae ancestor, followed by phyllosphere association of three of the major groups (A, B, D), whose early branching in Methylobacterium history has been heavily obscured by HGT. Together, our work lays the foundations for a thorough redefinition of Methylobacterium taxonomy, beginning with the abandonment of Methylorubrum.
Collapse
Affiliation(s)
- Jean-Baptiste Leducq
- Université Laval - Quebec City (QC) Canada.,University of Idaho - Moscow (ID) US
| | | | | | | | | | | | | | | | | | - B Jesse Shapiro
- Université de Montréal - Montreal (QC) Canada.,McGill University - Montreal (QC) Canada
| | | | | | | |
Collapse
|
10
|
Astudillo-Clavijo V, Stiassny MLJ, Ilves KL, Musilova Z, Salzburger W, López-Fernández H. Exon-based phylogenomics and the relationships of African cichlid fishes: tackling the challenges of reconstructing phylogenies with repeated rapid radiations. Syst Biol 2022; 72:134-149. [PMID: 35880863 DOI: 10.1093/sysbio/syac051] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2021] [Revised: 07/06/2022] [Accepted: 07/19/2022] [Indexed: 11/13/2022] Open
Abstract
African cichlids (subfamily: Pseudocrenilabrinae) are among the most diverse vertebrates, and their propensity for repeated rapid radiation has made them a celebrated model system in evolutionary research. Nonetheless, despite numerous studies, phylogenetic uncertainty persists, and riverine lineages remain comparatively underrepresented in higher-level phylogenetic studies. Heterogeneous gene histories resulting from incomplete lineage sorting (ILS) and hybridization are likely sources of uncertainty, especially during episodes of rapid speciation. We investigate relationships of Pseudocrenilabrinae and its close relatives while accounting for multiple sources of genetic discordance using species tree and hybrid network analyses with hundreds of single-copy exons. We improve sequence recovery for distant relatives, thereby extending the taxonomic reach of our probes, with a hybrid reference guided/de novo assembly approach. Our analyses provide robust hypotheses for most higher-level relationships and reveal widespread gene heterogeneity, including in riverine taxa. ILS and past hybridization are identified as sources of genetic discordance in different lineages. Sampling of various Blenniiformes (formerly Ovalentaria) adds strong phylogenomic support for convict blennies (Pholidichthyidae) as sister to Cichlidae, and points to other potentially useful protein-coding markers across the order. A reliable phylogeny with representatives from diverse environments will support ongoing taxonomic and comparative evolutionary research in the cichlid model system.
Collapse
Affiliation(s)
- Viviana Astudillo-Clavijo
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, M5S 3B2, Canada.,Department of Natural History, Royal Ontario Museum, Toronto, M5S 2C6, Canada.,Department of Ecology and Evolutionary Biology and Museum of Zoology, University of Michigan, Ann Arbor, 48109, USA
| | - Melanie L J Stiassny
- Department of Ichthyology, American Museum of Natural History, New York, 10024-5102, USA
| | - Katriina L Ilves
- Research & Collections, Zoology, Canadian Museum of Nature, Ottawa, K1P 6P4, Canada
| | - Zuzana Musilova
- Department of Zoology, Charles University in Prague, Vinicna 7, Prague, CZ-128 44, Czech Republic
| | - Walter Salzburger
- Zoological Institute, University of Basel, Vesalgasse 1, CH-4051, Basel, Switzerland
| | - Hernán López-Fernández
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, M5S 3B2, Canada.,Department of Natural History, Royal Ontario Museum, Toronto, M5S 2C6, Canada.,Department of Ecology and Evolutionary Biology and Museum of Zoology, University of Michigan, Ann Arbor, 48109, USA
| |
Collapse
|
11
|
Kneubehl AR, Krishnavajhala A, Leal SM, Replogle AJ, Kingry LC, Bermúdez SE, Labruna MB, Lopez JE. Comparative genomics of the Western Hemisphere soft tick-borne relapsing fever borreliae highlights extensive plasmid diversity. BMC Genomics 2022; 23:410. [PMID: 35641918 PMCID: PMC9158201 DOI: 10.1186/s12864-022-08523-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Accepted: 03/30/2022] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND Tick-borne relapsing fever (TBRF) is a globally prevalent, yet under-studied vector-borne disease transmitted by soft and hard bodied ticks. While soft TBRF (sTBRF) spirochetes have been described for over a century, our understanding of the molecular mechanisms facilitating vector and host adaptation is poorly understood. This is due to the complexity of their small (~ 1.5 Mb) but fragmented genomes that typically consist of a linear chromosome and both linear and circular plasmids. A majority of sTBRF spirochete genomes' plasmid sequences are either missing or are deposited as unassembled sequences. Consequently, our goal was to generate complete, plasmid-resolved genomes for a comparative analysis of sTBRF species of the Western Hemisphere. RESULTS Utilizing a Borrelia specific pipeline, genomes of sTBRF spirochetes from the Western Hemisphere were sequenced and assembled using a combination of short- and long-read sequencing technologies. Included in the analysis were the two recently isolated species from Central and South America, Borrelia puertoricensis n. sp. and Borrelia venezuelensis, respectively. Plasmid analyses identified diverse sequences that clustered plasmids into 30 families; however, only three families were conserved and syntenic across all species. We also compared two species, B. venezuelensis and Borrelia turicatae, which were isolated ~ 6,800 km apart and from different tick vector species but were previously reported to be genetically similar. CONCLUSIONS To truly understand the biological differences observed between species of TBRF spirochetes, complete chromosome and plasmid sequences are needed. This comparative genomic analysis highlights high chromosomal synteny across the species yet diverse plasmid composition. This was particularly true for B. turicatae and B. venezuelensis, which had high average nucleotide identity yet extensive plasmid diversity. These findings are foundational for future endeavors to evaluate the role of plasmids in vector and host adaptation.
Collapse
Affiliation(s)
- Alexander R Kneubehl
- Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular Virology and Microbiology, National School of Tropical Medicine, Baylor College of Medicine, Houston, TX, USA
| | | | - Sebastián Muñoz Leal
- Departamento de Ciencia Animal, Facultad de Ciencias Veterinarias, Universidad de Concepción, Concepción, Chile
| | - Adam J Replogle
- Division of Vector-Borne Diseases, Centers for Disease Control and Prevention, Fort Collins, CO, USA
| | - Luke C Kingry
- Division of Vector-Borne Diseases, Centers for Disease Control and Prevention, Fort Collins, CO, USA
| | - Sergio E Bermúdez
- Medical Entomology Department, Gorgas Memorial Institute for Health Research, Panamá City, Panamá
| | - Marcelo B Labruna
- Departamento de Medicina Veterinária Preventiva E Saúde Animal, Faculdade de Medicina Veterinária E Zootecnia, Universidade de São Paulo, São Paulo, Brazil
| | - Job E Lopez
- Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA.
- Department of Molecular Virology and Microbiology, National School of Tropical Medicine, Baylor College of Medicine, Houston, TX, USA.
| |
Collapse
|
12
|
Phylotranscriptomic and Evolutionary Analyses of Oedogoniales (Chlorophyceae, Chlorophyta). DIVERSITY 2022. [DOI: 10.3390/d14030157] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
This study determined the transcriptomes of eight Oedogoniales species, including six species from Oedogonium and two species from Oedocladium to conduct phylotranscriptomic and evolutionary analyses. 155,952 gene families and 192 single-copy orthogroups were detected. Phylotranscriptomic analyses based on single-copy orthogroups were conducted using supermatrix and coalescent-based approaches. The phylotranscriptomic analysis results revealed that Oedogonium is polyphyletic, and Oedocladium clustered with Oedogonium. Together with the transcriptomes of the OCC clade in the public database, the phylogenetic relationship of the three orders (Oedogoniales, Chaetophorales, Chaetopeltidales) is discussed. The non-synonymous (dN) to synonymous substitution (dS) ratios of single-copy orthogroups of the terrestrial Oedogoniales species using a branch model of phylogenetic analysis by maximum likelihood were estimated, which showed that 92 single-copy orthogroups were putative rapidly evolving genes. Gene Ontology enrichment and Kyoto Encyclopedia of Genes and Genomes pathway analyses results revealed that some of the rapidly evolving genes were associated with photosynthesis, implying that terrestrial Oedogoniales species experienced rapid evolution to adapt to terrestrial habitats. The phylogenetic results combined with evolutionary analyses suggest that the terrestrialization process of Oedogoniales may have occured more than once.
Collapse
|
13
|
Liu B, Warnow T. Scalable Species Tree Inference with External Constraints. J Comput Biol 2022; 29:664-678. [PMID: 35196115 DOI: 10.1089/cmb.2021.0543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Species tree inference is a basic step in biological discovery, but discordance between gene trees creates analytical challenges and large data sets create computational challenges. Although there is generally some information available about the species trees that could be used to speed up the estimation, only one species tree estimation method that addresses gene tree discordance-ASTRAL-J, a recent development in the ASTRAL family of methods-is able to use this information. Here we describe two new methods, NJst-J and FASTRAL-J, that can estimate the species tree, given a partial knowledge of the species tree in the form of a nonbinary unrooted constraint tree. We show that both NJst-J and FASTRAL-J are much faster than ASTRAL-J and we prove that all three methods are statistically consistent under the multispecies coalescent model subject to this constraint. Our extensive simulation study shows that both FASTRAL-J and NJst-J provide advantages over ASTRAL-J: both are faster (and NJst-J is particularly fast), and FASTRAL-J is generally at least as accurate as ASTRAL-J. An analysis of the Avian Phylogenomics Project data set with 48 species and 14,446 genes presents additional evidence of the value of FASTRAL-J over ASTRAL-J (and both over ASTRAL), with dramatic reductions in running time (20 hours for default ASTRAL, and minutes or seconds for ASTRAL-J and FASTRAL-J, respectively).
Collapse
Affiliation(s)
- Baqiao Liu
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| |
Collapse
|
14
|
Yan Z, Smith ML, Du P, Hahn MW, Nakhleh L. Species Tree Inference Methods Intended to Deal with Incomplete Lineage Sorting Are Robust to the Presence of Paralogs. Syst Biol 2022; 71:367-381. [PMID: 34245291 PMCID: PMC8978208 DOI: 10.1093/sysbio/syab056] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2020] [Revised: 06/23/2021] [Accepted: 06/30/2021] [Indexed: 11/24/2022] Open
Abstract
Many recent phylogenetic methods have focused on accurately inferring species trees when there is gene tree discordance due to incomplete lineage sorting (ILS). For almost all of these methods, and for phylogenetic methods in general, the data for each locus are assumed to consist of orthologous, single-copy sequences. Loci that are present in more than a single copy in any of the studied genomes are excluded from the data. These steps greatly reduce the number of loci available for analysis. The question we seek to answer in this study is: what happens if one runs such species tree inference methods on data where paralogy is present, in addition to or without ILS being present? Through simulation studies and analyses of two large biological data sets, we show that running such methods on data with paralogs can still provide accurate results. We use multiple different methods, some of which are based directly on the multispecies coalescent model, and some of which have been proven to be statistically consistent under it. We also treat the paralogous loci in multiple ways: from explicitly denoting them as paralogs, to randomly selecting one copy per species. In all cases, the inferred species trees are as accurate as equivalent analyses using single-copy orthologs. Our results have significant implications for the use of ILS-aware phylogenomic analyses, demonstrating that they do not have to be restricted to single-copy loci. This will greatly increase the amount of data that can be used for phylogenetic inference.[Gene duplication and loss; incomplete lineage sorting; multispecies coalescent; orthology; paralogy.].
Collapse
Affiliation(s)
- Zhi Yan
- Department of Computer Science, Rice University,
6100 Main Street, Houston, TX 77005, USA
| | - Megan L Smith
- Department of Biology and Department of Computer Science,
Indiana University, 1001 East Third Street, Bloomington,
IN 47405, USA
| | - Peng Du
- Department of Computer Science, Rice University,
6100 Main Street, Houston, TX 77005, USA
| | - Matthew W Hahn
- Department of Biology and Department of Computer Science,
Indiana University, 1001 East Third Street, Bloomington,
IN 47405, USA
| | - Luay Nakhleh
- Department of Computer Science, Rice University,
6100 Main Street, Houston, TX 77005, USA
- Department of BioSciences, Rice University, 6100
Main Street, Houston, TX 77005, USA
| |
Collapse
|
15
|
Zhu Q, Mirarab S. Assembling a Reference Phylogenomic Tree of Bacteria and Archaea by Summarizing Many Gene Phylogenies. Methods Mol Biol 2022; 2569:137-165. [PMID: 36083447 DOI: 10.1007/978-1-0716-2691-7_7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Phylogenomics is the inference of phylogenetic trees based on multiple marker genes sampled in the genomes of interest. An important challenge in phylogenomics is the potential incongruence among the evolutionary histories of individual genes, which can be widespread in microorganisms due to the prevalence of horizontal gene transfer. This protocol introduces the procedures for building a phylogenetic tree of a large number of microbial genomes using a broad sampling of marker genes that are representative of whole-genome evolution. The protocol highlights the use of a gene tree summary method, which can effectively reconstruct the species tree while accounting for the topological conflicts among individual gene trees. The pipeline described in this protocol is scalable to tens of thousands of genomes while retaining high accuracy. We discussed multiple software tools, libraries, and scripts to enable convenient adoption of the protocol. The protocol is suitable for microbiology and microbiome studies based on public genomes and metagenomic data.
Collapse
Affiliation(s)
- Qiyun Zhu
- Biodesign Center for Fundamental and Applied Microbiomics, Arizona State University, Tempe, AZ, USA.
- School of Life Sciences, Arizona State University, Tempe, AZ, USA.
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, University of California San Diego, San Diego, CA, USA
| |
Collapse
|
16
|
Mirarab S, Nakhleh L, Warnow T. Multispecies Coalescent: Theory and Applications in Phylogenetics. ANNUAL REVIEW OF ECOLOGY, EVOLUTION, AND SYSTEMATICS 2021. [DOI: 10.1146/annurev-ecolsys-012121-095340] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Species tree estimation is a basic part of many biological research projects, ranging from answering basic evolutionary questions (e.g., how did a group of species adapt to their environments?) to addressing questions in functional biology. Yet, species tree estimation is very challenging, due to processes such as incomplete lineage sorting, gene duplication and loss, horizontal gene transfer, and hybridization, which can make gene trees differ from each other and from the overall evolutionary history of the species. Over the last 10–20 years, there has been tremendous growth in methods and mathematical theory for estimating species trees and phylogenetic networks, and some of these methods are now in wide use. In this survey, we provide an overview of the current state of the art, identify the limitations of existing methods and theory, and propose additional research problems and directions.
Collapse
Affiliation(s)
- Siavash Mirarab
- Electrical and Computer Engineering Department, University of California, San Diego, La Jolla, California 92093, USA
| | - Luay Nakhleh
- Department of Computer Science, Rice University, Houston, Texas 77005, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, Illinois 61801, USA
| |
Collapse
|
17
|
Ortiz D, Pekár S, Dianat M. Phylogenomics and loci dropout patterns of deeply diverged Zodarion ant-eating spiders suggest a high potential of RAD-seq for genus-level spider phylogenetics. Cladistics 2021; 38:320-334. [PMID: 34699083 DOI: 10.1111/cla.12493] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/02/2021] [Indexed: 11/28/2022] Open
Abstract
RAD sequencing yields large amounts of genome-wide data at a relatively low cost and without requiring previous taxon-specific information, making it ideal for evolutionary studies of highly diversified and neglected organisms. However, concerns about information decay with phylogenetic distance have discouraged its use for assessing supraspecific relationships. Here, using Double Digest Restriction Associated DNA (ddRAD) data, we perform the first deep-level approach to the phylogeny of Zodarion, a highly diversified spider genus. We explore the impact of loci and taxon filtering across concatenated and multispecies coalescent reconstruction methods and investigate the patterns of information dropout in reference to both the time of divergence and the mitochondrial divergence between taxa. We found that relaxed loci-filtering and nested taxon-filtering strategies maximized the amount of molecular information and improved phylogenetic inference. As expected, there was a clear pattern of allele dropout towards deeper time and mitochondrial divergences, but the phylogenetic signal remained strong throughout the phylogeny. Therefore, we inferred topologies that were almost fully resolved, highly supported, and noticeably congruent between setups and inference methods, which highlights overall inconsistency in the taxonomy of Zodarion. Because Zodarion appears to be among the oldest and most mitochondrially diversified spider genera, our results suggest that ddRAD data show high potential for inferring intra-generic relationships across spiders and probably also in other taxonomic groups.
Collapse
Affiliation(s)
- David Ortiz
- Department of Botany and Zoology, Faculty of Science, Masaryk University, Kotlářská 2, 611 37 Brno, Czechia
| | - Stano Pekár
- Department of Botany and Zoology, Faculty of Science, Masaryk University, Kotlářská 2, 611 37 Brno, Czechia
| | - Malahat Dianat
- Department of Botany and Zoology, Faculty of Science, Masaryk University, Kotlářská 2, 611 37 Brno, Czechia
| |
Collapse
|
18
|
Hooft van Huysduynen A, Janssens S, Merckx V, Vos R, Valente L, Zizka A, Larter M, Karabayir B, Maaskant D, Witmer Y, Fernández‐Palacios JM, de Nascimento L, Jaén‐Molina R, Caujapé Castells J, Marrero‐Rodríguez Á, del Arco M, Lens F. Temporal and palaeoclimatic context of the evolution of insular woodiness in the Canary Islands. Ecol Evol 2021; 11:12220-12231. [PMID: 34522372 PMCID: PMC8427628 DOI: 10.1002/ece3.7986] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2021] [Revised: 07/13/2021] [Accepted: 07/20/2021] [Indexed: 11/14/2022] Open
Abstract
Insular woodiness (IW), referring to the evolutionary transition from herbaceousness toward woodiness on islands, has arisen more than 30 times on the Canary Islands (Atlantic Ocean). One of the IW hypotheses suggests that drought has been a major driver of wood formation, but we do not know in which palaeoclimatic conditions the insular woody lineages originated. Therefore, we provided an updated review on the presence of IW on the Canaries, reviewed the palaeoclimate, and estimated the timing of origin of woodiness of 24 insular woody lineages that represent a large majority of the insular woody species diversity on the Canaries. Our single, broad-scale dating analysis shows that woodiness in 60%-65% of the insular woody lineages studied originated within the last 3.2 Myr, during which Mediterranean seasonality (yearly summer droughts) became established on the Canaries. Consequently, our results are consistent with palaeoclimatic aridification as a potential driver of woodiness in a considerable proportion of the insular woody Canary Island lineages. However, the observed pattern between insular woodiness and palaeodrought during the last couple of million years could potentially have emerged as a result of the typically young age of the native insular flora, characterized by a high turnover.
Collapse
Affiliation(s)
| | - Steven Janssens
- Meise Botanic GardenMeiseBelgium
- Department of BiologyKU LeuvenLeuvenBelgium
| | - Vincent Merckx
- Naturalis Biodiversity CenterLeidenThe Netherlands
- Institute for Biodiversity and Ecosystem DynamicsUniversity of AmsterdamAmsterdamThe Netherlands
| | - Rutger Vos
- Naturalis Biodiversity CenterLeidenThe Netherlands
| | - Luis Valente
- Naturalis Biodiversity CenterLeidenThe Netherlands
- Groningen Institute for Evolutionary Life SciencesUniversity of GroningenGroningenThe Netherlands
| | - Alexander Zizka
- Naturalis Biodiversity CenterLeidenThe Netherlands
- German Center for Integrative Biodiversity Research (iDiv)LeipzigGermany
| | | | | | | | - Youri Witmer
- Naturalis Biodiversity CenterLeidenThe Netherlands
| | - José María Fernández‐Palacios
- Island Ecology and Biogeography Research GroupInstituto Universitario de Enfermedades Tropicales y Salud Pública de CanariasUniversidad de La Laguna (ULL)La LagunaSpain
| | - Lea de Nascimento
- Island Ecology and Biogeography Research GroupInstituto Universitario de Enfermedades Tropicales y Salud Pública de CanariasUniversidad de La Laguna (ULL)La LagunaSpain
| | - Ruth Jaén‐Molina
- Jardín Botánico Canario “Viera y Clavijo”‐Unidad Asociada al CSIC (Cabildo de Gran Canaria)Las Palmas de Gran CanariaSpain
| | - Juli Caujapé Castells
- Jardín Botánico Canario “Viera y Clavijo”‐Unidad Asociada al CSIC (Cabildo de Gran Canaria)Las Palmas de Gran CanariaSpain
| | - Águedo Marrero‐Rodríguez
- Jardín Botánico Canario “Viera y Clavijo”‐Unidad Asociada al CSIC (Cabildo de Gran Canaria)Las Palmas de Gran CanariaSpain
| | - Marcelino del Arco
- Departamento de BotánicaEcología y Fisiología VegetalUniversidad de La Laguna (ULL)La LagunaSpain
| | - Frederic Lens
- Naturalis Biodiversity CenterLeidenThe Netherlands
- Institute of Biology Leiden, Plant SciencesLeiden UniversityLeidenThe Netherlands
| |
Collapse
|
19
|
Thomas SK, Liu X, Du Z, Dong Y, Cummings A, Pokorny L, Xiang Q(J, Leebens‐Mack JH. Comprehending Cornales: phylogenetic reconstruction of the order using the Angiosperms353 probe set. AMERICAN JOURNAL OF BOTANY 2021; 108:1112-1121. [PMID: 34263456 PMCID: PMC8361741 DOI: 10.1002/ajb2.1696] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/19/2020] [Accepted: 05/12/2021] [Indexed: 05/08/2023]
Abstract
PREMISE Cornales is an order of flowering plants containing ecologically and horticulturally important families, including Cornaceae (dogwoods) and Hydrangeaceae (hydrangeas), among others. While many relationships in Cornales are strongly supported by previous studies, some uncertainty remains with regards to the placement of Hydrostachyaceae and to relationships among families in Cornales and within Cornaceae. Here we analyzed hundreds of nuclear loci to test published phylogenetic hypotheses and estimated a robust species tree for Cornales. METHODS Using the Angiosperms353 probe set and existing data sets, we generated phylogenomic data for 158 samples, representing all families in the Cornales, with intensive sampling in the Cornaceae. RESULTS We curated an average of 312 genes per sample, constructed maximum likelihood gene trees, and inferred a species tree using the summary approach implemented in ASTRAL-III, a method statistically consistent with the multispecies coalescent model. CONCLUSIONS The species tree we constructed generally shows high support values and a high degree of concordance among individual nuclear gene trees. Relationships among families are largely congruent with previous molecular studies, except for the placement of the nyssoids and the Grubbiaceae-Curtisiaceae clades. Furthermore, we were able to place Hydrostachyaceae within Cornales, and within Cornaceae, the monophyly of known morphogroups was well supported. However, patterns of gene tree discordance suggest potential ancient reticulation, gene flow, and/or ILS in the Hydrostachyaceae lineage and the early diversification of Cornus. Our findings reveal new insights into the diversification process across Cornales and demonstrate the utility of the Angiosperms353 probe set.
Collapse
Affiliation(s)
- Shawn K. Thomas
- Department of Plant BiologyUniversity of GeorgiaAthensGA30602USA
- Division of Biological SciencesUniversity of MissouriColumbiaMO65203USA
| | - Xiang Liu
- Department of Plant and Microbial BiologyNorth Carolina State UniversityRaleighNC27695USA
- SyngentaResearch Triangle ParkNC27709USA
| | - Zhi‐Yuan Du
- Wuhan Botanical GardenThe Chinese Academy of SciencesWuhanHubei430074China
| | - Yibo Dong
- Department of Plant and Microbial BiologyNorth Carolina State UniversityRaleighNC27695USA
- Global Health Infectious Disease ResearchCollege of Public HealthUniversity of South FloridaTampaFL33612USA
| | - Amanda Cummings
- Department of Plant BiologyUniversity of GeorgiaAthensGA30602USA
| | - Lisa Pokorny
- Royal Botanic Gardens, KewRichmondLondonTW9 3AEUK
- Computational/Systems Biology and Genomics ProgramCentre for Plant Biotechnology and GenomicsUPM‐INIA‐CSICPozuelo de Alarcón (Madrid)28223Spain
| | - Qui‐Yun (Jenny) Xiang
- Department of Plant and Microbial BiologyNorth Carolina State UniversityRaleighNC27695USA
| | | |
Collapse
|
20
|
Baker WJ, Dodsworth S, Forest F, Graham SW, Johnson MG, McDonnell A, Pokorny L, Tate JA, Wicke S, Wickett NJ. Exploring Angiosperms353: An open, community toolkit for collaborative phylogenomic research on flowering plants. AMERICAN JOURNAL OF BOTANY 2021; 108:1059-1065. [PMID: 34293179 DOI: 10.1002/ajb2.1703] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Accepted: 05/14/2021] [Indexed: 06/13/2023]
Affiliation(s)
| | - Steven Dodsworth
- School of Life Sciences, University of Bedfordshire, University Square, Luton, LU1 3JU, UK
| | - Félix Forest
- Royal Botanic Gardens, Kew, Richmond, Surrey, TW9 3AE, UK
| | - Sean W Graham
- Department of Botany, University of British Columbia, 6270 University Boulevard, Vancouver, British Columbia, V6T 1Z4, Canada
| | - Matthew G Johnson
- Department of Biological Sciences, Texas Tech University, Lubbock, TX, 79409, USA
| | - Angela McDonnell
- Plant Science and Conservation, Chicago Botanic Garden, 1000 Lake Cook Road, Glencoe, IL, 60022, USA
| | - Lisa Pokorny
- Royal Botanic Gardens, Kew, Richmond, Surrey, TW9 3AE, UK
| | - Jennifer A Tate
- School of Fundamental Sciences, Massey University, Palmerston North, 4442, New Zealand
| | - Susann Wicke
- Plant Evolutionary Biology, Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
- Plant Systematics and Biodiversity, Institute for Biology, Humboldt-Universität zu Berlin, Berlin, Germany
| | - Norman J Wickett
- Plant Science and Conservation, Chicago Botanic Garden, 1000 Lake Cook Road, Glencoe, IL, 60022, USA
| |
Collapse
|
21
|
Mahbub M, Wahab Z, Reaz R, Rahman MS, Bayzid MS. wQFM: Highly Accurate Genome-scale Species Tree Estimation from Weighted Quartets. Bioinformatics 2021; 37:3734-3743. [PMID: 34086858 DOI: 10.1093/bioinformatics/btab428] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 05/24/2021] [Accepted: 06/03/2021] [Indexed: 02/01/2023] Open
Abstract
MOTIVATION Species tree estimation from genes sampled from throughout the whole genome is complicated due to the gene tree-species tree discordance. Incomplete lineage sorting (ILS) is one of the most frequent causes for this discordance, where alleles can coexist in populations for periods that may span several speciation events. Quartet-based summary methods for estimating species trees from a collection of gene trees are becoming popular due to their high accuracy and statistical guarantee under ILS. Generating quartets with appropriate weights, where weights correspond to the relative importance of quartets, and subsequently amalgamating the weighted quartets to infer a single coherent species tree can allow for a statistically consistent way of estimating species trees. However, handling weighted quartets is challenging. RESULTS We propose wQFM, a highly accurate method for species tree estimation from multi-locus data, by extending the quartet FM (QFM) algorithm to a weighted setting. wQFM was assessed on a collection of simulated and real biological datasets, including the avian phylogenomic dataset which is one of the largest phylogenomic datasets to date. We compared wQFM with wQMC, which is the best alternate method for weighted quartet amalgamation, and with ASTRAL, which is one of the most accurate and widely used coalescent-based species tree estimation methods. Our results suggest that wQFM matches or improves upon the accuracy of wQMC and ASTRAL. AVAILABILITY wQFM is available in open source form at https://github.com/Mahim1997/wQFM-2020. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mahim Mahbub
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka-1205, Bangladesh
| | - Zahin Wahab
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka-1205, Bangladesh
| | - Rezwana Reaz
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka-1205, Bangladesh
| | - M Saifur Rahman
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka-1205, Bangladesh
| | - Md Shamsuzzoha Bayzid
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka-1205, Bangladesh
| |
Collapse
|
22
|
Farah IT, Islam MM, Zinat KT, Rahman AH, Bayzid MS. Species tree estimation from gene trees by minimizing deep coalescence and maximizing quartet consistency: a comparative study and the presence of pseudo species tree terraces. Syst Biol 2021; 70:1213-1231. [PMID: 33844023 DOI: 10.1093/sysbio/syab026] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2020] [Revised: 03/25/2021] [Accepted: 03/29/2021] [Indexed: 11/14/2022] Open
Abstract
Species tree estimation from multi-locus datasets is extremely challenging, especially in the presence of gene tree heterogeneity across the genome due to incomplete lineage sorting (ILS). Summary methods have been developed which estimate gene trees and then combine the gene trees to estimate a species tree by optimizing various optimization scores. In this study, we have extended and adapted the concept of phylogenetic terraces to species tree estimation by "summarizing" a set of gene trees, where multiple species trees with distinct topologies may have exactly the same optimality score (i.e., quartet score, extra lineage score, etc.). We particularly investigated the presence and impacts of equally optimal trees in species tree estimation from multi-locus data using summary methods by taking ILS into account. We analyzed two of the most popular ILS-aware optimization criteria: maximize quartet consistency (MQC) and minimize deep coalescence (MDC). Methods based on MQC are provably statistically consistent, whereas MDC is not a consistent criterion for species tree estimation. We present a comprehensive comparative study of these two optimality criteria. Our experiments, on a collection of datasets simulated under ILS, indicate that MDC may result in competitive or identical quartet consistency score as MQC, but could be significantly worse than MQC in terms of tree accuracy - demonstrating the presence and impacts of equally optimal species trees. This is the first known study that provides the conditions for the datasets to have equally optimal trees in the context of phylogenomic inference using summary methods.
Collapse
Affiliation(s)
- Ishrat Tanzila Farah
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology Dhaka-1205, Bangladesh
| | - Md Muktadirul Islam
- Applied Statistics and Data Science (ASDS), Department of Statistics Jahangirnagar University Dhaka-1342, Bangladesh
| | - Kazi Tasnim Zinat
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology Dhaka-1205, Bangladesh.,Department of Computer Science University of Maryland, College Park, Maryland, USA
| | - Atif Hasan Rahman
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology Dhaka-1205, Bangladesh
| | - Md Shamsuzzoha Bayzid
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology Dhaka-1205, Bangladesh
| |
Collapse
|
23
|
Freitas FV, Branstetter MG, Griswold T, Almeida EAB. Partitioned Gene-Tree Analyses and Gene-Based Topology Testing Help Resolve Incongruence in a Phylogenomic Study of Host-Specialist Bees (Apidae: Eucerinae). Mol Biol Evol 2021; 38:1090-1100. [PMID: 33179746 PMCID: PMC7947843 DOI: 10.1093/molbev/msaa277] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
Incongruence among phylogenetic results has become a common occurrence in analyses of genome-scale data sets. Incongruence originates from uncertainty in underlying evolutionary processes (e.g., incomplete lineage sorting) and from difficulties in determining the best analytical approaches for each situation. To overcome these difficulties, more studies are needed that identify incongruences and demonstrate practical ways to confidently resolve them. Here, we present results of a phylogenomic study based on the analysis 197 taxa and 2,526 ultraconserved element (UCE) loci. We investigate evolutionary relationships of Eucerinae, a diverse subfamily of apid bees (relatives of honey bees and bumble bees) with >1,200 species. We sampled representatives of all tribes within the group and >80% of genera, including two mysterious South American genera, Chilimalopsis and Teratognatha. Initial analysis of the UCE data revealed two conflicting hypotheses for relationships among tribes. To resolve the incongruence, we tested concatenation and species tree approaches and used a variety of additional strategies including locus filtering, partitioned gene-trees searches, and gene-based topological tests. We show that within-locus partitioning improves gene tree and subsequent species-tree estimation, and that this approach, confidently resolves the incongruence observed in our data set. After exploring our proposed analytical strategy on eucerine bees, we validated its efficacy to resolve hard phylogenetic problems by implementing it on a published UCE data set of Adephaga (Insecta: Coleoptera). Our results provide a robust phylogenetic hypothesis for Eucerinae and demonstrate a practical strategy for resolving incongruence in other phylogenomic data sets.
Collapse
Affiliation(s)
- Felipe V Freitas
- Laboratório de Biologia Comparada e Abelhas (LBCA), Departamento de Biologia, Faculdade de Filosofia, Ciências e Letras, Universidade de São Paulo, Ribeirão Preto, SP, Brazil
- U.S. Department of Agriculture, Agricultural Research Service (USDA-ARS), Pollinating Insects Research Unit, Utah State University, Logan, UT
| | - Michael G Branstetter
- U.S. Department of Agriculture, Agricultural Research Service (USDA-ARS), Pollinating Insects Research Unit, Utah State University, Logan, UT
| | - Terry Griswold
- U.S. Department of Agriculture, Agricultural Research Service (USDA-ARS), Pollinating Insects Research Unit, Utah State University, Logan, UT
| | - Eduardo A B Almeida
- Laboratório de Biologia Comparada e Abelhas (LBCA), Departamento de Biologia, Faculdade de Filosofia, Ciências e Letras, Universidade de São Paulo, Ribeirão Preto, SP, Brazil
| |
Collapse
|
24
|
Shen XX, Steenwyk JL, Rokas A. Dissecting incongruence between concatenation- and quartet-based approaches in phylogenomic data. Syst Biol 2021; 70:997-1014. [PMID: 33616672 DOI: 10.1093/sysbio/syab011] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2020] [Revised: 02/10/2021] [Accepted: 02/17/2021] [Indexed: 12/12/2022] Open
Abstract
Topological conflict or incongruence is widespread in phylogenomic data. Concatenation- and coalescent-based approaches often result in incongruent topologies, but the causes of this conflict can be difficult to characterize. We examined incongruence stemming from conflict between likelihood-based signal (quantified by the difference in gene-wise log likelihood score or ΔGLS) and quartet-based topological signal (quantified by the difference in gene-wise quartet score or ΔGQS) for every gene in three phylogenomic studies in animals, fungi, and plants, which were chosen because their concatenation-based IQ-TREE (T1) and quartet-based ASTRAL (T2) phylogenies are known to produce eight conflicting internal branches (bipartitions). By comparing the types of phylogenetic signal for all genes in these three data matrices, we found that 30% - 36% of genes in each data matrix are inconsistent, that is, each of these genes has higher log likelihood score for T1 versus T2 (i.e., ΔGLS >0) whereas its T1 topology has lower quartet score than its T2 topology (i.e., ΔGQS <0) or vice versa. Comparison of inconsistent and consistent genes using a variety of metrics (e.g., evolutionary rate, gene tree topology, distribution of branch lengths, hidden paralogy, and gene tree discordance) showed that inconsistent genes are more likely to recover neither T1 nor T2 and have higher levels of gene tree discordance than consistent genes. Simulation analyses demonstrate that removal of inconsistent genes from datasets with low levels of incomplete lineage sorting (ILS) and low and medium levels of gene tree estimation error (GTEE) reduced incongruence and increased accuracy. In contrast, removal of inconsistent genes from datasets with medium and high ILS levels and high GTEE levels eliminated or extensively reduced incongruence, but the resulting congruent species phylogenies were not always topologically identical to the true species trees.
Collapse
Affiliation(s)
- Xing-Xing Shen
- State Key Laboratory of Rice Biology and Ministry of Agriculture Key Lab of Molecular Biology of Crop Pathogens and Insects, Zhejiang University, Hangzhou, China.,Institute of Insect Sciences, Zhejiang University, Hangzhou, China
| | - Jacob L Steenwyk
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA
| | - Antonis Rokas
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
25
|
Meng KK, Chen SF, Xu KW, Zhou RC, Li MW, Dhamala MK, Liao WB, Fan Q. Phylogenomic analyses based on genome-skimming data reveal cyto-nuclear discordance in the evolutionary history of Cotoneaster (Rosaceae). Mol Phylogenet Evol 2021; 158:107083. [PMID: 33516804 DOI: 10.1016/j.ympev.2021.107083] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2020] [Revised: 12/16/2020] [Accepted: 01/12/2021] [Indexed: 11/19/2022]
Abstract
As a consequence of hybridization, polyploidization, and apomixis, the genus Cotoneaster (Rosaceae) represents one of the most complicated and controversial lineages in Rosaceae, with ca. 370 species which have been classified into two subgenera and several sections, and is notorious for its taxonomic difficulty. The infrageneric relationships and taxonomy of Cotoneaster have remained poorly understood. Previous studies have focused mainly on natural hybridization involving only several species, and phylogeny based on very limited markers. In the present study, the sequences of complete chloroplast genomes and 204 low-copy nuclear genes of 72 accessions, representing 69 species as ingroups, were used to conduct the most comprehensive phylogenetic analysis so far for Cotoneaster. Based on the sequences of complete chloroplast genomes and many nuclear genes, our analyses yield two robust phylogenetic trees respectively. Chloroplast genome and nuclear data confidently resolved relationships of this genus into two major clades which largely supported current classification based on morphological evidence. However, conflicts between the chloroplast genome and low-copy nuclear phylogenies were observed in both the species level and clade level. Cyto-nuclear discordance in the phylogeny could be caused by frequent hybridization events and incomplete sorting lineage (ILS). In addition, our divergence-time analysis revealed an evolutionary radiation of the genus from late Miocene to date.
Collapse
Affiliation(s)
- Kai-Kai Meng
- State Key Laboratory of Biocontrol and Guangdong Provincial Key Laboratory of Plant Resources, School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, China
| | - Su-Fang Chen
- State Key Laboratory of Biocontrol and Guangdong Provincial Key Laboratory of Plant Resources, School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, China
| | - Ke-Wang Xu
- Co-Innovation Center for Sustainable Forestry in Southern China, College of Biology and the Environment, Nanjing Forestry University, Nanjing 210037, China
| | - Ren-Chao Zhou
- State Key Laboratory of Biocontrol and Guangdong Provincial Key Laboratory of Plant Resources, School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, China
| | - Ming-Wan Li
- College of Forestry, Henan Agricultural University, Zhengzhou 450002, China
| | - Man Kumar Dhamala
- Central Department of Environmental Science, Tribhuvan University, Kirtipur, Kathmandu, Nepal
| | - Wen-Bo Liao
- State Key Laboratory of Biocontrol and Guangdong Provincial Key Laboratory of Plant Resources, School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, China.
| | - Qiang Fan
- State Key Laboratory of Biocontrol and Guangdong Provincial Key Laboratory of Plant Resources, School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, China.
| |
Collapse
|
26
|
Legried B, Molloy EK, Warnow T, Roch S. Polynomial-Time Statistical Estimation of Species Trees Under Gene Duplication and Loss. J Comput Biol 2020; 28:452-468. [DOI: 10.1089/cmb.2020.0424] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Affiliation(s)
- Brandon Legried
- Department of Mathematics, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Erin K. Molloy
- Department of Computer Science, University of California, Los Angeles, Los Angeles, California, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
| | - Sébastien Roch
- Department of Mathematics, University of Wisconsin-Madison, Madison, Wisconsin, USA
| |
Collapse
|
27
|
Chan KO, Hutter CR, Wood PL, Grismer LL, Das I, Brown RM. Gene flow creates a mirage of cryptic species in a Southeast Asian spotted stream frog complex. Mol Ecol 2020; 29:3970-3987. [PMID: 32808335 DOI: 10.1111/mec.15603] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2019] [Revised: 07/29/2020] [Accepted: 08/13/2020] [Indexed: 02/06/2023]
Abstract
Most new cryptic species are described using conventional tree- and distance-based species delimitation methods (SDMs), which rely on phylogenetic arrangements and measures of genetic divergence. However, although numerous factors such as population structure and gene flow are known to confound phylogenetic inference and species delimitation, the influence of these processes is not frequently evaluated. Using large numbers of exons, introns, and ultraconserved elements obtained using the FrogCap sequence-capture protocol, we compared conventional SDMs with more robust genomic analyses that assess population structure and gene flow to characterize species boundaries in a Southeast Asian frog complex (Pulchrana picturata). Our results showed that gene flow and introgression can produce phylogenetic patterns and levels of divergence that resemble distinct species (up to 10% divergence in mitochondrial DNA). Hybrid populations were inferred as independent (singleton) clades that were highly divergent from adjacent populations (7%-10%) and unusually similar (<3%) to allopatric populations. Such anomalous patterns are not uncommon in Southeast Asian amphibians, which brings into question whether the high levels of cryptic diversity observed in other amphibian groups reflect distinct cryptic species-or, instead, highly admixed and structured metapopulation lineages. Our results also provide an alternative explanation to the conundrum of divergent (sometimes nonsister) sympatric lineages-a pattern that has been celebrated as indicative of true cryptic speciation. Based on these findings, we recommend that species delimitation of continuously distributed "cryptic" groups should not rely solely on conventional SDMs, but should necessarily examine population structure and gene flow to avoid taxonomic inflation.
Collapse
Affiliation(s)
- Kin O Chan
- Lee Kong Chian National History Museum, Faculty of Science, National University of Singapore, Singapore
| | - Carl R Hutter
- Biodiversity Institute and Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS, USA.,Museum of Natural Sciences and Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, USA
| | - Perry L Wood
- Biodiversity Institute and Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS, USA.,Department of Biological Sciences & Museum of Natural History, Auburn University, Auburn, AL, USA
| | - L L Grismer
- Herpetology Laboratory, Department of Biology, La Sierra University, Riverside, CA, USA
| | - Indraneil Das
- Institute of Biodiversity and Environmental Conservation, Universiti Malaysia Sarawak, Kota Samarahan, Sarawak, Malaysia
| | - Rafe M Brown
- Biodiversity Institute and Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS, USA
| |
Collapse
|
28
|
Chan KO, Hutter CR, Wood PL, Grismer LL, Brown RM. Larger, unfiltered datasets are more effective at resolving phylogenetic conflict: Introns, exons, and UCEs resolve ambiguities in Golden-backed frogs (Anura: Ranidae; genus Hylarana). Mol Phylogenet Evol 2020; 151:106899. [PMID: 32590046 DOI: 10.1016/j.ympev.2020.106899] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Revised: 05/18/2020] [Accepted: 06/17/2020] [Indexed: 01/01/2023]
Abstract
Using FrogCap, a recently-developed sequence-capture protocol, we obtained >12,000 highly informative exons, introns, and ultraconserved elements (UCEs), which we used to illustrate variation in evolutionary histories of these classes of markers, and to resolve long-standing systematic problems in Southeast Asian Golden-backed frogs of the genus-complex Hylarana. We also performed a comprehensive suite of analyses to assess the relative performance of different genetic markers, data filtering strategies, tree inference methods, and different measures of branch support. To reduce gene tree estimation error, we filtered the data using different thresholds of taxon completeness (missing data) and parsimony informative sites (PIS). We then estimated species trees using concatenated datasets and Maximum Likelihood (IQ-TREE) in addition to summary (ASTRAL-III), distance-based (ASTRID), and site-based (SVDQuartets) multispecies coalescent methods. Topological congruence and branch support were examined using traditional bootstrap, local posterior probabilities, gene concordance factors, quartet frequencies, and quartet scores. Our results did not yield a single concordant topology. Instead, introns, exons, and UCEs clearly possessed different phylogenetic signals, resulting in conflicting, yet strongly-supported phylogenetic estimates. However, a combined analysis comprising the most informative introns, exons, and UCEs converged on a similar topology across all analyses, with the exception of SVDQuartets. Bootstrap values were consistently high despite high levels of incongruence and high proportions of gene trees supporting conflicting topologies. Although low bootstrap values did indicate low heuristic support, high bootstrap support did not necessarily reflect congruence or support for the correct topology. This study reiterates findings of some previous studies, which demonstrated that traditional bootstrap values can produce positively misleading measures of support in large phylogenomic datasets. We also showed a remarkably strong positive relationship between branch length and topological congruence across all datasets, implying that very short internodes remain a challenge to resolve, even with orders of magnitude more data than ever before. Overall, our results demonstrate that more data from unfiltered or combined datasets produced superior results. Although data filtering reduced gene tree incongruence, decreased amounts of data also biased phylogenetic estimation. A point of diminishing returns was evident, at which higher congruence (from more stringent filtering) at the expense of amount of data led to topological error as assessed by comparison to more complete datasets across different genomic markers. Additionally, we showed that applying a parameter-rich model to a partitioned analysis of concatenated data produces better results compared to unpartitioned, or even partitioned analysis using model selection. Despite some lingering uncertainties, a combined analysis of our genomic data and sequences supplemented from GenBank (on the basis of a few gene regions) revealed highly supported novel systematic arrangements. Based on these new findings, we transfer Amnirana nicobariensis into the genus Indosylvirana; and I. milleti and Hylarana celebensis to the genus Papurana. We also provisionally place H. attigua in the genus Papurana pending verification from positively identified (voucher substantiated) samples.
Collapse
Affiliation(s)
- Kin Onn Chan
- Lee Kong Chian National History Museum, Faculty of Science, National University of Singapore, 2 Conservatory Drive, 117377, Singapore.
| | - Carl R Hutter
- Biodiversity Institute and Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS 66045, USA; Museum of Natural Sciences and Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Perry L Wood
- Museum of Natural Sciences and Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA; Department of Biological Sciences & Museum of Natural History, Auburn University, Auburn, AL 36849, USA
| | - L Lee Grismer
- Herpetology Laboratory, Department of Biology, La Sierra University, 4500 Riverwalk Parkway, Riverside, CA 92505, USA
| | - Rafe M Brown
- Biodiversity Institute and Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS 66045, USA
| |
Collapse
|
29
|
Wong GKS, Soltis DE, Leebens-Mack J, Wickett NJ, Barker MS, Van de Peer Y, Graham SW, Melkonian M. Sequencing and Analyzing the Transcriptomes of a Thousand Species Across the Tree of Life for Green Plants. ANNUAL REVIEW OF PLANT BIOLOGY 2020; 71:741-765. [PMID: 31851546 DOI: 10.1146/annurev-arplant-042916-041040] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
The 1,000 Plants (1KP) initiative was the first large-scale effort to collect next-generation sequencing (NGS) data across a phylogenetically representative sampling of species for a major clade of life, in this case theViridiplantae, or green plants. As an international multidisciplinary consortium, we focused on plant evolution and its practical implications. Among the major outcomes were the inference of a reference species tree for green plants by phylotranscriptomic analysis of low-copy genes, a survey of paleopolyploidy (whole-genome duplications) across the Viridiplantae, the inferred evolutionary histories for many gene families and biological processes, the discovery of novel light-sensitive proteins for optogenetic studies in mammalian neuroscience, and elucidation of the genetic network for a complex trait (C4 photosynthesis). Altogether, 1KP demonstrated how value can be extracted from a phylodiverse sequencing data set, providing a template for future projects that aim to generate even more data, including complete de novo genomes, across the tree of life.
Collapse
Affiliation(s)
- Gane Ka-Shu Wong
- Department of Biological Sciences and Department of Medicine, University of Alberta, Edmonton, Alberta T6G 2E9, Canada;
- BGI-Shenzhen, Beishan Industrial Zone, Yantian District, Shenzhen 518083, China
| | - Douglas E Soltis
- Florida Museum of Natural History, Gainesville, Florida 32611, USA
- Department of Biology, University of Florida, Gainesville, Florida 32611, USA
| | - Jim Leebens-Mack
- Department of Plant Biology, University of Georgia, Athens, Georgia 30602, USA
| | - Norman J Wickett
- Negaunee Institute for Plant Conservation Science and Action, Chicago Botanic Garden, Glencoe, Illinois 60022, USA
| | - Michael S Barker
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona 85721, USA
| | - Yves Van de Peer
- Department of Plant Biotechnology and Bioinformatics, VIB Center for Plant Systems Biology, Ghent University, 9052 Ghent, Belgium
- Department of Biochemistry, Genetics and Microbiology, University of Pretoria, Pretoria 0028, South Africa
| | - Sean W Graham
- Department of Botany, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
| | - Michael Melkonian
- Faculty of Biology, University of Duisburg-Essen, D-45141 Essen, Germany
| |
Collapse
|
30
|
Hu Y, Xing W, Hu Z, Liu G. Phylogenetic Analysis and Substitution Rate Estimation of Colonial Volvocine Algae Based on Mitochondrial Genomes. Genes (Basel) 2020; 11:genes11010115. [PMID: 31968709 PMCID: PMC7016891 DOI: 10.3390/genes11010115] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Revised: 01/13/2020] [Accepted: 01/15/2020] [Indexed: 01/30/2023] Open
Abstract
We sequenced the mitochondrial genome of six colonial volvocine algae, namely: Pandorina morum, Pandorina colemaniae, Volvulina compacta, Colemanosphaera angeleri, Colemanosphaera charkowiensi, and Yamagishiella unicocca. Previous studies have typically reconstructed the phylogenetic relationship between colonial volvocine algae based on chloroplast or nuclear genes. Here, we explore the validity of phylogenetic analysis based on mitochondrial protein-coding genes. We found phylogenetic incongruence of the genera Yamagishiella and Colemanosphaera. In Yamagishiella, the stochastic error and linkage group formed by the mitochondrial protein-coding genes prevent phylogenetic analyses from reflecting the true relationship. In Colemanosphaera, a different reconstruction approach revealed a different phylogenetic relationship. This incongruence may be because of the influence of biological factors, such as incomplete lineage sorting or horizontal gene transfer. We also analyzed the substitution rates in the mitochondrial and chloroplast genomes between colonial volvocine algae. Our results showed that all volvocine species showed significantly higher substitution rates for the mitochondrial genome compared with the chloroplast genome. The nonsynonymous substitution (dN)/synonymous substitution (dS) ratio is similar in the genomes of both organelles in most volvocine species, suggesting that the two counterparts are under a similar selection pressure. We also identified a few chloroplast protein-coding genes that showed high dN/dS ratios in some species, resulting in a significant dN/dS ratio difference between the mitochondrial and chloroplast genomes.
Collapse
Affiliation(s)
- Yuxin Hu
- Key Laboratory of Algal Biology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
- School of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Weiyue Xing
- Key Laboratory of Algal Biology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
- School of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhengyu Hu
- State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
| | - Guoxiang Liu
- Key Laboratory of Algal Biology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China
- Correspondence: ; Tel.: +86-027-6878-0576
| |
Collapse
|
31
|
Christensen S, Molloy EK, Vachaspati P, Yammanuru A, Warnow T. Non-parametric correction of estimated gene trees using TRACTION. Algorithms Mol Biol 2020; 15:1. [PMID: 31911812 PMCID: PMC6942343 DOI: 10.1186/s13015-019-0161-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2019] [Accepted: 12/18/2019] [Indexed: 11/16/2022] Open
Abstract
Motivation Estimated gene trees are often inaccurate, due to insufficient phylogenetic signal in the single gene alignment, among other causes. Gene tree correction aims to improve the accuracy of an estimated gene tree by using computational techniques along with auxiliary information, such as a reference species tree or sequencing data. However, gene trees and species trees can differ as a result of gene duplication and loss (GDL), incomplete lineage sorting (ILS), and other biological processes. Thus gene tree correction methods need to take estimation error as well as gene tree heterogeneity into account. Many prior gene tree correction methods have been developed for the case where GDL is present. Results Here, we study the problem of gene tree correction where gene tree heterogeneity is instead due to ILS and/or HGT. We introduce TRACTION, a simple polynomial time method that provably finds an optimal solution to the RF-optimal tree refinement and completion (RF-OTRC) Problem, which seeks a refinement and completion of a singly-labeled gene tree with respect to a given singly-labeled species tree so as to minimize the Robinson−Foulds (RF) distance. Our extensive simulation study on 68,000 estimated gene trees shows that TRACTION matches or improves on the accuracy of well-established methods from the GDL literature when HGT and ILS are both present, and ties for best under the ILS-only conditions. Furthermore, TRACTION ties for fastest on these datasets. We also show that a naive generalization of the RF-OTRC problem to multi-labeled trees is possible, but can produce misleading results where gene tree heterogeneity is due to GDL.
Collapse
|
32
|
Polynomial-Time Statistical Estimation of Species Trees Under Gene Duplication and Loss. LECTURE NOTES IN COMPUTER SCIENCE 2020. [DOI: 10.1007/978-3-030-45257-5_8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|
33
|
Barylski J, Enault F, Dutilh BE, Schuller MBP, Edwards RA, Gillis A, Klumpp J, Knezevic P, Krupovic M, Kuhn JH, Lavigne R, Oksanen HM, Sullivan MB, Jang HB, Simmonds P, Aiewsakun P, Wittmann J, Tolstoy I, Brister JR, Kropinski AM, Adriaenssens EM. Analysis of Spounaviruses as a Case Study for the Overdue Reclassification of Tailed Phages. Syst Biol 2020; 69:110-123. [PMID: 31127947 PMCID: PMC7409376 DOI: 10.1093/sysbio/syz036] [Citation(s) in RCA: 83] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2018] [Accepted: 05/17/2019] [Indexed: 01/01/2023] Open
Abstract
Tailed bacteriophages are the most abundant and diverse viruses in the world, with genome sizes ranging from 10 kbp to over 500 kbp. Yet, due to historical reasons, all this diversity is confined to a single virus order-Caudovirales, composed of just four families: Myoviridae, Siphoviridae, Podoviridae, and the newly created Ackermannviridae family. In recent years, this morphology-based classification scheme has started to crumble under the constant flood of phage sequences, revealing that tailed phages are even more genetically diverse than once thought. This prompted us, the Bacterial and Archaeal Viruses Subcommittee of the International Committee on Taxonomy of Viruses (ICTV), to consider overall reorganization of phage taxonomy. In this study, we used a wide range of complementary methods-including comparative genomics, core genome analysis, and marker gene phylogenetics-to show that the group of Bacillus phage SPO1-related viruses previously classified into the Spounavirinae subfamily, is clearly distinct from other members of the family Myoviridae and its diversity deserves the rank of an autonomous family. Thus, we removed this group from the Myoviridae family and created the family Herelleviridae-a new taxon of the same rank. In the process of the taxon evaluation, we explored the feasibility of different demarcation criteria and critically evaluated the usefulness of our methods for phage classification. The convergence of results, drawing a consistent and comprehensive picture of a new family with associated subfamilies, regardless of method, demonstrates that the tools applied here are particularly useful in phage taxonomy. We are convinced that creation of this novel family is a crucial milestone toward much-needed reclassification in the Caudovirales order.
Collapse
Affiliation(s)
- Jakub Barylski
- Department of Molecular Virology, Institute of Experimental Biology, Faculty of Biology, Adam Mickiewicz University in Poznań, Collegium Biologicum - Umultowska 89, 61-614 Poznań, Poland
| | - François Enault
- Université Clermont Auvergne, CNRS, LMGE, F-63000 Clermont-Ferrand, France
| | - Bas E Dutilh
- Theoretical Biology and Bioinformatics, Department of Biology, Science for Life, Utrecht University, Padualaan 8, 3584 CH, Utrecht, The Netherlands
- Centre for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Centre, Geert Grooteplein 28, 6525 GA, Nijmegen, The Netherlands
| | - Margo BP Schuller
- Theoretical Biology and Bioinformatics, Department of Biology, Science for Life, Utrecht University, Padualaan 8, 3584 CH, Utrecht, The Netherlands
| | - Robert A Edwards
- Department of Biology, San Diego State University, 5500 Campanile Drive, San Diego, CA 92182, USA
- Department of Computer Science, San Diego State University, 5500 Campanile Drive, San Diego, CA 92182, USA
| | - Annika Gillis
- Laboratory of Food and Environmental Microbiology, Université Catholique de Louvain, Croix du Sud 2-L7.05.12, 1348 Louvain-la-Neuve, Belgium
| | - Jochen Klumpp
- Institute of Food, Nutrition and Health, ETH Zurich, Schmelzbergstrasse 7, 8092 Zurich, Switzerland
| | - Petar Knezevic
- Department of Biology and Ecology, Faculty of Sciences, University of Novi Sad, Novi Sad, Serbia
| | - Mart Krupovic
- Unité Biologie Moléculaire du Gène chez les Extrêmophiles, Institut Pasteur, 25 rue du Dr. Roux, 75015 Paris, France
| | - Jens H Kuhn
- Integrated Research Facility at Fort Detrick, Division of Clinical Research, National Institute of Allergy and Infectious Diseases, National Institutes of Health, B-8200 Research Plaza, Fort Detrick, Frederick, MD 21702, USA
| | - Rob Lavigne
- Laboratory of Gene Technology, Department of Biosystems, KU Leuven, Kasteelpark Arenberg 21 - box 2462, 3001 Leuven, Belgium
| | - Hanna M Oksanen
- Molecular and Integrative Biosciences Research Programme, Faculty of Biological and Environmental Sciences, University of Helsinki, P.O. Box 56 (Viikinkaari 9B), 00014 Helsinki, Finland
| | - Matthew B Sullivan
- Department of Microbiology, The Ohio State University, 496 W 12thAvenue, Columbus, OH 43210, USA
- Department of Civil, Environmental, and Geodetic Engineering, The Ohio State University, 496 W 12thAvenue, Columbus, OH 43210, USA
| | - Ho Bin Jang
- Department of Microbiology, The Ohio State University, 496 W 12thAvenue, Columbus, OH 43210, USA
- Department of Civil, Environmental, and Geodetic Engineering, The Ohio State University, 496 W 12thAvenue, Columbus, OH 43210, USA
| | - Peter Simmonds
- Nuffield Department of Medicine, University of Oxford, Peter Medawar Building, South Parks Road, Oxford OX1 3SY, UK
| | - Pakorn Aiewsakun
- Nuffield Department of Medicine, University of Oxford, Peter Medawar Building, South Parks Road, Oxford OX1 3SY, UK
- Department of Microbiology, Faculty of Science, Mahidol University, Bangkok 10400, Thailand
| | - Johannes Wittmann
- Leibniz-Institut DSMZ—German Collection of Microorganisms and Cell Cultures, Inhoffenstr. 7B, 38124 Braunschweig, Germany
| | - Igor Tolstoy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda MD 20894, USA
| | - J Rodney Brister
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda MD 20894, USA
| | - Andrew M Kropinski
- Department of Food Science, University of Guelph, Guelph, Ontario, Canada
- Department of Pathobiology, University of Guelph, 50 Stone Road E, Guelph, Ontario N1G 2W1, Canada
| | - Evelien M Adriaenssens
- Department of Functional & Comparative Genomics, Institute of Integrative Biology, University of Liverpool, Biosciences Building, Crown Street, Liverpool L69 7ZB, UK
- Gut Microbes & Health Institute Strategic Programme, Quadram Institute Bioscience, Norwich Research Park, James Watson Road, Norwich NR4 7UQ Norwich, UK
| |
Collapse
|
34
|
Zhu Q, Mai U, Pfeiffer W, Janssen S, Asnicar F, Sanders JG, Belda-Ferre P, Al-Ghalith GA, Kopylova E, McDonald D, Kosciolek T, Yin JB, Huang S, Salam N, Jiao JY, Wu Z, Xu ZZ, Cantrell K, Yang Y, Sayyari E, Rabiee M, Morton JT, Podell S, Knights D, Li WJ, Huttenhower C, Segata N, Smarr L, Mirarab S, Knight R. Phylogenomics of 10,575 genomes reveals evolutionary proximity between domains Bacteria and Archaea. Nat Commun 2019; 10:5477. [PMID: 31792218 PMCID: PMC6889312 DOI: 10.1038/s41467-019-13443-4] [Citation(s) in RCA: 192] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2019] [Accepted: 11/06/2019] [Indexed: 11/10/2022] Open
Abstract
Rapid growth of genome data provides opportunities for updating microbial evolutionary relationships, but this is challenged by the discordant evolution of individual genes. Here we build a reference phylogeny of 10,575 evenly-sampled bacterial and archaeal genomes, based on a comprehensive set of 381 markers, using multiple strategies. Our trees indicate remarkably closer evolutionary proximity between Archaea and Bacteria than previous estimates that were limited to fewer "core" genes, such as the ribosomal proteins. The robustness of the results was tested with respect to several variables, including taxon and site sampling, amino acid substitution heterogeneity and saturation, non-vertical evolution, and the impact of exclusion of candidate phyla radiation (CPR) taxa. Our results provide an updated view of domain-level relationships.
Collapse
Affiliation(s)
- Qiyun Zhu
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Uyen Mai
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Wayne Pfeiffer
- San Diego Supercomputer Center, University of California San Diego, La Jolla, CA, USA
| | - Stefan Janssen
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Algorithmic Bioinformatics, Department of Biology and Chemistry, Justus Liebig University Gießen, Giessen, Germany
| | | | - Jon G Sanders
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Pedro Belda-Ferre
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Gabriel A Al-Ghalith
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA
| | - Evguenia Kopylova
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Daniel McDonald
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Tomasz Kosciolek
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Malopolska Centre of Biotechnology, Jagiellonian University, Krakow, Poland
| | - John B Yin
- Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA, USA
- Department of Mathematics, University of California San Diego, La Jolla, CA, USA
| | - Shi Huang
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Single-Cell Center, Qingdao Institute of Bioenergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, Shandong, China
| | - Nimaichand Salam
- State Key Laboratory of Biocontrol and Guangdong Key Laboratory of Plant Resources, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Jian-Yu Jiao
- State Key Laboratory of Biocontrol and Guangdong Key Laboratory of Plant Resources, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Zijun Wu
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Division of Biological Sciences, University of California San Diego, La Jolla, CA, USA
| | - Zhenjiang Z Xu
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Kalen Cantrell
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA
| | - Yimeng Yang
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA
| | - Erfan Sayyari
- Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA, USA
| | - Maryam Rabiee
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - James T Morton
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| | - Sheila Podell
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
| | - Dan Knights
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA
| | - Wen-Jun Li
- State Key Laboratory of Biocontrol and Guangdong Key Laboratory of Plant Resources, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Curtis Huttenhower
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Nicola Segata
- Department CIBIO, University of Trento, Trento, Italy
| | - Larry Smarr
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
- Center for Microbiome Innovation, University of California San Diego, La Jolla, CA, USA
- California Institute for Telecommunications and Information Technology, University of California San Diego, La Jolla, CA, USA
| | - Siavash Mirarab
- Malopolska Centre of Biotechnology, Jagiellonian University, Krakow, Poland
| | - Rob Knight
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA.
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA.
- Center for Microbiome Innovation, University of California San Diego, La Jolla, CA, USA.
- Department of Bioengineering, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
35
|
Patané JSL, Martins J, Rangel LT, Belasque J, Digiampietri LA, Facincani AP, Ferreira RM, Jaciani FJ, Zhang Y, Varani AM, Almeida NF, Wang N, Ferro JA, Moreira LM, Setubal JC. Origin and diversification of Xanthomonas citri subsp. citri pathotypes revealed by inclusive phylogenomic, dating, and biogeographic analyses. BMC Genomics 2019; 20:700. [PMID: 31500575 PMCID: PMC6734499 DOI: 10.1186/s12864-019-6007-4] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Accepted: 07/30/2019] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Xanthomonas citri subsp. citri pathotypes cause bacterial citrus canker, being responsible for severe agricultural losses worldwide. The A pathotype has a broad host spectrum, while A* and Aw are more restricted both in hosts and in geography. Two previous phylogenomic studies led to contrasting well-supported clades for sequenced genomes of these pathotypes. No extensive biogeographical or divergence dating analytic approaches have been so far applied to available genomes. RESULTS Based on a larger sampling of genomes than in previous studies (including six new genomes sequenced by our group, adding to a total of 95 genomes), phylogenomic analyses resulted in different resolutions, though overall indicating that A + AW is the most likely true clade. Our results suggest the high degree of recombination at some branches and the fast diversification of lineages are probable causes for this phylogenetic blurring effect. One of the genomes analyzed, X. campestris pv. durantae, was shown to be an A* strain; this strain has been reported to infect a plant of the family Verbenaceae, though there are no reports of any X. citri subsp. citri pathotypes infecting any plant outside the Citrus genus. Host reconstruction indicated the pathotype ancestor likely had plant hosts in the family Fabaceae, implying an ancient jump to the current Rutaceae hosts. Extensive dating analyses indicated that the origin of X. citri subsp. citri occurred more recently than the main phylogenetic splits of Citrus plants, suggesting dispersion rather than host-directed vicariance as the main driver of geographic expansion. An analysis of 120 pathogenic-related genes revealed pathotype-associated patterns of presence/absence. CONCLUSIONS Our results provide novel insights into the evolutionary history of X. citri subsp. citri as well as a sound phylogenetic foundation for future evolutionary and genomic studies of its pathotypes.
Collapse
Affiliation(s)
- José S L Patané
- Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, SP, Brazil
- Laboratório Especial de Ciclo Celular, Instituto Butantan, São Paulo, SP, Brazil
| | - Joaquim Martins
- Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, SP, Brazil
| | - Luiz Thiberio Rangel
- Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, SP, Brazil
| | - José Belasque
- Departamento de Fitopatologia e Nematologia, Escola Superior de Agricultura "Luiz de Queiroz", Universidade de São Paulo, Piracicaba, SP, Brazil
| | - Luciano A Digiampietri
- Escola de Artes, Ciências e Humanidades, Universidade de São Paulo, São Paulo, SP, Brazil
| | - Agda Paula Facincani
- Faculdade de Ciências Agrárias e Veterinárias, Universidade Estadual Paulista (UNESP), Jaboticabal, SP, Brazil
| | - Rafael Marini Ferreira
- Faculdade de Ciências Agrárias e Veterinárias, Universidade Estadual Paulista (UNESP), Jaboticabal, SP, Brazil
| | - Fabrício José Jaciani
- Departamento de Pesquisa e Desenvolvimento, Fundo de Defesa da Citricultura (Fundecitrus), Araraquara, SP, Brazil
| | - Yunzeng Zhang
- Citrus Research and Education Center, Department of Microbiology and Cell Science, University of Florida, Lake Alfred, FL, USA
| | - Alessandro M Varani
- Faculdade de Ciências Agrárias e Veterinárias, Universidade Estadual Paulista (UNESP), Jaboticabal, SP, Brazil
| | - Nalvo F Almeida
- Faculdade de Computação, Universidade Federal de Mato Grosso do Sul, Campo Grande, MS, Brazil
| | - Nian Wang
- Citrus Research and Education Center, Department of Microbiology and Cell Science, University of Florida, Lake Alfred, FL, USA
| | - Jesus A Ferro
- Faculdade de Ciências Agrárias e Veterinárias, Universidade Estadual Paulista (UNESP), Jaboticabal, SP, Brazil
| | - Leandro M Moreira
- Núcleo de Pesquisas em Ciências Biológicas, Universidade Federal de Ouro Preto, Ouro Preto, MG, Brazil
| | - João C Setubal
- Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, SP, Brazil.
- Biocomplexity Institute of Virginia Tech, Blacksburg, VA, USA.
| |
Collapse
|
36
|
Cornetti L, Fields PD, Van Damme K, Ebert D. A fossil-calibrated phylogenomic analysis of Daphnia and the Daphniidae. Mol Phylogenet Evol 2019; 137:250-262. [DOI: 10.1016/j.ympev.2019.05.018] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2018] [Revised: 05/03/2019] [Accepted: 05/20/2019] [Indexed: 11/16/2022]
|
37
|
Liu L, Anderson C, Pearl D, Edwards SV. Modern Phylogenomics: Building Phylogenetic Trees Using the Multispecies Coalescent Model. Methods Mol Biol 2019; 1910:211-239. [PMID: 31278666 DOI: 10.1007/978-1-4939-9074-0_7] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The multispecies coalescent (MSC) model provides a compelling framework for building phylogenetic trees from multilocus DNA sequence data. The pure MSC is best thought of as a special case of so-called "multispecies network coalescent" models, in which gene flow is allowed among branches of the tree, whereas MSC methods assume there is no gene flow between diverging species. Early implementations of the MSC, such as "parsimony" or "democratic vote" approaches to combining information from multiple gene trees, as well as concatenation, in which DNA sequences from multiple gene trees are combined into a single "supergene," were quickly shown to be inconsistent in some regions of tree space, in so far as they converged on the incorrect species tree as more gene trees and sequence data were accumulated. The anomaly zone, a region of tree space in which the most frequent gene tree is different from the species tree, is one such region where many so-called "coalescent" methods are inconsistent. Second-generation implementations of the MSC employed Bayesian or likelihood models; these are consistent in all regions of gene tree space, but Bayesian methods in particular are incapable of handling the large phylogenomic data sets currently available. Two-step methods, such as MP-EST and ASTRAL, in which gene trees are first estimated and then combined to estimate an overarching species tree, are currently popular in part because they can handle large phylogenomic data sets. These methods are consistent in the anomaly zone but can sometimes provide inappropriate measures of tree support or apportion error and signal in the data inappropriately. MP-EST in particular employs a likelihood model which can be conveniently manipulated to perform statistical tests of competing species trees, incorporating the likelihood of the collected gene trees on each species tree in a likelihood ratio test. Such tests provide a useful alternative to the multilocus bootstrap, which only indirectly tests the appropriateness of competing species trees. We illustrate these tests and implementations of the MSC with examples and suggest that MSC methods are a useful class of models effectively using information from multiple loci to build phylogenetic trees.
Collapse
Affiliation(s)
- Liang Liu
- Department of Statistics, University of Georgia, Athens, GA, USA
| | | | - Dennis Pearl
- Department of Statistics, Pennsylvania State University, University Park, PA, USA
| | - Scott V Edwards
- Department of Organismic and Evolutionary Biology & Museum of Comparative Zoology, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
38
|
Avni E, Snir S. A New Quartet-Based Statistical Method for Comparing Sets of Gene Trees Is Developed Using a Generalized Hoeffding Inequality. J Comput Biol 2018; 26:27-37. [PMID: 30422680 DOI: 10.1089/cmb.2018.0129] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Extracting the strength of the tree signal that is encompassed by a collection of gene trees is an exceptionally challenging problem in phylogenomics. Often, this problem not only involves the construction of individual phylogenies based on different genes, which may be a difficult endeavor on its own, but is also exacerbated by many factors that create conflicts between the evolutionary histories of different gene families, such as duplications or losses of genes; hybridization events; incomplete lineage sorting; and horizontal gene transfer, the latter two play central roles in the evolution of eukaryotes and prokaryotes, respectively. In this work, we tackle the aforementioned problem by focusing on quartet trees, which are the most basic unit of information in the context of unrooted phylogenies. In the first part, we show how a theorem of Janson that generalizes the classical Hoeffding inequality can be used to develop a statistical test involving quartets. In the second part, we study real and simulated data using this theoretical advancement, thus demonstrating how the significance of the differences between sets of quartets can be assessed. Our results are particularly intriguing since they nonstandardly require the analysis of dependent random variables.
Collapse
Affiliation(s)
- Eliran Avni
- Department of Evolutionary Biology, University of Haifa, Haifa, Israel
| | - Sagi Snir
- Department of Evolutionary Biology, University of Haifa, Haifa, Israel
| |
Collapse
|
39
|
Zhang C, Rabiee M, Sayyari E, Mirarab S. ASTRAL-III: polynomial time species tree reconstruction from partially resolved gene trees. BMC Bioinformatics 2018; 19:153. [PMID: 29745866 PMCID: PMC5998893 DOI: 10.1186/s12859-018-2129-y] [Citation(s) in RCA: 1196] [Impact Index Per Article: 170.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Background Evolutionary histories can be discordant across the genome, and such discordances need to be considered in reconstructing the species phylogeny. ASTRAL is one of the leading methods for inferring species trees from gene trees while accounting for gene tree discordance. ASTRAL uses dynamic programming to search for the tree that shares the maximum number of quartet topologies with input gene trees, restricting itself to a predefined set of bipartitions. Results We introduce ASTRAL-III, which substantially improves the running time of ASTRAL-II and guarantees polynomial running time as a function of both the number of species (n) and the number of genes (k). ASTRAL-III limits the bipartition constraint set (X) to grow at most linearly with n and k. Moreover, it handles polytomies more efficiently than ASTRAL-II, exploits similarities between gene trees better, and uses several techniques to avoid searching parts of the search space that are mathematically guaranteed not to include the optimal tree. The asymptotic running time of ASTRAL-III in the presence of polytomies is \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$O\left ((nk)^{1.726} D \right)$\end{document}O(nk)1.726D where D=O(nk) is the sum of degrees of all unique nodes in input trees. The running time improvements enable us to test whether contracting low support branches in gene trees improves the accuracy by reducing noise. In extensive simulations, we show that removing branches with very low support (e.g., below 10%) improves accuracy while overly aggressive filtering is harmful. We observe on a biological avian phylogenomic dataset of 14K genes that contracting low support branches greatly improve results. Conclusions ASTRAL-III is a faster version of the ASTRAL method for phylogenetic reconstruction and can scale up to 10,000 species. With ASTRAL-III, low support branches can be removed, resulting in improved accuracy. Electronic supplementary material The online version of this article (10.1186/s12859-018-2129-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Chao Zhang
- Bioinformatics and Systems Biology, University of California at San Diego, 9500 Gilman Drive, La Jolla, 92093-0021, CA, USA
| | - Maryam Rabiee
- Department of Computer Science and Engineering, University of California at San Diego, 9500 Gilman Drive, La Jolla, 92093-0021, CA, USA
| | - Erfan Sayyari
- Department of Electrical and Computer Engineering, University of California at San Diego, 9500 Gilman Drive, La Jolla, 92093-0021, CA, USA
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, University of California at San Diego, 9500 Gilman Drive, La Jolla, 92093-0021, CA, USA.
| |
Collapse
|
40
|
Davidson R, Lawhorn M, Rusinko J, Weber N. Efficient Quartet Representations of Trees and Applications to Supertree and Summary Methods. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1010-1015. [PMID: 28113327 DOI: 10.1109/tcbb.2016.2638911] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Quartet trees displayed by larger phylogenetic trees have long been used as inputs for species tree and supertree reconstruction. Computational constraints prevent the use of all displayed quartets in many practical problems with large numbers of taxa. We introduce the notion of an Efficient Quartet System (EQS) to represent a phylogenetic tree with a subset of the quartets displayed by the tree. We show mathematically that the set of quartets obtained from a tree via an EQS contains all of the combinatorial information of the tree itself. Using performance tests on simulated datasets, we also demonstrate that using an EQS to reduce the number of quartets in both summary method pipelines for species tree inference as well as methods for supertree inference results in only small reductions in accuracy.
Collapse
|
41
|
Roberts WR, Roalson EH. Phylogenomic analyses reveal extensive gene flow within the magic flowers (Achimenes). AMERICAN JOURNAL OF BOTANY 2018; 105:726-740. [PMID: 29702729 DOI: 10.1002/ajb2.1058] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/20/2017] [Accepted: 02/02/2018] [Indexed: 06/08/2023]
Abstract
PREMISE OF THE STUDY The Neotropical Gesneriaceae is a lineage known for its colorful and diverse flowers, as well as an extensive history of intra- and intergeneric hybridization, particularly among Achimenes (the magic flowers) and other members of subtribe Gloxiniinae. Despite numerous studies seeking to elucidate the evolutionary relationships of these lineages, relatively few have sought to infer specific patterns of gene flow despite evidence of widespread hybridization. METHODS To explore the utility of phylogenomic data for reassessing phylogenetic relationships and inferring patterns of gene flow among species of Achimenes, we sequenced 12 transcriptomes. We used a variety of methods to infer the species tree, examine gene tree discordance, and infer patterns of gene flow. KEY RESULTS Phylogenomic analyses resolve clade relationships at the crown of the lineage with strong support. In contrast to previous analyses, we recovered strong support for several new relationships despite a significant amount of gene tree discordance. We present evidence for at least two introgression events between two species pairs that share pollinators, and suggest that the species status of Achimenes admirabilis be reexamined. CONCLUSIONS Our study demonstrates the utility of transcriptome data for phylogenomic analyses, and inferring patterns of gene flow despite gene tree discordance. Moreover, these data provide another example of prevalent interspecific gene flow among Neotropical plants that share pollinators.
Collapse
Affiliation(s)
- Wade R Roberts
- Molecular Plant Sciences Graduate Program, Washington State University, Pullman, Washington, 99164-1030, USA
- School of Biological Sciences, Washington State University, Pullman, Washington, 99164-4236, USA
| | - Eric H Roalson
- Molecular Plant Sciences Graduate Program, Washington State University, Pullman, Washington, 99164-1030, USA
- School of Biological Sciences, Washington State University, Pullman, Washington, 99164-4236, USA
| |
Collapse
|
42
|
Abstract
Phylogenomics aims at reconstructing the evolutionary histories of organisms taking into account whole genomes or large fractions of genomes. The abundance of genomic data for an enormous variety of organisms has enabled phylogenomic inference of many groups, and this has motivated the development of many computer programs implementing the associated methods. This chapter surveys phylogenetic concepts and methods aimed at both gene tree and species tree reconstruction while also addressing common pitfalls, providing references to relevant computer programs. A practical phylogenomic analysis example including bacterial genomes is presented at the end of the chapter.
Collapse
Affiliation(s)
- José S L Patané
- Department of Biochemistry, Institute of Chemistry, University of São Paulo, Av. Prof. Lineu Prestes 748, São Paulo, SP, 05508-000, Brazil
| | - Joaquim Martins
- Department of Biochemistry, Institute of Chemistry, University of São Paulo, Av. Prof. Lineu Prestes 748, São Paulo, SP, 05508-000, Brazil
| | - João C Setubal
- Department of Biochemistry, Institute of Chemistry, University of São Paulo, Av. Prof. Lineu Prestes 748, São Paulo, SP, 05508-000, Brazil.
| |
Collapse
|
43
|
Mallo D, Posada D. Multilocus inference of species trees and DNA barcoding. Philos Trans R Soc Lond B Biol Sci 2017; 371:rstb.2015.0335. [PMID: 27481787 PMCID: PMC4971187 DOI: 10.1098/rstb.2015.0335] [Citation(s) in RCA: 50] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/10/2016] [Indexed: 11/30/2022] Open
Abstract
The unprecedented amount of data resulting from next-generation sequencing has opened a new era in phylogenetic estimation. Although large datasets should, in theory, increase phylogenetic resolution, massive, multilocus datasets have uncovered a great deal of phylogenetic incongruence among different genomic regions, due both to stochastic error and to the action of different evolutionary process such as incomplete lineage sorting, gene duplication and loss and horizontal gene transfer. This incongruence violates one of the fundamental assumptions of the DNA barcoding approach, which assumes that gene history and species history are identical. In this review, we explain some of the most important challenges we will have to face to reconstruct the history of species, and the advantages and disadvantages of different strategies for the phylogenetic analysis of multilocus data. In particular, we describe the evolutionary events that can generate species tree—gene tree discordance, compare the most popular methods for species tree reconstruction, highlight the challenges we need to face when using them and discuss their potential utility in barcoding. Current barcoding methods sacrifice a great amount of statistical power by only considering one locus, and a transition to multilocus barcodes would not only improve current barcoding methods, but also facilitate an eventual transition to species-tree-based barcoding strategies, which could better accommodate scenarios where the barcode gap is too small or inexistent. This article is part of the themed issue ‘From DNA barcodes to biomes’.
Collapse
Affiliation(s)
- Diego Mallo
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo 36310, Spain
| | - David Posada
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo 36310, Spain
| |
Collapse
|
44
|
Song J, Zheng S, Nguyen N, Wang Y, Zhou Y, Lin K. Integrated pipeline for inferring the evolutionary history of a gene family embedded in the species tree: a case study on the STIMATE gene family. BMC Bioinformatics 2017; 18:439. [PMID: 28974198 PMCID: PMC5627428 DOI: 10.1186/s12859-017-1850-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2017] [Accepted: 09/26/2017] [Indexed: 11/28/2022] Open
Abstract
Background Because phylogenetic inference is an important basis for answering many evolutionary problems, a large number of algorithms have been developed. Some of these algorithms have been improved by integrating gene evolution models with the expectation of accommodating the hierarchy of evolutionary processes. To the best of our knowledge, however, there still is no single unifying model or algorithm that can take all evolutionary processes into account through a stepwise or simultaneous method. Results On the basis of three existing phylogenetic inference algorithms, we built an integrated pipeline for inferring the evolutionary history of a given gene family; this pipeline can model gene sequence evolution, gene duplication-loss, gene transfer and multispecies coalescent processes. As a case study, we applied this pipeline to the STIMATE (TMEM110) gene family, which has recently been reported to play an important role in store-operated Ca2+ entry (SOCE) mediated by ORAI and STIM proteins. We inferred their phylogenetic trees in 69 sequenced chordate genomes. Conclusions By integrating three tree reconstruction algorithms with diverse evolutionary models, a pipeline for inferring the evolutionary history of a gene family was developed, and its application was demonstrated. Electronic supplementary material The online version of this article (10.1186/s12859-017-1850-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jia Song
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, 100875, China
| | - Sisi Zheng
- Beijing Key Laboratory of Gene Resources and Molecular Development College of Life Sciences, Beijing Normal University, Beijing, 100875, China
| | - Nhung Nguyen
- Center for Translational Cancer Research, Institute of Biosciences and Technology, Department of Medical Physiology, College of Medicine, Texas A&M University, Houston, TX, 77030, USA
| | - Youjun Wang
- Beijing Key Laboratory of Gene Resources and Molecular Development College of Life Sciences, Beijing Normal University, Beijing, 100875, China
| | - Yubin Zhou
- Center for Translational Cancer Research, Institute of Biosciences and Technology, Department of Medical Physiology, College of Medicine, Texas A&M University, Houston, TX, 77030, USA
| | - Kui Lin
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, 100875, China.
| |
Collapse
|
45
|
Molloy EK, Warnow T. To Include or Not to Include: The Impact of Gene Filtering on Species Tree Estimation Methods. Syst Biol 2017; 67:285-303. [DOI: 10.1093/sysbio/syx077] [Citation(s) in RCA: 138] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2016] [Accepted: 09/13/2017] [Indexed: 01/27/2023] Open
Affiliation(s)
- Erin K Molloy
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| |
Collapse
|
46
|
Avni E, Snir S. Toxic genes present a unique phylogenetic signature. Mol Phylogenet Evol 2017; 116:141-148. [PMID: 28842276 DOI: 10.1016/j.ympev.2017.08.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2017] [Revised: 08/17/2017] [Accepted: 08/17/2017] [Indexed: 10/19/2022]
Abstract
Horizontal gene transfer (HGT) is a major part of the evolution of Archaea and Bacteria, to the extent that the validity of the Tree of Life concept for prokaryotes has been seriously questioned. The patterns and routes of HGT remain a subject of intense study and debate. It was discovered that while several genes exhibit rampant HGT across the whole prokaryotic tree of life, others are lethal to certain organisms and therefore cannot be successfully transferred to them. We distinguish between these two classes of genes and show analytically that genes found to be toxic to a specific species (E. coli) also resist HGT in general. Several tools we employ show evidence to support that claim. One of those tools is the quartet plurality distribution (QPD), a mathematical tool that measures tendency to HGT over a large set of genes and species. When aggregated over a collection of genes, it can reveal important properties of this collection. We conclude that evidence of toxicity of certain genes to a wide variety of prokaryotes are revealed using the new tool of quartet plurality distribution.
Collapse
Affiliation(s)
- Eliran Avni
- Dept. of Evolutionary Biology, University of Haifa, Haifa 31905, Israel.
| | - Sagi Snir
- Dept. of Evolutionary Biology, University of Haifa, Haifa 31905, Israel.
| |
Collapse
|
47
|
ASTRAL-III: Increased Scalability and Impacts of Contracting Low Support Branches. COMPARATIVE GENOMICS 2017. [DOI: 10.1007/978-3-319-67979-2_4] [Citation(s) in RCA: 97] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
|
48
|
Zhao L, Li X, Zhang N, Zhang SD, Yi TS, Ma H, Guo ZH, Li DZ. Phylogenomic analyses of large-scale nuclear genes provide new insights into the evolutionary relationships within the rosids. Mol Phylogenet Evol 2016; 105:166-176. [DOI: 10.1016/j.ympev.2016.06.007] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2015] [Revised: 06/06/2016] [Accepted: 06/27/2016] [Indexed: 12/28/2022]
|
49
|
Abstract
BACKGROUND Phylogenetic networks model reticulate evolutionary histories. The last two decades have seen an increased interest in establishing mathematical results and developing computational methods for inferring and analyzing these networks. A salient concept underlying a great majority of these developments has been the notion that a network displays a set of trees and those trees can be used to infer, analyze, and study the network. RESULTS In this paper, we show that in the presence of coalescence effects, the set of displayed trees is not sufficient to capture the network. We formally define the set of parental trees of a network and make three contributions based on this definition. First, we extend the notion of anomaly zone to phylogenetic networks and report on anomaly results for different networks. Second, we demonstrate how coalescence events could negatively affect the ability to infer a species tree that could be augmented into the correct network. Third, we demonstrate how a phylogenetic network can be viewed as a mixture model that lends itself to a novel inference approach via gene tree clustering. CONCLUSIONS Our results demonstrate the limitations of focusing on the set of trees displayed by a network when analyzing and inferring the network. Our findings can form the basis for achieving higher accuracy when inferring phylogenetic networks and open up new venues for research in this area, including new problem formulations based on the notion of a network's parental trees.
Collapse
Affiliation(s)
- Jiafan Zhu
- Department of Computer Science, Rice University, Houston, 77005 Texas USA
| | - Yun Yu
- Department of Computer Science, Rice University, Houston, 77005 Texas USA
| | - Luay Nakhleh
- Department of Computer Science, Rice University, Houston, 77005 Texas USA
- Department of BioSciences, Rice University, Houston, 77005 Texas USA
| |
Collapse
|
50
|
Malukiewicz J, Hepp CM, Guschanski K, Stone AC. Phylogeny of the jacchus group of Callithrix marmosets based on complete mitochondrial genomes. AMERICAN JOURNAL OF PHYSICAL ANTHROPOLOGY 2016; 162:157-169. [PMID: 27762445 DOI: 10.1002/ajpa.23105] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/26/2016] [Revised: 09/07/2016] [Accepted: 09/13/2016] [Indexed: 01/26/2023]
Abstract
OBJECTIVES Two subgroups make up the marmoset genus Callithrix. The "aurita" group is composed of two species, whereas evolutionary relationships among the four species of the "jacchus" group remain unclear. To uncover these relationships, we first sequenced mitochondrial genomes for C. kuhlii and C. penicillata to complement data available for congeners. We then constructed a phylogenetic tree based on mtDNA heavy chain protein coding genes from several primates to untangle species relationships and estimate divergence times of the jacchus group. MATERIALS AND METHODS MtDNA genomes of C. kuhlii and C. penicillata were Sanger sequenced. These Callithrix mitogenomes were combined with other publically available primate mtDNA genomes. Phylogenies were produced using maximum likelihood and Bayesian inference. Finally, divergence times within the jacchus group of marmosets were estimated with Bayesian inference. RESULTS In our phylogenetic tree, C. geoffroyi was the sister to all other jacchus group species, followed by C. kuhlii, while C. jacchus and C. penicillata diverged most recently. Bayesian inference showed that C. jacchus and C. penicillata diverged approximately 0.70 MYA and that the jacchus group radiated approximately 1.30 MYA. DISCUSSION Callithrix nuclear and mtDNA phylogenies frequently result in polytomies and paraphyly. Here, we present a well-supported phylogenetic tree based on mitochondrial genome sequences, which facilitates the understanding of the divergence of the jacchus marmosets. Our results demonstrate how mitochondrial genomes can enrich Callithrix phylogenetic studies by alleviating some of the difficulties faced by previous mtDNA studies and allow formulation of hypotheses to test further under larger genomic-scale analyses.
Collapse
Affiliation(s)
- Joanna Malukiewicz
- School of Life Sciences, Arizona State University, Tempe, Arizona, 85287, USA
| | - Crystal M Hepp
- School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, AZ 86011, USA
| | - Katerina Guschanski
- Department of Animal Ecology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| | - Anne C Stone
- School of Human Evolution and Social Change, Arizona State University, Tempe, AZ 85287, USA.,Institute of Human Origins, Arizona State University, Tempe, AZ 85287, USA
| |
Collapse
|