1
|
Hadfield R, Mulford T, Fisher ML, Borgmeier A, Ardon DA, Suchomel AD, Fomekong-Lontchi J, Sutherland L, Huie M, Lupiyaningdyah P, Nichols S, Fei Lin Y, Anantaprayoon N, Leavitt SD. Imperiled wanderlust lichens in steppe habitats of western North America comprise geographically structured mycobiont lineages and a reversal to sexual reproduction within this asexual clade. Mol Phylogenet Evol 2024; 201:108212. [PMID: 39384122 DOI: 10.1016/j.ympev.2024.108212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Revised: 09/20/2024] [Accepted: 10/01/2024] [Indexed: 10/11/2024]
Abstract
The northern North American Cordillera is a globally significant center of endemism. In western North America, imperiled arid steppe habitats support a number of unique species, including several endemic lichens. However, processes driving diversification and endemism in this region remain unclear. In this study, we investigate diversity and phylogeography of the threatened wanderlust lichens (mycobiont = Rhizoplaca species) which occur unattached on calcareous soils in steppe habitats. Wanderlust lichens comprise three species of lichen-forming fungi (LFF) - Rhizoplaca arbuscula, R. haydenii, and R. idahoensis (endangered, IUCN Red List) - which occur in fragmented populations in Idaho and Wyoming, with more limited populations in southern Montana and northern Utah. These lichens reproduce almost exclusively via large, asexual vegetative propagules. Here, our aims were to (i) assess the evolutionary origin of this group and identify phylogeographic structure, (ii) infer ancestral geographic distributions for lineages within this clade, and (iii) use species distribution modeling to better understand the distribution of contemporary populations. Using a genome-skimming approach, we generated a 19.1Mb alignment, spanning ca. half of the complete LFF genome, from specimens collected throughout the entire range of wanderlust lichens. Based on this phylogeny, we investigated phylogeographic patterns using RASP. Finally, we used MaxEnt to estimate species distribution models for R. arbuscula and R. haydenii. We inferred a highly structured topology, with clades corresponding to distinct geographic regions and morphologies collected throughout the group's distribution. We found that R. robusta, a sexually reproducing taxon, is clearly nested within the vagrant Rhizoplaca clade. Phylogeographic analyses suggest that both dispersal and vicariance played significant roles throughout the evolutionary history of the vagrant Rhizoplaca clade, with most of the dispersal events originating from the Salmon Basin in eastern Idaho - the center of diversity for this group. Despite the fact that wanderlust lichens are dispersal limited due to large, unspecialized vegetative propagules, we inferred multiple dispersal events crossing the Continental Divide. Comparing herbarium records with species distribution models suggests that wanderlust lichens don't fully occupy the areas of highest distribution probability. In fact, documented records often occur in areas predicted to be only marginally suitable. These data suggest a potential mismatch between contemporary habitats outside of the center of diversity in eastern Idaho with the most suitable habitat, adding to the vulnerability of this imperiled complex of endemic lichens.
Collapse
Affiliation(s)
- Robert Hadfield
- Department of Biology, Brigham Young University, Provo, UT 84602, USA; Monte L. Bean Museum, Brigham Young University, Provo, UT 84602, USA
| | - Teagan Mulford
- Department of Biology, Brigham Young University, Provo, UT 84602, USA; Monte L. Bean Museum, Brigham Young University, Provo, UT 84602, USA
| | - Makani L Fisher
- Department of Entomology, Purdue University, West Lafayette, IN 47907, USA
| | - Abigail Borgmeier
- Department of Biology, Brigham Young University, Provo, UT 84602, USA
| | - Diego A Ardon
- Department of Biology, Brigham Young University, Provo, UT 84602, USA
| | - Andrew D Suchomel
- Department of Biology, Brigham Young University, Provo, UT 84602, USA
| | - Judicaël Fomekong-Lontchi
- Department of Biology, Brigham Young University, Provo, UT 84602, USA; Monte L. Bean Museum, Brigham Young University, Provo, UT 84602, USA
| | - Laura Sutherland
- Department of Biology, Brigham Young University, Provo, UT 84602, USA; Monte L. Bean Museum, Brigham Young University, Provo, UT 84602, USA
| | - Madison Huie
- Department of Plant and Wildlife Science, Brigham Young University, Provo, UT 84602, USA
| | - Pungki Lupiyaningdyah
- Department of Biology, Brigham Young University, Provo, UT 84602, USA; Monte L. Bean Museum, Brigham Young University, Provo, UT 84602, USA
| | - Sierra Nichols
- Department of Biology, Brigham Young University, Provo, UT 84602, USA
| | - Ying Fei Lin
- Monte L. Bean Museum, Brigham Young University, Provo, UT 84602, USA; Department of Plant and Wildlife Science, Brigham Young University, Provo, UT 84602, USA
| | | | - Steven D Leavitt
- Department of Biology, Brigham Young University, Provo, UT 84602, USA; Monte L. Bean Museum, Brigham Young University, Provo, UT 84602, USA.
| |
Collapse
|
2
|
Augustijnen H, Bätscher L, Cesanek M, Chkhartishvili T, Dincă V, Iankoshvili G, Ogawa K, Vila R, Klopfstein S, de Vos JM, Lucek K. A macroevolutionary role for chromosomal fusion and fission in Erebia butterflies. SCIENCE ADVANCES 2024; 10:eadl0989. [PMID: 38630820 PMCID: PMC11023530 DOI: 10.1126/sciadv.adl0989] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 03/14/2024] [Indexed: 04/19/2024]
Abstract
The impact of large-scale chromosomal rearrangements, such as fusions and fissions, on speciation is a long-standing conundrum. We assessed whether bursts of change in chromosome numbers resulting from chromosomal fusion or fission are related to increased speciation rates in Erebia, one of the most species-rich and karyotypically variable butterfly groups. We established a genome-based phylogeny and used state-dependent birth-death models to infer trajectories of karyotype evolution. We demonstrated that rates of anagenetic chromosomal changes (i.e., along phylogenetic branches) exceed cladogenetic changes (i.e., at speciation events), but, when cladogenetic changes occur, they are mostly associated with chromosomal fissions rather than fusions. We found that the relative importance of fusion and fission differs among Erebia clades of different ages and that especially in younger, more karyotypically diverse clades, speciation is more frequently associated with cladogenetic chromosomal changes. Overall, our results imply that chromosomal fusions and fissions have contrasting macroevolutionary roles and that large-scale chromosomal rearrangements are associated with bursts of species diversification.
Collapse
Affiliation(s)
- Hannah Augustijnen
- Department of Environmental Science, University of Basel, 4056 Basel, Switzerland
| | - Livio Bätscher
- Department of Environmental Science, University of Basel, 4056 Basel, Switzerland
| | - Martin Cesanek
- Slovak Entomological Society, Slovak Academy of Sciences, Bratislava 1, Slovakia
| | | | - Vlad Dincă
- Ecology and Genetics Research Unit, University of Oulu, 90570 Oulu, Finland
| | | | - Kota Ogawa
- Faculty of Social and Cultural Studies, Kyushu University, Fukuoka 819-0395, Japan
- Insect Sciences and Creative Entomology Center, Kyushu University, Fukuoka 819-0395, Japan
| | - Roger Vila
- Institut de Biologia Evolutiva (CSIC-Univ. Pompeu Fabra), 08003 Barcelona, Spain
| | - Seraina Klopfstein
- Institute of Ecology and Evolution, University of Bern, 3012 Bern, Switzerland
- Life Sciences, Natural History Museum Basel, 4051 Basel, Switzerland
| | - Jurriaan M. de Vos
- Department of Environmental Science, University of Basel, 4056 Basel, Switzerland
| | - Kay Lucek
- Department of Environmental Science, University of Basel, 4056 Basel, Switzerland
- Institute of Biology, University of Neuchâtel, 2000 Neuchâtel, Switzerland
| |
Collapse
|
3
|
Parey E, Berthelot C, Roest Crollius H, Guiguen Y. Solving an enigma in the tree of life, at the origins of teleost fishes. C R Biol 2024; 347:1-8. [PMID: 38441104 DOI: 10.5802/crbiol.150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 02/05/2024] [Accepted: 02/09/2024] [Indexed: 03/07/2024]
Abstract
Tracing the phylogenetic relationships between species is one of the fundamental objectives of evolutionary biology. Since Charles Darwin's seminal work in the 19th century, considerable progress has been made towards establishing a tree of life that summarises the evolutionary history of species. Nevertheless, substantial uncertainties still remain. Specifically, the relationships at the origins of teleost fishes have been the subject of extensive debate over the last 50 years. This question has major implications for various research fields: there are almost 30,000 species in the teleost group, which includes invaluable model organisms for biomedical, evolutionary and ecological studies. Here, we present the work in which we solved this enigma. We demonstrated that eels are more closely related to bony-tongued fishes than to the rest of teleost fishes. We achieved this by taking advantage of new genomic data and leveraging innovative phylogenetic markers. Notably, in addition to traditional molecular phylogeny methods based on the evolution of gene sequences, we also considered the evolution of gene order along the DNA molecule. We discuss the challenges and opportunities that these new markers represent for the field of molecular phylogeny, and in particular the possibilities they offer for re-examining other controversial branches in the tree of life.
Collapse
|
4
|
Jiang Z, Zang W, Ericson PGP, Song G, Wu S, Feng S, Drovetski SV, Liu G, Zhang D, Saitoh T, Alström P, Edwards SV, Lei F, Qu Y. Gene flow and an anomaly zone complicate phylogenomic inference in a rapidly radiated avian family (Prunellidae). BMC Biol 2024; 22:49. [PMID: 38413944 PMCID: PMC10900574 DOI: 10.1186/s12915-024-01848-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2023] [Accepted: 02/15/2024] [Indexed: 02/29/2024] Open
Abstract
BACKGROUND Resolving the phylogeny of rapidly radiating lineages presents a challenge when building the Tree of Life. An Old World avian family Prunellidae (Accentors) comprises twelve species that rapidly diversified at the Pliocene-Pleistocene boundary. RESULTS Here we investigate the phylogenetic relationships of all species of Prunellidae using a chromosome-level de novo assembly of Prunella strophiata and 36 high-coverage resequenced genomes. We use homologous alignments of thousands of exonic and intronic loci to build the coalescent and concatenated phylogenies and recover four different species trees. Topology tests show a large degree of gene tree-species tree discordance but only 40-54% of intronic gene trees and 36-75% of exonic genic trees can be explained by incomplete lineage sorting and gene tree estimation errors. Estimated branch lengths for three successive internal branches in the inferred species trees suggest the existence of an empirical anomaly zone. The most common topology recovered for species in this anomaly zone was not similar to any coalescent or concatenated inference phylogenies, suggesting presence of anomalous gene trees. However, this interpretation is complicated by the presence of gene flow because extensive introgression was detected among these species. When exploring tree topology distributions, introgression, and regional variation in recombination rate, we find that many autosomal regions contain signatures of introgression and thus may mislead phylogenetic inference. Conversely, the phylogenetic signal is concentrated to regions with low-recombination rate, such as the Z chromosome, which are also more resistant to interspecific introgression. CONCLUSIONS Collectively, our results suggest that phylogenomic inference should consider the underlying genomic architecture to maximize the consistency of phylogenomic signal.
Collapse
Affiliation(s)
- Zhiyong Jiang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Wenqing Zang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Per G P Ericson
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, PO Box 50007, Stockholm, SE-104 05, Sweden
| | - Gang Song
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Shaoyuan Wu
- Jiangsu International Joint Center of Genomics, Jiangsu Key Laboratory of Phylogenomics & Comparative Genomics, School of Life Sciences, Jiangsu Normal University, Xuzhou, 221116, Jiangsu, China
| | - Shaohong Feng
- Center for Evolutionary & Organismal Biology, Zhejiang University School of Medicine, Hangzhou, 310058, China
- Liangzhu Laboratory, Zhejiang University, 1369 West Wenyi Road, Hangzhou, 311121, China
- Innovation Center of Yangtze River Delta, Zhejiang University, Jiashan, 314102, China
| | - Sergei V Drovetski
- National Museum of Natural History, Smithsonian Institution, Washington, DC, 20004, USA
- Present address: U.S. Geological Survey, Eastern Ecological Science Center at Patuxent Research Refuge, Laurel, MD, 20708, USA
| | - Gang Liu
- Chinese Academy of Forestry, Institute of Ecological Conservation and Restoration, Beijing, 100091, China
| | - Dezhi Zhang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Takema Saitoh
- Yamashina Institute for Ornithology, Abiko, Chiba, Japan
| | - Per Alström
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- Animal Ecology, Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Norbyvägen 18 D, 752 36, Uppsala, Sweden
| | - Scott V Edwards
- Museum of Comparative Zoology and Department of Organismic & Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA, 02138, USA
| | - Fumin Lei
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Yanhua Qu
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China.
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, PO Box 50007, Stockholm, SE-104 05, Sweden.
| |
Collapse
|
5
|
Rodríguez-Machado S, Elías DJ, McMahan CD, Gruszkiewicz-Tolli A, Piller KR, Chakrabarty P. Disentangling historical relationships within Poeciliidae (Teleostei: Cyprinodontiformes) using ultraconserved elements. Mol Phylogenet Evol 2024; 190:107965. [PMID: 37977500 DOI: 10.1016/j.ympev.2023.107965] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Revised: 10/18/2023] [Accepted: 11/12/2023] [Indexed: 11/19/2023]
Abstract
Poeciliids (Cyprinodontiformes: Poeciliidae), commonly known as livebearers, are popular fishes in the aquarium trade (e.g., guppies, mollies, swordtails) that are widely distributed in the Americas, with 274 valid species in 27 genera. This group has undergone various taxonomic changes recently, spurred by investigations using traditional genetic markers. Here we used over 1,000 ultraconserved loci to infer the relationships within Poeciliidae in the first attempt at understanding their diversification based on genome-scale data. We explore gene tree discordance and investigate potential incongruence between concatenation and coalescent inference methods. Our aim is to examine the influence of incomplete lineage sorting and reticulate evolution on the poeciliids' evolutionary history and how these factors contribute to the observed gene tree discordace. Our concatenated and coalescent phylogenomic inferences recovered four major clades within Poeciliidae. Most supra-generic level relationships we inferred were congruent with previous molecular studies, but we found some disagreements; the Middle American taxa Phallichthys and Poecilia (Mollienesia) were recovered as non-monophyletic, and unlike other recent molecular studies, we recovered Brachyrhaphis as monophyletic. Our study is the first to provide signatures of reticulate evolution in Poeciliidae at the family level; however, continued finer-scale investigations are needed to understand the complex evolutionary history of the family along with a much-needed taxonomic re-evaluation.
Collapse
Affiliation(s)
- Sheila Rodríguez-Machado
- Museum of Natural Science, Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, United States.
| | - Diego J Elías
- Museum of Natural Science, Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, United States; Field Museum of Natural History, Chicago, IL 60605, United States
| | - Caleb D McMahan
- Field Museum of Natural History, Chicago, IL 60605, United States
| | - Anna Gruszkiewicz-Tolli
- Department of Biological Sciences, Southeastern Louisiana University, Hammond, LA 70402, United States
| | - Kyle R Piller
- Department of Biological Sciences, Southeastern Louisiana University, Hammond, LA 70402, United States
| | - Prosanta Chakrabarty
- Museum of Natural Science, Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, United States
| |
Collapse
|
6
|
Lu B. Evolutionary Insights into the Relationship of Frogs, Salamanders, and Caecilians and Their Adaptive Traits, with an Emphasis on Salamander Regeneration and Longevity. Animals (Basel) 2023; 13:3449. [PMID: 38003067 PMCID: PMC10668855 DOI: 10.3390/ani13223449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Revised: 11/01/2023] [Accepted: 11/06/2023] [Indexed: 11/26/2023] Open
Abstract
The extant amphibians have developed uncanny abilities to adapt to their environment. I compared the genes of amphibians to those of other vertebrates to investigate the genetic changes underlying their unique traits, especially salamanders' regeneration and longevity. Using the well-supported Batrachia tree, I found that salamander genomes have undergone accelerated adaptive evolution, especially for development-related genes. The group-based comparison showed that several genes are under positive selection, rapid evolution, and unexpected parallel evolution with traits shared by distantly related species, such as the tail-regenerative lizard and the longer-lived naked mole rat. The genes, such as EEF1E1, PAFAH1B1, and OGFR, may be involved in salamander regeneration, as they are involved in the apoptotic process, blastema formation, and cell proliferation, respectively. The genes PCNA and SIRT1 may be involved in extending lifespan, as they are involved in DNA repair and histone modification, respectively. Some genes, such as PCNA and OGFR, have dual roles in regeneration and aging, which suggests that these two processes are interconnected. My experiment validated the time course differential expression pattern of SERPINI1 and OGFR, two genes that have evolved in parallel in salamanders and lizards during the regeneration process of salamander limbs. In addition, I found several candidate genes responsible for frogs' frequent vocalization and caecilians' degenerative vision. This study provides much-needed insights into the processes of regeneration and aging, and the discovery of the critical genes paves the way for further functional analysis, which could open up new avenues for exploiting the genetic potential of humans and improving human well-being.
Collapse
Affiliation(s)
- Bin Lu
- Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610041, China
| |
Collapse
|
7
|
Simmons MP, Goloboff PA, Stöver BC, Springer MS, Gatesy J. Quantification of congruence among gene trees with polytomies using overall success of resolution for phylogenomic coalescent analyses. Cladistics 2023; 39:418-436. [PMID: 37096985 DOI: 10.1111/cla.12540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Revised: 02/22/2023] [Accepted: 03/24/2023] [Indexed: 04/26/2023] Open
Abstract
Gene-tree-inference error can cause species-tree-inference artefacts in summary phylogenomic coalescent analyses. Here we integrate two ways of accommodating these inference errors: collapsing arbitrarily or dubiously resolved gene-tree branches, and subsampling gene trees based on their pairwise congruence. We tested the effect of collapsing gene-tree branches with 0% approximate-likelihood-ratio-test (SH-like aLRT) support in likelihood analyses and strict consensus trees for parsimony, and then subsampled those partially resolved trees based on congruence measures that do not penalize polytomies. For this purpose we developed a new TNT script for congruence sorting (congsort), and used it to calculate topological incongruence for eight phylogenomic datasets using three distance measures: standard Robinson-Foulds (RF) distances; overall success of resolution (OSR), which is based on counting both matching and contradicting clades; and RF contradictions, which only counts contradictory clades. As expected, we found that gene-tree incongruence was often concentrated in clades that are arbitrarily or dubiously resolved and that there was greater congruence between the partially collapsed gene trees and the coalescent and concatenation topologies inferred from those genes. Coalescent branch lengths typically increased as the most incongruent gene trees were excluded, although branch supports typically did not. We investigated two successful and complementary approaches to prioritizing genes for investigation of alignment or homology errors. Coalescent-tree clades that contradicted concatenation-tree clades were generally less robust to gene-tree subsampling than congruent clades. Our preferred approach to collapsing likelihood gene-tree clades (0% SH-like aLRT support) and subsampling those trees (OSR) generally outperformed competing approaches for a large fungal dataset with respect to branch lengths, support and congruence. We recommend widespread application of this approach (and strict consensus trees for parsimony-based analyses) for improving quantification of gene-tree congruence/conflict, estimating coalescent branch lengths, testing robustness of coalescent analyses to gene-tree-estimation error, and improving topological robustness of summary coalescent analyses. This approach is quick and easy to implement, even for huge datasets.
Collapse
Affiliation(s)
- Mark P Simmons
- Department of Biology, Colorado State University, Fort Collins, CO, 80523, USA
| | - Pablo A Goloboff
- CONICET, INSUE, Fundación Miguel Lillo, Miguel Lillo 251, 4000, S.M. de Tucumán, Argentina
| | - Ben C Stöver
- Institute for Evolution and Biodiversity, WMU Münster, 48149, Münster, Germany
| | - Mark S Springer
- Department of Evolution, Ecology, and Organismal Biology, University of California, Riverside, CA, 92521, USA
| | - John Gatesy
- Division of Vertebrate Zoology, American Museum of Natural History, New York, NY, 10024, USA
| |
Collapse
|
8
|
Vecchi M, Tsvetkova A, Stec D, Ferrari C, Calhim S, Tumanov D. Expanding Acutuncus: Phylogenetics and morphological analyses reveal a considerably wider distribution for this tardigrade genus. Mol Phylogenet Evol 2023; 180:107707. [PMID: 36681365 DOI: 10.1016/j.ympev.2023.107707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Revised: 01/11/2023] [Accepted: 01/12/2023] [Indexed: 01/20/2023]
Abstract
The tardigrade genus Acutuncus has been long thought to be an Antarctic endemism, well adapted to this harsh environment. The Antarctic endemicity of Acutuncus was recently dispelled with the description of Acutuncus mariae Zawierucha, 2020 found in the Svalbard archipelago. The integrated analyses on two newly found Acutuncus populations from UK and Italy, and a population of Acutuncus antarcticus found close to its type locality allowed us to expand the climatic and geographic range of the genus Acutuncus. These findings also allowed us to re-evaluate the morphological diagnoses of Acutuncus and accommodate it in the newly proposed monotypic family Acutuncidae fam. nov. Two new Acutuncus species morpho-groups are instituted based on eggs morphology: one (Acutuncus antarcticus morphogroup) including the Antarctic Acutuncus taxa characterized by eggs with long pillars within the chorion and eggs laid freely to the environment, the other (Acutuncus mariae morphogroup) including the European species, characterized by eggs with short pillars within the chorion and eggs laid in the exuvium. Finally, we describe two new Acutuncus species from Europe: Acutuncus mecnuffisp. nov. and Acutuncus giovanniniaesp. nov.
Collapse
Affiliation(s)
- Matteo Vecchi
- Department of Biological and Environmental Science, University of Jyvaskyla, PO Box 35, FI-40014 Jyvaskyla, Finland.
| | - Alexandra Tsvetkova
- Department of Invertebrate Zoology, Faculty of Biology, Saint Petersburg State University, 199034, Universitetskaya nab. 7/9, Saint Petersburg, Russia
| | - Daniel Stec
- Institute of Systematics and Evolution of Animals, Polish Academy of Sciences, Sławkowska 17, 31-016 Kraków, Poland
| | - Claudio Ferrari
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parco Area delle Scienze 33/A, 43124 Parma, Italy
| | - Sara Calhim
- Department of Biological and Environmental Science, University of Jyvaskyla, PO Box 35, FI-40014 Jyvaskyla, Finland
| | - Denis Tumanov
- Department of Invertebrate Zoology, Faculty of Biology, Saint Petersburg State University, 199034, Universitetskaya nab. 7/9, Saint Petersburg, Russia; Zoological Institute of the Russian Academy of Sciences, 199034, Universitetskaja nab. 1, Saint Petersburg, Russia.
| |
Collapse
|
9
|
Dornburg A, Mallik R, Wang Z, Bernal MA, Thompson B, Bruford EA, Nebert DW, Vasiliou V, Yohe LR, Yoder JA, Townsend JP. Placing human gene families into their evolutionary context. Hum Genomics 2022; 16:56. [PMID: 36369063 PMCID: PMC9652883 DOI: 10.1186/s40246-022-00429-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Accepted: 10/12/2022] [Indexed: 11/13/2022] Open
Abstract
Following the draft sequence of the first human genome over 20 years ago, we have achieved unprecedented insights into the rules governing its evolution, often with direct translational relevance to specific diseases. However, staggering sequence complexity has also challenged the development of a more comprehensive understanding of human genome biology. In this context, interspecific genomic studies between humans and other animals have played a critical role in our efforts to decode human gene families. In this review, we focus on how the rapid surge of genome sequencing of both model and non-model organisms now provides a broader comparative framework poised to empower novel discoveries. We begin with a general overview of how comparative approaches are essential for understanding gene family evolution in the human genome, followed by a discussion of analyses of gene expression. We show how homology can provide insights into the genes and gene families associated with immune response, cancer biology, vision, chemosensation, and metabolism, by revealing similarity in processes among distant species. We then explain methodological tools that provide critical advances and show the limitations of common approaches. We conclude with a discussion of how these investigations position us to gain fundamental insights into the evolution of gene families among living organisms in general. We hope that our review catalyzes additional excitement and research on the emerging field of comparative genomics, while aiding the placement of the human genome into its existentially evolutionary context.
Collapse
Affiliation(s)
- Alex Dornburg
- Department of Bioinformatics and Genomics, UNC-Charlotte, Charlotte, NC, USA.
| | - Rittika Mallik
- Department of Bioinformatics and Genomics, UNC-Charlotte, Charlotte, NC, USA
| | - Zheng Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| | - Moisés A Bernal
- Department of Biological Sciences, College of Science and Mathematics, Auburn University, Auburn, AL, USA
| | - Brian Thompson
- Department of Environmental Health Sciences, Yale School of Public Health, New Haven, CT, USA
| | - Elspeth A Bruford
- Department of Haematology, University of Cambridge School of Clinical Medicine, Cambridge, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Daniel W Nebert
- Department of Environmental Health, Center for Environmental Genetics, University of Cincinnati Medical Center, P.O. Box 670056, Cincinnati, OH, 45267, USA
- Department of Pediatrics and Molecular Developmental Biology, Division of Human Genetics, Cincinnati Children's Hospital, Cincinnati, OH, 45229, USA
| | - Vasilis Vasiliou
- Department of Environmental Health Sciences, Yale School of Public Health, New Haven, CT, USA
| | - Laurel R Yohe
- Department of Bioinformatics and Genomics, UNC-Charlotte, Charlotte, NC, USA
| | - Jeffrey A Yoder
- Department of Molecular Biomedical Sciences, College of Veterinary Medicine, North Carolina State University, Raleigh, NC, USA
| | - Jeffrey P Townsend
- Department of Bioinformatics and Genomics, UNC-Charlotte, Charlotte, NC, USA
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT, USA
| |
Collapse
|
10
|
Lozano-Fernandez J. A Practical Guide to Design and Assess a Phylogenomic Study. Genome Biol Evol 2022; 14:evac129. [PMID: 35946263 PMCID: PMC9452790 DOI: 10.1093/gbe/evac129] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/03/2022] [Indexed: 11/13/2022] Open
Abstract
Over the last decade, molecular systematics has undergone a change of paradigm as high-throughput sequencing now makes it possible to reconstruct evolutionary relationships using genome-scale datasets. The advent of "big data" molecular phylogenetics provided a battery of new tools for biologists but simultaneously brought new methodological challenges. The increase in analytical complexity comes at the price of highly specific training in computational biology and molecular phylogenetics, resulting very often in a polarized accumulation of knowledge (technical on one side and biological on the other). Interpreting the robustness of genome-scale phylogenetic studies is not straightforward, particularly as new methodological developments have consistently shown that the general belief of "more genes, more robustness" often does not apply, and because there is a range of systematic errors that plague phylogenomic investigations. This is particularly problematic because phylogenomic studies are highly heterogeneous in their methodology, and best practices are often not clearly defined. The main aim of this article is to present what I consider as the ten most important points to take into consideration when planning a well-thought-out phylogenomic study and while evaluating the quality of published papers. The goal is to provide a practical step-by-step guide that can be easily followed by nonexperts and phylogenomic novices in order to assess the technical robustness of phylogenomic studies or improve the experimental design of a project.
Collapse
Affiliation(s)
- Jesus Lozano-Fernandez
- Department of Genetics, Microbiology and Statistics, Biodiversity Research Institute (IRBio), University of Barcelona, Avd. Diagonal 643, 08028 Barcelona, Spain
- Institute of Evolutionary Biology (CSIC – Universitat Pompeu Fabra), Passeig marítim de la Barcelona 37-49, 08003 Barcelona, Spain
| |
Collapse
|
11
|
Interpreting phylogenetic conflict: Hybridization in the most speciose genus of lichen-forming fungi. Mol Phylogenet Evol 2022; 174:107543. [PMID: 35690378 DOI: 10.1016/j.ympev.2022.107543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 02/06/2022] [Accepted: 05/13/2022] [Indexed: 11/24/2022]
Abstract
While advances in sequencing technologies have been invaluable for understanding evolutionary relationships, increasingly large genomic data sets may result in conflicting evolutionary signals that are often caused by biological processes, including hybridization. Hybridization has been detected in a variety of organisms, influencing evolutionary processes such as generating reproductive barriers and mixing standing genetic variation. Here, we investigate the potential role of hybridization in the diversification of the most speciose genus of lichen-forming fungi, Xanthoparmelia. As Xanthoparmelia is projected to have gone through recent, rapid diversification, this genus is particularly suitable for investigating and interpreting the origins of phylogenomic conflict. Focusing on a clade of Xanthoparmelia largely restricted to the Holarctic region, we used a genome skimming approach to generate 962 single-copy gene regions representing over 2 Mbp of the mycobiont genome. From this genome-scale dataset, we inferred evolutionary relationships using both concatenation and coalescent-based species tree approaches. We also used three independent tests for hybridization. Although different species tree reconstruction methods recovered largely consistent and well-supported trees, there was widespread incongruence among individual gene trees. Despite challenges in differentiating hybridization from ILS in situations of recent rapid radiations, our genome-wide analyses detected multiple potential hybridization events in the Holarctic clade, suggesting one possible source of trait variability in this hyperdiverse genus. This study highlights the value in using a pluralistic approach for characterizing genome-scale conflict, even in groups with well-resolved phylogenies, while highlighting current challenges in detecting the specific impacts of hybridization.
Collapse
|
12
|
Steenwyk JL, Buida Iii TJ, Gonçalves C, Goltz DC, Morales G, Mead ME, LaBella AL, Chavez CM, Schmitz JE, Hadjifrangiskou M, Li Y, Rokas A. BioKIT: a versatile toolkit for processing and analyzing diverse types of sequence data. Genetics 2022; 221:6583183. [PMID: 35536198 PMCID: PMC9252278 DOI: 10.1093/genetics/iyac079] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Accepted: 05/03/2022] [Indexed: 11/14/2022] Open
Abstract
Bioinformatic analysis-such as genome assembly quality assessment, alignment summary statistics, relative synonymous codon usage, file format conversion, and processing and analysis-is integrated into diverse disciplines in the biological sciences. Several command-line pieces of software have been developed to conduct some of these individual analyses, but unified toolkits that conduct all these analyses are lacking. To address this gap, we introduce BioKIT, a versatile command line toolkit that has, upon publication, 42 functions, several of which were community-sourced, that conduct routine and novel processing and analysis of genome assemblies, multiple sequence alignments, coding sequences, sequencing data, and more. To demonstrate the utility of BioKIT, we conducted a comprehensive examination of relative synonymous codon usage across 171 fungal genomes that use alternative genetic codes, showed that the novel metric of gene-wise relative synonymous codon usage can accurately estimate gene-wise codon optimization, evaluated the quality and characteristics of 901 eukaryotic genome assemblies, and calculated alignment summary statistics for 10 phylogenomic data matrices. BioKIT will be helpful in facilitating and streamlining sequence analysis workflows. BioKIT is freely available under the MIT license from GitHub (https://github.com/JLSteenwyk/BioKIT), PyPi (https://pypi.org/project/jlsteenwyk-biokit/), and the Anaconda Cloud (https://anaconda.org/jlsteenwyk/jlsteenwyk-biokit). Documentation, user tutorials, and instructions for requesting new features are available online (https://jlsteenwyk.com/BioKIT).
Collapse
Affiliation(s)
- Jacob L Steenwyk
- Department of Biological Sciences, Vanderbilt University, VU Station B #35-1634, Nashville, TN 37235, USA.,Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN 37235, USA
| | | | - Carla Gonçalves
- Department of Biological Sciences, Vanderbilt University, VU Station B #35-1634, Nashville, TN 37235, USA.,Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN 37235, USA.,Associate Laboratory i4HB-Institute for Health and Bioeconomy, NOVA School of Science and Technology, NOVA University Lisbon, 2819-516 Caparica, Portugal.,UCIBIO-Applied Molecular Biosciences Unit, Department of Life Sciences, NOVA School of Science and Technology, NOVA University Lisbon, 2819-516 Caparica, Portugal
| | | | - Grace Morales
- Department of Pathology, Microbiology & Immunology, Center for Personalized Microbiology, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Matthew E Mead
- Department of Biological Sciences, Vanderbilt University, VU Station B #35-1634, Nashville, TN 37235, USA.,Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN 37235, USA
| | - Abigail L LaBella
- Department of Biological Sciences, Vanderbilt University, VU Station B #35-1634, Nashville, TN 37235, USA.,Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN 37235, USA
| | - Christina M Chavez
- Department of Biological Sciences, Vanderbilt University, VU Station B #35-1634, Nashville, TN 37235, USA.,Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN 37235, USA
| | - Jonathan E Schmitz
- Department of Pathology, Microbiology & Immunology, Center for Personalized Microbiology, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Maria Hadjifrangiskou
- Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN 37235, USA.,Department of Pathology, Microbiology & Immunology, Center for Personalized Microbiology, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Yuanning Li
- Department of Biological Sciences, Vanderbilt University, VU Station B #35-1634, Nashville, TN 37235, USA
| | - Antonis Rokas
- Department of Biological Sciences, Vanderbilt University, VU Station B #35-1634, Nashville, TN 37235, USA.,Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN 37235, USA
| |
Collapse
|
13
|
Tumanov DV. End of a mystery: Integrative approach reveals the phylogenetic position of an enigmatic Antarctic tardigrade genus
Ramajendas
(Tardigrada, Eutardigrada). ZOOL SCR 2021. [DOI: 10.1111/zsc.12521] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Denis V. Tumanov
- Department of Invertebrate Zoology Faculty of Biology Saint Petersburg State University Saint Petersburg Russia
- Marine Research Laboratory Zoological Institute of the Russian Academy of Sciences Saint Petersburg Russia
| |
Collapse
|
14
|
How challenging RADseq data turned out to favor coalescent-based species tree inference. A case study in Aichryson (Crassulaceae). Mol Phylogenet Evol 2021; 167:107342. [PMID: 34785384 DOI: 10.1016/j.ympev.2021.107342] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2020] [Revised: 07/05/2021] [Accepted: 10/29/2021] [Indexed: 12/24/2022]
Abstract
Analysing multiple genomic regions while incorporating detection and qualification of discordance among regions has become standard for understanding phylogenetic relationships. In plants, which usually have comparatively large genomes, this is feasible by the combination of reduced-representation library (RRL) methods and high-throughput sequencing enabling the cost effective acquisition of genomic data for thousands of loci from hundreds of samples. One popular RRL method is RADseq. A major disadvantage of established RADseq approaches is the rather short fragment and sequencing range, leading to loci of little individual phylogenetic information. This issue hampers the application of coalescent-based species tree inference. The modified RADseq protocol presented here targets ca. 5,000 loci of 300-600nt length, sequenced with the latest short-read-sequencing (SRS) technology, has the potential to overcome this drawback. To illustrate the advantages of this approach we use the study group Aichryson Webb & Berthelott (Crassulaceae), a plant genus that diversified on the Canary Islands. The data analysis approach used here aims at a careful quality control of the long loci dataset. It involves an informed selection of thresholds for accurate clustering, a thorough exploration of locus properties, such as locus length, coverage and variability, to identify potential biased data and a comparative phylogenetic inference of filtered datasets, accompanied by an evaluation of resulting BS support, gene and site concordance factor values, to improve overall resolution of the resulting phylogenetic trees. The final dataset contains variable loci with an average length of 373nt and facilitates species tree estimation using a coalescent-based summary approach. Additional improvements brought by the approach are critically discussed.
Collapse
|
15
|
Mauricio B, Mailho-Fontana PL, Sato LA, Barbosa FF, Astray RM, Kupfer A, Brodie ED, Jared C, Antoniazzi MM. Morphology of the Cutaneous Poison and Mucous Glands in Amphibians with Particular Emphasis on Caecilians ( Siphonops annulatus). Toxins (Basel) 2021; 13:toxins13110779. [PMID: 34822563 PMCID: PMC8617868 DOI: 10.3390/toxins13110779] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2021] [Revised: 10/30/2021] [Accepted: 11/02/2021] [Indexed: 01/18/2023] Open
Abstract
Caecilians (order Gymnophiona) are apodan, snake-like amphibians, usually with fossorial habits, constituting one of the most unknown groups of terrestrial vertebrates. As in orders Anura (frogs, tree frogs and toads) and Caudata (salamanders and newts), the caecilian skin is rich in mucous glands, responsible for body lubrication, and poison glands, producing varied toxins used in defence against predators and microorganisms. Whereas in anurans and caudatans skin gland morphology has been well studied, caecilian poison glands remain poorly elucidated. Here we characterised the skin gland morphology of the caecilian Siphonops annulatus, emphasising the poison glands in comparison to those of anurans and salamanders. We showed that S. annulatus glands are similar to those of salamanders, consisting of several syncytial compartments full of granules composed of protein material but showing some differentiated apical compartments containing mucus. An unusual structure resembling a mucous gland is frequently observed in lateral/apical position, apparently connected to the main duct. We conclude that the morphology of skin poison glands in caecilians is more similar to salamander glands when compared to anuran glands that show a much-simplified structure.
Collapse
Affiliation(s)
- Beatriz Mauricio
- Laboratory of Structural Biology, Instituto Butantan, São Paulo 05509-000, Brazil; (B.M.); (P.L.M.-F.); (L.A.S.); (M.M.A.)
| | - Pedro Luiz Mailho-Fontana
- Laboratory of Structural Biology, Instituto Butantan, São Paulo 05509-000, Brazil; (B.M.); (P.L.M.-F.); (L.A.S.); (M.M.A.)
| | - Luciana Almeida Sato
- Laboratory of Structural Biology, Instituto Butantan, São Paulo 05509-000, Brazil; (B.M.); (P.L.M.-F.); (L.A.S.); (M.M.A.)
| | - Flavia Ferreira Barbosa
- Multipurpose Laboratory, Instituto Butantan, São Paulo 05503-000, Brazil; (F.F.B.); (R.M.A.)
| | - Renato Mancini Astray
- Multipurpose Laboratory, Instituto Butantan, São Paulo 05503-000, Brazil; (F.F.B.); (R.M.A.)
| | - Alexander Kupfer
- Department of Zoology, State Museum of Natural History, 70191 Stuttgart, Germany;
| | - Edmund D. Brodie
- Department of Biology, Utah State University, Logan, UT 84322, USA;
| | - Carlos Jared
- Laboratory of Structural Biology, Instituto Butantan, São Paulo 05509-000, Brazil; (B.M.); (P.L.M.-F.); (L.A.S.); (M.M.A.)
- Correspondence:
| | - Marta Maria Antoniazzi
- Laboratory of Structural Biology, Instituto Butantan, São Paulo 05509-000, Brazil; (B.M.); (P.L.M.-F.); (L.A.S.); (M.M.A.)
| |
Collapse
|
16
|
Dornburg A, Near TJ. The Emerging Phylogenetic Perspective on the Evolution of Actinopterygian Fishes. ANNUAL REVIEW OF ECOLOGY, EVOLUTION, AND SYSTEMATICS 2021. [DOI: 10.1146/annurev-ecolsys-122120-122554] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
The emergence of a new phylogeny of ray-finned fishes at the turn of the twenty-first century marked a paradigm shift in understanding the evolutionary history of half of living vertebrates. We review how the new ray-finned fish phylogeny radically departs from classical expectations based on morphology. We focus on evolutionary relationships that span the backbone of ray-finned fish phylogeny, from the earliest divergences among teleosts and nonteleosts to the resolution of major lineages of Percomorpha. Throughout, we feature advances gained by the new phylogeny toward a broader understanding of ray-finned fish evolutionary history and the implications for topics that span from the genetics of human health to reconsidering the concept of living fossils. Additionally, we discuss conceptual challenges that involve reconciling taxonomic classification with phylogenetic relationships and propose an alternate higher-level classification for Percomorpha. Our review highlights remaining areas of phylogenetic uncertainty and opportunities for comparative investigations empowered by this new phylogenetic perspective on ray-finned fishes.
Collapse
Affiliation(s)
- Alex Dornburg
- Department of Bioinformatics and Genomics, University of North Carolina, Charlotte, North Carolina 28223, USA
| | - Thomas J. Near
- Department of Ecology and Evolutionary Biology and Peabody Museum of Natural History, Yale University, New Haven, Connecticut 06511, USA
| |
Collapse
|
17
|
Forthman M, Braun EL, Kimball RT. Gene tree quality affects empirical coalescent branch length estimation. ZOOL SCR 2021. [DOI: 10.1111/zsc.12512] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Affiliation(s)
- Michael Forthman
- Department of Entomology & Nematology University of Florida Gainesville FL USA
- California State Collection of Arthropods Plant Pest Diagnostics Branch California Department of Food & Agriculture Sacramento CA USA
| | - Edward L. Braun
- Department of Biology University of Florida Gainesville FL USA
| | | |
Collapse
|
18
|
Mongiardino Koch N. Phylogenomic Subsampling and the Search for Phylogenetically Reliable Loci. Mol Biol Evol 2021; 38:4025-4038. [PMID: 33983409 DOI: 10.1101/2021.02.13.431075] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/21/2023] Open
Abstract
Phylogenomic subsampling is a procedure by which small sets of loci are selected from large genome-scale data sets and used for phylogenetic inference. This step is often motivated by either computational limitations associated with the use of complex inference methods or as a means of testing the robustness of phylogenetic results by discarding loci that are deemed potentially misleading. Although many alternative methods of phylogenomic subsampling have been proposed, little effort has gone into comparing their behavior across different data sets. Here, I calculate multiple gene properties for a range of phylogenomic data sets spanning animal, fungal, and plant clades, uncovering a remarkable predictability in their patterns of covariance. I also show how these patterns provide a means for ordering loci by both their rate of evolution and their relative phylogenetic usefulness. This method of retrieving phylogenetically useful loci is found to be among the top performing when compared with alternative subsampling protocols. Relatively common approaches such as minimizing potential sources of systematic bias or increasing the clock-likeness of the data are found to fare worse than selecting loci at random. Likewise, the general utility of rate-based subsampling is found to be limited: loci evolving at both low and high rates are among the least effective, and even those evolving at optimal rates can still widely differ in usefulness. This study shows that many common subsampling approaches introduce unintended effects in off-target gene properties and proposes an alternative multivariate method that simultaneously optimizes phylogenetic signal while controlling for known sources of bias.
Collapse
|
19
|
Abstract
Phylogenomic subsampling is a procedure by which small sets of loci are selected from large genome-scale data sets and used for phylogenetic inference. This step is often motivated by either computational limitations associated with the use of complex inference methods or as a means of testing the robustness of phylogenetic results by discarding loci that are deemed potentially misleading. Although many alternative methods of phylogenomic subsampling have been proposed, little effort has gone into comparing their behavior across different data sets. Here, I calculate multiple gene properties for a range of phylogenomic data sets spanning animal, fungal, and plant clades, uncovering a remarkable predictability in their patterns of covariance. I also show how these patterns provide a means for ordering loci by both their rate of evolution and their relative phylogenetic usefulness. This method of retrieving phylogenetically useful loci is found to be among the top performing when compared with alternative subsampling protocols. Relatively common approaches such as minimizing potential sources of systematic bias or increasing the clock-likeness of the data are found to fare worse than selecting loci at random. Likewise, the general utility of rate-based subsampling is found to be limited: loci evolving at both low and high rates are among the least effective, and even those evolving at optimal rates can still widely differ in usefulness. This study shows that many common subsampling approaches introduce unintended effects in off-target gene properties and proposes an alternative multivariate method that simultaneously optimizes phylogenetic signal while controlling for known sources of bias.
Collapse
|
20
|
Alda F, Ludt WB, Elías DJ, McMahan CD, Chakrabarty P. Comparing Ultraconserved Elements and Exons for Phylogenomic Analyses of Middle American Cichlids: When Data Agree to Disagree. Genome Biol Evol 2021; 13:evab161. [PMID: 34272856 PMCID: PMC8369075 DOI: 10.1093/gbe/evab161] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/05/2021] [Indexed: 12/20/2022] Open
Abstract
Choosing among types of genomic markers to be used in a phylogenomic study can have a major influence on the cost, design, and results of a study. Yet few attempts have been made to compare categories of next-generation sequence markers limiting our ability to compare the suitability of these different genomic fragment types. Here, we explore properties of different genomic markers to find if they vary in the accuracy of component phylogenetic trees and to clarify the causes of conflict obtained from different data sets or inference methods. As a test case, we explore the causes of discordance between phylogenetic hypotheses obtained using a novel data set of ultraconserved elements (UCEs) and a recently published exon data set of the cichlid tribe Heroini. Resolving relationships among heroine cichlids has historically been difficult, and the processes of colonization and diversification in Middle America and the Greater Antilles are not yet well understood. Despite differences in informativeness and levels of gene tree discordance between UCEs and exons, the resulting phylogenomic hypotheses generally agree on most relationships. The independent data sets disagreed in areas with low phylogenetic signal that were overwhelmed by incomplete lineage sorting and nonphylogenetic signals. For UCEs, high levels of incomplete lineage sorting were found to be the major cause of gene tree discordance, whereas, for exons, nonphylogenetic signal is most likely caused by a reduced number of highly informative loci. This paucity of informative loci in exons might be due to heterogeneous substitution rates that are problematic to model (i.e., computationally restrictive) resulting in systematic errors that UCEs (being less informative individually but more uniform) are less prone to. These results generally demonstrate the robustness of phylogenomic methods to accommodate genomic markers with different biological and phylogenetic properties. However, we identify common and unique pitfalls of different categories of genomic fragments when inferring enigmatic phylogenetic relationships.
Collapse
Affiliation(s)
- Fernando Alda
- Department of Biology, Geology and Environmental Science, University of Tennessee at Chattanooga, Tennessee, USA
| | - William B Ludt
- Department of Ichthyology, Natural History Museum of Los Angeles County, Los Angeles, California, USA
| | - Diego J Elías
- Museum of Natural Science, Department of Biological Sciences, Louisiana State University, Baton Rouge, Louisiana, USA
| | | | - Prosanta Chakrabarty
- Museum of Natural Science, Department of Biological Sciences, Louisiana State University, Baton Rouge, Louisiana, USA
| |
Collapse
|
21
|
Vankan M, Ho SYW, Duchêne DA. Evolutionary Rate Variation Among Lineages in Gene Trees has a Negative Impact on Species-Tree Inference. Syst Biol 2021; 71:490-500. [PMID: 34255084 PMCID: PMC8830059 DOI: 10.1093/sysbio/syab051] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 06/18/2021] [Indexed: 11/12/2022] Open
Abstract
Phylogenetic analyses of genomic data provide a powerful means of reconstructing the evolutionary relationships among organisms, yet such analyses are often hindered by conflicting phylogenetic signals among loci. Identifying the signals that are most influential to species-tree estimation can help to inform the choice of data for phylogenomic analysis. We investigated this in an analysis of 30 phylogenomic data sets. For each data set, we examined the association between several branch-length characteristics of gene trees and the distance between these gene trees and the corresponding species trees. We found that the distance of each gene tree to the species tree inferred from the full data set was positively associated with variation in root-to-tip distances and negatively associated with mean branch support. However, no such associations were found for gene-tree length, a measure of the overall substitution rate at each locus. We further explored the usefulness of the best-performing branch-based characteristics for selecting loci for phylogenomic analyses. We found that loci that yield gene trees with high variation in root-to-tip distances have a disproportionately distant signal of tree topology compared with the complete data sets. These results suggest that rate variation across lineages should be taken into consideration when exploring and even selecting loci for phylogenomic analysis.[Branch support; data filtering; nucleotide substitution model; phylogenomics; substitution rate; summary coalescent methods.]
Collapse
Affiliation(s)
- Mezzalina Vankan
- School of Life and Environmental Sciences, University of Sydney, NSW 2006, Australia.,Research School of Biology, Australian National University, ACT 2601, Australia
| | - Simon Y W Ho
- School of Life and Environmental Sciences, University of Sydney, NSW 2006, Australia
| | - David A Duchêne
- Research School of Biology, Australian National University, ACT 2601, Australia.,Centre for Evolutionary Hologenomics, University of Copenhagen, Copenhagen 1352, Denmark
| |
Collapse
|
22
|
Doyle JJ. Defining coalescent genes: Theory meets practice in organelle phylogenomics. Syst Biol 2021; 71:476-489. [PMID: 34191012 DOI: 10.1093/sysbio/syab053] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 06/24/2021] [Accepted: 06/28/2021] [Indexed: 11/13/2022] Open
Abstract
The species tree paradigm that dominates current molecular systematic practice infers species trees from collections of sequences under assumptions of the multispecies coalescent (MSC), i.e., that there is free recombination between the sequences and no (or very low) recombination within them. These coalescent genes (c-genes) are thus defined in an historical rather than molecular sense, and can in theory be as large as an entire genome or as small as a single nucleotide. A debate about how to define c-genes centers on the contention that nuclear gene sequences used in many coalescent analyses undergo too much recombination, such that their introns comprise multiple c-genes, violating a key assumption of the MSC. Recently a similar argument has been made for the genes of plastid (e.g., chloroplast) and mitochondrial genomes, which for the last 30 or more years have been considered to represent a single c-gene for the purposes of phylogeny reconstruction because they are non-recombining in a historical sense. Consequently, it has been suggested that these genomes should be analyzed using coalescent methods that treat their genes-over 70 protein-coding genes in the case of most plastid genomes (plastomes)-as independent estimates of species phylogeny, in contrast to the usual practice of concatenation, which is appropriate for generating gene trees. However, although recombination certainly occurs in the plastome, as has been recognized since the 1970's, it is unlikely to be phylogenetically relevant. This is because such historically effective recombination can only occur when plastomes with incongruent histories are brought together in the same plastid. However, plastids sort rapidly into different cell lineages and rarely fuse. Thus, because of plastid biology, the plastome is a more canonical c-gene than is the average multi-intron mammalian nuclear gene. The plastome should thus continue to be treated as a single estimate of the underlying species phylogeny, as should the mitochondrial genome. The implications of this long-held insight of molecular systematics for studies in the phylogenomic era are explored.
Collapse
Affiliation(s)
- Jeff J Doyle
- Plant Biology Section, Plant Breeding & Genetics Section, and L. H. Bailey Hortorium, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853 USA
| |
Collapse
|
23
|
Dymek AM, Piprek RP, Boroń A, Kirschbaum F, Pecio A. Ovary structure and oogenesis in internally and externally fertilizing Osteoglossiformes (Teleostei:Osteoglossomorpha). ACTA ZOOL-STOCKHOLM 2021. [DOI: 10.1111/azo.12378] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Anna M. Dymek
- Department of Comparative Anatomy Institute of Zoology and Biomedical Research Faculty of Biology Jagiellonian University Cracow Poland
| | - Rafal P. Piprek
- Department of Comparative Anatomy Institute of Zoology and Biomedical Research Faculty of Biology Jagiellonian University Cracow Poland
| | - Alicja Boroń
- Department of Zoology Faculty of Biology and Biotechnology University of Warmia and Mazury in Olsztyn Olsztyn Poland
| | - Frank Kirschbaum
- Albrecht Daniel Thaer Institute of Agricultural and Horticultural Sciences Faculty of Life Sciences Humboldt University of Berlin Berlin Germany
| | - Anna Pecio
- Department of Comparative Anatomy Institute of Zoology and Biomedical Research Faculty of Biology Jagiellonian University Cracow Poland
| |
Collapse
|
24
|
Takezaki N. Resolving the Early Divergence Pattern of Teleost Fish Using Genome-Scale Data. Genome Biol Evol 2021; 13:6178791. [PMID: 33739405 PMCID: PMC8103497 DOI: 10.1093/gbe/evab052] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/10/2021] [Indexed: 12/13/2022] Open
Abstract
Regarding the phylogenetic relationship of the three primary groups of teleost fishes, Osteoglossomorpha (bonytongues and others), Elopomorpha (eels and relatives), Clupeocephala (the remaining teleost fish), early morphological studies hypothesized the first divergence of Osteoglossomorpha, whereas the recent prevailing view is the first divergence of Elopomorpha. Molecular studies supported all the possible relationships of the three primary groups. This study analyzed genome-scale data from four previous studies: 1) 412 genes from 12 species, 2) 772 genes from 15 species, 3) 1,062 genes from 30 species, and 4) 491 UCE loci from 27 species. The effects of the species, loci, and models used on the constructed tree topologies were investigated. In the analyses of the data sets (1)–(3), although the first divergence of Clupeocephala that left the other two groups in a sister relationship was supported by concatenated sequences and gene trees of all the species and genes, the first divergence of Elopomorpha among the three groups was supported using species and/or genes with low divergence of sequence and amino-acid frequencies. This result corresponded to that of the UCE data set (4), whose sequence divergence was low, which supported the first divergence of Elopomorpha with high statistical significance. The increase in accuracy of the phylogenetic construction by using species and genes with low sequence divergence was predicted by a phylogenetic informativeness approach and confirmed by computer simulation. These results supported that Elopomorpha was the first basal group of teleost fish to have diverged, consistent with the prevailing view of recent morphological studies.
Collapse
Affiliation(s)
- Naoko Takezaki
- Life Science Research Center, Kagawa University, Mikicho, Kitagun, Kagawa, Japan
| |
Collapse
|
25
|
Freitas FV, Branstetter MG, Griswold T, Almeida EAB. Partitioned Gene-Tree Analyses and Gene-Based Topology Testing Help Resolve Incongruence in a Phylogenomic Study of Host-Specialist Bees (Apidae: Eucerinae). Mol Biol Evol 2021; 38:1090-1100. [PMID: 33179746 PMCID: PMC7947843 DOI: 10.1093/molbev/msaa277] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
Incongruence among phylogenetic results has become a common occurrence in analyses of genome-scale data sets. Incongruence originates from uncertainty in underlying evolutionary processes (e.g., incomplete lineage sorting) and from difficulties in determining the best analytical approaches for each situation. To overcome these difficulties, more studies are needed that identify incongruences and demonstrate practical ways to confidently resolve them. Here, we present results of a phylogenomic study based on the analysis 197 taxa and 2,526 ultraconserved element (UCE) loci. We investigate evolutionary relationships of Eucerinae, a diverse subfamily of apid bees (relatives of honey bees and bumble bees) with >1,200 species. We sampled representatives of all tribes within the group and >80% of genera, including two mysterious South American genera, Chilimalopsis and Teratognatha. Initial analysis of the UCE data revealed two conflicting hypotheses for relationships among tribes. To resolve the incongruence, we tested concatenation and species tree approaches and used a variety of additional strategies including locus filtering, partitioned gene-trees searches, and gene-based topological tests. We show that within-locus partitioning improves gene tree and subsequent species-tree estimation, and that this approach, confidently resolves the incongruence observed in our data set. After exploring our proposed analytical strategy on eucerine bees, we validated its efficacy to resolve hard phylogenetic problems by implementing it on a published UCE data set of Adephaga (Insecta: Coleoptera). Our results provide a robust phylogenetic hypothesis for Eucerinae and demonstrate a practical strategy for resolving incongruence in other phylogenomic data sets.
Collapse
Affiliation(s)
- Felipe V Freitas
- Laboratório de Biologia Comparada e Abelhas (LBCA), Departamento de Biologia, Faculdade de Filosofia, Ciências e Letras, Universidade de São Paulo, Ribeirão Preto, SP, Brazil
- U.S. Department of Agriculture, Agricultural Research Service (USDA-ARS), Pollinating Insects Research Unit, Utah State University, Logan, UT
| | - Michael G Branstetter
- U.S. Department of Agriculture, Agricultural Research Service (USDA-ARS), Pollinating Insects Research Unit, Utah State University, Logan, UT
| | - Terry Griswold
- U.S. Department of Agriculture, Agricultural Research Service (USDA-ARS), Pollinating Insects Research Unit, Utah State University, Logan, UT
| | - Eduardo A B Almeida
- Laboratório de Biologia Comparada e Abelhas (LBCA), Departamento de Biologia, Faculdade de Filosofia, Ciências e Letras, Universidade de São Paulo, Ribeirão Preto, SP, Brazil
| |
Collapse
|
26
|
Shen XX, Steenwyk JL, Rokas A. Dissecting incongruence between concatenation- and quartet-based approaches in phylogenomic data. Syst Biol 2021; 70:997-1014. [PMID: 33616672 DOI: 10.1093/sysbio/syab011] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2020] [Revised: 02/10/2021] [Accepted: 02/17/2021] [Indexed: 12/12/2022] Open
Abstract
Topological conflict or incongruence is widespread in phylogenomic data. Concatenation- and coalescent-based approaches often result in incongruent topologies, but the causes of this conflict can be difficult to characterize. We examined incongruence stemming from conflict between likelihood-based signal (quantified by the difference in gene-wise log likelihood score or ΔGLS) and quartet-based topological signal (quantified by the difference in gene-wise quartet score or ΔGQS) for every gene in three phylogenomic studies in animals, fungi, and plants, which were chosen because their concatenation-based IQ-TREE (T1) and quartet-based ASTRAL (T2) phylogenies are known to produce eight conflicting internal branches (bipartitions). By comparing the types of phylogenetic signal for all genes in these three data matrices, we found that 30% - 36% of genes in each data matrix are inconsistent, that is, each of these genes has higher log likelihood score for T1 versus T2 (i.e., ΔGLS >0) whereas its T1 topology has lower quartet score than its T2 topology (i.e., ΔGQS <0) or vice versa. Comparison of inconsistent and consistent genes using a variety of metrics (e.g., evolutionary rate, gene tree topology, distribution of branch lengths, hidden paralogy, and gene tree discordance) showed that inconsistent genes are more likely to recover neither T1 nor T2 and have higher levels of gene tree discordance than consistent genes. Simulation analyses demonstrate that removal of inconsistent genes from datasets with low levels of incomplete lineage sorting (ILS) and low and medium levels of gene tree estimation error (GTEE) reduced incongruence and increased accuracy. In contrast, removal of inconsistent genes from datasets with medium and high ILS levels and high GTEE levels eliminated or extensively reduced incongruence, but the resulting congruent species phylogenies were not always topologically identical to the true species trees.
Collapse
Affiliation(s)
- Xing-Xing Shen
- State Key Laboratory of Rice Biology and Ministry of Agriculture Key Lab of Molecular Biology of Crop Pathogens and Insects, Zhejiang University, Hangzhou, China.,Institute of Insect Sciences, Zhejiang University, Hangzhou, China
| | - Jacob L Steenwyk
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA
| | - Antonis Rokas
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
27
|
Hime PM, Lemmon AR, Lemmon ECM, Prendini E, Brown JM, Thomson RC, Kratovil JD, Noonan BP, Pyron RA, Peloso PLV, Kortyna ML, Keogh JS, Donnellan SC, Mueller RL, Raxworthy CJ, Kunte K, Ron SR, Das S, Gaitonde N, Green DM, Labisko J, Che J, Weisrock DW. Phylogenomics Reveals Ancient Gene Tree Discordance in the Amphibian Tree of Life. Syst Biol 2021; 70:49-66. [PMID: 32359157 PMCID: PMC7823230 DOI: 10.1093/sysbio/syaa034] [Citation(s) in RCA: 101] [Impact Index Per Article: 25.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2019] [Revised: 04/14/2020] [Accepted: 04/14/2020] [Indexed: 11/30/2022] Open
Abstract
Molecular phylogenies have yielded strong support for many parts of the amphibian Tree of Life, but poor support for the resolution of deeper nodes, including relationships among families and orders. To clarify these relationships, we provide a phylogenomic perspective on amphibian relationships by developing a taxon-specific Anchored Hybrid Enrichment protocol targeting hundreds of conserved exons which are effective across the class. After obtaining data from 220 loci for 286 species (representing 94% of the families and 44% of the genera), we estimate a phylogeny for extant amphibians and identify gene tree-species tree conflict across the deepest branches of the amphibian phylogeny. We perform locus-by-locus genealogical interrogation of alternative topological hypotheses for amphibian monophyly, focusing on interordinal relationships. We find that phylogenetic signal deep in the amphibian phylogeny varies greatly across loci in a manner that is consistent with incomplete lineage sorting in the ancestral lineage of extant amphibians. Our results overwhelmingly support amphibian monophyly and a sister relationship between frogs and salamanders, consistent with the Batrachia hypothesis. Species tree analyses converge on a small set of topological hypotheses for the relationships among extant amphibian families. These results clarify several contentious portions of the amphibian Tree of Life, which in conjunction with a set of vetted fossil calibrations, support a surprisingly younger timescale for crown and ordinal amphibian diversification than previously reported. More broadly, our study provides insight into the sources, magnitudes, and heterogeneity of support across loci in phylogenomic data sets.[AIC; Amphibia; Batrachia; Phylogeny; gene tree-species tree discordance; genomics; information theory.].
Collapse
Affiliation(s)
- Paul M Hime
- Biodiversity Institute, University of Kansas, Lawrence, KS 66045, USA
- Department of Biology, University of Kentucky, Lexington, KY 40506, USA
| | - Alan R Lemmon
- Department of Scientific Computing, Florida State University, Tallahassee, FL 32306, USA
| | | | - Elizabeth Prendini
- Division of Vertebrate Zoology: Herpetology, American Museum of Natural History, New York, NY 10024, USA
| | - Jeremy M Brown
- Department of Biological Sciences and Museum of Natural Science, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Robert C Thomson
- School of Life Sciences, University of Hawai’i, Honolulu, HI 96822, USA
| | - Justin D Kratovil
- Department of Biology, University of Kentucky, Lexington, KY 40506, USA
- Department of Entomology, University of Kentucky, Lexington, KY 40546, USA
| | - Brice P Noonan
- Department of Biology, University of Mississippi, Oxford, MS 38677, USA
| | - R Alexander Pyron
- Department of Biological Sciences, The George Washington University, Washington, DC 20052, USA
| | - Pedro L V Peloso
- Division of Vertebrate Zoology: Herpetology, American Museum of Natural History, New York, NY 10024, USA
- Instituto de Ciências Biológicas, Universidade Federal do Pará, Belém, 66075-750, Brazil
| | - Michelle L Kortyna
- Department of Biological Science, Florida State University, Tallahassee, FL 32306, USA
| | - J Scott Keogh
- Division of Ecology and Evolution, Research School of Biology, The Australian National University, Canberra, 2601, Australia
| | - Stephen C Donnellan
- South Australian Museum, North Terrace, Adelaide 5000, Australia
- School of Biological Sciences, University of Adelaide, Adelaide 5005, Australia
| | | | - Christopher J Raxworthy
- Division of Vertebrate Zoology: Herpetology, American Museum of Natural History, New York, NY 10024, USA
| | - Krushnamegh Kunte
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bengaluru 560065, India
| | - Santiago R Ron
- Museo de Zoología, Escuela de Biología, Pontificia Universidad Católica del Ecuador, Quito, Ecuador
| | - Sandeep Das
- Forest Ecology and Biodiversity Conservation Division, Kerala Forest Research Institute, Peechi, Kerala 680653, India
| | - Nikhil Gaitonde
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bengaluru 560065, India
| | - David M Green
- Redpath Museum, McGill University, Montreal, Quebec H3A 0C4, Canada
| | - Jim Labisko
- The Durrell Institute of Conservation and Ecology, School of Anthropology and Conservation, The University of Kent, Canterbury, Kent, CT2 7NR, UK
- Island Biodiversity and Conservation Centre, University of Seychelles, PO Box 1348, Anse Royale, Mahé, Seychelles
| | - Jing Che
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Kunming 650223, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China
| | - David W Weisrock
- Department of Biology, University of Kentucky, Lexington, KY 40506, USA
| |
Collapse
|
28
|
Phylogenomics of manakins (Aves: Pipridae) using alternative locus filtering strategies based on informativeness. Mol Phylogenet Evol 2020; 155:107013. [PMID: 33217578 DOI: 10.1016/j.ympev.2020.107013] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2019] [Revised: 11/07/2020] [Accepted: 11/11/2020] [Indexed: 01/11/2023]
Abstract
Target capture sequencing effectively generates molecular marker arrays useful for molecular systematics. These extensive data sets are advantageous where previous studies using a few loci have failed to resolve relationships confidently. Moreover, target capture is well-suited to fragmented source DNA, allowing data collection from species that lack fresh tissues. Herein we use target capture to generate data for a phylogeny of the avian family Pipridae (manakins), a group that has been the subject of many behavioral and ecological studies. Most manakin species feature lek mating systems, where males exhibit complex behavioral displays including mechanical and vocal sounds, coordinated movements of multiple males, and high speed movements. We analyzed thousands of ultraconserved element (UCE) loci along with a smaller number of coding exons and their flanking regions from all but one species of Pipridae. We examined three different methods of phylogenetic estimation (concatenation and two multispecies coalescent methods). Phylogenetic inferences using UCE data yielded strongly supported estimates of phylogeny regardless of analytical method. Exon probes had limited capability to capture sequence data and resulted in phylogeny estimates with reduced support and modest topological differences relative to the UCE trees, although these conflicts had limited support. Two genera were paraphyletic among all analyses and data sets, with Antilophia nested within Chiroxiphia and Tyranneutes nested within Neopelma. The Chiroxiphia-Antilophia clade was an exception to the generally high support we observed; the topology of this clade differed among analyses, even those based on UCE data. To further explore relationships within this group, we employed two filtering strategies to remove low-information loci. Those analyses resulted in distinct topologies, suggesting that the relationships we identified within Chiroxiphia-Antilophia should be interpreted with caution. Despite the existence of a few continuing uncertainties, our analyses resulted in a robust phylogenetic hypothesis of the family Pipridae that provides a comparative framework for future ecomorphological and behavioral studies.
Collapse
|
29
|
Pardo JD, Lennie K, Anderson JS. Can We Reliably Calibrate Deep Nodes in the Tetrapod Tree? Case Studies in Deep Tetrapod Divergences. Front Genet 2020; 11:506749. [PMID: 33193596 PMCID: PMC7596322 DOI: 10.3389/fgene.2020.506749] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Accepted: 09/03/2020] [Indexed: 12/12/2022] Open
Abstract
Recent efforts have led to the development of extremely sophisticated methods for incorporating tree-wide data and accommodating uncertainty when estimating the temporal patterns of phylogenetic trees, but assignment of prior constraints on node age remains the most important factor. This depends largely on understanding substantive disagreements between specialists (paleontologists, geologists, and comparative anatomists), which are often opaque to phylogeneticists and molecular biologists who rely on these data as downstream users. This often leads to misunderstandings of how the uncertainty associated with node age minima arises, leading to inappropriate treatments of that uncertainty by phylogeneticists. In order to promote dialogue on this subject, we here review factors (phylogeny, preservational megabiases, spatial and temporal patterns in the tetrapod fossil record) that complicate assignment of prior node age constraints for deep divergences in the tetrapod tree, focusing on the origin of crown-group Amniota, crown-group Amphibia, and crown-group Tetrapoda. We find that node priors for amphibians and tetrapods show high phylogenetic lability and different phylogenetic treatments identifying disparate taxa as the earliest representatives of these crown groups. This corresponds partially to the well-known problem of lissamphibian origins but increasingly reflects deeper instabilities in early tetrapod phylogeny. Conversely, differences in phylogenetic treatment do not affect our ability to recognize the earliest crown-group amniotes but do affect how diverse we understand the earliest amniote faunas to be. Preservational megabiases and spatiotemporal heterogeneity of the early tetrapod fossil record present unrecognized challenges in reliably estimating the ages of tetrapod nodes; the tetrapod record throughout the relevant interval is spatially restricted and disrupted by several major intervals of minimal sampling coincident with the emergence of all three crown groups. Going forward, researchers attempting to calibrate the ages for these nodes, and other similar deep nodes in the metazoan fossil record, should consciously consider major phylogenetic uncertainty, preservational megabias, and spatiotemporal heterogeneity, preferably examining the impact of working hypotheses from multiple research groups. We emphasize a need for major tetrapod collection effort outside of classic European and North American sections, particularly from the southern hemisphere, and suggest that such sampling may dramatically change our timelines of tetrapod evolution.
Collapse
Affiliation(s)
- Jason D. Pardo
- Department of Comparative and Experimental Biology, Faculty of Veterinary Medicine, University of Calgary, Calgary, AB, Canada
- McCaig Institute for Bone and Joint Health, University of Calgary, Calgary, AB, Canada
| | - Kendra Lennie
- McCaig Institute for Bone and Joint Health, University of Calgary, Calgary, AB, Canada
- Department of Biological Sciences, University of Calgary, Calgary, AB, Canada
| | - Jason S. Anderson
- Department of Comparative and Experimental Biology, Faculty of Veterinary Medicine, University of Calgary, Calgary, AB, Canada
- McCaig Institute for Bone and Joint Health, University of Calgary, Calgary, AB, Canada
| |
Collapse
|
30
|
Le Kim T, Le Sy V. mPartition: A Model-Based Method for Partitioning Alignments. J Mol Evol 2020; 88:641-652. [PMID: 32864711 DOI: 10.1007/s00239-020-09963-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Accepted: 08/08/2020] [Indexed: 10/23/2022]
Abstract
Maximum likelihood (ML) analysis of nucleotide or amino-acid alignments is widely used to infer evolutionary relationships among species. Computing the likelihood of a phylogenetic tree from such alignments is a complicated task because the evolutionary processes typically vary across sites. A number of studies have shown that partitioning alignments into sub-alignments of sites, where each sub-alignment is analyzed using a different model of evolution (e.g., GTR + I + G), is a sensible strategy. Current partitioning methods group sites into subsets based on the inferred rates of evolution at the sites. However, these do not provide sufficient information to adequately reflect the substitution processes of characters at the sites. Moreover, the site rate-based methods group all invariant sites into one subset, potentially resulting in wrong phylogenetic trees. In this study, we propose a partitioning method, called mPartition, that combines not only the evolutionary rates but also substitution models at sites to partition alignments. Analyses of different partitioning methods on both real and simulated datasets showed that mPartition was better than the other partitioning methods tested. Notably, mPartition overcame the pitfall of grouping all invariant sites into one subset. Using mPartition may lead to increased accuracy of ML-based phylogenetic inference, especially for multiple loci or whole genome datasets.
Collapse
Affiliation(s)
- Thu Le Kim
- University of Engineering and Technology, Vietnam National University Hanoi, 144 Xuan Thuy, Cau Giay, Hanoi, 10000, Vietnam.,Hanoi University of Science and Technology, 1st Dai Co Viet, Hai Ba Trung, Hanoi, 10000, Vietnam
| | - Vinh Le Sy
- University of Engineering and Technology, Vietnam National University Hanoi, 144 Xuan Thuy, Cau Giay, Hanoi, 10000, Vietnam.
| |
Collapse
|
31
|
Singhal S, Colston TJ, Grundler MR, Smith SA, Costa GC, Colli GR, Moritz C, Pyron RA, Rabosky DL. Congruence and Conflict in the Higher-Level Phylogenetics of Squamate Reptiles: An Expanded Phylogenomic Perspective. Syst Biol 2020; 70:542-557. [PMID: 32681800 DOI: 10.1093/sysbio/syaa054] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Revised: 05/05/2020] [Accepted: 07/05/2020] [Indexed: 12/16/2022] Open
Abstract
Genome-scale data have the potential to clarify phylogenetic relationships across the tree of life but have also revealed extensive gene tree conflict. This seeming paradox, whereby larger data sets both increase statistical confidence and uncover significant discordance, suggests that understanding sources of conflict is important for accurate reconstruction of evolutionary history. We explore this paradox in squamate reptiles, the vertebrate clade comprising lizards, snakes, and amphisbaenians. We collected an average of 5103 loci for 91 species of squamates that span higher-level diversity within the clade, which we augmented with publicly available sequences for an additional 17 taxa. Using a locus-by-locus approach, we evaluated support for alternative topologies at 17 contentious nodes in the phylogeny. We identified shared properties of conflicting loci, finding that rate and compositional heterogeneity drives discordance between gene trees and species tree and that conflicting loci rarely overlap across contentious nodes. Finally, by comparing our tests of nodal conflict to previous phylogenomic studies, we confidently resolve 9 of the 17 problematic nodes. We suggest this locus-by-locus and node-by-node approach can build consensus on which topological resolutions remain uncertain in phylogenomic studies of other contentious groups. [Anchored hybrid enrichment (AHE); gene tree conflict; molecular evolution; phylogenomic concordance; target capture; ultraconserved elements (UCE).].
Collapse
Affiliation(s)
- Sonal Singhal
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI 48109, USA.,Museum of Zoology, University of Michigan, Ann Arbor, MI 48109, USA.,Department of Biology, CSU Dominguez Hills, Carson, CA 90747, USA
| | - Timothy J Colston
- Department of Biological Sciences, The George Washington University, Washington D.C. 20052, USA.,Department of Biological Science, Florida State University, Tallahassee, FL 32306, USA
| | - Maggie R Grundler
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI 48109, USA.,Museum of Zoology, University of Michigan, Ann Arbor, MI 48109, USA.,Department of Environmental Science, Policy, & Management, University of California Berkeley, Berkeley, CA 94720, USA
| | - Stephen A Smith
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Gabriel C Costa
- Department of Biology and Environmental Sciences, Auburn University at Montgomery, Montgomery, AL, USA
| | - Guarino R Colli
- Departamento de Zoologia, Universidade de Brasília, Brasília, DF, Brazil
| | - Craig Moritz
- Division of Ecology and Evolution, Research School of Biology, and Centre for Biodiversity Analysis, The Australian National University, 46 Sullivans Creek Road, Acton, ACT 2601, Australia
| | - R Alexander Pyron
- Department of Biological Sciences, The George Washington University, Washington D.C. 20052, USA
| | - Daniel L Rabosky
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI 48109, USA.,Museum of Zoology, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
32
|
Chan KO, Hutter CR, Wood PL, Grismer LL, Brown RM. Larger, unfiltered datasets are more effective at resolving phylogenetic conflict: Introns, exons, and UCEs resolve ambiguities in Golden-backed frogs (Anura: Ranidae; genus Hylarana). Mol Phylogenet Evol 2020; 151:106899. [PMID: 32590046 DOI: 10.1016/j.ympev.2020.106899] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Revised: 05/18/2020] [Accepted: 06/17/2020] [Indexed: 01/01/2023]
Abstract
Using FrogCap, a recently-developed sequence-capture protocol, we obtained >12,000 highly informative exons, introns, and ultraconserved elements (UCEs), which we used to illustrate variation in evolutionary histories of these classes of markers, and to resolve long-standing systematic problems in Southeast Asian Golden-backed frogs of the genus-complex Hylarana. We also performed a comprehensive suite of analyses to assess the relative performance of different genetic markers, data filtering strategies, tree inference methods, and different measures of branch support. To reduce gene tree estimation error, we filtered the data using different thresholds of taxon completeness (missing data) and parsimony informative sites (PIS). We then estimated species trees using concatenated datasets and Maximum Likelihood (IQ-TREE) in addition to summary (ASTRAL-III), distance-based (ASTRID), and site-based (SVDQuartets) multispecies coalescent methods. Topological congruence and branch support were examined using traditional bootstrap, local posterior probabilities, gene concordance factors, quartet frequencies, and quartet scores. Our results did not yield a single concordant topology. Instead, introns, exons, and UCEs clearly possessed different phylogenetic signals, resulting in conflicting, yet strongly-supported phylogenetic estimates. However, a combined analysis comprising the most informative introns, exons, and UCEs converged on a similar topology across all analyses, with the exception of SVDQuartets. Bootstrap values were consistently high despite high levels of incongruence and high proportions of gene trees supporting conflicting topologies. Although low bootstrap values did indicate low heuristic support, high bootstrap support did not necessarily reflect congruence or support for the correct topology. This study reiterates findings of some previous studies, which demonstrated that traditional bootstrap values can produce positively misleading measures of support in large phylogenomic datasets. We also showed a remarkably strong positive relationship between branch length and topological congruence across all datasets, implying that very short internodes remain a challenge to resolve, even with orders of magnitude more data than ever before. Overall, our results demonstrate that more data from unfiltered or combined datasets produced superior results. Although data filtering reduced gene tree incongruence, decreased amounts of data also biased phylogenetic estimation. A point of diminishing returns was evident, at which higher congruence (from more stringent filtering) at the expense of amount of data led to topological error as assessed by comparison to more complete datasets across different genomic markers. Additionally, we showed that applying a parameter-rich model to a partitioned analysis of concatenated data produces better results compared to unpartitioned, or even partitioned analysis using model selection. Despite some lingering uncertainties, a combined analysis of our genomic data and sequences supplemented from GenBank (on the basis of a few gene regions) revealed highly supported novel systematic arrangements. Based on these new findings, we transfer Amnirana nicobariensis into the genus Indosylvirana; and I. milleti and Hylarana celebensis to the genus Papurana. We also provisionally place H. attigua in the genus Papurana pending verification from positively identified (voucher substantiated) samples.
Collapse
Affiliation(s)
- Kin Onn Chan
- Lee Kong Chian National History Museum, Faculty of Science, National University of Singapore, 2 Conservatory Drive, 117377, Singapore.
| | - Carl R Hutter
- Biodiversity Institute and Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS 66045, USA; Museum of Natural Sciences and Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Perry L Wood
- Museum of Natural Sciences and Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA; Department of Biological Sciences & Museum of Natural History, Auburn University, Auburn, AL 36849, USA
| | - L Lee Grismer
- Herpetology Laboratory, Department of Biology, La Sierra University, 4500 Riverwalk Parkway, Riverside, CA 92505, USA
| | - Rafe M Brown
- Biodiversity Institute and Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS 66045, USA
| |
Collapse
|
33
|
Karin BR, Gamble T, Jackman TR. Optimizing Phylogenomics with Rapidly Evolving Long Exons: Comparison with Anchored Hybrid Enrichment and Ultraconserved Elements. Mol Biol Evol 2020; 37:904-922. [PMID: 31710677 PMCID: PMC7038749 DOI: 10.1093/molbev/msz263] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Marker selection has emerged as an important component of phylogenomic study design due to rising concerns of the effects of gene tree estimation error, model misspecification, and data-type differences. Researchers must balance various trade-offs associated with locus length and evolutionary rate among other factors. The most commonly used reduced representation data sets for phylogenomics are ultraconserved elements (UCEs) and Anchored Hybrid Enrichment (AHE). Here, we introduce Rapidly Evolving Long Exon Capture (RELEC), a new set of loci that targets single exons that are both rapidly evolving (evolutionary rate faster than RAG1) and relatively long in length (>1,500 bp), while at the same time avoiding paralogy issues across amniotes. We compare the RELEC data set to UCEs and AHE in squamate reptiles by aligning and analyzing orthologous sequences from 17 squamate genomes, composed of 10 snakes and 7 lizards. The RELEC data set (179 loci) outperforms AHE and UCEs by maximizing per-locus genetic variation while maintaining presence and orthology across a range of evolutionary scales. RELEC markers show higher phylogenetic informativeness than UCE and AHE loci, and RELEC gene trees show greater similarity to the species tree than AHE or UCE gene trees. Furthermore, with fewer loci, RELEC remains computationally tractable for full Bayesian coalescent species tree analyses. We contrast RELEC to and discuss important aspects of comparable methods, and demonstrate how RELEC may be the most effective set of loci for resolving difficult nodes and rapid radiations. We provide several resources for capturing or extracting RELEC loci from other amniote groups.
Collapse
Affiliation(s)
- Benjamin R Karin
- Department of Biology, Villanova University, Villanova, PA
- Museum of Vertebrate Zoology and Department of Integrative Biology, University of California, Berkeley, CA
| | - Tony Gamble
- Department of Biological Sciences, Marquette University, Milwaukee, WI
- Milwaukee Public Museum, Milwaukee, WI
- Bell Museum of Natural History, University of Minnesota, St. Paul, MN
| | - Todd R Jackman
- Department of Biology, Villanova University, Villanova, PA
| |
Collapse
|
34
|
Prasanna AN, Gerber D, Kijpornyongpan T, Aime MC, Doyle VP, Nagy LG. Model Choice, Missing Data, and Taxon Sampling Impact Phylogenomic Inference of Deep Basidiomycota Relationships. Syst Biol 2020; 69:17-37. [PMID: 31062852 DOI: 10.1093/sysbio/syz029] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2017] [Revised: 04/21/2019] [Accepted: 04/26/2019] [Indexed: 11/12/2022] Open
Abstract
Resolving deep divergences in the tree of life is challenging even for analyses of genome-scale phylogenetic data sets. Relationships between Basidiomycota subphyla, the rusts and allies (Pucciniomycotina), smuts and allies (Ustilaginomycotina), and mushroom-forming fungi and allies (Agaricomycotina) were found particularly recalcitrant both to traditional multigene and genome-scale phylogenetics. Here, we address basal Basidiomycota relationships using concatenated and gene tree-based analyses of various phylogenomic data sets to examine the contribution of several potential sources of bias. We evaluate the contribution of biological causes (hard polytomy, incomplete lineage sorting) versus unmodeled evolutionary processes and factors that exacerbate their effects (e.g., fast-evolving sites and long-branch taxa) to inferences of basal Basidiomycota relationships. Bayesian Markov Chain Monte Carlo and likelihood mapping analyses reject the hard polytomy with confidence. In concatenated analyses, fast-evolving sites and oversimplified models of amino acid substitution favored the grouping of smuts with mushroom-forming fungi, often leading to maximal bootstrap support in both concatenation and coalescent analyses. On the contrary, the most conserved data subsets grouped rusts and allies with mushroom-forming fungi, although this relationship proved labile, sensitive to model choice, to different data subsets and to missing data. Excluding putative long-branch taxa, genes with high proportions of missing data and/or with strong signal failed to reveal a consistent trend toward one or the other topology, suggesting that additional sources of conflict are at play. While concatenated analyses yielded strong but conflicting support, individual gene trees mostly provided poor support for any resolution of rusts, smuts, and mushroom-forming fungi, suggesting that the true Basidiomycota tree might be in a part of tree space that is difficult to access using both concatenation and gene tree-based approaches. Inference-based assessments of absolute model fit strongly reject best-fit models for the vast majority of genes, indicating a poor fit of even the most commonly used models. While this is consistent with previous assessments of site-homogenous models of amino acid evolution, this does not appear to be the sole source of confounding signal. Our analyses suggest that topologies uniting smuts with mushroom-forming fungi can arise as a result of inappropriate modeling of amino acid sites that might be prone to systematic bias. We speculate that improved models of sequence evolution could shed more light on basal splits in the Basidiomycota, which, for now, remain unresolved despite the use of whole genome data.
Collapse
Affiliation(s)
- Arun N Prasanna
- Synthetic and Systems Biology Unit, Institute of Biochemistry, BRC-HAS, Szeged 6726, Hungary
| | - Daniel Gerber
- Synthetic and Systems Biology Unit, Institute of Biochemistry, BRC-HAS, Szeged 6726, Hungary.,Institute of Archaeology, Research Centre for the Humanities, Hungarian Academy of Sciences, Budapest 1097, Hungary
| | | | - M Catherine Aime
- Department of Botany and Plant Pathology, Purdue University, West Lafayette, IN 47907, USA
| | - Vinson P Doyle
- Department of Plant Pathology and Crop Physiology, Louisiana State University AgCenter, Baton Rouge, LA 70803, USA
| | - Laszlo G Nagy
- Synthetic and Systems Biology Unit, Institute of Biochemistry, BRC-HAS, Szeged 6726, Hungary
| |
Collapse
|
35
|
Wang HC, Susko E, Roger AJ. The Relative Importance of Modeling Site Pattern Heterogeneity Versus Partition-Wise Heterotachy in Phylogenomic Inference. Syst Biol 2020; 68:1003-1019. [PMID: 31140564 DOI: 10.1093/sysbio/syz021] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2018] [Revised: 02/04/2019] [Accepted: 04/09/2019] [Indexed: 12/18/2022] Open
Abstract
Large taxa-rich genome-scale data sets are often necessary for resolving ancient phylogenetic relationships. But accurate phylogenetic inference requires that they are analyzed with realistic models that account for the heterogeneity in substitution patterns amongst the sites, genes and lineages. Two kinds of adjustments are frequently used: models that account for heterogeneity in amino acid frequencies at sites in proteins, and partitioned models that accommodate the heterogeneity in rates (branch lengths) among different proteins in different lineages (protein-wise heterotachy). Although partitioned and site-heterogeneous models are both widely used in isolation, their relative importance to the inference of correct phylogenies has not been carefully evaluated. We conducted several empirical analyses and a large set of simulations to compare the relative performances of partitioned models, site-heterogeneous models, and combined partitioned site heterogeneous models. In general, site-homogeneous models (partitioned or not) performed worse than site heterogeneous, except in simulations with extreme protein-wise heterotachy. Furthermore, simulations using empirically-derived realistic parameter settings showed a marked long-branch attraction (LBA) problem for analyses employing protein-wise partitioning even when the generating model included partitioning. This LBA problem results from a small sample bias compounded over many single protein alignments. In some cases, this problem was ameliorated by clustering similarly-evolving proteins together into larger partitions using the PartitionFinder method. Similar results were obtained under simulations with larger numbers of taxa or heterogeneity in simulating topologies over genes. For an empirical Microsporidia test data set, all but one tested site-heterogeneous models (with or without partitioning) obtain the correct Microsporidia+Fungi grouping, whereas site-homogenous models (with or without partitioning) did not. The single exception was the fully partitioned site-heterogeneous analysis that succumbed to the compounded small sample LBA bias. In general unless protein-wise heterotachy effects are extreme, it is more important to model site-heterogeneity than protein-wise heterotachy in phylogenomic analyses. Complete protein-wise partitioning should be avoided as it can lead to a serious LBA bias. In cases of extreme protein-wise heterotachy, approaches that cluster similarly-evolving proteins together and coupled with site-heterogeneous models work well for phylogenetic estimation.
Collapse
Affiliation(s)
- Huai-Chun Wang
- Department of Mathematics and Statistics, Dalhousie University, 6316 Coburg Road, Halifax, Nova Scotia B3H 4R2, Canada.,Centre for Comparative Genomics and Evolutionary Bioinformatics, Dalhousie University, 5850 College Street, Halifax, Nova Scotia B3H 4R2, Canada
| | - Edward Susko
- Department of Mathematics and Statistics, Dalhousie University, 6316 Coburg Road, Halifax, Nova Scotia B3H 4R2, Canada.,Centre for Comparative Genomics and Evolutionary Bioinformatics, Dalhousie University, 5850 College Street, Halifax, Nova Scotia B3H 4R2, Canada
| | - Andrew J Roger
- Centre for Comparative Genomics and Evolutionary Bioinformatics, Dalhousie University, 5850 College Street, Halifax, Nova Scotia B3H 4R2, Canada.,Department of Biochemistry and Molecular Biology, Dalhousie University, 5850 College Street, Halifax, Nova Scotia B3H 4R2, Canada
| |
Collapse
|
36
|
Smith SA, Walker-Hale N, Walker JF, Brown JW. Phylogenetic Conflicts, Combinability, and Deep Phylogenomics in Plants. Syst Biol 2019; 69:579-592. [DOI: 10.1093/sysbio/syz078] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2018] [Revised: 10/16/2019] [Accepted: 11/18/2019] [Indexed: 11/13/2022] Open
Abstract
Abstract
Studies have demonstrated that pervasive gene tree conflict underlies several important phylogenetic relationships where different species tree methods produce conflicting results. Here, we present a means of dissecting the phylogenetic signal for alternative resolutions within a data set in order to resolve recalcitrant relationships and, importantly, identify what the data set is unable to resolve. These procedures extend upon methods for isolating conflict and concordance involving specific candidate relationships and can be used to identify systematic error and disambiguate sources of conflict among species tree inference methods. We demonstrate these on a large phylogenomic plant data set. Our results support the placement of Amborella as sister to the remaining extant angiosperms, Gnetales as sister to pines, and the monophyly of extant gymnosperms. Several other contentious relationships, including the resolution of relationships within the bryophytes and the eudicots, remain uncertain given the low number of supporting gene trees. To address whether concatenation of filtered genes amplified phylogenetic signal for relationships, we implemented a combinatorial heuristic to test combinability of genes. We found that nested conflicts limited the ability of data filtering methods to fully ameliorate conflicting signal amongst gene trees. These analyses confirmed that the underlying conflicting signal does not support broad concatenation of genes. Our approach provides a means of dissecting a specific data set to address deep phylogenetic relationships while also identifying the inferential boundaries of the data set. [Angiosperms; coalescent; gene-tree conflict; genomics; phylogenetics; phylogenomics.]
Collapse
Affiliation(s)
- Stephen A Smith
- Department of Ecology and Evolutionary Biology, University of Michigan, 1105 North University Ave, Biological Sciences Building, Ann Arbor, MI 48109-1085, USA
| | - Nathanael Walker-Hale
- Department of Ecology and Evolutionary Biology, University of Michigan, 1105 North University Ave, Biological Sciences Building, Ann Arbor, MI 48109-1085, USA
- Department of Plant Sciences, University of Cambridge, Downing Street, Cambridge CB2 3EA, Cambridge, UK
| | - Joseph F Walker
- Department of Ecology and Evolutionary Biology, University of Michigan, 1105 North University Ave, Biological Sciences Building, Ann Arbor, MI 48109-1085, USA
- Sainsbury Laboratory (SLCU), University of Cambrige, Bateman St, Cambridge CB2 1LR, Cambridge, UK
| | - Joseph W Brown
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield S10 2TN, Sheffield, UK
| |
Collapse
|
37
|
Du Y, Wu S, Edwards SV, Liu L. The effect of alignment uncertainty, substitution models and priors in building and dating the mammal tree of life. BMC Evol Biol 2019; 19:203. [PMID: 31694538 PMCID: PMC6833305 DOI: 10.1186/s12862-019-1534-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Accepted: 10/21/2019] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND The flood of genomic data to help build and date the tree of life requires automation at several critical junctures, most importantly during sequence assembly and alignment. It is widely appreciated that automated alignment protocols can yield inaccuracies, but the relative impact of various sources error on phylogenomic analysis is not yet known. This study employs an updated mammal data set of 5162 coding loci sampled from 90 species to evaluate the effects of alignment uncertainty, substitution models, and fossil priors on gene tree, species tree, and divergence time estimation. Additionally, a novel coalescent likelihood ratio test is introduced for comparing competing species trees against a given set of gene trees. RESULTS The aligned DNA sequences of 5162 loci from 90 species were trimmed and filtered using trimAL and two filtering protocols. The final dataset contains 4 sets of alignments - before trimming, after trimming, filtered by a recently proposed pipeline, and further filtered by comparing ML gene trees for each locus with the concatenation tree. Our analyses suggest that the average discordance among the coalescent trees is significantly smaller than that among the concatenation trees estimated from the 4 sets of alignments or with different substitution models. There is no significant difference among the divergence times estimated with different substitution models. However, the divergence dates estimated from the alignments after trimming are more recent than those estimated from the alignments before trimming. CONCLUSIONS Our results highlight that alignment uncertainty of the updated mammal data set and the choice of substitution models have little impact on tree topologies yielded by coalescent methods for species tree estimation, whereas they are more influential on the trees made by concatenation. Given the choice of calibration scheme and clock models, divergence time estimates are robust to the choice of substitution models, but removing alignments deemed problematic by trimming algorithms can lead to more recent dates. Although the fossil prior is important in divergence time estimation, Bayesian estimates of divergence times in this data set are driven primarily by the sequence data.
Collapse
Affiliation(s)
- Yan Du
- Department of Statistics, University of Georgia, 310 Herty Drive, Athens, GA 30606 USA
| | - Shaoyuan Wu
- Jiangsu Key Laboratory of Phylogenomics & Comparative Genomics, School of Life Sciences, Jiangsu Normal University, Xuzhou, Jiangsu 221116 People’s Republic of China
| | - Scott V. Edwards
- Department of Organismic & Evolutionary Biology, Museum of Comparative Zoology, Harvard University, Cambridge, MA 02138 USA
| | - Liang Liu
- Liang Liu, Department of Statistics and Institute of Bioinformatics, University of Georgia, 310 Herty Drive, Athens, GA 30606 USA
| |
Collapse
|
38
|
Braun EL. An evolutionary model motivated by physicochemical properties of amino acids reveals variation among proteins. Bioinformatics 2019; 34:i350-i356. [PMID: 29950007 PMCID: PMC6022633 DOI: 10.1093/bioinformatics/bty261] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Motivation The relative rates of amino acid interchanges over evolutionary time are likely to vary among proteins. Variation in those rates has the potential to reveal information about constraints on proteins. However, the most straightforward model that could be used to estimate relative rates of amino acid substitution is parameter-rich and it is therefore impractical to use for this purpose. Results A six-parameter model of amino acid substitution that incorporates information about the physicochemical properties of amino acids was developed. It showed that amino acid side chain volume, polarity and aromaticity have major impacts on protein evolution. It also revealed variation among proteins in the relative importance of those properties. The same general approach can be used to improve the fit of empirical models such as the commonly used PAM and LG models. Availability and implementation Perl code and test data are available from https://github.com/ebraun68/sixparam. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Edward L Braun
- Department of Biology and Genetics Institute, University of Florida, Gainesville, FL, USA
| |
Collapse
|
39
|
Burbrink FT, Grazziotin FG, Pyron RA, Cundall D, Donnellan S, Irish F, Keogh JS, Kraus F, Murphy RW, Noonan B, Raxworthy CJ, Ruane S, Lemmon AR, Lemmon EM, Zaher H. Interrogating Genomic-Scale Data for Squamata (Lizards, Snakes, and Amphisbaenians) Shows no Support for Key Traditional Morphological Relationships. Syst Biol 2019; 69:502-520. [DOI: 10.1093/sysbio/syz062] [Citation(s) in RCA: 119] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2019] [Revised: 09/05/2019] [Accepted: 09/10/2019] [Indexed: 12/15/2022] Open
Abstract
Abstract
Genomics is narrowing uncertainty in the phylogenetic structure for many amniote groups. For one of the most diverse and species-rich groups, the squamate reptiles (lizards, snakes, and amphisbaenians), an inverse correlation between the number of taxa and loci sampled still persists across all publications using DNA sequence data and reaching a consensus on the relationships among them has been highly problematic. In this study, we use high-throughput sequence data from 289 samples covering 75 families of squamates to address phylogenetic affinities, estimate divergence times, and characterize residual topological uncertainty in the presence of genome-scale data. Importantly, we address genomic support for the traditional taxonomic groupings Scleroglossa and Macrostomata using novel machine-learning techniques. We interrogate genes using various metrics inherent to these loci, including parsimony-informative sites (PIS), phylogenetic informativeness, length, gaps, number of substitutions, and site concordance to understand why certain loci fail to find previously well-supported molecular clades and how they fail to support species-tree estimates. We show that both incomplete lineage sorting and poor gene-tree estimation (due to a few undesirable gene properties, such as an insufficient number of PIS), may account for most gene and species-tree discordance. We find overwhelming signal for Toxicofera, and also show that none of the loci included in this study supports Scleroglossa or Macrostomata. We comment on the origins and diversification of Squamata throughout the Mesozoic and underscore remaining uncertainties that persist in both deeper parts of the tree (e.g., relationships between Dibamia, Gekkota, and remaining squamates; among the three toxicoferan clades Iguania, Serpentes, and Anguiformes) and within specific clades (e.g., affinities among gekkotan, pleurodont iguanians, and colubroid families).
Collapse
Affiliation(s)
- Frank T Burbrink
- Department of Herpetology, The American Museum of Natural History, 79th Street at Central Park West, New York, NY 10024, USA
| | - Felipe G Grazziotin
- Laboratório de Coleções Zoológicas, Instituto Butantan, Av. Vital Brasil, 1500—Butantã, São Paulo—SP 05503-900, Brazil
| | - R Alexander Pyron
- Department of Biological Sciences, The George Washington University, Washington, DC 20052, USA
| | - David Cundall
- Department of Biological Sciences, 1 W. Packer Avenue, Lehigh University, Bethlehem, PA 18015, USA
| | - Steve Donnellan
- South Australian Museum, North Terrace, Adelaide SA 5000, Australia
- School of Biological Sciences, University of Adelaide, SA 5005 Australia
| | - Frances Irish
- Department of Biological Sciences, Moravian College, 1200 Main St, Bethlehem, PA 18018, US
| | - J Scott Keogh
- Division of Ecology and Evolution, Research School of Biology, The Australian National University, Canberra, ACT 2601, Australia
| | - Fred Kraus
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Robert W Murphy
- Department of Natural History, Royal Ontario Museum, 100 Queens Park, Toronto, ON M5S 2C6, Canada
| | - Brice Noonan
- Department of Biology, University of Mississippi, Oxford, MS 38677, USA
| | - Christopher J Raxworthy
- Department of Herpetology, The American Museum of Natural History, 79th Street at Central Park West, New York, NY 10024, USA
| | - Sara Ruane
- Department of Biological Sciences, 206 Boyden Hall, Rutgers University, 195 University Avenue, Newark, NJ 07102, USA
| | - Alan R Lemmon
- Department of Scientific Computing, Florida State University, Dirac Science Library, Tallahassee, FL 32306-4102, USA
| | - Emily Moriarty Lemmon
- Department of Biological Science, Florida State University, 319 Stadium Drive, Tallahassee, FL 32306-4295, USA
| | - Hussam Zaher
- Museu de Zoologia da Universidade de São Paulo, São Paulo, Brazil CEP 04263-000, Brazil
- Centre de Recherche sur la Paléobiodiversité et les Paléoenvironnements (CR2P), UMR 7207 CNRS/MNHN/Sorbonne Université, Muséum national d’Histoire naturelle, 8 rue Buffon, CP 38, 75005 Paris, France
| |
Collapse
|
40
|
Quartet-Based Computations of Internode Certainty Provide Robust Measures of Phylogenetic Incongruence. Syst Biol 2019; 69:308-324. [DOI: 10.1093/sysbio/syz058] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2017] [Accepted: 08/26/2019] [Indexed: 11/14/2022] Open
Abstract
Abstract
Incongruence, or topological conflict, is prevalent in genome-scale data sets. Internode certainty (IC) and related measures were recently introduced to explicitly quantify the level of incongruence of a given internal branch among a set of phylogenetic trees and complement regular branch support measures (e.g., bootstrap, posterior probability) that instead assess the statistical confidence of inference. Since most phylogenomic studies contain data partitions (e.g., genes) with missing taxa and IC scores stem from the frequencies of bipartitions (or splits) on a set of trees, IC score calculation typically requires adjusting the frequencies of bipartitions from these partial gene trees. However, when the proportion of missing taxa is high, the scores yielded by current approaches that adjust bipartition frequencies in partial gene trees differ substantially from each other and tend to be overestimates. To overcome these issues, we developed three new IC measures based on the frequencies of quartets, which naturally apply to both complete and partial trees. Comparison of our new quartet-based measures to previous bipartition-based measures on simulated data shows that: (1) on complete data sets, both quartet-based and bipartition-based measures yield very similar IC scores; (2) IC scores of quartet-based measures on a given data set with and without missing taxa are more similar than the scores of bipartition-based measures; and (3) quartet-based measures are more robust to the absence of phylogenetic signal and errors in phylogenetic inference than bipartition-based measures. Additionally, the analysis of an empirical mammalian phylogenomic data set using our quartet-based measures reveals the presence of substantial levels of incongruence for numerous internal branches. An efficient open-source implementation of these quartet-based measures is freely available in the program QuartetScores (https://github.com/lutteropp/QuartetScores).
Collapse
|
41
|
Steenwyk JL, Rokas A. Treehouse: a user-friendly application to obtain subtrees from large phylogenies. BMC Res Notes 2019; 12:541. [PMID: 31455362 PMCID: PMC6712805 DOI: 10.1186/s13104-019-4577-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2019] [Accepted: 08/21/2019] [Indexed: 01/13/2023] Open
Abstract
Objective Phylogenetic trees that contain hundreds to thousands of taxa are now routinely generated. Retrieving the relationships among a subset of taxa in these large phylogenies can be a challenging or time-consuming task. Addressing this challenge requires the development of tools that facilitate the easy retrieval of subtrees from any user-specified set of taxa in a given phylogeny. Results We developed treehouse, an open source tool that enables the retrieval of any subtree from a given large phylogeny. With a three-step workflow, treehouse successfully allows a user to obtain a subtree from any phylogeny. Treehouse can help researchers to explore the relationships among any set of taxa from across the tree of life. Treehouse is implemented as a shiny application in the R programming language. Treehouse software and usage instructions are publicly available at https://github.com/JLSteenwyk/treehouse.
Collapse
Affiliation(s)
- Jacob L Steenwyk
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, 37235, USA
| | - Antonis Rokas
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, 37235, USA.
| |
Collapse
|
42
|
Siu-Ting K, Torres-Sánchez M, San Mauro D, Wilcockson D, Wilkinson M, Pisani D, O'Connell MJ, Creevey CJ. Inadvertent Paralog Inclusion Drives Artifactual Topologies and Timetree Estimates in Phylogenomics. Mol Biol Evol 2019; 36:1344-1356. [PMID: 30903171 PMCID: PMC6526904 DOI: 10.1093/molbev/msz067] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Increasingly, large phylogenomic data sets include transcriptomic data from nonmodel organisms. This not only has allowed controversial and unexplored evolutionary relationships in the tree of life to be addressed but also increases the risk of inadvertent inclusion of paralogs in the analysis. Although this may be expected to result in decreased phylogenetic support, it is not clear if it could also drive highly supported artifactual relationships. Many groups, including the hyperdiverse Lissamphibia, are especially susceptible to these issues due to ancient gene duplication events and small numbers of sequenced genomes and because transcriptomes are increasingly applied to resolve historically conflicting taxonomic hypotheses. We tested the potential impact of paralog inclusion on the topologies and timetree estimates of the Lissamphibia using published and de novo sequencing data including 18 amphibian species, from which 2,656 single-copy gene families were identified. A novel paralog filtering approach resulted in four differently curated data sets, which were used for phylogenetic reconstructions using Bayesian inference, maximum likelihood, and quartet-based supertrees. We found that paralogs drive strongly supported conflicting hypotheses within the Lissamphibia (Batrachia and Procera) and older divergence time estimates even within groups where no variation in topology was observed. All investigated methods, except Bayesian inference with the CAT-GTR model, were found to be sensitive to paralogs, but with filtering convergence to the same answer (Batrachia) was observed. This is the first large-scale study to address the impact of orthology selection using transcriptomic data and emphasizes the importance of quality over quantity particularly for understanding relationships of poorly sampled taxa.
Collapse
Affiliation(s)
- Karen Siu-Ting
- Institute for Global Food Security, School of Biological Sciences, Queen's University Belfast, Belfast, United Kingdom.,School of Biotechnology, Dublin City University, Glasnevin, Dublin, Ireland.,Dpto. de Herpetología, Museo de Historia Natural, Universidad Nacional Mayor de San Marcos, Lima, Perú
| | - María Torres-Sánchez
- Department of Biodiversity, Ecology, and Evolution, Complutense University of Madrid, Madrid, Spain.,Department of Neuroscience, Spinal Cord and Brain Injury Research Center and Ambystoma Genetic Stock Center, University of Kentucky, Lexington, KY
| | - Diego San Mauro
- Department of Biodiversity, Ecology, and Evolution, Complutense University of Madrid, Madrid, Spain
| | - David Wilcockson
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth, United Kingdom
| | - Mark Wilkinson
- Department of Life Sciences, Natural History Museum, London, United Kingdom
| | - Davide Pisani
- Life Sciences Building, University of Bristol, Bristol, United Kingdom
| | - Mary J O'Connell
- School of Biology, Faculty of Biological Sciences, University of Leeds, Leeds, United Kingdom.,School of Life Sciences, University of Nottingham, University Park, United Kingdom
| | - Christopher J Creevey
- Institute for Global Food Security, School of Biological Sciences, Queen's University Belfast, Belfast, United Kingdom
| |
Collapse
|
43
|
Comparative Phylogenomics, a Stepping Stone for Bird Biodiversity Studies. DIVERSITY-BASEL 2019. [DOI: 10.3390/d11070115] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Birds are a group with immense availability of genomic resources, and hundreds of forthcoming genomes at the doorstep. We review recent developments in whole genome sequencing, phylogenomics, and comparative genomics of birds. Short read based genome assemblies are common, largely due to efforts of the Bird 10K genome project (B10K). Chromosome-level assemblies are expected to increase due to improved long-read sequencing. The available genomic data has enabled the reconstruction of the bird tree of life with increasing confidence and resolution, but challenges remain in the early splits of Neoaves due to their explosive diversification after the Cretaceous-Paleogene (K-Pg) event. Continued genomic sampling of the bird tree of life will not just better reflect their evolutionary history but also shine new light onto the organization of phylogenetic signal and conflict across the genome. The comparatively simple architecture of avian genomes makes them a powerful system to study the molecular foundation of bird specific traits. Birds are on the verge of becoming an extremely resourceful system to study biodiversity from the nucleotide up.
Collapse
|
44
|
Schrago CG, Seuánez HN. Large ancestral effective population size explains the difficult phylogenetic placement of owl monkeys. Am J Primatol 2019; 81:e22955. [PMID: 30779198 DOI: 10.1002/ajp.22955] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2018] [Revised: 12/05/2018] [Accepted: 12/15/2018] [Indexed: 11/07/2022]
Abstract
The phylogenetic position of owl monkeys, grouped in the genus Aotus, has been a controversial issue for understanding Neotropical primate evolution. Explanations of the difficult phylogenetic assignment of owl monkeys have been elusive, frequently relying on insufficient data (stochastic error) or scenarios of rapid speciation (adaptive radiation) events. Using a coalescent-based approach, we explored the population-level mechanisms likely explaining these topological discrepancies. We examined the topological variance of 2,192 orthologous genes shared between representatives of the three major Cebidae lineages and the outgroup. By employing a methodological framework that allows for reticulated tree topologies, our analysis explicitly tested for non-dichotomous evolutionary processes impacting the finding of the position of owl monkeys in the cebid phylogeny. Our findings indicated that Aotus is a sister lineage of the callitrichines. Most gene trees (>50%) failed to recover the species tree topology, although the distribution of gene trees mismatching the true species topology followed the standard expectation of the multispecies coalescent without reticulation. We showed that the large effective population size of the common ancestor of Aotus and callitrichines was the most likely factor responsible for generating phylogenetic uncertainty. On the other hand, fast speciation scenarios or introgression played minor roles. We propose that the difficult phylogenetic placement of Aotus is explained by population-level processes associated with the large ancestral effective size. These results shed light on the biogeography of the early cebid diversification in the Miocene, highlighting the relevance of evaluating phylogenetic relationships employing population-aware approaches.
Collapse
Affiliation(s)
- Carlos G Schrago
- Department of Genetics, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Hector N Seuánez
- Department of Genetics, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil.,Division of Genetics, National Cancer Institute, Rio de Janeiro, Brazil
| |
Collapse
|
45
|
Zhou X, Shen XX, Hittinger CT, Rokas A. Evaluating Fast Maximum Likelihood-Based Phylogenetic Programs Using Empirical Phylogenomic Data Sets. Mol Biol Evol 2019; 35:486-503. [PMID: 29177474 PMCID: PMC5850867 DOI: 10.1093/molbev/msx302] [Citation(s) in RCA: 91] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
The sizes of the data matrices assembled to resolve branches of the tree of life have increased dramatically, motivating the development of programs for fast, yet accurate, inference. For example, several different fast programs have been developed in the very popular maximum likelihood framework, including RAxML/ExaML, PhyML, IQ-TREE, and FastTree. Although these programs are widely used, a systematic evaluation and comparison of their performance using empirical genome-scale data matrices has so far been lacking. To address this question, we evaluated these four programs on 19 empirical phylogenomic data sets with hundreds to thousands of genes and up to 200 taxa with respect to likelihood maximization, tree topology, and computational speed. For single-gene tree inference, we found that the more exhaustive and slower strategies (ten searches per alignment) outperformed faster strategies (one tree search per alignment) using RAxML, PhyML, or IQ-TREE. Interestingly, single-gene trees inferred by the three programs yielded comparable coalescent-based species tree estimations. For concatenation-based species tree inference, IQ-TREE consistently achieved the best-observed likelihoods for all data sets, and RAxML/ExaML was a close second. In contrast, PhyML often failed to complete concatenation-based analyses, whereas FastTree was the fastest but generated lower likelihood values and more dissimilar tree topologies in both types of analyses. Finally, data matrix properties, such as the number of taxa and the strength of phylogenetic signal, sometimes substantially influenced the programs’ relative performance. Our results provide real-world gene and species tree phylogenetic inference benchmarks to inform the design and execution of large-scale phylogenomic data analyses.
Collapse
Affiliation(s)
- Xiaofan Zhou
- Integrative Microbiology Research Centre, South China Agricultural University, Guangzhou, P.R. China.,Guangdong Province Key Laboratory of Microbial Signals and Disease Control, Department of Plant Pathology, South China Agricultural University, Guangzhou, P.R. China
| | - Xing-Xing Shen
- Department of Biological Sciences, Vanderbilt University, Nashville, TN
| | - Chris Todd Hittinger
- Laboratory of Genetics, Genome Center of Wisconsin, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, J. F. Crow Institute for the Study of Evolution, University of Wisconsin-Madison, Madison, WI
| | - Antonis Rokas
- Department of Biological Sciences, Vanderbilt University, Nashville, TN
| |
Collapse
|
46
|
Bravo GA, Antonelli A, Bacon CD, Bartoszek K, Blom MPK, Huynh S, Jones G, Knowles LL, Lamichhaney S, Marcussen T, Morlon H, Nakhleh LK, Oxelman B, Pfeil B, Schliep A, Wahlberg N, Werneck FP, Wiedenhoeft J, Willows-Munro S, Edwards SV. Embracing heterogeneity: coalescing the Tree of Life and the future of phylogenomics. PeerJ 2019; 7:e6399. [PMID: 30783571 PMCID: PMC6378093 DOI: 10.7717/peerj.6399] [Citation(s) in RCA: 67] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2018] [Accepted: 01/07/2019] [Indexed: 12/23/2022] Open
Abstract
Building the Tree of Life (ToL) is a major challenge of modern biology, requiring advances in cyberinfrastructure, data collection, theory, and more. Here, we argue that phylogenomics stands to benefit by embracing the many heterogeneous genomic signals emerging from the first decade of large-scale phylogenetic analysis spawned by high-throughput sequencing (HTS). Such signals include those most commonly encountered in phylogenomic datasets, such as incomplete lineage sorting, but also those reticulate processes emerging with greater frequency, such as recombination and introgression. Here we focus specifically on how phylogenetic methods can accommodate the heterogeneity incurred by such population genetic processes; we do not discuss phylogenetic methods that ignore such processes, such as concatenation or supermatrix approaches or supertrees. We suggest that methods of data acquisition and the types of markers used in phylogenomics will remain restricted until a posteriori methods of marker choice are made possible with routine whole-genome sequencing of taxa of interest. We discuss limitations and potential extensions of a model supporting innovation in phylogenomics today, the multispecies coalescent model (MSC). Macroevolutionary models that use phylogenies, such as character mapping, often ignore the heterogeneity on which building phylogenies increasingly rely and suggest that assimilating such heterogeneity is an important goal moving forward. Finally, we argue that an integrative cyberinfrastructure linking all steps of the process of building the ToL, from specimen acquisition in the field to publication and tracking of phylogenomic data, as well as a culture that values contributors at each step, are essential for progress.
Collapse
Affiliation(s)
- Gustavo A. Bravo
- Department of Organismic and Evolutionary Biology, Museum of Comparative Zoology, Harvard University, Cambridge, MA, USA
| | - Alexandre Antonelli
- Department of Organismic and Evolutionary Biology, Museum of Comparative Zoology, Harvard University, Cambridge, MA, USA
- Gothenburg Global Biodiversity Centre, Göteborg, Sweden
- Department of Biological and Environmental Sciences, University of Gothenburg, Göteborg, Sweden
- Gothenburg Botanical Garden, Göteborg, Sweden
| | - Christine D. Bacon
- Gothenburg Global Biodiversity Centre, Göteborg, Sweden
- Department of Biological and Environmental Sciences, University of Gothenburg, Göteborg, Sweden
| | - Krzysztof Bartoszek
- Department of Computer and Information Science, Linköping University, Linköping, Sweden
| | - Mozes P. K. Blom
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Stockholm, Sweden
| | - Stella Huynh
- Institut de Biologie, Université de Neuchâtel, Neuchâtel, Switzerland
| | - Graham Jones
- Department of Biological and Environmental Sciences, University of Gothenburg, Göteborg, Sweden
| | - L. Lacey Knowles
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA
| | - Sangeet Lamichhaney
- Department of Organismic and Evolutionary Biology, Museum of Comparative Zoology, Harvard University, Cambridge, MA, USA
| | - Thomas Marcussen
- Centre for Ecological and Evolutionary Synthesis, University of Oslo, Oslo, Norway
| | - Hélène Morlon
- Institut de Biologie, Ecole Normale Supérieure de Paris, Paris, France
| | - Luay K. Nakhleh
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Bengt Oxelman
- Gothenburg Global Biodiversity Centre, Göteborg, Sweden
- Department of Biological and Environmental Sciences, University of Gothenburg, Göteborg, Sweden
| | - Bernard Pfeil
- Department of Biological and Environmental Sciences, University of Gothenburg, Göteborg, Sweden
| | - Alexander Schliep
- Department of Computer Science and Engineering, Chalmers University of Technology and University of Gothenburg, Göteborg, Sweden
| | | | - Fernanda P. Werneck
- Coordenação de Biodiversidade, Programa de Coleções Científicas Biológicas, Instituto Nacional de Pesquisa da Amazônia, Manaus, AM, Brazil
| | - John Wiedenhoeft
- Department of Computer Science and Engineering, Chalmers University of Technology and University of Gothenburg, Göteborg, Sweden
- Department of Computer Science, Rutgers University, Piscataway, NJ, USA
| | - Sandi Willows-Munro
- School of Life Sciences, University of Kwazulu-Natal, Pietermaritzburg, South Africa
| | - Scott V. Edwards
- Department of Organismic and Evolutionary Biology, Museum of Comparative Zoology, Harvard University, Cambridge, MA, USA
- Gothenburg Centre for Advanced Studies in Science and Technology, Chalmers University of Technology and University of Gothenburg, Göteborg, Sweden
| |
Collapse
|
47
|
Luo D, Li Y, Zhao Q, Zhao L, Ludwig A, Peng Z. Highly Resolved Phylogenetic Relationships within Order Acipenseriformes According to Novel Nuclear Markers. Genes (Basel) 2019; 10:E38. [PMID: 30634684 PMCID: PMC6356338 DOI: 10.3390/genes10010038] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2018] [Revised: 12/28/2018] [Accepted: 01/02/2019] [Indexed: 11/16/2022] Open
Abstract
Order Acipenseriformes contains 27 extant species distributed across the northern hemisphere, including so-called "living fossil" species of garfish and sturgeons. Previous studies have focused on their mitochondrial genetics and have rarely used nuclear genetic data, leaving questions as to their phylogenetic relationships. This study aimed to utilize a bioinformatics approach to screen for candidate single-copy nuclear genes, using transcriptomic data from sturgeon species and genomic data from the spotted gar, Lepisosteus oculatus. We utilized nested polymerase chain reaction (PCR) and degenerate primers to identify nuclear protein-coding (NPC) gene markers to determine phylogenetic relationships among the Acipenseriformes. We identified 193 nuclear single-copy genes, selected from 1850 candidate genes with at least one exon larger than 700 bp. Forty-three of these genes were used for primer design and development of 30 NPC markers, which were sequenced for at least 14 Acipenseriformes species. Twenty-seven NPC markers were found completely in 16 species. Gene trees according to Bayesian inference (BI) and maximum likelihood (ML) were calculated based on the 30 NPC markers (20,946 bp total). Both gene and species trees produced very similar topologies. A molecular clock model estimated the divergence time between sturgeon and paddlefish at 204.1 Mya, approximately 10% later than previous estimates based on cytochrome b data (184.4 Mya). The successful development and application of NPC markers provides a new perspective and insight for the phylogenetic relationships of Acipenseriformes. Furthermore, the newly developed nuclear markers may be useful in further studies on the conservation, evolution, and genomic biology of this group.
Collapse
Affiliation(s)
- Dehuai Luo
- Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), Southwest University School of Life Sciences, Chongqing 400715, China.
| | - Yanping Li
- Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), Southwest University School of Life Sciences, Chongqing 400715, China.
- Shenzhen Key Lab of Marine Genomics, Guangdong Provincial Key Lab of Molecular Breeding in Marine Economic Animals, BGI Academy of Marine Sciences, BGI Marine, BGI, Shenzhen 518083, China.
| | - Qingyuan Zhao
- Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), Southwest University School of Life Sciences, Chongqing 400715, China.
| | - Lianpeng Zhao
- Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), Southwest University School of Life Sciences, Chongqing 400715, China.
| | - Arne Ludwig
- Department of Evolutionary Genetics, Leibniz-Institute for Zoo and Wildlife Research, 10315 Berlin, Germany.
| | - Zuogang Peng
- Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), Southwest University School of Life Sciences, Chongqing 400715, China.
| |
Collapse
|
48
|
Abstract
This study investigated long-term substitution rate differences using three calibration points, divergences between lobe-finned vertebrates and ray-finned fish, between mammals and sauropsids, and between holosteans (gar and bowfin) and teleost fish with amino acid sequence data of 625 genes for 25 bony vertebrates. The result showed that the substitution rate was two to three times higher in the stem branches of lobe-finned vertebrates before the mammal-sauropsid divergence than in amniotes. The rate in the stem branch of ray-finned fish before the holostean-teleost fish divergence was also a few times higher than the holostean rate, whereas it was similar to or somewhat slower than the teleost fish rate. The phylogenetic relationship of coelacanth and lungfish with tetrapod was difficult to determine because of the short interval of the divergences. Considering the high rate in the stem branches, the divergences of coelacanth and lungfish from the stem branch were estimated as 408–427 Ma and 399–414 Ma, respectively, with the interval of 9–13 Myr. With the external calibration of the mammal-sauropsid split, the estimated times for ordinal divergences within eutherian mammals tend to be smaller than those in previous studies that used the calibration points within the lineage, with deeper divergences before the Cretaceous–Paleogene boundary and shallower ones after the boundary. In contrast the estimated times within birds were larger than those of previous studies, with the divergence between Galliformes and Anseriformes ∼80 Ma and that between Galloanserae and Neoaves 110 Ma.
Collapse
Affiliation(s)
- Naoko Takezaki
- Life Science Research Center, Kagawa University, Kitagun, Kagawa, Japan
| |
Collapse
|
49
|
Hilton EJ, Lavoué S. A review of the systematic biology of fossil and living bony-tongue fishes, Osteoglossomorpha (Actinopterygii: Teleostei). NEOTROPICAL ICHTHYOLOGY 2018. [DOI: 10.1590/1982-0224-20180031] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
ABSTRACT The bony-tongue fishes, Osteoglossomorpha, have been the focus of a great deal of morphological, systematic, and evolutionary study, due in part to their basal position among extant teleostean fishes. This group includes the mooneyes (Hiodontidae), knifefishes (Notopteridae), the abu (Gymnarchidae), elephantfishes (Mormyridae), arawanas and pirarucu (Osteoglossidae), and the African butterfly fish (Pantodontidae). This morphologically heterogeneous group also has a long and diverse fossil record, including taxa from all continents and both freshwater and marine deposits. The phylogenetic relationships among most extant osteoglossomorph families are widely agreed upon. However, there is still much to discover about the systematic biology of these fishes, particularly with regard to the phylogenetic affinities of several fossil taxa, within Mormyridae, and the position of Pantodon. In this paper we review the state of knowledge for osteoglossomorph fishes. We first provide an overview of the diversity of Osteoglossomorpha, and then discuss studies of the phylogeny of Osteoglossomorpha from both morphological and molecular perspectives, as well as biogeographic analyses of the group. Finally, we offer our perspectives on future needs for research on the systematic biology of Osteoglossomorpha.
Collapse
Affiliation(s)
| | - Sébastien Lavoué
- National Taiwan University, Taiwan; Universiti Sains Malaysia, Malaysia
| |
Collapse
|
50
|
Wang X, Lim BK, Ting N, Hu J, Liang Y, Roos C, Yu L. Reconstructing the phylogeny of new world monkeys ( platyrrhini): evidence from multiple non-coding loci. Curr Zool 2018; 65:579-588. [PMID: 31616489 PMCID: PMC6784508 DOI: 10.1093/cz/zoy072] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2018] [Accepted: 09/12/2018] [Indexed: 11/27/2022] Open
Abstract
Among mammalian phylogenies, those characterized by rapid radiations are particularly problematic. The New World monkeys (NWMs, Platyrrhini) comprise 3 families and 7 subfamilies, which radiated within a relatively short time period. Accordingly, their phylogenetic relationships are still largely disputed. In the present study, 56 nuclear non-coding loci, including 33 introns (INs) and 23 intergenic regions (IGs), from 20 NWM individuals representing 18 species were used to investigate phylogenetic relationships among families and subfamilies. Of the 56 loci, 43 have not been used in previous NWM phylogenetics. We applied concatenation and coalescence tree-inference methods, and a recently proposed question-specific approach to address NWM phylogeny. Our results indicate incongruence between concatenation and coalescence methods for the IN and IG datasets. However, a consensus was reached with a single tree topology from all analyses of combined INs and IGs as well as all analyses of question-specific loci using both concatenation and coalescence methods, albeit with varying degrees of statistical support. In detail, our results indicated the sister-group relationships between the families Atelidae and Pitheciidae, and between the subfamilies Aotinae and Callithrichinae among Cebidae. Our study provides insights into the disputed phylogenetic relationships among NWM families and subfamilies from the perspective of multiple non-coding loci and various tree-inference approaches. However, the present phylogenetic framework needs further evaluation by adding more independent sequence data and a deeper taxonomic sampling. Overall, our work has important implications for phylogenetic studies dealing with rapid radiations.
Collapse
Affiliation(s)
- Xiaoping Wang
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan University, Kunming, China.,School of Life Sciences, Yunnan University, Kunming, China
| | - Burton K Lim
- Department of Natural History, Royal Ontario Museum, Toronto, ON, Canada
| | - Nelson Ting
- Department of Anthropology and Institute of Ecology and Evolution, University of Oregon, Eugene, Oregon, USA
| | - Jingyang Hu
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan University, Kunming, China.,State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China.,Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming, China
| | - Yunpeng Liang
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan University, Kunming, China
| | - Christian Roos
- Gene Bank of Primates and Primate Genetics Laboratory, German Primate Center, Leibniz Institute for Primate Research, Kellnerweg, Göttingen, Germany
| | - Li Yu
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan University, Kunming, China
| |
Collapse
|