1
|
Lozano-Fernandez J. A Practical Guide to Design and Assess a Phylogenomic Study. Genome Biol Evol 2022; 14:evac129. [PMID: 35946263 PMCID: PMC9452790 DOI: 10.1093/gbe/evac129] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/03/2022] [Indexed: 11/13/2022] Open
Abstract
Over the last decade, molecular systematics has undergone a change of paradigm as high-throughput sequencing now makes it possible to reconstruct evolutionary relationships using genome-scale datasets. The advent of "big data" molecular phylogenetics provided a battery of new tools for biologists but simultaneously brought new methodological challenges. The increase in analytical complexity comes at the price of highly specific training in computational biology and molecular phylogenetics, resulting very often in a polarized accumulation of knowledge (technical on one side and biological on the other). Interpreting the robustness of genome-scale phylogenetic studies is not straightforward, particularly as new methodological developments have consistently shown that the general belief of "more genes, more robustness" often does not apply, and because there is a range of systematic errors that plague phylogenomic investigations. This is particularly problematic because phylogenomic studies are highly heterogeneous in their methodology, and best practices are often not clearly defined. The main aim of this article is to present what I consider as the ten most important points to take into consideration when planning a well-thought-out phylogenomic study and while evaluating the quality of published papers. The goal is to provide a practical step-by-step guide that can be easily followed by nonexperts and phylogenomic novices in order to assess the technical robustness of phylogenomic studies or improve the experimental design of a project.
Collapse
Affiliation(s)
- Jesus Lozano-Fernandez
- Department of Genetics, Microbiology and Statistics, Biodiversity Research Institute (IRBio), University of Barcelona, Avd. Diagonal 643, 08028 Barcelona, Spain
- Institute of Evolutionary Biology (CSIC – Universitat Pompeu Fabra), Passeig marítim de la Barcelona 37-49, 08003 Barcelona, Spain
| |
Collapse
|
2
|
Literman R, Schwartz R. Genome-Scale Profiling Reveals Noncoding Loci Carry Higher Proportions of Concordant Data. Mol Biol Evol 2021; 38:2306-2318. [PMID: 33528497 PMCID: PMC8136493 DOI: 10.1093/molbev/msab026] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Many evolutionary relationships remain controversial despite whole-genome sequencing data. These controversies arise, in part, due to challenges associated with accurately modeling the complex phylogenetic signal coming from genomic regions experiencing distinct evolutionary forces. Here, we examine how different regions of the genome support or contradict well-established relationships among three mammal groups using millions of orthologous parsimony-informative biallelic sites (PIBS) distributed across primate, rodent, and Pecora genomes. We compared PIBS concordance percentages among locus types (e.g. coding sequences (CDS), introns, intergenic regions), and contrasted PIBS utility over evolutionary timescales. Sites derived from noncoding sequences provided more data and proportionally more concordant sites compared with those from CDS in all clades. CDS PIBS were also predominant drivers of tree incongruence in two cases of topological conflict. PIBS derived from most locus types provided surprisingly consistent support for splitting events spread across the timescales we examined, although we find evidence that CDS and intronic PIBS may, respectively and to a limited degree, inform disproportionately about older and younger splits. In this era of accessible wholegenome sequence data, these results:1) suggest benefits to more intentionally focusing on noncoding loci as robust data for tree inference and 2) reinforce the importance of accurate modeling, especially when using CDS data.
Collapse
Affiliation(s)
- Robert Literman
- Department of Biological Sciences, University of Rhode Island, South Kingstown, RI, USA.,Center for Food Safety and Applied Nutrition, Office of Regulatory Science, U.S. Food and Drug Administration, College Park, MD, USA
| | - Rachel Schwartz
- Department of Biological Sciences, University of Rhode Island, South Kingstown, RI, USA
| |
Collapse
|
3
|
Optimal markers for the identification of Colletotrichum species. Mol Phylogenet Evol 2019; 143:106694. [PMID: 31786239 DOI: 10.1016/j.ympev.2019.106694] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2019] [Revised: 10/15/2019] [Accepted: 11/25/2019] [Indexed: 01/19/2023]
Abstract
Colletotrichum is among the most important genera of fungal plant pathogens. Molecular phylogenetic studies over the last decade have resulted in a much better understanding of the evolutionary relationships and species boundaries within the genus. There are now approximately 200 species accepted, most of which are distributed among 13 species complexes. Given their prominence on agricultural crops around the world, rapid identification of a large collection of Colletotrichum isolates is routinely needed by plant pathologists, regulatory officials, and fungal biologists. However, there is no agreement on the best molecular markers to discriminate species in each species complex. Here we calculate the barcode gap distance and intra/inter-specific distance overlap to evaluate each of the most commonly applied molecular markers for their utility as a barcode for species identification. Glyceraldehyde-3-phosphate dehydrogenase (GAPDH), histone-3 (HIS3), DNA lyase (APN2), intergenic spacer between DNA lyase and the mating-type locus MAT1-2-1 (APN2/MAT-IGS), and intergenic spacer between GAPDH and a hypothetical protein (GAP2-IGS) have the properties of good barcodes, whereas sequences of actin (ACT), chitin synthase (CHS-1) and nuclear rDNA internal transcribed spacers (nrITS) are not able to distinguish most species. Finally, we assessed the utility of these markers for phylogenetic studies using phylogenetic informativeness profiling, the genealogical sorting index (GSI), and Bayesian concordance analyses (BCA). Although GAPDH, HIS3 and β-tubulin (TUB2) were frequently among the best markers, there was not a single set of markers that were best for all species complexes. Eliminating markers with low phylogenetic signal tends to decrease uncertainty in the topology, regardless of species complex, and leads to a larger proportion of markers that support each lineage in the Bayesian concordance analyses. Finally, we reconstruct the phylogeny of each species complex using a minimal set of phylogenetic markers with the strongest phylogenetic signal and find the majority of species are strongly supported as monophyletic.
Collapse
|
4
|
Fu CN, Mo ZQ, Yang JB, Ge XJ, Li DZ, Xiang QY(J, Gao LM. Plastid phylogenomics and biogeographic analysis support a trans-Tethyan origin and rapid early radiation of Cornales in the Mid-Cretaceous. Mol Phylogenet Evol 2019; 140:106601. [DOI: 10.1016/j.ympev.2019.106601] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2019] [Revised: 08/17/2019] [Accepted: 08/20/2019] [Indexed: 12/14/2022]
|
5
|
Effects of missing data and data type on phylotranscriptomic analysis of stony corals (Cnidaria: Anthozoa: Scleractinia). Mol Phylogenet Evol 2019; 134:12-23. [DOI: 10.1016/j.ympev.2019.01.012] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Revised: 01/11/2019] [Accepted: 01/17/2019] [Indexed: 01/28/2023]
|
6
|
Liu Y, Liu S, Yeh CF, Zhang N, Chen G, Que P, Dong L, Li SH. The first set of universal nuclear protein-coding loci markers for avian phylogenetic and population genetic studies. Sci Rep 2018; 8:15723. [PMID: 30356056 PMCID: PMC6200822 DOI: 10.1038/s41598-018-33646-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2018] [Accepted: 09/21/2018] [Indexed: 11/24/2022] Open
Abstract
Multiple nuclear markers provide genetic polymorphism data for molecular systematics and population genetic studies. They are especially required for the coalescent-based analyses that can be used to accurately estimate species trees and infer population demographic histories. However, in avian evolutionary studies, these powerful coalescent-based methods are hindered by the lack of a sufficient number of markers. In this study, we designed PCR primers to amplify 136 nuclear protein-coding loci (NPCLs) by scanning the published Red Junglefowl (Gallus gallus) and Zebra Finch (Taeniopygia guttata) genomes. To test their utility, we amplified these loci in 41 bird species representing 23 Aves orders. The sixty-three best-performing NPCLs, based on high PCR success rates, were selected which had various mutation rates and were evenly distributed across 17 avian autosomal chromosomes and the Z chromosome. To test phylogenetic resolving power of these markers, we conducted a Neoavian phylogenies analysis using 63 concatenated NPCL markers derived from 48 whole genomes of birds. The resulting phylogenetic topology, to a large extent, is congruence with results resolved by previous whole genome data. To test the level of intraspecific polymorphism in these makers, we examined the genetic diversity in four populations of the Kentish Plover (Charadrius alexandrinus) at 17 of NPCL markers chosen at random. Our results showed that these NPCL markers exhibited a level of polymorphism comparable with mitochondrial loci. Therefore, this set of pan-avian nuclear protein-coding loci has great potential to facilitate studies in avian phylogenetics and population genetics.
Collapse
Affiliation(s)
- Yang Liu
- State Key Laboratory of Biocontrol, Department of Ecology/School of Life Sciences, Sun Yat-sen University, Guangzhou, 510275, Guangdong, China
| | - Simin Liu
- State Key Laboratory of Biocontrol, Department of Ecology/School of Life Sciences, Sun Yat-sen University, Guangzhou, 510275, Guangdong, China
| | - Chia-Fen Yeh
- Department of Life Sciences, National Taiwan Normal University, Taipei, 116, Taiwan, China
| | - Nan Zhang
- State Key Laboratory of Biocontrol, Department of Ecology/School of Life Sciences, Sun Yat-sen University, Guangzhou, 510275, Guangdong, China
| | - Guoling Chen
- State Key Laboratory of Biocontrol, Department of Ecology/School of Life Sciences, Sun Yat-sen University, Guangzhou, 510275, Guangdong, China
| | - Pinjia Que
- Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, 100875, China
| | - Lu Dong
- Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, 100875, China.
| | - Shou-Hsien Li
- Department of Life Sciences, National Taiwan Normal University, Taipei, 116, Taiwan, China.
| |
Collapse
|
7
|
Kuang T, Tornabene L, Li J, Jiang J, Chakrabarty P, Sparks JS, Naylor GJP, Li C. Phylogenomic analysis on the exceptionally diverse fish clade Gobioidei (Actinopterygii: Gobiiformes) and data-filtering based on molecular clocklikeness. Mol Phylogenet Evol 2018; 128:192-202. [PMID: 30036699 DOI: 10.1016/j.ympev.2018.07.018] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2017] [Revised: 07/11/2018] [Accepted: 07/17/2018] [Indexed: 11/30/2022]
Abstract
The use of genome-scale data to infer phylogenetic relationships has gained in popularity in recent years due to the progress made in target-gene capture and sequencing techniques. Data filtering, the approach of excluding data inconsistent with the model from analyses, presumably could alleviate problems caused by systematic errors in phylogenetic inference. Different data filtering criteria, such as those based on evolutionary rate and molecular clocklikeness as well as others have been proposed for selecting useful phylogenetic markers, yet few studies have tested these criteria using phylogenomic data. We developed a novel set of single-copy nuclear coding markers to capture thousands of target genes in gobioid fishes, a species-rich lineages of vertebrates, and tested the effects of data-filtering methods based on substitution rate and molecular clocklikeness while attempting to control for the compounding effects of missing data and variation in locus length. We found that molecular clocklikeness was a better predictor than overall substitution rate for phylogenetic usefulness of molecular markers in our study. In addition, when the 100 best ranked loci for our predictors were concatenated and analyzed using maximum likelihood, or combined in a coalescent-based species-tree analysis, the resulting trees showed a well-resolved topology of Gobioidei that mostly agrees with previous studies. However, trees generated from the 100 least clocklike frequently recovered conflicting, and in some cases clearly erroneous topologies with strong support, thus indicating strong systematic biases in those datasets. Collectively these results suggest that data filtering has the potential improve the performance of phylogenetic inference when using both a concatenation approach as well as methods that rely on input from individual gene trees (i.e. coalescent species-tree approaches), which may be preferred in scenarios where incomplete lineage sorting is likely to be an issue.
Collapse
Affiliation(s)
- Ting Kuang
- Shanghai Universities Key Laboratory of Marine Animal Taxonomy and Evolution, Shanghai, China; Shanghai Collaborative Innovation for Aquatic Animal Genetics and Breeding, Shanghai, China; National Demonstration Center for Experimental Fisheries Science Education (Shanghai Ocean University), China
| | - Luke Tornabene
- School of Aquatic and Fishery Sciences, University of Washington, Seattle, WA 98105, USA
| | - Jingyan Li
- Shanghai Universities Key Laboratory of Marine Animal Taxonomy and Evolution, Shanghai, China; Shanghai Collaborative Innovation for Aquatic Animal Genetics and Breeding, Shanghai, China; National Demonstration Center for Experimental Fisheries Science Education (Shanghai Ocean University), China
| | - Jiamei Jiang
- Shanghai Universities Key Laboratory of Marine Animal Taxonomy and Evolution, Shanghai, China; Shanghai Collaborative Innovation for Aquatic Animal Genetics and Breeding, Shanghai, China; National Demonstration Center for Experimental Fisheries Science Education (Shanghai Ocean University), China
| | - Prosanta Chakrabarty
- Louisiana State University, Museum of Natural Science, Department of Biological Sciences, Baton Rouge, LA 70803, USA
| | - John S Sparks
- American Museum of Natural History, Central Park West at 79th Street, NY, NY 10024, USA
| | | | - Chenhong Li
- Shanghai Universities Key Laboratory of Marine Animal Taxonomy and Evolution, Shanghai, China; Shanghai Collaborative Innovation for Aquatic Animal Genetics and Breeding, Shanghai, China; National Demonstration Center for Experimental Fisheries Science Education (Shanghai Ocean University), China.
| |
Collapse
|
8
|
Che LH, Zhang SQ, Li Y, Liang D, Pang H, Ślipiński A, Zhang P. Genome-wide survey of nuclear protein-coding markers for beetle phylogenetics and their application in resolving both deep and shallow-level divergences. Mol Ecol Resour 2017; 17:1342-1358. [DOI: 10.1111/1755-0998.12664] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2016] [Revised: 01/09/2017] [Accepted: 02/14/2017] [Indexed: 11/27/2022]
Affiliation(s)
- Li-Heng Che
- State Key Laboratory of Biocontrol; College of Ecology and Evolution; School of Life Sciences; Sun Yat-Sen University; Guangzhou 510006; Guangdong Province China
| | - Shao-Qian Zhang
- State Key Laboratory of Biocontrol; College of Ecology and Evolution; School of Life Sciences; Sun Yat-Sen University; Guangzhou 510006; Guangdong Province China
| | - Yun Li
- State Key Laboratory of Biocontrol; College of Ecology and Evolution; School of Life Sciences; Sun Yat-Sen University; Guangzhou 510006; Guangdong Province China
| | - Dan Liang
- State Key Laboratory of Biocontrol; College of Ecology and Evolution; School of Life Sciences; Sun Yat-Sen University; Guangzhou 510006; Guangdong Province China
| | - Hong Pang
- State Key Laboratory of Biocontrol; College of Ecology and Evolution; School of Life Sciences; Sun Yat-Sen University; Guangzhou 510006; Guangdong Province China
| | - Adam Ślipiński
- Australian National Insect Collection; CSIRO; GPO Box 1700 Canberra ACT 2601 Australia
| | - Peng Zhang
- State Key Laboratory of Biocontrol; College of Ecology and Evolution; School of Life Sciences; Sun Yat-Sen University; Guangzhou 510006; Guangdong Province China
| |
Collapse
|
9
|
Irisarri I, Meyer A. The Identification of the Closest Living Relative(s) of Tetrapods: Phylogenomic Lessons for Resolving Short Ancient Internodes. Syst Biol 2016; 65:1057-1075. [PMID: 27425642 DOI: 10.1093/sysbio/syw057] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2015] [Accepted: 06/08/2016] [Indexed: 01/08/2023] Open
Abstract
Identifying the closest living relative(s) of tetrapods is an important, yet still contested question in vertebrate phylogenetics. Three hypotheses are possible and ruling out alternatives has proven difficult even with large molecular data sets due to weak phylogenetic signal coupled nonphylogenetic noise resulting from relatively rapid speciation events that occurred a long time ago ([Formula: see text]400 Ma). Here, we revisit the identity of the closest living relative of land vertebrates from a phylogenomic perspective and include new genomic data for all extant lungfish genera. RNA-seq proves to be a great alternative to genomic sequencing, which currently is technically not feasible in lungfishes due to their huge (50-130 Gb) and repetitive genomes. We examined the most important sources of systematic error, namely long-branch attraction (LBA), compositional heterogeneity and distribution of missing data and applied different correction techniques. A multispecies coalescent approach is used to account for deep coalescence that might come from the short and deep internodes separating early sarcopterygian splits. Concatenation methods favored lungfishes as the closest living relatives of tetrapods with strong statistical support. Amino acid profile mixture models can unambiguously resolve this difficult internode thanks to their ability to avoid systematic error. We assessed the performance of different site-heterogeneous models and data partitioning and compared the ability of different strategies designed to overcome LBA, including taxon manipulation, reduction of among-lineage rate heterogeneity and removal of fast-evolving or compositionally heterogeneous positions. The identification of lungfish as sister group of tetrapods is robust regarding the effects of nonstationary composition and distribution of missing data. The multispecies coalescent method reconstructed strongly supported topologies that were congruent with concatenation, despite pervasive gene tree heterogeneity. We reject alternative topologies for early sarcopterygian relationships by increasing the signal-to-noise ratio in our alignments. The analytical pipeline outlined here combines probabilistic phylogenomic inference with methods for evaluating data quality, model adequacy, and assessing systematic error, and thus is likely to help resolve similarly difficult internodes in the tree of life. [Coalescence; coelacanth; compositional heterogeneity; gene tree; long-branch attraction; lungfish; missing data; model misspecification; phylogenomic; species tree; systematic error.].
Collapse
Affiliation(s)
- Iker Irisarri
- Laboratory for Zoology and Evolutionary Biology, Department of Biology, University of Konstanz, 78464 Konstanz, Germany
| | - Axel Meyer
- Laboratory for Zoology and Evolutionary Biology, Department of Biology, University of Konstanz, 78464 Konstanz, Germany
| |
Collapse
|
10
|
Balasundaram SV, Engh IB, Skrede I, Kauserud H. How many DNA markers are needed to reveal cryptic fungal species? Fungal Biol 2015; 119:940-945. [DOI: 10.1016/j.funbio.2015.07.006] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2015] [Revised: 06/17/2015] [Accepted: 07/14/2015] [Indexed: 10/23/2022]
|
11
|
Wang B, Zhang Y, Wei P, Sun M, Ma X, Zhu X. Identification of nuclear low-copy genes and their phylogenetic utility in rosids. Genome 2015; 57:547-54. [PMID: 25761707 DOI: 10.1139/gen-2014-0138] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
By far, the interordinal relationships in rosids remain poorly resolved. Previous studies based on chloroplast, mitochondrial, and nuclear DNA has produced conflicting phylogenetic resolutions that has become a widely concerned problem in recent phylogenetic studies. Here, a total of 96 single-copy nuclear gene loci were identified from the KOG (eukaryotic orthologous groups) database, most of which were first used for phylogenetic analysis of angiosperms. The orthologous sequence datasets from completely sequenced genomes of rosids were assembled for the resolution of the position of the COM (Celastrales-Oxalidales-Malpighiales) clade in rosids. Our analysis revealed strong and consistent support for CM topology (the COM clade as sister to the malvids). Our results will contribute to further exploring the underlying cause of conflict between chloroplast, mitochondrial, and nuclear data. In addition, our study identified a few novel nuclear molecular markers with potential to investigate the deep phylogenetic relationship of plants or other eukaryotic taxonomical groups.
Collapse
Affiliation(s)
- Baohua Wang
- School of Life Sciences, Nantong University, Nantong 226019, China
| | | | | | | | | | | |
Collapse
|
12
|
Hilu KW, Black CM, Oza D. Impact of gene molecular evolution on phylogenetic reconstruction: a case study in the rosids (Superorder Rosanae, Angiosperms). PLoS One 2014; 9:e99725. [PMID: 24932884 PMCID: PMC4059714 DOI: 10.1371/journal.pone.0099725] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2014] [Accepted: 05/18/2014] [Indexed: 11/19/2022] Open
Abstract
Rate of substitution of genomic regions is among the most debated intrinsic features that impact phylogenetic informativeness. However, this variable is also coupled with rates of nonsynonymous substitutions that underscore the nature and degree of selection on the selected genes. To empirically address these variables, we constructed four completely overlapping data sets of plastid matK, atpB, rbcL, and mitochondrial matR genes and used the rosid lineage (angiosperms) as a working platform. The genes differ in combinations of overall rates of nucleotide and amino acid substitutions. Tree robustness, homoplasy, accuracy in contrast to a reference tree, and phylogenetic informativeness are evaluated. The rapidly evolving/unconstrained matK faired best, whereas remaining genes varied in degrees of contribution to rosid phylogenetics across the lineage's 108 million years evolutionary history. Phylogenetic accuracy was low with the slowly evolving/unconstrained matR despite least amount of homoplasy. Third codon positions contributed the highest amount of parsimony informative sites, resolution and informativeness, but magnitude varied with gene mode of evolution. These findings are in clear contrast with the views that rapidly evolving regions and the 3rd codon position have inevitable negative impact on phylogenetic reconstruction at deep historic level due to accumulation of multiple hits and subsequent elevation in homoplasy and saturation. Relaxed evolutionary constraint in rapidly evolving genes distributes substitutions across codon positions, an evolutionary mode expected to reduce the frequency of multiple hits. These findings should be tested at deeper evolutionary histories.
Collapse
Affiliation(s)
- Khidir W. Hilu
- Department of Biological Sciences, Virginia Tech, Blacksburg, Virginia, United States of America
| | - Chelsea M. Black
- Department of Biological Sciences, Virginia Tech, Blacksburg, Virginia, United States of America
| | - Dipan Oza
- Department of Biological Sciences, Virginia Tech, Blacksburg, Virginia, United States of America
| |
Collapse
|
13
|
Su Z, Wang Z, López-Giráldez F, Townsend JP. The impact of incorporating molecular evolutionary model into predictions of phylogenetic signal and noise. Front Ecol Evol 2014. [DOI: 10.3389/fevo.2014.00011] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
14
|
Bayly MJ, Rigault P, Spokevicius A, Ladiges PY, Ades PK, Anderson C, Bossinger G, Merchant A, Udovicic F, Woodrow IE, Tibbits J. Chloroplast genome analysis of Australian eucalypts – Eucalyptus, Corymbia, Angophora, Allosyncarpia and Stockwellia (Myrtaceae). Mol Phylogenet Evol 2013; 69:704-16. [DOI: 10.1016/j.ympev.2013.07.006] [Citation(s) in RCA: 68] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2013] [Revised: 06/28/2013] [Accepted: 07/08/2013] [Indexed: 12/01/2022]
|
15
|
Cruaud A, Underhill JG, Huguin M, Genson G, Jabbour-Zahab R, Tolley KA, Rasplus JY, van Noort S. A multilocus phylogeny of the world Sycoecinae fig wasps (Chalcidoidea: Pteromalidae). PLoS One 2013; 8:e79291. [PMID: 24223925 PMCID: PMC3818460 DOI: 10.1371/journal.pone.0079291] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2012] [Accepted: 09/22/2013] [Indexed: 11/25/2022] Open
Abstract
The Sycoecinae is one of five chalcid subfamilies of fig wasps that are mostly dependent on Ficus inflorescences for reproduction. Here, we analysed two mitochondrial (COI, Cytb) and four nuclear genes (ITS2, EF-1α, RpL27a, mago nashi) from a worldwide sample of 56 sycoecine species. Various alignment and partitioning strategies were used to test the stability of major clades. All topologies estimated using maximum likelihood and Bayesian methods were similar and well resolved but did not support the existing classification. A high degree of morphological convergence was highlighted and several species appeared best described as species complexes. We therefore proposed a new classification for the subfamily. Our analyses revealed several cases of probable speciation on the same host trees (up to 8 closely related species on one single tree of F. sumatrana), which raises the question of how resource partitioning occurs to avoid competitive exclusion. Comparisons of our results with fig phylogenies showed that, despite sycoecines being internally ovipositing wasps host-switches are common incidents in their evolutionary history. Finally, by studying the evolutionary properties of the markers we used and profiling their phylogenetic informativeness, we predicted their utility for resolving phylogenetic relationships of Chalcidoidea at various taxonomic levels.
Collapse
Affiliation(s)
- Astrid Cruaud
- INRA, UMR1062 CBGP Centre de Biologie pour la Gestion des Populations, Montferrier-sur-Lez, France
| | - Jenny G. Underhill
- South African National Biodiversity Institute, Kirstenbosch Research Centre, Cape Town, South Africa
| | - Maïlis Huguin
- INRA, UMR1062 CBGP Centre de Biologie pour la Gestion des Populations, Montferrier-sur-Lez, France
| | - Gwenaëlle Genson
- INRA, UMR1062 CBGP Centre de Biologie pour la Gestion des Populations, Montferrier-sur-Lez, France
| | - Roula Jabbour-Zahab
- INRA, UMR1062 CBGP Centre de Biologie pour la Gestion des Populations, Montferrier-sur-Lez, France
| | - Krystal A. Tolley
- South African National Biodiversity Institute, Kirstenbosch Research Centre, Cape Town, South Africa
- Department of Zoology, University of Cape Town, Rondebosch, South Africa
| | - Jean-Yves Rasplus
- INRA, UMR1062 CBGP Centre de Biologie pour la Gestion des Populations, Montferrier-sur-Lez, France
| | - Simon van Noort
- Natural History Division, South African Museum, Iziko Museums of Cape Town, Cape Town, South Africa
- Department of Zoology, University of Cape Town, Rondebosch, South Africa
| |
Collapse
|
16
|
Baeza JA, Fuentes MS. Exploring phylogenetic informativeness and nuclear copies of mitochondrial DNA (numts) in three commonly used mitochondrial genes: mitochondrial phylogeny of peppermint, cleaner, and semi-terrestrial shrimps (Caridea:Lysmata,Exhippolysmata, andMerguia). Zool J Linn Soc 2013. [DOI: 10.1111/zoj.12044] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
|
17
|
Shen XX, Liang D, Feng YJ, Chen MY, Zhang P. A versatile and highly efficient toolkit including 102 nuclear markers for vertebrate phylogenomics, tested by resolving the higher level relationships of the caudata. Mol Biol Evol 2013; 30:2235-48. [PMID: 23827877 DOI: 10.1093/molbev/mst122] [Citation(s) in RCA: 61] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Resolving difficult nodes for any part of the vertebrate tree of life often requires analyzing a large number of loci. Developing molecular markers that are workable for the groups of interest is often a bottleneck in phylogenetic research. Here, on the basis of a nested polymerase chain reaction (PCR) strategy, we present a universal toolkit including 102 nuclear protein-coding locus (NPCL) markers for vertebrate phylogenomics. The 102 NPCL markers have a broad range of evolutionary rates, which makes them useful for a wide range of time depths. The new NPCL toolkit has three important advantages compared with all previously developed NPCL sets: 1) the kit is universally applicable across vertebrates, with a PCR success rate of 94.6% in 16 widely divergent tested vertebrate species; 2) more than 90% of PCR reactions produce strong and single bands of the expected sizes that can be directly sequenced; and 3) all cleanup PCR reactions can be sequenced with only two specific universal primers. To test its actual phylogenetic utility, 30 NPCLs from this toolkit were used to address the higher level relationships of living salamanders. Of the 639 target PCR reactions performed on 19 salamanders and several outgroup species, 632 (98.9%) were successful, and 602 (94.1%) were directly sequenced. Concatenation and species-tree analyses on this 30-locus data set produced a fully resolved phylogeny and showed that Cryptobranchoidea (Cryptobranchidae + Hynobiidae) branches first within the salamander tree, followed by Sirenidae. Our experimental tests and our demonstration for a particular case show that our NPCL toolkit is a highly reliable, fast, and cost-effective approach for vertebrate phylogenomic studies and thus has the potential to accelerate the completion of many parts of the vertebrate tree of life.
Collapse
Affiliation(s)
- Xing Xing Shen
- Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou, China
| | | | | | | | | |
Collapse
|
18
|
Morrison JT, Bantilan NS, Wang VN, Nellett KM, Cruz YP. Expression patterns of Oct4, Cdx2, Tead4, and Yap1 proteins during blastocyst formation in embryos of the marsupial,Monodelphis domesticaWagner. Evol Dev 2013; 15:171-85. [DOI: 10.1111/ede.12031] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Affiliation(s)
- J. T. Morrison
- Department of Biology; Oberlin College; Oberlin, OH 44074; USA
| | - N. S. Bantilan
- Department of Biology; Oberlin College; Oberlin, OH 44074; USA
| | - V. N. Wang
- Department of Biology; Oberlin College; Oberlin, OH 44074; USA
| | - K. M. Nellett
- Department of Biology; Oberlin College; Oberlin, OH 44074; USA
| | - Y. P. Cruz
- Department of Biology; Oberlin College; Oberlin, OH 44074; USA
| |
Collapse
|
19
|
Deep metazoan phylogeny: When different genes tell different stories. Mol Phylogenet Evol 2013; 67:223-33. [DOI: 10.1016/j.ympev.2013.01.010] [Citation(s) in RCA: 200] [Impact Index Per Article: 18.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2012] [Revised: 01/08/2013] [Accepted: 01/12/2013] [Indexed: 11/30/2022]
|
20
|
Coalescent-based genome analyses resolve the early branches of the euarchontoglires. PLoS One 2013; 8:e60019. [PMID: 23560065 PMCID: PMC3613385 DOI: 10.1371/journal.pone.0060019] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2012] [Accepted: 02/20/2013] [Indexed: 11/19/2022] Open
Abstract
Despite numerous large-scale phylogenomic studies, certain parts of the mammalian tree are extraordinarily difficult to resolve. We used the coding regions from 19 completely sequenced genomes to study the relationships within the super-clade Euarchontoglires (Primates, Rodentia, Lagomorpha, Dermoptera and Scandentia) because the placement of Scandentia within this clade is controversial. The difficulty in resolving this issue is due to the short time spans between the early divergences of Euarchontoglires, which may cause incongruent gene trees. The conflict in the data can be depicted by network analyses and the contentious relationships are best reconstructed by coalescent-based analyses. This method is expected to be superior to analyses of concatenated data in reconstructing a species tree from numerous gene trees. The total concatenated dataset used to study the relationships in this group comprises 5,875 protein-coding genes (9,799,170 nucleotides) from all orders except Dermoptera (flying lemurs). Reconstruction of the species tree from 1,006 gene trees using coalescent models placed Scandentia as sister group to the primates, which is in agreement with maximum likelihood analyses of concatenated nucleotide sequence data. Additionally, both analytical approaches favoured the Tarsier to be sister taxon to Anthropoidea, thus belonging to the Haplorrhine clade. When divergence times are short such as in radiations over periods of a few million years, even genome scale analyses struggle to resolve phylogenetic relationships. On these short branches processes such as incomplete lineage sorting and possibly hybridization occur and make it preferable to base phylogenomic analyses on coalescent methods.
Collapse
|
21
|
Fong JJ, Brown JM, Fujita MK, Boussau B. A phylogenomic approach to vertebrate phylogeny supports a turtle-archosaur affinity and a possible paraphyletic lissamphibia. PLoS One 2012; 7:e48990. [PMID: 23145043 PMCID: PMC3492174 DOI: 10.1371/journal.pone.0048990] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2012] [Accepted: 10/03/2012] [Indexed: 01/18/2023] Open
Abstract
In resolving the vertebrate tree of life, two fundamental questions remain: 1) what is the phylogenetic position of turtles within amniotes, and 2) what are the relationships between the three major lissamphibian (extant amphibian) groups? These relationships have historically been difficult to resolve, with five different hypotheses proposed for turtle placement, and four proposed branching patterns within Lissamphibia. We compiled a large cDNA/EST dataset for vertebrates (75 genes for 129 taxa) to address these outstanding questions. Gene-specific phylogenetic analyses revealed a great deal of variation in preferred topology, resulting in topologically ambiguous conclusions from the combined dataset. Due to consistent preferences for the same divergent topologies across genes, we suspected systematic phylogenetic error as a cause of some variation. Accordingly, we developed and tested a novel statistical method that identifies sites that have a high probability of containing biased signal for a specific phylogenetic relationship. After removing putatively biased sites, support emerged for a sister relationship between turtles and either crocodilians or archosaurs, as well as for a caecilian-salamander sister relationship within Lissamphibia, with Lissamphibia potentially paraphyletic.
Collapse
Affiliation(s)
- Jonathan J Fong
- Museum of Vertebrate Zoology, University of California, Berkeley, CA, USA.
| | | | | | | |
Collapse
|
22
|
Application of the phylogenetic informativeness method to chloroplast markers: a test case of closely related species in tribe Hydrangeeae (Hydrangeaceae). Mol Phylogenet Evol 2012; 66:233-42. [PMID: 23063487 DOI: 10.1016/j.ympev.2012.09.029] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2012] [Revised: 09/19/2012] [Accepted: 09/24/2012] [Indexed: 11/21/2022]
Abstract
In evolutionary biology appropriate marker selection for the reconstruction of solid phylogenetic hypotheses is fundamental. One of the most challenging tasks addresses the appropriate choice of genomic regions in studies of closely related species. Robust phylogenetic frameworks are central to studies dealing with questions ranging from evolutionary and conservation biology, biogeography to plant breeding. Phylogenetic informativeness profiles provide a quantitative measure of the phylogenetic signal in markers and therefore a method for locus prioritization. The present work profiles phylogenetic informativeness of mostly non-coding chloroplast regions in an angiosperm lineage of closely related species: the popular ornamental tribe Hydrangeeae (Hydrangeaceae, Cornales, Asterids). A recent phylogenetic study denoted a case of resolution contrast between the two strongly supported clades within tribe Hydrangeeae. We evaluate the phylogenetic signal of 13 highly variable plastid markers for estimating relationships within and among the currently recognized monophyletic groups of this tribe. A selection of combined loci based on their phylogenetic informativeness retrieved more robust phylogenetic hypotheses than simply combining individual markers performing best with respect to resolution, nodal support and accuracy or those presenting the highest number of parsimony informative characters. We propose the rpl32-ndhF intergenic spacer (IGS), trnV-ndhC IGS, trnL-rpl32 IGS, psbT-petB region and ndhA intron as the best candidates for future phylogenetic studies in Hydrangeeae and potentially in other Asterids. We also contrasted the phylogenetic informativeness of coded indels against substitutions concluding that, despite their low phylogenetic informativeness, coded indels provide additional phylogenetic signal that is nearly free of noise. Phylogenetic relationships obtained from our total combined analyses showed improved resolution and nodal support with respect to recently published results.
Collapse
|
23
|
Hedin M, Starrett J, Akhter S, Schönhofer AL, Shultz JW. Phylogenomic resolution of paleozoic divergences in harvestmen (Arachnida, Opiliones) via analysis of next-generation transcriptome data. PLoS One 2012; 7:e42888. [PMID: 22936998 PMCID: PMC3427324 DOI: 10.1371/journal.pone.0042888] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2012] [Accepted: 07/12/2012] [Indexed: 11/19/2022] Open
Abstract
Next-generation sequencing technologies are rapidly transforming molecular systematic studies of non-model animal taxa. The arachnid order Opiliones (commonly known as "harvestmen") includes more than 6,400 described species placed into four well-supported lineages (suborders). Fossil plus molecular clock evidence indicates that these lineages were diverging in the late Silurian to mid-Carboniferous, with some fossil harvestmen representing the earliest known land animals. Perhaps because of this ancient divergence, phylogenetic resolution of subordinal interrelationships within Opiliones has been difficult. We present the first phylogenomics analysis for harvestmen, derived from comparative RNA-Seq data for eight species representing all suborders. Over 30 gigabases of original Illumina short-read data were used in de novo assemblies, resulting in 50-80,000 transcripts per taxon. Transcripts were compared to published scorpion and tick genomics data, and a stringent filtering process was used to identify over 350 putatively single-copy, orthologous protein-coding genes shared among taxa. Phylogenetic analyses using various partitioning strategies, data coding schemes, and analytical methods overwhelmingly support the "classical" hypothesis of Opiliones relationships, including the higher-level clades Palpatores and Phalangida. Relaxed molecular clock analyses using multiple alternative fossil calibration strategies corroborate ancient divergences within Opiliones that are possibly deeper than the recorded fossil record indicates. The assembled data matrices, comprising genes that are conserved, highly expressed, and varying in length and phylogenetic informativeness, represent an important resource for future molecular systematic studies of Opiliones and other arachnid groups.
Collapse
Affiliation(s)
- Marshal Hedin
- Department of Biology, San Diego State University, San Diego, California, United States of America.
| | | | | | | | | |
Collapse
|
24
|
Shen XX, Liang D, Zhang P. The development of three long universal nuclear protein-coding locus markers and their application to osteichthyan phylogenetics with nested PCR. PLoS One 2012; 7:e39256. [PMID: 22720083 PMCID: PMC3375249 DOI: 10.1371/journal.pone.0039256] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2012] [Accepted: 05/22/2012] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Universal nuclear protein-coding locus (NPCL) markers that are applicable across diverse taxa and show good phylogenetic discrimination have broad applications in molecular phylogenetic studies. For example, RAG1, a representative NPCL marker, has been successfully used to make phylogenetic inferences within all major osteichthyan groups. However, such markers with broad working range and high phylogenetic performance are still scarce. It is necessary to develop more universal NPCL markers comparable to RAG1 for osteichthyan phylogenetics. METHODOLOGY/PRINCIPAL FINDINGS We developed three long universal NPCL markers (>1.6 kb each) based on single-copy nuclear genes (KIAA1239, SACS and TTN) that possess large exons and exhibit the appropriate evolutionary rates. We then compared their phylogenetic utilities with that of the reference marker RAG1 in 47 jawed vertebrate species. In comparison with RAG1, each of the three long universal markers yielded similar topologies and branch supports, all in congruence with the currently accepted osteichthyan phylogeny. To compare their phylogenetic performance visually, we also estimated the phylogenetic informativeness (PI) profile for each of the four long universal NPCL markers. The PI curves indicated that SACS performed best over the whole timescale, while RAG1, KIAA1239 and TTN exhibited similar phylogenetic performances. In addition, we compared the success of nested PCR and standard PCR when amplifying NPCL marker fragments. The amplification success rate and efficiency of the nested PCR were overwhelmingly higher than those of standard PCR. CONCLUSIONS/SIGNIFICANCE Our work clearly demonstrates the superiority of nested PCR over the conventional PCR in phylogenetic studies and develops three long universal NPCL markers (KIAA1239, SACS and TTN) with the nested PCR strategy. The three markers exhibit high phylogenetic utilities in osteichthyan phylogenetics and can be widely used as pilot genes for phylogenetic questions of osteichthyans at different taxonomic levels.
Collapse
Affiliation(s)
- Xing-Xing Shen
- Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou, People’s Republic of China
| | - Dan Liang
- Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou, People’s Republic of China
| | - Peng Zhang
- Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou, People’s Republic of China
| |
Collapse
|
25
|
Ebersberger I, de Matos Simoes R, Kupczok A, Gube M, Kothe E, Voigt K, von Haeseler A. A consistent phylogenetic backbone for the fungi. Mol Biol Evol 2011; 29:1319-34. [PMID: 22114356 PMCID: PMC3339314 DOI: 10.1093/molbev/msr285] [Citation(s) in RCA: 105] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
The kingdom of fungi provides model organisms for biotechnology, cell biology, genetics, and life sciences in general. Only when their phylogenetic relationships are stably resolved, can individual results from fungal research be integrated into a holistic picture of biology. However, and despite recent progress, many deep relationships within the fungi remain unclear. Here, we present the first phylogenomic study of an entire eukaryotic kingdom that uses a consistency criterion to strengthen phylogenetic conclusions. We reason that branches (splits) recovered with independent data and different tree reconstruction methods are likely to reflect true evolutionary relationships. Two complementary phylogenomic data sets based on 99 fungal genomes and 109 fungal expressed sequence tag (EST) sets analyzed with four different tree reconstruction methods shed light from different angles on the fungal tree of life. Eleven additional data sets address specifically the phylogenetic position of Blastocladiomycota, Ustilaginomycotina, and Dothideomycetes, respectively. The combined evidence from the resulting trees supports the deep-level stability of the fungal groups toward a comprehensive natural system of the fungi. In addition, our analysis reveals methodologically interesting aspects. Enrichment for EST encoded data—a common practice in phylogenomic analyses—introduces a strong bias toward slowly evolving and functionally correlated genes. Consequently, the generalization of phylogenomic data sets as collections of randomly selected genes cannot be taken for granted. A thorough characterization of the data to assess possible influences on the tree reconstruction should therefore become a standard in phylogenomic analyses.
Collapse
Affiliation(s)
- Ingo Ebersberger
- Center for Integrative Bioinformatics Vienna, University of Vienna, Medical University of Vienna, University of Veterinary Medicine Vienna, Vienna, Austria.
| | | | | | | | | | | | | |
Collapse
|