1
|
McKibben MTW, Finch G, Barker MS. Species-tree topology impacts the inference of ancient whole-genome duplications across the angiosperm phylogeny. AMERICAN JOURNAL OF BOTANY 2024; 111:e16378. [PMID: 39039654 DOI: 10.1002/ajb2.16378] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 06/11/2024] [Accepted: 06/12/2024] [Indexed: 07/24/2024]
Abstract
PREMISE The history of angiosperms is marked by repeated rounds of ancient whole-genome duplications (WGDs). Here we used state-of-the-art methods to provide an up-to-date view of the distribution of WGDs in the history of angiosperms that considers both uncertainty introduced by different WGD inference methods and different underlying species-tree hypotheses. METHODS We used the distribution synonymous divergences (Ks) of paralogs and orthologs from transcriptomic and genomic data to infer and place WGDs across two hypothesized angiosperm phylogenies. We further tested these WGD hypotheses with syntenic inferences and Bayesian models of duplicate gene gain and loss. RESULTS The predicted number of WGDs in the history of angiosperms (~170) based on the current taxon sampling is largely similar across different inference methods, but varies in the precise placement of WGDs on the phylogeny. Ks-based methods often yield alternative hypothesized WGD placements due to variation in substitution rates among lineages. Phylogenetic models of duplicate gene gain and loss are more robust to topological variation. However, errors in species-tree inference can still produce spurious WGD hypotheses, regardless of method used. CONCLUSIONS Here we showed that different WGD inference methods largely agree on an average of 3.5 WGD in the history of individual angiosperm species. However, the precise placement of WGDs on the phylogeny is subject to the WGD inference method and tree topology. As researchers continue to test hypotheses regarding the impacts ancient WGDs have on angiosperm evolution, it is important to consider the uncertainty of the phylogeny as well as WGD inference methods.
Collapse
Affiliation(s)
- Michael T W McKibben
- Department of Ecology & Evolutionary Biology, University of Arizona, Tucson, AZ, USA
| | - Geoffrey Finch
- Department of Ecology & Evolutionary Biology, University of Arizona, Tucson, AZ, USA
| | - Michael S Barker
- Department of Ecology & Evolutionary Biology, University of Arizona, Tucson, AZ, USA
| |
Collapse
|
2
|
Cornuault J, Sanmartín I. A road map for phylogenetic models of species trees. Mol Phylogenet Evol 2022; 173:107483. [DOI: 10.1016/j.ympev.2022.107483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Revised: 03/09/2022] [Accepted: 04/05/2022] [Indexed: 10/18/2022]
|
3
|
Doyle JJ. Defining coalescent genes: Theory meets practice in organelle phylogenomics. Syst Biol 2021; 71:476-489. [PMID: 34191012 DOI: 10.1093/sysbio/syab053] [Citation(s) in RCA: 49] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 06/24/2021] [Accepted: 06/28/2021] [Indexed: 11/13/2022] Open
Abstract
The species tree paradigm that dominates current molecular systematic practice infers species trees from collections of sequences under assumptions of the multispecies coalescent (MSC), i.e., that there is free recombination between the sequences and no (or very low) recombination within them. These coalescent genes (c-genes) are thus defined in an historical rather than molecular sense, and can in theory be as large as an entire genome or as small as a single nucleotide. A debate about how to define c-genes centers on the contention that nuclear gene sequences used in many coalescent analyses undergo too much recombination, such that their introns comprise multiple c-genes, violating a key assumption of the MSC. Recently a similar argument has been made for the genes of plastid (e.g., chloroplast) and mitochondrial genomes, which for the last 30 or more years have been considered to represent a single c-gene for the purposes of phylogeny reconstruction because they are non-recombining in a historical sense. Consequently, it has been suggested that these genomes should be analyzed using coalescent methods that treat their genes-over 70 protein-coding genes in the case of most plastid genomes (plastomes)-as independent estimates of species phylogeny, in contrast to the usual practice of concatenation, which is appropriate for generating gene trees. However, although recombination certainly occurs in the plastome, as has been recognized since the 1970's, it is unlikely to be phylogenetically relevant. This is because such historically effective recombination can only occur when plastomes with incongruent histories are brought together in the same plastid. However, plastids sort rapidly into different cell lineages and rarely fuse. Thus, because of plastid biology, the plastome is a more canonical c-gene than is the average multi-intron mammalian nuclear gene. The plastome should thus continue to be treated as a single estimate of the underlying species phylogeny, as should the mitochondrial genome. The implications of this long-held insight of molecular systematics for studies in the phylogenomic era are explored.
Collapse
Affiliation(s)
- Jeff J Doyle
- Plant Biology Section, Plant Breeding & Genetics Section, and L. H. Bailey Hortorium, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853 USA
| |
Collapse
|
4
|
Seo TK, Thorne JL. Information Criteria for Comparing Partition Schemes. Syst Biol 2018; 67:616-632. [PMID: 29309694 DOI: 10.1093/sysbio/syx097] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2017] [Accepted: 12/17/2017] [Indexed: 01/10/2023] Open
Abstract
When inferring phylogenies, one important decision is whether and how nucleotide substitution parameters should be shared across different subsets or partitions of the data. One sort of partitioning error occurs when heterogeneous subsets are mistakenly lumped together and treated as if they share parameter values. The opposite kind of error is mistakenly treating homogeneous subsets as if they result from distinct sets of parameters. Lumping and splitting errors are not equally bad. Lumping errors can yield parameter estimates that do not accurately reflect any of the subsets that were combined whereas splitting errors yield estimates that did not benefit from sharing information across partitions. Phylogenetic partitioning decisions are often made by applying information criteria such as the Akaike information criterion (AIC). As with other information criteria, the AIC evaluates a model or partition scheme by combining the maximum log-likelihood value with a penalty that depends on the number of parameters being estimated. For the purpose of selecting an optimal partitioning scheme, we derive an adjustment to the AIC that we refer to as the AIC$^{(p)}$ and that is motivated by the idea that splitting errors are less serious than lumping errors. We also introduce a similar adjustment to the Bayesian information criterion (BIC) that we refer to as the BIC$^{(p)}$. Via simulation and empirical data analysis, we contrast AIC and BIC behavior to our suggested adjustments. We discuss these results and also emphasize why we expect the probability of lumping errors with the AIC$^{(p)}$ and the BIC$^{(p)}$ to be relatively robust to model parameterization.
Collapse
Affiliation(s)
- Tae-Kun Seo
- Department of Biological Sciences, Korea Polar Research Institute, 26 Songdomirae-ro, Yeonsu-gu, Incheon 406-840, Republic of Korea
| | - Jeffrey L Thorne
- Bioinformatics Research Center, Box 7566, North Carolina State University, Raleigh NC 27695-7566, USA
| |
Collapse
|
5
|
Pease JB, Brown JW, Walker JF, Hinchliff CE, Smith SA. Quartet Sampling distinguishes lack of support from conflicting support in the green plant tree of life. AMERICAN JOURNAL OF BOTANY 2018; 105:385-403. [PMID: 29746719 DOI: 10.1002/ajb2.1016] [Citation(s) in RCA: 133] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/09/2017] [Accepted: 09/05/2017] [Indexed: 05/21/2023]
Abstract
PREMISE OF THE STUDY Phylogenetic support has been difficult to evaluate within the green plant tree of life partly due to a lack of specificity between conflicted versus poorly informed branches. As data sets continue to expand in both breadth and depth, new support measures are needed that are more efficient and informative. METHODS We describe the Quartet Sampling (QS) method, a quartet-based evaluation system that synthesizes several phylogenetic and genomic analytical approaches. QS characterizes discordance in large-sparse and genome-wide data sets, overcoming issues of alignment sparsity and distinguishing strong conflict from weak support. We tested QS with simulations and recent plant phylogenies inferred from variously sized data sets. KEY RESULTS QS scores demonstrated convergence with increasing replicates and were not strongly affected by branch depth. Patterns of QS support from different phylogenies led to a coherent understanding of ancestral branches defining key disagreements, including the relationships of Ginkgo to cycads, magnoliids to monocots and eudicots, and mosses to liverworts. The relationships of ANA-grade angiosperms (Amborella, Nymphaeales, Austrobaileyales), major monocot groups, bryophytes, and fern families are likely highly discordant in their evolutionary histories, rather than poorly informed. QS can also detect discordance due to introgression in phylogenomic data. CONCLUSIONS Quartet Sampling is an efficient synthesis of phylogenetic tests that offers more comprehensive and specific information on branch support than conventional measures. The QS method corroborates growing evidence that phylogenomic investigations that incorporate discordance testing are warranted when reconstructing complex evolutionary histories, in particular those surrounding ANA-grade, monocots, and nonvascular plants.
Collapse
Affiliation(s)
- James B Pease
- Department of Biology, Wake Forest University, 455 Vine Street, Winston-Salem, North Carolina, 27101, USA
| | - Joseph W Brown
- Department of Ecology and Evolutionary Biology, University of Michigan, 830 North University, Ann Arbor, Michigan, 48109, USA
| | - Joseph F Walker
- Department of Ecology and Evolutionary Biology, University of Michigan, 830 North University, Ann Arbor, Michigan, 48109, USA
| | - Cody E Hinchliff
- Department of Biological Sciences, University of Idaho, 875 Perimeter Drive, MS 3051, Moscow, Idaho, 83844, USA
| | - Stephen A Smith
- Department of Ecology and Evolutionary Biology, University of Michigan, 830 North University, Ann Arbor, Michigan, 48109, USA
| |
Collapse
|
6
|
Mossel E, Roch S. Distance-based species tree estimation under the coalescent: Information-theoretic trade-off between number of loci and sequence length. ANN APPL PROBAB 2017. [DOI: 10.1214/16-aap1273] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
7
|
Genome sequencing of an Indian peste des petits ruminants virus isolate, Izatnagar/94, and its implications for virus diversity, divergence and phylogeography. Arch Virol 2017; 162:1677-1693. [PMID: 28247095 DOI: 10.1007/s00705-017-3288-2] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2016] [Accepted: 01/25/2017] [Indexed: 10/20/2022]
Abstract
Peste des petits ruminants is an important transboundary disease infecting small ruminants. Genome or gene sequence analysis enriches our knowledge about the evolution and transboundary nature of the causative agent of this disease, peste des petits ruminants virus (PPRV). Although analysis using whole genome sequences of pathogens leads to more precise phylogenetic relationships, when compared to individual genes or partial sequences, there is still a need to identify specific genes/genomic regions that can provide evolutionary assessments consistent with those predicted with full-length genome sequences. Here the virulent Izatnagar/94 PPRV isolate was assembled and compared to all available complete genome sequences (currently in the NCBI database) to estimate nucleotide diversity and to deduce evolutionary relationships between genes/genomic regions and the full length genomes. Our aim was to identify the preferred candidate gene for use as a phylogenetic marker, as well as to predict divergence time and explore PPRV phylogeography. Among all the PPRV genes, the H gene was identified to be the most diverse with the highest evolutionary relationship with the full genome sequences. Hence it is considered as the most preferred candidate gene for phylogenetic study with 93% identity set as a nucleotide cutoff. A whole genome nucleotide sequence cutoff value of 94% permitted specific differentiation of PPRV lineages. All the isolates examined in the study were found to have a most recent common ancestor in the late 19th or in the early 20th century with high posterior probability values. The Bayesian skyline plot revealed a decrease in genetic diversity among lineage IV isolates since the start of the vaccination program and the network analysis localized the ancestry of PPRV to Africa.
Collapse
|
8
|
Zallot R, Harrison KJ, Kolaczkowski B, de Crécy-Lagard V. Functional Annotations of Paralogs: A Blessing and a Curse. Life (Basel) 2016; 6:life6030039. [PMID: 27618105 PMCID: PMC5041015 DOI: 10.3390/life6030039] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2016] [Revised: 08/29/2016] [Accepted: 09/02/2016] [Indexed: 12/15/2022] Open
Abstract
Gene duplication followed by mutation is a classic mechanism of neofunctionalization, producing gene families with functional diversity. In some cases, a single point mutation is sufficient to change the substrate specificity and/or the chemistry performed by an enzyme, making it difficult to accurately separate enzymes with identical functions from homologs with different functions. Because sequence similarity is often used as a basis for assigning functional annotations to genes, non-isofunctional gene families pose a great challenge for genome annotation pipelines. Here we describe how integrating evolutionary and functional information such as genome context, phylogeny, metabolic reconstruction and signature motifs may be required to correctly annotate multifunctional families. These integrative analyses can also lead to the discovery of novel gene functions, as hints from specific subgroups can guide the functional characterization of other members of the family. We demonstrate how careful manual curation processes using comparative genomics can disambiguate subgroups within large multifunctional families and discover their functions. We present the COG0720 protein family as a case study. We also discuss strategies to automate this process to improve the accuracy of genome functional annotation pipelines.
Collapse
Affiliation(s)
- Rémi Zallot
- Department of Microbiology and Cell Science, Institute of Food and Agricultural Sciences, University of Florida, Gainesville, FL 32611, USA.
| | - Katherine J Harrison
- Department of Microbiology and Cell Science, Institute of Food and Agricultural Sciences, University of Florida, Gainesville, FL 32611, USA.
| | - Bryan Kolaczkowski
- Department of Microbiology and Cell Science, Institute of Food and Agricultural Sciences, University of Florida, Gainesville, FL 32611, USA.
| | - Valérie de Crécy-Lagard
- Department of Microbiology and Cell Science, Institute of Food and Agricultural Sciences, University of Florida, Gainesville, FL 32611, USA.
| |
Collapse
|
9
|
Grueber CE. Comparative genomics for biodiversity conservation. Comput Struct Biotechnol J 2015; 13:370-5. [PMID: 26106461 PMCID: PMC4475778 DOI: 10.1016/j.csbj.2015.05.003] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2015] [Revised: 05/13/2015] [Accepted: 05/15/2015] [Indexed: 12/31/2022] Open
Abstract
Genomic approaches are gathering momentum in biology and emerging opportunities lie in the creative use of comparative molecular methods for revealing the processes that influence diversity of wildlife. However, few comparative genomic studies are performed with explicit and specific objectives to aid conservation of wild populations. Here I provide a brief overview of comparative genomic approaches that offer specific benefits to biodiversity conservation. Because conservation examples are few, I draw on research from other areas to demonstrate how comparing genomic data across taxa may be used to inform the characterisation of conservation units and studies of hybridisation, as well as studies that provide conservation outcomes from a better understanding of the drivers of divergence. A comparative approach can also provide valuable insight into the threatening processes that impact rare species, such as emerging diseases and their management in conservation. In addition to these opportunities, I note areas where additional research is warranted. Overall, comparing and contrasting the genomic composition of threatened and other species provide several useful tools for helping to preserve the molecular biodiversity of the global ecosystem.
Collapse
|
10
|
Kozak KM, Wahlberg N, Neild AFE, Dasmahapatra KK, Mallet J, Jiggins CD. Multilocus species trees show the recent adaptive radiation of the mimetic heliconius butterflies. Syst Biol 2015; 64:505-24. [PMID: 25634098 PMCID: PMC4395847 DOI: 10.1093/sysbio/syv007] [Citation(s) in RCA: 139] [Impact Index Per Article: 13.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2014] [Accepted: 01/23/2015] [Indexed: 11/25/2022] Open
Abstract
Müllerian mimicry among Neotropical Heliconiini butterflies is an excellent example of natural selection, associated with the diversification of a large continental-scale radiation. Some of the processes driving the evolution of mimicry rings are likely to generate incongruent phylogenetic signals across the assemblage, and thus pose a challenge for systematics. We use a data set of 22 mitochondrial and nuclear markers from 92% of species in the tribe, obtained by Sanger sequencing and de novo assembly of short read data, to re-examine the phylogeny of Heliconiini with both supermatrix and multispecies coalescent approaches, characterize the patterns of conflicting signal, and compare the performance of various methodological approaches to reflect the heterogeneity across the data. Despite the large extent of reticulate signal and strong conflict between markers, nearly identical topologies are consistently recovered by most of the analyses, although the supermatrix approach failed to reflect the underlying variation in the history of individual loci. However, the supermatrix represents a useful approximation where multiple rare species represented by short sequences can be incorporated easily. The first comprehensive, time-calibrated phylogeny of this group is used to test the hypotheses of a diversification rate increase driven by the dramatic environmental changes in the Neotropics over the past 23 myr, or changes caused by diversity-dependent effects on the rate of diversification. We find that the rate of diversification has increased on the branch leading to the presently most species-rich genus Heliconius, but the change occurred gradually and cannot be unequivocally attributed to a specific environmental driver. Our study provides comprehensive comparison of philosophically distinct species tree reconstruction methods and provides insights into the diversification of an important insect radiation in the most biodiverse region of the planet.
Collapse
Affiliation(s)
- Krzysztof M Kozak
- Butterfly Genetics Group, Department of Zoology, University of Cambridge, CB2 3EJ Cambridge, UK; Laboratory of Genetics, Department of Biology, University of Turku, 20014 Turku, Finland; Department of Entomology, The Natural History Museum, London SW7 5BD, UK; Department of Biology, University of York, YO10 5DD Heslington, York, UK; and Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| | - Niklas Wahlberg
- Butterfly Genetics Group, Department of Zoology, University of Cambridge, CB2 3EJ Cambridge, UK; Laboratory of Genetics, Department of Biology, University of Turku, 20014 Turku, Finland; Department of Entomology, The Natural History Museum, London SW7 5BD, UK; Department of Biology, University of York, YO10 5DD Heslington, York, UK; and Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| | - Andrew F E Neild
- Butterfly Genetics Group, Department of Zoology, University of Cambridge, CB2 3EJ Cambridge, UK; Laboratory of Genetics, Department of Biology, University of Turku, 20014 Turku, Finland; Department of Entomology, The Natural History Museum, London SW7 5BD, UK; Department of Biology, University of York, YO10 5DD Heslington, York, UK; and Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| | - Kanchon K Dasmahapatra
- Butterfly Genetics Group, Department of Zoology, University of Cambridge, CB2 3EJ Cambridge, UK; Laboratory of Genetics, Department of Biology, University of Turku, 20014 Turku, Finland; Department of Entomology, The Natural History Museum, London SW7 5BD, UK; Department of Biology, University of York, YO10 5DD Heslington, York, UK; and Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| | - James Mallet
- Butterfly Genetics Group, Department of Zoology, University of Cambridge, CB2 3EJ Cambridge, UK; Laboratory of Genetics, Department of Biology, University of Turku, 20014 Turku, Finland; Department of Entomology, The Natural History Museum, London SW7 5BD, UK; Department of Biology, University of York, YO10 5DD Heslington, York, UK; and Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| | - Chris D Jiggins
- Butterfly Genetics Group, Department of Zoology, University of Cambridge, CB2 3EJ Cambridge, UK; Laboratory of Genetics, Department of Biology, University of Turku, 20014 Turku, Finland; Department of Entomology, The Natural History Museum, London SW7 5BD, UK; Department of Biology, University of York, YO10 5DD Heslington, York, UK; and Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| |
Collapse
|
11
|
McMahon MM, Deepak A, Fernández-Baca D, Boss D, Sanderson MJ. STBase: one million species trees for comparative biology. PLoS One 2015; 10:e0117987. [PMID: 25679219 PMCID: PMC4332655 DOI: 10.1371/journal.pone.0117987] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2014] [Accepted: 01/05/2015] [Indexed: 11/29/2022] Open
Abstract
Comprehensively sampled phylogenetic trees provide the most compelling foundations for strong inferences in comparative evolutionary biology. Mismatches are common, however, between the taxa for which comparative data are available and the taxa sampled by published phylogenetic analyses. Moreover, many published phylogenies are gene trees, which cannot always be adapted immediately for species level comparisons because of discordance, gene duplication, and other confounding biological processes. A new database, STBase, lets comparative biologists quickly retrieve species level phylogenetic hypotheses in response to a query list of species names. The database consists of 1 million single- and multi-locus data sets, each with a confidence set of 1000 putative species trees, computed from GenBank sequence data for 413,000 eukaryotic taxa. Two bodies of theoretical work are leveraged to aid in the assembly of multi-locus concatenated data sets for species tree construction. First, multiply labeled gene trees are pruned to conflict-free singly-labeled species-level trees that can be combined between loci. Second, impacts of missing data in multi-locus data sets are ameliorated by assembling only decisive data sets. Data sets overlapping with the user's query are ranked using a scheme that depends on user-provided weights for tree quality and for taxonomic overlap of the tree with the query. Retrieval times are independent of the size of the database, typically a few seconds. Tree quality is assessed by a real-time evaluation of bootstrap support on just the overlapping subtree. Associated sequence alignments, tree files and metadata can be downloaded for subsequent analysis. STBase provides a tool for comparative biologists interested in exploiting the most relevant sequence data available for the taxa of interest. It may also serve as a prototype for future species tree oriented databases and as a resource for assembly of larger species phylogenies from precomputed trees.
Collapse
Affiliation(s)
- Michelle M. McMahon
- School of Plant Sciences, University of Arizona, Tucson, AZ, 85721, United States of America
| | - Akshay Deepak
- Department of Computer Science, Iowa State University, Ames, IA, 50011, United States of America
| | - David Fernández-Baca
- Department of Computer Science, Iowa State University, Ames, IA, 50011, United States of America
| | - Darren Boss
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, 85721, United States of America
| | - Michael J. Sanderson
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, 85721, United States of America
| |
Collapse
|
12
|
Meerow AW, Noblick L, Salas-Leiva DE, Sanchez V, Francisco-Ortega J, Jestrow B, Nakamura K. Phylogeny and historical biogeography of the cocosoid palms (Arecaceae, Arecoideae, Cocoseae) inferred from sequences of six WRKY gene family loci. Cladistics 2014; 31:509-534. [DOI: 10.1111/cla.12100] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Affiliation(s)
- Alan W. Meerow
- USDA-ARS-SHRS-National Germplasm Repository; 13601 Old Cutler Rd. Miami FL 33158 USA
| | - Larry Noblick
- Montgomery Botanical Center; 11901 Old Cutler Rd. Coral Gables FL 33156 USA
| | - Dayana E. Salas-Leiva
- USDA-ARS-SHRS-National Germplasm Repository; 13601 Old Cutler Rd. Miami FL 33158 USA
- Montgomery Botanical Center; 11901 Old Cutler Rd. Coral Gables FL 33156 USA
- Department of Biological Sciences; Florida International University; 11200 SW 8th St. Miami FL 33199 USA
| | - Vanessa Sanchez
- USDA-ARS-SHRS-National Germplasm Repository; 13601 Old Cutler Rd. Miami FL 33158 USA
| | - Javier Francisco-Ortega
- Department of Biological Sciences; Florida International University; 11200 SW 8th St. Miami FL 33199 USA
- Kushlan Tropical Science Institute; Fairchild Tropical Botanical Garden; 10901 Old Cutler Rd. Miami FL 33156 USA
| | - Brett Jestrow
- Kushlan Tropical Science Institute; Fairchild Tropical Botanical Garden; 10901 Old Cutler Rd. Miami FL 33156 USA
| | - Kyoko Nakamura
- USDA-ARS-SHRS-National Germplasm Repository; 13601 Old Cutler Rd. Miami FL 33158 USA
| |
Collapse
|
13
|
Jockusch EL, Martínez-Solano I, Timpe EK. The Effects of Inference Method, Population Sampling, and Gene Sampling on Species Tree Inferences: An Empirical Study in Slender Salamanders (Plethodontidae: Batrachoseps). Syst Biol 2014; 64:66-83. [DOI: 10.1093/sysbio/syu078] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Elizabeth L. Jockusch
- Department of Ecology and Evolutionary Biology, University of Connecticut, 75 N. Eagleville Road, U-3043, Storrs, CT 06269-3043, USA; and 2CIBIO-InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Campus Agrário de Vairão, Universidade do Porto, 4485-661 Vairão, Portugal
| | - Iñigo Martínez-Solano
- Department of Ecology and Evolutionary Biology, University of Connecticut, 75 N. Eagleville Road, U-3043, Storrs, CT 06269-3043, USA; and 2CIBIO-InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Campus Agrário de Vairão, Universidade do Porto, 4485-661 Vairão, Portugal
- Department of Ecology and Evolutionary Biology, University of Connecticut, 75 N. Eagleville Road, U-3043, Storrs, CT 06269-3043, USA; and 2CIBIO-InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Campus Agrário de Vairão, Universidade do Porto, 4485-661 Vairão, Portugal
| | - Elizabeth K. Timpe
- Department of Ecology and Evolutionary Biology, University of Connecticut, 75 N. Eagleville Road, U-3043, Storrs, CT 06269-3043, USA; and 2CIBIO-InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Campus Agrário de Vairão, Universidade do Porto, 4485-661 Vairão, Portugal
| |
Collapse
|
14
|
Wang S, Luo X, Wei W, Zheng Y, Dou Y, Cai X. Calculation of evolutionary correlation between individual genes and full-length genome: a method useful for choosing phylogenetic markers for molecular epidemiology. PLoS One 2013; 8:e81106. [PMID: 24312527 PMCID: PMC3849185 DOI: 10.1371/journal.pone.0081106] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2013] [Accepted: 10/18/2013] [Indexed: 11/21/2022] Open
Abstract
Individual genes or regions are still commonly used to estimate the phylogenetic relationships among viral isolates. The genomic regions that can faithfully provide assessments consistent with those predicted with full-length genome sequences would be preferable to serve as good candidates of the phylogenetic markers for molecular epidemiological studies of many viruses. Here we employed a statistical method to evaluate the evolutionary relationships between individual viral genes and full-length genomes without tree construction as a way to determine which gene can match the genome well in phylogenetic analyses. This method was performed by calculation of linear correlations between the genetic distance matrices of aligned individual gene sequences and aligned genome sequences. We applied this method to the phylogenetic analyses of porcine circovirus 2 (PCV2), measles virus (MV), hepatitis E virus (HEV) and Japanese encephalitis virus (JEV). Phylogenetic trees were constructed for comparisons and the possible factors affecting the method accuracy were also discussed in the calculations. The results revealed that this method could produce results consistent with those of previous studies about the proper consensus sequences that could be successfully used as phylogenetic markers. And our results also suggested that these evolutionary correlations could provide useful information for identifying genes that could be used effectively to infer the genetic relationships.
Collapse
Affiliation(s)
- Shuai Wang
- State Key Laboratory of Veterinary Etiological Biology, Lanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Lanzhou, Gansu, China
| | - Xuenong Luo
- State Key Laboratory of Veterinary Etiological Biology, Lanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Lanzhou, Gansu, China
| | - Wei Wei
- State Key Laboratory of Veterinary Etiological Biology, Lanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Lanzhou, Gansu, China
| | - Yadong Zheng
- State Key Laboratory of Veterinary Etiological Biology, Lanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Lanzhou, Gansu, China
| | - Yongxi Dou
- State Key Laboratory of Veterinary Etiological Biology, Lanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Lanzhou, Gansu, China
- * E-mail: (YD); (XC)
| | - Xuepeng Cai
- State Key Laboratory of Veterinary Etiological Biology, Lanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Lanzhou, Gansu, China
- * E-mail: (YD); (XC)
| |
Collapse
|
15
|
Tian Y, Kubatko LS. Gene tree rooting methods give distributions that mimic the coalescent process. Mol Phylogenet Evol 2013; 70:63-9. [PMID: 24055603 DOI: 10.1016/j.ympev.2013.09.004] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2013] [Revised: 08/28/2013] [Accepted: 09/06/2013] [Indexed: 11/19/2022]
Abstract
Multi-locus phylogenetic inference is commonly carried out via models that incorporate the coalescent process to model the possibility that incomplete lineage sorting leads to incongruence between gene trees and the species tree. An interesting question that arises in this context is whether data "fit" the coalescent model. Previous work (Rosenfeld et al., 2012) has suggested that rooting of gene trees may account for variation in empirical data that has been previously attributed to the coalescent process. We examine this possibility using simulated data. We show that, in the case of four taxa, the distribution of gene trees observed from rooting estimated gene trees with either the molecular clock or with outgroup rooting can be closely matched by the distribution predicted by the coalescent model with specific choices of species tree branch lengths. We apply commonly-used coalescent-based methods of species tree inference to assess their performance in these situations.
Collapse
Affiliation(s)
- Yuan Tian
- Departments of Statistics and Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, OH 43210, United States.
| | | |
Collapse
|
16
|
Abstract
Large-scale databases are available that contain homologous gene families constructed from hundreds of complete genome sequences from across the three domains of life. Here, we discuss the approaches of increasing complexity aimed at extracting information on the pattern and process of gene family evolution from such datasets. In particular, we consider the models that invoke processes of gene birth (duplication and transfer) and death (loss) to explain the evolution of gene families. First, we review birth-and-death models of family size evolution and their implications in light of the universal features of family size distribution observed across different species and the three domains of life. Subsequently, we proceed to recent developments on models capable of more completely considering information in the sequences of homologous gene families through the probabilistic reconciliation of the phylogenetic histories of individual genes with the phylogenetic history of the genomes in which they have resided. To illustrate the methods and results presented, we use data from the HOGENOM database, demonstrating that the distribution of homologous gene family sizes in the genomes of the eukaryota, archaea, and bacteria exhibits remarkably similar shapes. We show that these distributions are best described by models of gene family size evolution, where for individual genes the death (loss) rate is larger than the birth (duplication and transfer) rate but new families are continually supplied to the genome by a process of origination. Finally, we use probabilistic reconciliation methods to take into consideration additional information from gene phylogenies, and find that, for prokaryotes, the majority of birth events are the result of transfer.
Collapse
|