1
|
Kong S, Swofford DL, Kubatko LS. Inference of Phylogenetic Networks From Sequence Data Using Composite Likelihood. Syst Biol 2025; 74:53-69. [PMID: 39387633 DOI: 10.1093/sysbio/syae054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Revised: 09/13/2024] [Accepted: 10/08/2024] [Indexed: 10/12/2024] Open
Abstract
While phylogenies have been essential in understanding how species evolve, they do not adequately describe some evolutionary processes. For instance, hybridization, a common phenomenon where interbreeding between 2 species leads to formation of a new species, must be depicted by a phylogenetic network, a structure that modifies a phylogenetic tree by allowing 2 branches to merge into 1, resulting in reticulation. However, existing methods for estimating networks become computationally expensive as the dataset size and/or topological complexity increase. The lack of methods for scalable inference hampers phylogenetic networks from being widely used in practice, despite accumulating evidence that hybridization occurs frequently in nature. Here, we propose a novel method, PhyNEST (Phylogenetic Network Estimation using SiTe patterns), that estimates binary, level-1 phylogenetic networks with a fixed, user-specified number of reticulations directly from sequence data. By using the composite likelihood as the basis for inference, PhyNEST is able to use the full genomic data in a computationally tractable manner, eliminating the need to summarize the data as a set of gene trees prior to network estimation. To search network space, PhyNEST implements both hill climbing and simulated annealing algorithms. PhyNEST assumes that the data are composed of coalescent independent sites that evolve according to the Jukes-Cantor substitution model and that the network has a constant effective population size. Simulation studies demonstrate that PhyNEST is often more accurate than 2 existing composite likelihood summary methods (SNaQand PhyloNet) and that it is robust to at least one form of model misspecification (assuming a less complex nucleotide substitution model than the true generating model). We applied PhyNEST to reconstruct the evolutionary relationships among Heliconius butterflies and Papionini primates, characterized by hybrid speciation and widespread introgression, respectively. PhyNEST is implemented in an open-source Julia package and is publicly available at https://github.com/sungsik-kong/PhyNEST.jl.
Collapse
Affiliation(s)
- Sungsik Kong
- Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, OH 43210, USA
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA
| | - David L Swofford
- Florida Museum of Natural History, University of Florida, Gainesville, FL 32611, USA
| | - Laura S Kubatko
- Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, OH 43210, USA
- Department of Statistics, The Ohio State University, Columbus, OH 43210, USA
| |
Collapse
|
2
|
Allman ES, Baños H, Mitchell JD, Rhodes JA. TINNiK: inference of the tree of blobs of a species network under the coalescent model. Algorithms Mol Biol 2024; 19:23. [PMID: 39501362 PMCID: PMC11539473 DOI: 10.1186/s13015-024-00266-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2024] [Accepted: 08/22/2024] [Indexed: 11/08/2024] Open
Abstract
The tree of blobs of a species network shows only the tree-like aspects of relationships of taxa on a network, omitting information on network substructures where hybridization or other types of lateral transfer of genetic information occur. By isolating such regions of a network, inference of the tree of blobs can serve as a starting point for a more detailed investigation, or indicate the limit of what may be inferrable without additional assumptions. Building on our theoretical work on the identifiability of the tree of blobs from gene quartet distributions under the Network Multispecies Coalescent model, we develop an algorithm, TINNiK, for statistically consistent tree of blobs inference. We provide examples of its application to both simulated and empirical datasets, utilizing an implementation in the MSCquartets 2.0 R package.
Collapse
Affiliation(s)
- Elizabeth S Allman
- Department of Mathematics and Statistics, University of Alaska, Fairbanks, AK, USA.
| | - Hector Baños
- Department of Mathematics, California State University San Bernadino, San Bernadino, CA, USA
| | - Jonathan D Mitchell
- School of Natural Sciences (Mathematics), University of Tasmania, Hobart, TAS, Australia
- ARC Centre of Excellence for Plant Success in Nature and Agriculture, University of Tasmania, Hobart, TAS, Australia
| | - John A Rhodes
- Department of Mathematics and Statistics, University of Alaska, Fairbanks, AK, USA
| |
Collapse
|
3
|
Ning W, Meudt HM, Tate JA. A roadmap of phylogenomic methods for studying polyploid plant genera. APPLICATIONS IN PLANT SCIENCES 2024; 12:e11580. [PMID: 39184196 PMCID: PMC11342234 DOI: 10.1002/aps3.11580] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 12/10/2023] [Accepted: 01/13/2024] [Indexed: 08/27/2024]
Abstract
Phylogenetic inference of polyploid species is the first step towards understanding their patterns of diversification. In this paper, we review the challenges and limitations of inferring species relationships of polyploid plants using traditional phylogenetic sequencing approaches, as well as the mischaracterization of the species tree from single or multiple gene trees. We provide a roadmap to infer interspecific relationships among polyploid lineages by comparing and evaluating the application of current phylogenetic, phylogenomic, transcriptomic, and whole-genome approaches using different sequencing platforms. For polyploid species tree reconstruction, we assess the following criteria: (1) the amount of prior information or tools required to capture the genetic region(s) of interest; (2) the probability of recovering homeologs for polyploid species; and (3) the time efficiency of downstream data analysis. Moreover, we discuss bioinformatic pipelines that can reconstruct networks of polyploid species relationships. In summary, although current phylogenomic approaches have improved our understanding of reticulate species relationships in polyploid-rich genera, the difficulties of recovering reliable orthologous genes and sorting all homeologous copies for allopolyploids remain a challenge. In the future, assembled long-read sequencing data will assist the recovery and identification of multiple gene copies, which can be particularly useful for reconstructing the multiple independent origins of polyploids.
Collapse
Affiliation(s)
- Weixuan Ning
- School of Natural SciencesMassey UniversityPalmerston North4442New Zealand
| | - Heidi M. Meudt
- Museum of New Zealand Te Papa TongarewaWellington6011New Zealand
| | - Jennifer A. Tate
- School of Natural SciencesMassey UniversityPalmerston North4442New Zealand
| |
Collapse
|
4
|
Allman ES, Baños H, Mitchell JD, Rhodes JA. TINNiK: Inference of the Tree of Blobs of a Species Network Under the Coalescent. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.20.590418. [PMID: 38712257 PMCID: PMC11071406 DOI: 10.1101/2024.04.20.590418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
The tree of blobs of a species network shows only the tree-like aspects of relationships of taxa on a network, omitting information on network substructures where hybridization or other types of lateral transfer of genetic information occur. By isolating such regions of a network, inference of the tree of blobs can serve as a starting point for a more detailed investigation, or indicate the limit of what may be inferrable without additional assumptions. Building on our theoretical work on the identifiability of the tree of blobs from gene quartet distributions under the Network Multispecies Coalescent model, we develop an algorithm, TINNiK, for statistically consistent tree of blobs inference. We provide examples of its application to both simulated and empirical datasets, utilizing an implementation in the MSCquartets 2.0 R package.
Collapse
Affiliation(s)
- Elizabeth S. Allman
- Department of Mathematics and Statistics, University of Alaska, Fairbanks, AK, USA
| | - Hector Baños
- Department of Mathematics, California State University San Bernadino, San Bernadino, CA, USA
| | - Jonathan D. Mitchell
- School of Natural Sciences (Mathematics), University of Tasmania, Hobart, TAS, Australia
- ARC Centre of Excellence for Plant Success in Nature and Agriculture, University of Tasmania, Hobart, TAS, Australia
| | - John A. Rhodes
- Department of Mathematics and Statistics, University of Alaska, Fairbanks, AK, USA
| |
Collapse
|
5
|
San Jose M, Doorenweerd C, Geib S, Barr N, Dupuis JR, Leblanc L, Kauwe A, Morris KY, Rubinoff D. Interspecific gene flow obscures phylogenetic relationships in an important insect pest species complex. Mol Phylogenet Evol 2023; 188:107892. [PMID: 37524217 DOI: 10.1016/j.ympev.2023.107892] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 07/07/2023] [Accepted: 07/28/2023] [Indexed: 08/02/2023]
Abstract
As genomic data proliferates, the prevalence of post-speciation gene flow is making species boundaries and relationships increasingly ambiguous. Although current approaches inferring fully bifurcating phylogenies based on concatenated datasets provide simple and robust answers to many species relationships, they may be inaccurate because the models ignore inter-specific gene flow and incomplete lineage sorting. To examine the potential error resulting from ignoring gene flow, we generated both a RAD-seq and a 500 protein-coding loci highly multiplexed amplicon (HiMAP) dataset for a monophyletic group of 12 species defined as the Bactrocera dorsalis sensu lato clade. With some of the world's worst agricultural pests, the taxonomy of the B. dorsalis s.l. clade is important for trade and quarantines. However, taxonomic confusion confounds resolution due to intra- and interspecific phenotypic variation and convergence, mitochondrial introgression across half of the species, and viable hybrids. We compared the topological convergence of our datasets using concatenated phylogenetic and various multispecies coalescent approaches, some of which account for gene flow. All analyses agreed on species delimitation, but there was incongruence between species relationships. Under concatenation, both datasets suggest identical species relationships with mostly high statistical support. However, multispecies coalescent and multispecies network approaches suggest markedly different hypotheses and detected significant gene flow. We suggest that the network approaches are likely more accurate because gene flow violates the assumptions of the concatenated phylogenetic analyses, but the data-reductive requirements of network approaches resulted in reduced statistical support and could not unambiguously resolve gene flow directions. Our study highlights the importance of testing for gene flow, particularly with phylogenomic datasets, even when concatenated approaches receive high statistical support.
Collapse
Affiliation(s)
- Michael San Jose
- University of Hawaii, College of Tropical Agriculture and Human Resources, Department of Plant and Environmental Protection Sciences, Entomology Section, 3050 Maile Way, Honolulu, HI, 96822-2231, USA.
| | - Camiel Doorenweerd
- University of Hawaii, College of Tropical Agriculture and Human Resources, Department of Plant and Environmental Protection Sciences, Entomology Section, 3050 Maile Way, Honolulu, HI, 96822-2231, USA
| | - Scott Geib
- Tropical Crop and Commodity Protection Research Unit, Daniel K Inouye U.S. Pacific Basin Agricultural Center, USDA Agricultural Research Services, Hilo, HI, USA
| | - Norman Barr
- United States Department of Agriculture, Animal and Plant Health Inspection Service, Plant Protection and Quarantine, Science & Technology, Insect Management and Molecular Diagnostics Laboratory, 22675 N. Moorefield Road, Edinburg, TX 78541, USA
| | - Julian R Dupuis
- University of Kentucky, Department of Entomology, S225 Ag Science Center North, 1100 South Limestone, Lexington, KY, 40546-0091, USA
| | - Luc Leblanc
- University of Idaho, Department of Entomology, Plant Pathology and Nematology, 875 Perimeter Drive, MS2329, Moscow, ID, 83844-2329, USA
| | - Angela Kauwe
- Tropical Crop and Commodity Protection Research Unit, Daniel K Inouye U.S. Pacific Basin Agricultural Center, USDA Agricultural Research Services, Hilo, HI, USA
| | - Kimberley Y Morris
- University of Hawaii, College of Tropical Agriculture and Human Resources, Department of Plant and Environmental Protection Sciences, Entomology Section, 3050 Maile Way, Honolulu, HI, 96822-2231, USA; Tropical Crop and Commodity Protection Research Unit, Daniel K Inouye U.S. Pacific Basin Agricultural Center, USDA Agricultural Research Services, Hilo, HI, USA
| | - Daniel Rubinoff
- University of Hawaii, College of Tropical Agriculture and Human Resources, Department of Plant and Environmental Protection Sciences, Entomology Section, 3050 Maile Way, Honolulu, HI, 96822-2231, USA
| |
Collapse
|
6
|
Felipez W, Villavicencio J, Nizolli VO, Pegoraro C, da Maia L, Costa de Oliveira A. Genome-Wide Identification of Bilberry WRKY Transcription Factors: Go Wild and Duplicate. PLANTS (BASEL, SWITZERLAND) 2023; 12:3176. [PMID: 37765340 PMCID: PMC10535657 DOI: 10.3390/plants12183176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 07/11/2023] [Accepted: 07/20/2023] [Indexed: 09/29/2023]
Abstract
WRKY transcription factor genes compose an important family of transcriptional regulators that are present in several plant species. According to previous studies, these genes can also perform important roles in bilberry (Vaccinium myrtillus L.) metabolism, making it essential to deepen our understanding of fruit ripening regulation and anthocyanin biosynthesis. In this context, the detailed characterization of these proteins will provide a comprehensive view of the functional features of VmWRKY genes in different plant organs and in response to different intensities of light. In this study, the investigation of the complete genome of the bilberry identified 76 VmWRKY genes that were evaluated and distributed in all twelve chromosomes. The proteins encoded by these genes were classified into four groups (I, II, III, and IV) based on their conserved domains and zinc finger domain types. Fifteen pairs of VmWRKY genes in segmental duplication and four pairs in tandem duplication were detected. A cis element analysis showed that all promoters of the VmWRKY genes contain at least one potential cis stress-response element. Differential expression analysis of RNA-seq data revealed that VmWRKY genes from bilberry show preferential or specific expression in samples. These findings provide an overview of the functional characterization of these proteins in bilberry.
Collapse
Affiliation(s)
- Winder Felipez
- Instituto de Agroecología y Seguridad Alimentaria, Facultad de Ciências Agrárias, Universidad San Francisco Xavier de Chuquisaca—USFX, Casilla, Correo Central, Sucre 1046, Bolivia;
- Plant Genomics and Breeding Center, Departamento de Fitotecnia, Faculdade de Agronomia Eliseu Maciel, Universidade Federal de Pelotas—UFPel, Pelotas CEP 96010-900, RS, Brazil; (J.V.); (V.O.N.); (L.d.M.)
| | - Jennifer Villavicencio
- Plant Genomics and Breeding Center, Departamento de Fitotecnia, Faculdade de Agronomia Eliseu Maciel, Universidade Federal de Pelotas—UFPel, Pelotas CEP 96010-900, RS, Brazil; (J.V.); (V.O.N.); (L.d.M.)
- Carrera de Ingeniería Agroforestal, Facultad de Ciencias Ambientales, Universidad Cientifica del Sur—UCSUR, Antigua Panamericana Sur km 19 Villa el Salvador, Lima CP 150142, Peru
| | - Valeria Oliveira Nizolli
- Plant Genomics and Breeding Center, Departamento de Fitotecnia, Faculdade de Agronomia Eliseu Maciel, Universidade Federal de Pelotas—UFPel, Pelotas CEP 96010-900, RS, Brazil; (J.V.); (V.O.N.); (L.d.M.)
| | - Camila Pegoraro
- Plant Genomics and Breeding Center, Departamento de Fitotecnia, Faculdade de Agronomia Eliseu Maciel, Universidade Federal de Pelotas—UFPel, Pelotas CEP 96010-900, RS, Brazil; (J.V.); (V.O.N.); (L.d.M.)
| | - Luciano da Maia
- Plant Genomics and Breeding Center, Departamento de Fitotecnia, Faculdade de Agronomia Eliseu Maciel, Universidade Federal de Pelotas—UFPel, Pelotas CEP 96010-900, RS, Brazil; (J.V.); (V.O.N.); (L.d.M.)
| | - Antonio Costa de Oliveira
- Plant Genomics and Breeding Center, Departamento de Fitotecnia, Faculdade de Agronomia Eliseu Maciel, Universidade Federal de Pelotas—UFPel, Pelotas CEP 96010-900, RS, Brazil; (J.V.); (V.O.N.); (L.d.M.)
| |
Collapse
|
7
|
Cozzi D, Rossi M, Rubinacci S, Gagie T, Köppl D, Boucher C, Bonizzoni P. μ- PBWT: a lightweight r-indexing of the PBWT for storing and querying UK Biobank data. Bioinformatics 2023; 39:btad552. [PMID: 37688560 PMCID: PMC10502237 DOI: 10.1093/bioinformatics/btad552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 07/07/2023] [Accepted: 09/07/2023] [Indexed: 09/11/2023] Open
Abstract
MOTIVATION The Positional Burrows-Wheeler Transform (PBWT) is a data structure that indexes haplotype sequences in a manner that enables finding maximal haplotype matches in h sequences containing w variation sites in O(hw) time. This represents a significant improvement over classical quadratic-time approaches. However, the original PBWT data structure does not allow for queries over Biobank panels that consist of several millions of haplotypes, if an index of the haplotypes must be kept entirely in memory. RESULTS In this article, we leverage the notion of r-index proposed for the BWT to present a memory-efficient method for constructing and storing the run-length encoded PBWT, and computing set maximal matches (SMEMs) queries in haplotype sequences. We implement our method, which we refer to as μ-PBWT, and evaluate it on datasets of 1000 Genome Project and UK Biobank data. Our experiments demonstrate that the μ-PBWT reduces the memory usage up to a factor of 20% compared to the best current PBWT-based indexing. In particular, μ-PBWT produces an index that stores high-coverage whole genome sequencing data of chromosome 20 in about a third of the space of its BCF file. μ-PBWT is an adaptation of techniques for the run-length compressed BWT for the PBWT (RLPBWT) and it is based on keeping in memory only a succinct representation of the RLPBWT that still allows the efficient computation of set maximal matches (SMEMs) over the original panel. AVAILABILITY AND IMPLEMENTATION Our implementation is open source and available at https://github.com/dlcgold/muPBWT. The binary is available at https://bioconda.github.io/recipes/mupbwt/README.html.
Collapse
Affiliation(s)
- Davide Cozzi
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan 20126, Italy
| | - Massimiliano Rossi
- Department of Computer & Information Science & Engineering, Herbert-Wertheim College of Engineering, University of Florida, Gainesville, Florida 32611, United States
| | - Simone Rubinacci
- Department of Computational Biology, University of Lausanne, Lausanne 1015, Switzerland
| | - Travis Gagie
- Faculty of Computer Science, Dalhousie University, Halifax B3H 4R2, Canada
| | - Dominik Köppl
- M&D Data Science Center, Tokyo Medical and Dental University, Tokyo 113-8510, Japan
- Department of Computer Science, University of Muenster, Muenster 48149, Germany
| | - Christina Boucher
- Department of Computer & Information Science & Engineering, Herbert-Wertheim College of Engineering, University of Florida, Gainesville, Florida 32611, United States
| | - Paola Bonizzoni
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan 20126, Italy
| |
Collapse
|
8
|
The Complete Chloroplast Genome Sequence of Machilus chuanchienensis (Lauraceae): Genome Structure and Phylogenetic Analysis. Genes (Basel) 2022; 13:genes13122402. [PMID: 36553669 PMCID: PMC9778441 DOI: 10.3390/genes13122402] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2022] [Revised: 12/14/2022] [Accepted: 12/15/2022] [Indexed: 12/23/2022] Open
Abstract
Machilus chuanchienensis is an ecological tree distributed in southwestern China. It has a significant valuation with making Hawk tea using its leaves, an ethnic traditional tea-like beverage with a long history in Chinese tea culture. The whole chloroplast (cp) genome is an ideal model for the phylogenetic study of Lauraceae because of its simple structure and highly conserved features. There have been numerous reports of complete cp genome sequences in Lauraceae, but little is known about M. chuanchienensis. Here, the next-generation sequencing (NGS) was used to sequence the M. chuanchienensis cp genome. Then, a comprehensive comparative genome analysis was performed. The results revealed that the M. chuanchienensis's cp genome measured 152,748 base pairs (bp) with a GC content of 39.15% and coded 126 genes annotated, including comprising eight ribosomal RNA (rRNA), 36 transporter RNA (tRNA), and 82 protein-coding genes. In addition, the cp genome presented a typical quadripartite structure comprising a large single-copy (LSC; 93,811) region, a small single-copy (SSC; 18,803) region, and the inverted repeats (IRs; 20,067) region and contained 92 simple sequence repeat (SSR) locus in total. Phylogenetic relationships of 37 species indicated that M. chuanchienensis was a sister to M. balansae, M. melanophylla, and M. minutiflora. Further research on this crucial species may benefit significantly from these findings.
Collapse
|
9
|
Allman ES, Baños H, Mitchell JD, Rhodes JA. The tree of blobs of a species network: identifiability under the coalescent. J Math Biol 2022; 86:10. [PMID: 36472708 PMCID: PMC10062380 DOI: 10.1007/s00285-022-01838-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 08/31/2022] [Accepted: 11/17/2022] [Indexed: 12/12/2022]
Abstract
Inference of species networks from genomic data under the Network Multispecies Coalescent Model is currently severely limited by heavy computational demands. It also remains unclear how complicated networks can be for consistent inference to be possible. As a step toward inferring a general species network, this work considers its tree of blobs, in which non-cut edges are contracted to nodes, so only tree-like relationships between the taxa are shown. An identifiability theorem, that most features of the unrooted tree of blobs can be determined from the distribution of gene quartet topologies, is established. This depends upon an analysis of gene quartet concordance factors under the model, together with a new combinatorial inference rule. The arguments for this theoretical result suggest a practical algorithm for tree of blobs inference, to be fully developed in a subsequent work.
Collapse
Affiliation(s)
- Elizabeth S Allman
- Department of Mathematics and Statistics, University of Alaska Fairbanks, Fairbanks, AK, 99775, USA
| | - Hector Baños
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, Dalhousie University, Halifax, NS, Canada
- Department of Mathematics and Statistics, Faculty of Science, Dalhousie University, Halifax, NS, Canada
| | - Jonathan D Mitchell
- Department of Mathematics and Statistics, University of Alaska Fairbanks, Fairbanks, AK, 99775, USA
- School of Natural Sciences (Mathematics), University of Tasmania, Hobart, TAS, 7001, Australia
- ARC Centre of Excellence for Plant Success in Nature and Agriculture, University of Tasmania, Hobart, TAS, 7001, Australia
| | - John A Rhodes
- Department of Mathematics and Statistics, University of Alaska Fairbanks, Fairbanks, AK, 99775, USA.
| |
Collapse
|
10
|
Scornavacca C, Weller M. Treewidth-based algorithms for the small parsimony problem on networks. Algorithms Mol Biol 2022; 17:15. [PMID: 35987645 PMCID: PMC9392953 DOI: 10.1186/s13015-022-00216-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Accepted: 03/17/2022] [Indexed: 12/04/2022] Open
Abstract
Background Phylogenetic reconstruction is one of the paramount challenges of contemporary bioinformatics. A subtask of existing tree reconstruction algorithms is modeled by the Small Parsimony problem: given a tree T and an assignment of character-states to its leaves, assign states to the internal nodes of T such as to minimize the parsimony score, that is, the number of edges of T connecting nodes with different states. While this problem is polynomial-time solvable on trees, the matter is more complicated if T contains reticulate events such as hybridizations or recombinations, i.e. when T is a network. Indeed, three different versions of the parsimony score on networks have been proposed and each of them is NP-hard to decide. Existing parameterized algorithms focus on combining the number c of possible character-states with the number of reticulate events (per biconnected component). Results We consider the parameter treewidth t of the underlying undirected graph of the input network, presenting dynamic programming algorithms for (slight generalizations of) all three versions of the parsimony problem on size-n networks running in times \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$c^t {n^{O(1)}}$$\end{document}ctnO(1), \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$(3c)^t {n^{O(1)}}$$\end{document}(3c)tnO(1), and \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$6^{tc}n^{O(1)}$$\end{document}6tcnO(1), respectively. Our algorithms use a formulation of the treewidth that may facilitate formalizing treewidth-based dynamic programming algorithms on phylogenetic networks for other problems. Conclusions Our algorithms allow the computation of the three popular parsimony scores, modeling the evolutionary development of a (multistate) character on a given phylogenetic network of low treewidth. Our results subsume and improve previously known algorithm for all three variants. While our results rely on being given a “good” tree-decomposition of the input, encouraging theoretical results as well as practical implementations producing them are publicly available. We present a reformulation of tree decompositions in terms of “agreeing trees” on the same set of nodes. As this formulation may come more natural to researchers and engineers developing algorithms for phylogenetic networks, we hope to render exploiting the input network’s treewidth as parameter more accessible to this audience.
Collapse
|
11
|
The role of neural artificial intelligence for diagnosis and treatment planning in endodontics: A qualitative review. Saudi Dent J 2022; 34:270-281. [PMID: 35692236 PMCID: PMC9177869 DOI: 10.1016/j.sdentj.2022.04.004] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2022] [Revised: 04/12/2022] [Accepted: 04/13/2022] [Indexed: 11/23/2022] Open
Abstract
Introduction The role of artificial intelligence (AI) is currently increasing in terms of diagnosing diseases and planning treatment in endodontics. However, findings from individual research studies are not systematically reviewed and compiled together. Hence, this study aimed to systematically review, appraise, and evaluate neural AI algorithms employed and their comparative efficacy to conventional methods in endodontic diagnosis and treatment planning. Methods The present research question focused on the literature search about different AI algorithms and models of AI assisted endodontic diagnosis and treatment planning. The search engine included databases such as Google Scholar, PubMed, and Science Direct with search criteria of primary research paper, published in English, and analyzed data on AI and its role in the field of endodontics. Results The initial search resulted in 785 articles, exclusion based on abstract relevance, animal studies, grey literature and letter to editors narrowed down the scope of selected articles to 11 accepted for review. The review data supported the findings that AI can play a crucial role in the area of endodontics, such as identification of apical lesions, classifying and numbering teeth, detecting dental caries, periodontitis and periapical disease, diagnosing different dental problems, helping dentists make referrals, and also helping them make plans for treatment of dental disorders in a timely and effective manner with greater accuracy. Conclusion AI with different models or frameworks and algorithms can help dentists to diagnose and manage endodontic problems with greater accuracy. However, endodontic fraternity needs to provide more emphasis on the utilization of AI, provision of evidence based guidelines and implementation of the AI models.
Collapse
|
12
|
Comparative Chloroplast Genome Analysis of Wax Gourd (Benincasa hispida) with Three Benincaseae Species, Revealing Evolutionary Dynamic Patterns and Phylogenetic Implications. Genes (Basel) 2022; 13:genes13030461. [PMID: 35328015 PMCID: PMC8954987 DOI: 10.3390/genes13030461] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Accepted: 03/01/2022] [Indexed: 12/13/2022] Open
Abstract
Benincasa hispida (wax gourd) is an important Cucurbitaceae crop, with enormous economic and medicinal importance. Here, we report the de novo assembly and annotation of the complete chloroplast genome of wax gourd with 156,758 bp in total. The quadripartite structure of the chloroplast genome comprises a large single-copy (LSC) region with 86,538 bp and a small single-copy (SSC) region with 18,060 bp, separated by a pair of inverted repeats (IRa and IRb) with 26,080 bp each. Comparison analyses among B. hispida and three other species from Benincaseae presented a significant conversion regarding nucleotide content, genome structure, codon usage, synonymous and non-synonymous substitutions, putative RNA editing sites, microsatellites, and oligonucleotide repeats. The LSC and SSC regions were found to be much more varied than the IR regions through a divergent analysis of the species within Benincaseae. Notable IR contractions and expansions were observed, suggesting a difference in genome size, gene duplication and deletion, and the presence of pseudogenes. Intronic gene sequences, such as trnR-UCU–atpA and atpH–atpI, were observed as highly divergent regions. Two types of phylogenetic analysis based on the complete cp genome and 72 genes suggested sister relationships between B. hispida with the Citrullus, Lagenaria, and Cucumis. Variations and consistency with previous studies regarding phylogenetic relationships are discussed. The cp genome of B. hispida provides valuable genetic information for the detection of molecular markers, research on taxonomic discrepancies, and the inference of the phylogenetic relationships of Cucurbitaceae.
Collapse
|
13
|
Sanderson MJ, Búrquez A, Copetti D, McMahon MM, Zeng Y, Wojciechowski MF. Origin and diversification of the saguaro cactus (Carnegiea gigantea): a within-species phylogenomic analysis. Syst Biol 2022; 71:1178-1194. [PMID: 35244183 DOI: 10.1093/sysbio/syac017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Revised: 02/18/2022] [Accepted: 02/25/2022] [Indexed: 11/14/2022] Open
Abstract
Reconstructing accurate historical relationships within a species poses numerous challenges, not least in many plant groups in which gene flow is high enough to extend well beyond species boundaries. Nonetheless, the extent of tree-like history within a species is an empirical question on which it is now possible to bring large amounts of genome sequence to bear. We assess phylogenetic structure across the geographic range of the saguaro cactus, an emblematic member of Cactaceae, a clade known for extensive hybridization and porous species boundaries. Using 200 Gb of whole genome resequencing data from 20 individuals sampled from 10 localities, we assembled two data sets comprising 150,000 biallelic single nucleotide polymorphisms (SNPs) from protein coding sequences. From these we inferred within-species trees and evaluated their significance and robustness using five qualitatively different inference methods. Despite the low sequence diversity, large census population sizes, and presence of wide-ranging pollen and seed dispersal agents, phylogenetic trees were well resolved and highly consistent across both data sets and all methods. We inferred that the most likely root, based on marginal likelihood comparisons, is to the east and south of the region of highest genetic diversity, which lies along the coast of the Gulf of California in Sonora, Mexico. Together with striking decreases in marginal likelihood found to the north, this supports hypotheses that saguaro's current range reflects post-glacial expansion from the refugia in the south of its range. We conclude with observations about practical and theoretical issues raised by phylogenomic data sets within species, in which SNP-based methods must be used rather than gene tree methods that are widely used when sequence divergence is higher. These include computational scalability, inference of gene flow, and proper assessment of statistical support in the presence of linkage effects.
Collapse
Affiliation(s)
- Michael J Sanderson
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
| | - Alberto Búrquez
- Instituto de Ecología, Unidad Hermosillo, Universidad Nacional Autónoma de México, Hermosillo, Sonora, Mexico
| | - Dario Copetti
- Arizona Genomics Institute, School of Plant Sciences, University of Arizona, Tucson, AZ, 85721 USA
| | | | - Yichao Zeng
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
| | | |
Collapse
|
14
|
Hibbins MS, Hahn MW. Phylogenomic approaches to detecting and characterizing introgression. Genetics 2022; 220:iyab173. [PMID: 34788444 PMCID: PMC9208645 DOI: 10.1093/genetics/iyab173] [Citation(s) in RCA: 51] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 10/02/2021] [Indexed: 12/26/2022] Open
Abstract
Phylogenomics has revealed the remarkable frequency with which introgression occurs across the tree of life. These discoveries have been enabled by the rapid growth of methods designed to detect and characterize introgression from whole-genome sequencing data. A large class of phylogenomic methods makes use of data across species to infer and characterize introgression based on expectations from the multispecies coalescent. These methods range from simple tests, such as the D-statistic, to model-based approaches for inferring phylogenetic networks. Here, we provide a detailed overview of the various signals that different modes of introgression are expected leave in the genome, and how current methods are designed to detect them. We discuss the strengths and pitfalls of these approaches and identify areas for future development, highlighting the different signals of introgression, and the power of each method to detect them. We conclude with a discussion of current challenges in inferring introgression and how they could potentially be addressed.
Collapse
Affiliation(s)
- Mark S Hibbins
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | - Matthew W Hahn
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
- Department of Computer Science, Indiana University, Bloomington, IN 47405, USA
| |
Collapse
|
15
|
Abstract
Phylogenetic networks represent evolutionary history of species and can record natural reticulate evolutionary processes such as horizontal gene transfer and gene recombination. This makes phylogenetic networks a more comprehensive representation of evolutionary history compared to phylogenetic trees. Stochastic processes for generating random trees or networks are important tools in evolutionary analysis, especially in phylogeny reconstruction where they can be utilized for validation or serve as priors for Bayesian methods. However, as more network generators are developed, there is a lack of discussion or comparison for different generators. To bridge this gap, we compare a set of phylogenetic network generators by profiling topological summary statistics of the generated networks over the number of reticulations and comparing the topological profiles.
Collapse
Affiliation(s)
- Remie Janssen
- Delft University of Technology, Delft Institute of Applied Mathematics, Mekelweg 4, 2628 CD, Delft, The Netherlands
| | - Pengyu Liu
- Simon Fraser University, Department of Mathematics, 8888 University Drive, Burnaby, BC V5A 1S6, Canada
| |
Collapse
|
16
|
Mirarab S, Nakhleh L, Warnow T. Multispecies Coalescent: Theory and Applications in Phylogenetics. ANNUAL REVIEW OF ECOLOGY, EVOLUTION, AND SYSTEMATICS 2021. [DOI: 10.1146/annurev-ecolsys-012121-095340] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Species tree estimation is a basic part of many biological research projects, ranging from answering basic evolutionary questions (e.g., how did a group of species adapt to their environments?) to addressing questions in functional biology. Yet, species tree estimation is very challenging, due to processes such as incomplete lineage sorting, gene duplication and loss, horizontal gene transfer, and hybridization, which can make gene trees differ from each other and from the overall evolutionary history of the species. Over the last 10–20 years, there has been tremendous growth in methods and mathematical theory for estimating species trees and phylogenetic networks, and some of these methods are now in wide use. In this survey, we provide an overview of the current state of the art, identify the limitations of existing methods and theory, and propose additional research problems and directions.
Collapse
Affiliation(s)
- Siavash Mirarab
- Electrical and Computer Engineering Department, University of California, San Diego, La Jolla, California 92093, USA
| | - Luay Nakhleh
- Department of Computer Science, Rice University, Houston, Texas 77005, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, Illinois 61801, USA
| |
Collapse
|
17
|
Yan Z, Cao Z, Liu Y, Ogilvie HA, Nakhleh L. Maximum Parsimony Inference of Phylogenetic Networks in the Presence of Polyploid Complexes. Syst Biol 2021; 71:706-720. [PMID: 34605924 PMCID: PMC9017653 DOI: 10.1093/sysbio/syab081] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Revised: 09/26/2021] [Accepted: 09/29/2021] [Indexed: 12/18/2022] Open
Abstract
Phylogenetic networks provide a powerful framework for modeling and analyzing reticulate
evolutionary histories. While polyploidy has been shown to be prevalent not only in plants
but also in other groups of eukaryotic species, most work done thus far on phylogenetic
network inference assumes diploid hybridization. These inference methods have been
applied, with varying degrees of success, to data sets with polyploid species, even though
polyploidy violates the mathematical assumptions underlying these methods. Statistical
methods were developed recently for handling specific types of polyploids and so were
parsimony methods that could handle polyploidy more generally yet while excluding
processes such as incomplete lineage sorting. In this article, we introduce a new method
for inferring most parsimonious phylogenetic networks on data that include polyploid
species. Taking gene tree topologies as input, the method seeks a phylogenetic network
that minimizes deep coalescences while accounting for polyploidy. We demonstrate the
performance of the method on both simulated and biological data. The inference method as
well as a method for evaluating evolutionary hypotheses in the form of phylogenetic
networks are implemented and publicly available in the PhyloNet software package.
[Incomplete lineage sorting; minimizing deep coalescences; multilabeled trees;
multispecies network coalescent; phylogenetic networks; polyploidy.]
Collapse
Affiliation(s)
- Zhi Yan
- Department of Computer Science, Rice University, Houston, 6100 Main Street, Houston, TX 77005, USA
| | - Zhen Cao
- Department of Computer Science, Rice University, Houston, 6100 Main Street, Houston, TX 77005, USA
| | - Yushu Liu
- Department of Computer Science, Rice University, Houston, 6100 Main Street, Houston, TX 77005, USA
| | - Huw A Ogilvie
- Department of Computer Science, Rice University, Houston, 6100 Main Street, Houston, TX 77005, USA
| | - Luay Nakhleh
- Department of Computer Science, Rice University, Houston, 6100 Main Street, Houston, TX 77005, USA
- Department of Biosciences, Rice University, Houston, 6100 Main Street, Houston, TX 77005, USA
| |
Collapse
|
18
|
Rabier CE, Berry V, Stoltz M, Santos JD, Wang W, Glaszmann JC, Pardi F, Scornavacca C. On the inference of complex phylogenetic networks by Markov Chain Monte-Carlo. PLoS Comput Biol 2021; 17:e1008380. [PMID: 34478440 PMCID: PMC8445492 DOI: 10.1371/journal.pcbi.1008380] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2020] [Revised: 09/16/2021] [Accepted: 07/13/2021] [Indexed: 11/19/2022] Open
Abstract
For various species, high quality sequences and complete genomes are nowadays available for many individuals. This makes data analysis challenging, as methods need not only to be accurate, but also time efficient given the tremendous amount of data to process. In this article, we introduce an efficient method to infer the evolutionary history of individuals under the multispecies coalescent model in networks (MSNC). Phylogenetic networks are an extension of phylogenetic trees that can contain reticulate nodes, which allow to model complex biological events such as horizontal gene transfer, hybridization and introgression. We present a novel way to compute the likelihood of biallelic markers sampled along genomes whose evolution involved such events. This likelihood computation is at the heart of a Bayesian network inference method called SnappNet, as it extends the Snapp method inferring evolutionary trees under the multispecies coalescent model, to networks. SnappNet is available as a package of the well-known beast 2 software. Recently, the MCMC_BiMarkers method, implemented in PhyloNet, also extended Snapp to networks. Both methods take biallelic markers as input, rely on the same model of evolution and sample networks in a Bayesian framework, though using different methods for computing priors. However, SnappNet relies on algorithms that are exponentially more time-efficient on non-trivial networks. Using simulations, we compare performances of SnappNet and MCMC_BiMarkers. We show that both methods enjoy similar abilities to recover simple networks, but SnappNet is more accurate than MCMC_BiMarkers on more complex network scenarios. Also, on complex networks, SnappNet is found to be extremely faster than MCMC_BiMarkers in terms of time required for the likelihood computation. We finally illustrate SnappNet performances on a rice data set. SnappNet infers a scenario that is consistent with previous results and provides additional understanding of rice evolution.
Collapse
Affiliation(s)
- Charles-Elie Rabier
- Institut des Sciences de l’Evolution (ISEM), Université de Montpellier, CNRS, EPHE, IRD, Montpellier, France
- Laboratoire d’Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM), Université de Montpellier, CNRS, Montpellier, France
- Institut Montpelliérain Alexander Grothendieck (IMAG), Université de Montpellier, CNRS, Montpellier, France
| | - Vincent Berry
- Laboratoire d’Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM), Université de Montpellier, CNRS, Montpellier, France
| | - Marnus Stoltz
- Institut des Sciences de l’Evolution (ISEM), Université de Montpellier, CNRS, EPHE, IRD, Montpellier, France
| | - João D. Santos
- CIRAD, UMR AGAP, Montpellier, France
- Amélioration Génétique et Adaptation des Plantes méditerranéennes et tropicales (AGAP), Université de Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
| | - Wensheng Wang
- Institute of Crop Sciences (ICS), Chinese Academy of Agricultural Sciences, Beijing, China
| | - Jean-Christophe Glaszmann
- CIRAD, UMR AGAP, Montpellier, France
- Amélioration Génétique et Adaptation des Plantes méditerranéennes et tropicales (AGAP), Université de Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France
| | - Fabio Pardi
- Laboratoire d’Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM), Université de Montpellier, CNRS, Montpellier, France
| | - Celine Scornavacca
- Institut des Sciences de l’Evolution (ISEM), Université de Montpellier, CNRS, EPHE, IRD, Montpellier, France
| |
Collapse
|
19
|
Wang Y, Cao Z, Ogilvie HA, Nakhleh L. Phylogenomic assessment of the role of hybridization and introgression in trait evolution. PLoS Genet 2021; 17:e1009701. [PMID: 34407067 PMCID: PMC8405015 DOI: 10.1371/journal.pgen.1009701] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 08/30/2021] [Accepted: 07/07/2021] [Indexed: 11/30/2022] Open
Abstract
Trait evolution among a set of species-a central theme in evolutionary biology-has long been understood and analyzed with respect to a species tree. However, the field of phylogenomics, which has been propelled by advances in sequencing technologies, has ushered in the era of species/gene tree incongruence and, consequently, a more nuanced understanding of trait evolution. For a trait whose states are incongruent with the branching patterns in the species tree, the same state could have arisen independently in different species (homoplasy) or followed the branching patterns of gene trees, incongruent with the species tree (hemiplasy). Another evolutionary process whose extent and significance are better revealed by phylogenomic studies is gene flow between different species. In this work, we present a phylogenomic method for assessing the role of hybridization and introgression in the evolution of polymorphic or monomorphic binary traits. We apply the method to simulated evolutionary scenarios to demonstrate the interplay between the parameters of the evolutionary history and the role of introgression in a binary trait's evolution (which we call xenoplasy). Very importantly, we demonstrate, including on a biological data set, that inferring a species tree and using it for trait evolution analysis in the presence of gene flow could lead to misleading hypotheses about trait evolution.
Collapse
Affiliation(s)
- Yaxuan Wang
- Department of Computer Science, Rice University, Houston, Texas, United States of America
| | - Zhen Cao
- Department of Computer Science, Rice University, Houston, Texas, United States of America
| | - Huw A. Ogilvie
- Department of Computer Science, Rice University, Houston, Texas, United States of America
| | - Luay Nakhleh
- Department of Computer Science, Rice University, Houston, Texas, United States of America
- Department of BioSciences, Rice University, Houston, Texas, United States of America
| |
Collapse
|
20
|
Esquerré D, Keogh JS, Demangel D, Morando M, Avila LJ, Sites JW, Ferri-Yáñez F, Leaché AD. Rapid radiation and rampant reticulation: Phylogenomics of South American Liolaemus lizards. Syst Biol 2021; 71:286-300. [PMID: 34259868 DOI: 10.1093/sysbio/syab058] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2019] [Revised: 06/25/2021] [Accepted: 06/30/2021] [Indexed: 01/09/2023] Open
Abstract
Understanding the factors that cause heterogeneity among gene trees can increase the accuracy of species trees. Discordant signals across the genome are commonly produced by incomplete lineage sorting (ILS) and introgression, which in turn can result in reticulate evolution. Species tree inference using the multispecies coalescent is designed to deal with ILS and is robust to low levels of introgression, but extensive introgression violates the fundamental assumption that relationships are strictly bifurcating. In this study, we explore the phylogenomics of the iconic Liolaemus subgenus of South American lizards, a group of over 100 species mostly distributed in and around the Andes mountains. Using mitochondrial DNA (mtDNA) and genome-wide restriction-site associated DNA sequencing (RADseq; nDNA hereafter), we inferred a time-calibrated mtDNA gene tree, nDNA species trees, and phylogenetic networks. We found high levels of discordance between mtDNA and nDNA, which we attribute in part to extensive ILS resulting from rapid diversification. These data also reveal extensive and deep introgression, which combined with rapid diversification, explain the high level of phylogenetic discordance. We discuss these findings in the context of Andean orogeny and glacial cycles that fragmented, expanded, and contracted species distributions. Finally, we use the new phylogeny to resolve long-standing taxonomic issues in one of the most studied lizard groups in the New World.
Collapse
Affiliation(s)
- Damien Esquerré
- Division of Ecology and Evolution, Research School of Biology, The Australian National University, Canberra, ACT, Australia
| | - J Scott Keogh
- Division of Ecology and Evolution, Research School of Biology, The Australian National University, Canberra, ACT, Australia
| | | | - Mariana Morando
- Instituto Patagónico para el Estudio de los Ecosistemas Continentales (IPEEC- CONICET), Puerto Madryn, Chubut, Argentina
| | - Luciano J Avila
- Instituto Patagónico para el Estudio de los Ecosistemas Continentales (IPEEC- CONICET), Puerto Madryn, Chubut, Argentina
| | - Jack W Sites
- Department of Biology and M.L. Bean Life Science Museum, Brigham Young University, Provo, Utah, USA
| | - Francisco Ferri-Yáñez
- Departamento de Biogeografía y Cambio Global, Museo Nacional de Ciencias Naturales, CSIC & Laboratorio Internacional en Cambio Global CSIC-PUC (LINCGlobal), Calle José Gutiérrez Abascal, 2, 28006, Madrid, Spain
| | - Adam D Leaché
- Department of Biology & Burke Museum of Natural History and Culture, University of Washington, Seattle, Washington, USA
| |
Collapse
|
21
|
Genomic phylogeography of the White-crowned Manakin Pseudopipra pipra (Aves: Pipridae) illuminates a continental-scale radiation out of the Andes. Mol Phylogenet Evol 2021; 164:107205. [PMID: 34015448 DOI: 10.1016/j.ympev.2021.107205] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2020] [Revised: 04/30/2021] [Accepted: 05/06/2021] [Indexed: 11/24/2022]
Abstract
The complex landscape history of the Neotropics has generated opportunities for population isolation and diversification that place this region among the most species-rich in the world. Detailed phylogeographic studies are required to uncover the biogeographic histories of Neotropical taxa, to identify evolutionary correlates of diversity, and to reveal patterns of genetic connectivity, disjunction, and potential differentiation among lineages from different areas of endemism. The White-crowned Manakin (Pseudopipra pipra) is a small suboscine passerine bird that is broadly distributed through the subtropical rainforests of Central America, the lower montane cloud forests of the Andes from Colombia to central Peru, the lowlands of Amazonia and the Guianas, and the Atlantic forest of southeast Brazil. Pseudopipra is currently recognized as a single, polytypic biological species. We studied the effect of the Neotropical landscape on genetic and phenotypic differentiation within this species using genomic data derived from double digest restriction site associated DNA sequencing (ddRAD), and mitochondrial DNA. Most of the genetic breakpoints we identify among populations coincide with physical barriers to gene flow previously associated with avian areas of endemism. The phylogenetic relationships among these populations imply a novel pattern of Andean origination for this group, with subsequent diversification into the Amazonian lowlands. Our analysis of genomic admixture and gene flow reveals a complex history of introgression between some western Amazonian populations. These reticulate processes confound our application of standard concatenated and coalescent phylogenetic methods and raise the question of whether a lineage in the western Napo area of endemism should be considered a hybrid species. Lastly, analysis of variation in vocal and plumage phenotypes in the context of our phylogeny supports the hypothesis that Pseudopipra is a species-complex composed of at least 8, and perhaps up to 17 distinct species which have arisen in the last ∼2.5 Ma.
Collapse
|
22
|
Wang Y, Ogilvie HA, Nakhleh L. Practical Speedup of Bayesian Inference of Species Phylogenies by Restricting the Space of Gene Trees. Mol Biol Evol 2021; 37:1809-1818. [PMID: 32077947 PMCID: PMC7253205 DOI: 10.1093/molbev/msaa045] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Species tree inference from multilocus data has emerged as a powerful paradigm in the postgenomic era, both in terms of the accuracy of the species tree it produces as well as in terms of elucidating the processes that shaped the evolutionary history. Bayesian methods for species tree inference are desirable in this area as they have been shown not only to yield accurate estimates, but also to naturally provide measures of confidence in those estimates. However, the heavy computational requirements of Bayesian inference have limited the applicability of such methods to very small data sets. In this article, we show that the computational efficiency of Bayesian inference under the multispecies coalescent can be improved in practice by restricting the space of the gene trees explored during the random walk, without sacrificing accuracy as measured by various metrics. The idea is to first infer constraints on the trees of the individual loci in the form of unresolved gene trees, and then to restrict the sampler to consider only resolutions of the constrained trees. We demonstrate the improvements gained by such an approach on both simulated and biological data.
Collapse
Affiliation(s)
- Yaxuan Wang
- Computer Science Department, Rice University, Houston, TX
| | - Huw A Ogilvie
- Computer Science Department, Rice University, Houston, TX
| | - Luay Nakhleh
- Computer Science Department, Rice University, Houston, TX
| |
Collapse
|
23
|
Zhu J, Liu X, Ogilvie HA, Nakhleh LK. A divide-and-conquer method for scalable phylogenetic network inference from multilocus data. Bioinformatics 2020; 35:i370-i378. [PMID: 31510688 PMCID: PMC6612858 DOI: 10.1093/bioinformatics/btz359] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Motivation Reticulate evolutionary histories, such as those arising in the presence of hybridization, are best modeled as phylogenetic networks. Recently developed methods allow for statistical inference of phylogenetic networks while also accounting for other processes, such as incomplete lineage sorting. However, these methods can only handle a small number of loci from a handful of genomes. Results In this article, we introduce a novel two-step method for scalable inference of phylogenetic networks from the sequence alignments of multiple, unlinked loci. The method infers networks on subproblems and then merges them into a network on the full set of taxa. To reduce the number of trinets to infer, we formulate a Hitting Set version of the problem of finding a small number of subsets, and implement a simple heuristic to solve it. We studied their performance, in terms of both running time and accuracy, on simulated as well as on biological datasets. The two-step method accurately infers phylogenetic networks at a scale that is infeasible with existing methods. The results are a significant and promising step towards accurate, large-scale phylogenetic network inference. Availability and implementation We implemented the algorithms in the publicly available software package PhyloNet (https://bioinfocs.rice.edu/PhyloNet). Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jiafan Zhu
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Xinhao Liu
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Huw A Ogilvie
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Luay K Nakhleh
- Department of Computer Science, Rice University, Houston, TX, USA.,Department of BioSciences, Rice University, Houston, TX, USA
| |
Collapse
|
24
|
Crowl AA, Manos PS, McVay JD, Lemmon AR, Lemmon EM, Hipp AL. Uncovering the genomic signature of ancient introgression between white oak lineages (Quercus). THE NEW PHYTOLOGIST 2020; 226:1158-1170. [PMID: 30963585 DOI: 10.1111/nph.15842] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Accepted: 03/29/2019] [Indexed: 05/10/2023]
Abstract
Botanists have long recognised interspecific gene flow as a common occurrence within white oaks (Quercus section Quercus). Historical allele exchange, however, has not been fully characterised and the complex genomic signals resulting from the combination of vertical and horizontal gene transmission may confound phylogenetic inference and obscure our ability to accurately infer the deep evolutionary history of oaks. Using anchored enrichment, we obtained a phylogenomic dataset consisting of hundreds of single-copy nuclear loci. Concatenation, species-tree and network analyses were carried out in an attempt to uncover the genomic signal of ancient introgression and infer the divergent phylogenetic topology for the white oak clade. Locus and site-level likelihood comparisons were then conducted to further explore the introgressed signal within our dataset. Historical, intersectional gene flow is suggested to have occurred between an ancestor of the Eurasian Roburoid lineage and Quercus pontica and North American Dumosae and Prinoideae lineages. Despite extensive time past, our approach proved successful in detecting the genomic signature of ancient introgression. Our results, however, highlight the importance of sampling and the use of a plurality of analytical tools and methods to sufficiently explore genomic datasets, uncover this signal, and accurately infer evolutionary history.
Collapse
Affiliation(s)
- Andrew A Crowl
- Department of Biology, Duke University, Durham, NC, 27708, USA
| | - Paul S Manos
- Department of Biology, Duke University, Durham, NC, 27708, USA
| | - John D McVay
- Department of Biology, Duke University, Durham, NC, 27708, USA
| | - Alan R Lemmon
- Department of Scientific Computing, Florida State University, Dirac Science Library, Tallahassee, FL, 32317, USA
| | - Emily Moriarty Lemmon
- Department of Biological Science, Florida State University, 89 Chieftan Way, Tallahassee, FL, 32317, USA
| | - Andrew L Hipp
- The Morton Arboretum, 4100 Illinois Route 53, Lisle, IL, 60532, USA
- The Field Museum, 1400 S Lake Shore Drive, Chicago, IL, 60605, USA
| |
Collapse
|
25
|
Jiang CK, Ma JQ, Liu YF, Chen JD, Ni DJ, Chen L. Identification and distribution of a single nucleotide polymorphism responsible for the catechin content in tea plants. HORTICULTURE RESEARCH 2020; 7:24. [PMID: 32140233 PMCID: PMC7049304 DOI: 10.1038/s41438-020-0247-y] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Revised: 12/23/2019] [Accepted: 01/04/2020] [Indexed: 05/15/2023]
Abstract
Catechins are the predominant products in tea plants and have essential functions for both plants and humans. Several genes encoding the enzymes regulating catechin biosynthesis have been identified, and the identification of single nucleotide polymorphisms (SNPs) resulting in nonsynonymous mutations within these genes can be used to establish a functional link to catechin content. Therefore, the transcriptomes of two parents and four filial offspring were sequenced using next-generation sequencing technology and aligned to the reference genome to enable SNP mining. Subsequently, 176 tea plant accessions were genotyped based on candidate SNPs using kompetitive allele-specific polymerase chain reaction (KASP). The catechin contents of these samples were characterized by high-performance liquid chromatography (HPLC), and analysis of variance (ANOVA) was subsequently performed to determine the relationship between genotypes and catechin content. As a result of these efforts, a SNP within the chalcone synthase (CHS) gene was shown to be functionally associated with catechin content. Furthermore, the geographical and interspecific distribution of this SNP was investigated. Collectively, these results will contribute to the early evaluation of tea plants and serve as a rapid tool for accelerating targeted efforts in tea breeding.
Collapse
Affiliation(s)
- Chen-Kai Jiang
- Key Laboratory of Tea Biology and Resources Utilization, Ministry of Agriculture and Rural Affairs, Tea Research Institute of the Chinese Academy of Agricultural Sciences, 9 South Meiling Road, Hangzhou, Zhejiang 310008 China
- College of Horticulture and Forestry Science, Huazhong Agricultural University, 1 Shizishan Street, Hongshan District, Wuhan, Hubei 430070 China
| | - Jian-Qiang Ma
- Key Laboratory of Tea Biology and Resources Utilization, Ministry of Agriculture and Rural Affairs, Tea Research Institute of the Chinese Academy of Agricultural Sciences, 9 South Meiling Road, Hangzhou, Zhejiang 310008 China
| | - Yu-Fei Liu
- Key Laboratory of Tea Biology and Resources Utilization, Ministry of Agriculture and Rural Affairs, Tea Research Institute of the Chinese Academy of Agricultural Sciences, 9 South Meiling Road, Hangzhou, Zhejiang 310008 China
- Tea Research Institute, Yunnan Academy of Agricultural Sciences, Menghai, Yunnan 666201 China
| | - Jie-Dan Chen
- Key Laboratory of Tea Biology and Resources Utilization, Ministry of Agriculture and Rural Affairs, Tea Research Institute of the Chinese Academy of Agricultural Sciences, 9 South Meiling Road, Hangzhou, Zhejiang 310008 China
| | - De-Jiang Ni
- College of Horticulture and Forestry Science, Huazhong Agricultural University, 1 Shizishan Street, Hongshan District, Wuhan, Hubei 430070 China
| | - Liang Chen
- Key Laboratory of Tea Biology and Resources Utilization, Ministry of Agriculture and Rural Affairs, Tea Research Institute of the Chinese Academy of Agricultural Sciences, 9 South Meiling Road, Hangzhou, Zhejiang 310008 China
| |
Collapse
|
26
|
Olave M, Meyer A. Implementing Large Genomic Single Nucleotide Polymorphism Data Sets in Phylogenetic Network Reconstructions: A Case Study of Particularly Rapid Radiations of Cichlid Fish. Syst Biol 2020; 69:848-862. [DOI: 10.1093/sysbio/syaa005] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2019] [Revised: 01/09/2020] [Accepted: 01/23/2020] [Indexed: 12/23/2022] Open
Abstract
AbstractThe Midas cichlids of the Amphilophus citrinellus spp. species complex from Nicaragua (13 species) are an extraordinary example of adaptive and rapid radiation ($<$24,000 years old). These cichlids are a very challenging group to infer its evolutionary history in phylogenetic analyses, due to the apparent prevalence of incomplete lineage sorting (ILS), as well as past and current gene flow. Assuming solely a vertical transfer of genetic material from an ancestral lineage to new lineages is not appropriate in many cases of genes transferred horizontally in nature. Recently developed methods to infer phylogenetic networks under such circumstances might be able to circumvent these problems. These models accommodate not just ILS, but also gene flow, under the multispecies network coalescent (MSNC) model, processes that are at work in young, hybridizing, and/or rapidly diversifying lineages. There are currently only a few programs available that implement MSNC for estimating phylogenetic networks. Here, we present a novel way to incorporate single nucleotide polymorphism (SNP) data into the currently available PhyloNetworks program. Based on simulations, we demonstrate that SNPs can provide enough power to recover the true phylogenetic network. We also show that it can accurately infer the true network more often than other similar SNP-based programs (PhyloNet and HyDe). Moreover, our approach results in a faster algorithm compared to the original pipeline in PhyloNetworks, without losing power. We also applied our new approach to infer the phylogenetic network of Midas cichlid radiation. We implemented the most comprehensive genomic data set to date (RADseq data set of 679 individuals and $>$37K SNPs from 19 ingroup lineages) and present estimated phylogenetic networks for this extremely young and fast-evolving radiation of cichlid fish. We demonstrate that the MSNC is more appropriate than the multispecies coalescent alone for the analysis of this rapid radiation. [Genomics; multispecies network coalescent; phylogenetic networks; phylogenomics; RADseq; SNPs.]
Collapse
Affiliation(s)
- Melisa Olave
- Department of Biology, University of Konstanz, 78457 Konstanz, Germany
| | - Axel Meyer
- Department of Biology, University of Konstanz, 78457 Konstanz, Germany
| |
Collapse
|
27
|
Granados-Aguilar X, Granados Mendoza C, Cervantes CR, Montes JR, Arias S. Unraveling Reticulate Evolution in Opuntia (Cactaceae) From Southern Mexico. FRONTIERS IN PLANT SCIENCE 2020; 11:606809. [PMID: 33519858 PMCID: PMC7838128 DOI: 10.3389/fpls.2020.606809] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Accepted: 12/10/2020] [Indexed: 05/20/2023]
Abstract
The process of hybridization occurs in approximately 40% of vascular plants, and this exchange of genetic material between non-conspecific individuals occurs unequally among plant lineages, being more frequent in certain groups such as Opuntia (Cactaceae). This genus is known for multiple taxonomic controversies due to widespread polyploidy and probable hybrid origin of several of its species. Southern Mexico species of this genus have been poorly studied despite their great diversity in regions such as the Tehuacán-Cuicatlán Valley which contains around 12% of recognized Mexico's native Opuntia species. In this work, we focus on testing the hybrid status of two putative hybrids from this region, Opuntia tehuacana and Opuntia pilifera, and estimate if hybridization occurs among sampled southern opuntias using two newly identified nuclear intron markers to construct phylogenetic networks with HyDe and Dsuite and perform invariant analysis under the coalescent model with HyDe and Dsuite. For the test of hybrid origin in O. tehuacana, our results could not recover hybridization as proposed in the literature, but we found introgression into O. tehuacana individuals involving O. decumbens and O. huajuapensis. Regarding O. pilifera, we identified O. decumbens as probable parental species, supported by our analysis, which sustains the previous hybridization hypothesis between Nopalea and Basilares clades. Finally, we suggest new hybridization and introgression cases among southern Mexican species involving O. tehuantepecana and O. depressa as parental species of O. velutina and O. decumbens.
Collapse
Affiliation(s)
- Xochitl Granados-Aguilar
- Posgrado en Ciencias Biológicas, Instituto de Biología, Universidad Nacional Autónoma de México, Mexico City, Mexico
- Jardín Botánico, Instituto de Biología, Universidad Nacional Autónoma de México, Mexico City, Mexico
- *Correspondence: Xochitl Granados-Aguilar,
| | - Carolina Granados Mendoza
- Departamento de Botánica, Instituto de Biología, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Cristian Rafael Cervantes
- Posgrado en Ciencias Biológicas, Instituto de Biología, Universidad Nacional Autónoma de México, Mexico City, Mexico
- Jardín Botánico, Instituto de Biología, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - José Rubén Montes
- Posgrado en Ciencias Biológicas, Instituto de Biología, Universidad Nacional Autónoma de México, Mexico City, Mexico
- Departamento de Botánica, Instituto de Biología, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Salvador Arias
- Jardín Botánico, Instituto de Biología, Universidad Nacional Autónoma de México, Mexico City, Mexico
- Salvador Arias,
| |
Collapse
|
28
|
Blanco-Pastor JL, Bertrand YJK, Liberal IM, Wei Y, Brummer EC, Pfeil BE. Evolutionary networks from RADseq loci point to hybrid origins of Medicago carstiensis and Medicago cretacea. AMERICAN JOURNAL OF BOTANY 2019; 106:1219-1228. [PMID: 31535720 DOI: 10.1002/ajb2.1352] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/21/2019] [Accepted: 07/12/2019] [Indexed: 06/10/2023]
Abstract
PREMISE Although hybridization has played an important role in the evolution of many plant species, phylogenetic reconstructions that include hybridizing lineages have been historically constrained by the available models and data. Restriction-site-associated DNA sequencing (RADseq) has been a popular sequencing technique for the reconstruction of hybridization in the next-generation sequencing era. However, the utility of RADseq for the reconstruction of complex evolutionary networks has not been thoroughly investigated. Conflicting phylogenetic relationships in the genus Medicago have been mainly attributed to hybridization, but the specific hybrid origins of taxa have not been yet clarified. METHODS We obtained new molecular data from diploid species of Medicago section Medicago using single-digest RADseq to reconstruct evolutionary networks from gene trees, an approach that is computationally tractable with data sets that include several species and complex hybridization patterns. RESULTS Our analyses revealed that assembly filters to exclusively select a small set of loci with high phylogenetic information led to the most-divergent network topologies. Conversely, alternative clustering thresholds or filters on the number of samples per locus had a lower impact on networks. A strong hybridization signal was detected for M. carstiensis and M. cretacea, while signals were less clear for M. rugosa, M. rhodopea, M. suffruticosa, M. marina, M. scutellata, and M. sativa. CONCLUSIONS Complex network reconstructions from RADseq gene trees were not robust under variations of the assembly parameters and filters. But when the most-divergent networks were discarded, all remaining analyses consistently supported a hybrid origin for M. carstiensis and M. cretacea.
Collapse
Affiliation(s)
- José Luis Blanco-Pastor
- Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, 40530, Göteborg, Sweden
- INRA, Centre Nouvelle-Aquitaine-Poitiers, UR4 (URP3F), 86600, Lusignan, France
| | - Yann J K Bertrand
- Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, 40530, Göteborg, Sweden
- Institute of Botany, Czech Academy of Sciences, Zámek 1, 25243, Průhonice, Czech Republic
| | | | - Yanling Wei
- Plant Breeding Center, Department of Plant Sciences, University of California, Davis, Davis, CA, USA
| | - E Charles Brummer
- Plant Breeding Center, Department of Plant Sciences, University of California, Davis, Davis, CA, USA
| | - Bernard E Pfeil
- Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, 40530, Göteborg, Sweden
| |
Collapse
|
29
|
Zhu J, Nakhleh L. Inference of species phylogenies from bi-allelic markers using pseudo-likelihood. Bioinformatics 2019; 34:i376-i385. [PMID: 29950004 PMCID: PMC6022577 DOI: 10.1093/bioinformatics/bty295] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Motivation Phylogenetic networks represent reticulate evolutionary histories. Statistical methods for their inference under the multispecies coalescent have recently been developed. A particularly powerful approach uses data that consist of bi-allelic markers (e.g. single nucleotide polymorphism data) and allows for exact likelihood computations of phylogenetic networks while numerically integrating over all possible gene trees per marker. While the approach has good accuracy in terms of estimating the network and its parameters, likelihood computations remain a major computational bottleneck and limit the method’s applicability. Results In this article, we first demonstrate why likelihood computations of networks take orders of magnitude more time when compared to trees. We then propose an approach for inference of phylogenetic networks based on pseudo-likelihood using bi-allelic markers. We demonstrate the scalability and accuracy of phylogenetic network inference via pseudo-likelihood computations on simulated data. Furthermore, we demonstrate aspects of robustness of the method to violations in the underlying assumptions of the employed statistical model. Finally, we demonstrate the application of the method to biological data. The proposed method allows for analyzing larger datasets in terms of the numbers of taxa and reticulation events. While pseudo-likelihood had been proposed before for data consisting of gene trees, the work here uses sequence data directly, offering several advantages as we discuss. Availability and implementation The methods have been implemented in PhyloNet (http://bioinfocs.rice.edu/phylonet).
Collapse
Affiliation(s)
- Jiafan Zhu
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Luay Nakhleh
- Department of Computer Science, Rice University, Houston, TX, USA.,Department of BioSciences, Rice University, Houston, TX, USA
| |
Collapse
|
30
|
Zhang C, Ogilvie HA, Drummond AJ, Stadler T. Bayesian Inference of Species Networks from Multilocus Sequence Data. Mol Biol Evol 2019; 35:504-517. [PMID: 29220490 PMCID: PMC5850812 DOI: 10.1093/molbev/msx307] [Citation(s) in RCA: 103] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
Reticulate species evolution, such as hybridization or introgression, is relatively common in nature. In the presence of reticulation, species relationships can be captured by a rooted phylogenetic network, and orthologous gene evolution can be modeled as bifurcating gene trees embedded in the species network. We present a Bayesian approach to jointly infer species networks and gene trees from multilocus sequence data. A novel birth-hybridization process is used as the prior for the species network, and we assume a multispecies network coalescent prior for the embedded gene trees. We verify the ability of our method to correctly sample from the posterior distribution, and thus to infer a species network, through simulations. To quantify the power of our method, we reanalyze two large data sets of genes from spruces and yeasts. For the three closely related spruces, we verify the previously suggested homoploid hybridization event in this clade; for the yeast data, we find extensive hybridization events. Our method is available within the BEAST 2 add-on SpeciesNetwork, and thus provides an extensible framework for Bayesian inference of reticulate evolution.
Collapse
Affiliation(s)
- Chi Zhang
- Department of Biosystems Science and Engineering, Eidgenössische Technische Hochschule Zürich, Basel, Switzerland.,Swiss Institute of Bioinformatics (SIB), Switzerland.,Key Laboratory of Vertebrate Evolution and Human Origins of Chinese Academy of Sciences, Institute of Vertebrate Paleontology and Paleoanthropology, Chinese Academy of Sciences, Beijing, China
| | - Huw A Ogilvie
- Division of Ecology and Evolution, Research School of Biology, Australian National University, Canberra, Australia.,Centre for Computational Evolution, University of Auckland, Auckland, New Zealand
| | - Alexei J Drummond
- Centre for Computational Evolution, University of Auckland, Auckland, New Zealand.,Department of Computer Science, University of Auckland, Auckland, New Zealand
| | - Tanja Stadler
- Department of Biosystems Science and Engineering, Eidgenössische Technische Hochschule Zürich, Basel, Switzerland.,Swiss Institute of Bioinformatics (SIB), Switzerland
| |
Collapse
|
31
|
Advances in Computational Methods for Phylogenetic Networks in the Presence of Hybridization. BIOINFORMATICS AND PHYLOGENETICS 2019. [DOI: 10.1007/978-3-030-10837-3_13] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
32
|
Abstract
PhyloNet was released in 2008 as a software package for representing and analyzing phylogenetic networks. At the time of its release, the main functionalities in PhyloNet consisted of measures for comparing network topologies and a single heuristic for reconciling gene trees with a species tree. Since then, PhyloNet has grown significantly. The software package now includes a wide array of methods for inferring phylogenetic networks from data sets of unlinked loci while accounting for both reticulation (e.g., hybridization) and incomplete lineage sorting. In particular, PhyloNet now allows for maximum parsimony, maximum likelihood, and Bayesian inference of phylogenetic networks from gene tree estimates. Furthermore, Bayesian inference directly from sequence data (sequence alignments or biallelic markers) is implemented. Maximum parsimony is based on an extension of the "minimizing deep coalescences" criterion to phylogenetic networks, whereas maximum likelihood and Bayesian inference are based on the multispecies network coalescent. All methods allow for multiple individuals per species. As computing the likelihood of a phylogenetic network is computationally hard, PhyloNet allows for evaluation and inference of networks using a pseudolikelihood measure. PhyloNet summarizes the results of the various analyzes and generates phylogenetic networks in the extended Newick format that is readily viewable by existing visualization software.
Collapse
Affiliation(s)
| | | | | | - Luay Nakhleh
- Computer Science.,BioSciences, Rice University, 6100 Main Street, Houston, TX 77005, USA
| |
Collapse
|
33
|
Abstract
Artificial intelligence (AI) is a commonly used term in daily life, and there are now two subconcepts that divide the entire range of meanings currently encompassed by the term. The coexistence of the concepts of strong and weak AI can be seen as a result of the recognition of the limits of mathematical and engineering concepts that have dominated the definition. This presentation reviewed the concept, history, and the current application of AI in daily life. Applications of AI are becoming a reality that is commonplace in all areas of modern human life. Efforts to develop robots controlled by AI have been continuously carried out to maximize human convenience. AI has also been applied in the medical decision-making process, and these AI systems can help nonspecialists to obtain expert-level information. Artificial neural networks are highly interconnected networks of computer processors inspired by biological nervous systems. These systems may help connect dental professionals all over the world. Currently, the use of AI is rapidly advancing beyond text-based, image-based dental practice. This presentation reviewed the history of artificial neural networks in the medical and dental fields, as well as current application in dentistry. As the use of AI in the entire medical field increases, the role of AI in dentistry will be greatly expanded. Currently, the use of AI is rapidly advancing beyond text-based, image-based dental practice. In addition to diagnosis of visually confirmed dental caries and impacted teeth, studies applying machine learning based on artificial neural networks to dental treatment through analysis of dental magnetic resonance imaging, computed tomography, and cephalometric radiography are actively underway, and some visible results are emerging at a rapid pace for commercialization.
Collapse
Affiliation(s)
- Wook Joo Park
- Department of Philosophy of Religion, College of Theology, The United Graduate School of Theology in Yonsei University, Seoul, Republic of Korea
| | - Jun-Beom Park
- Department of Periodontics, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea
| |
Collapse
|
34
|
Morales-Briones DF, Liston A, Tank DC. Phylogenomic analyses reveal a deep history of hybridization and polyploidy in the Neotropical genus Lachemilla (Rosaceae). THE NEW PHYTOLOGIST 2018; 218:1668-1684. [PMID: 29604235 DOI: 10.1111/nph.15099] [Citation(s) in RCA: 93] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2017] [Accepted: 02/09/2018] [Indexed: 05/10/2023]
Abstract
Hybridization, incomplete lineage sorting, and phylogenetic error produce similar incongruence patterns, representing a great challenge for phylogenetic reconstruction. Here, we use sequence capture data and multiple species tree and species network approaches to resolve the backbone phylogeny of the Neotropical genus Lachemilla, while distinguishing among sources of incongruence. We used 396 nuclear loci and nearly complete plastome sequences from 27 species to clarify the relationships among the major groups of Lachemilla, and explored multiple sources of conflict between gene trees and species trees inferred with a plurality of approaches. All phylogenetic methods recovered the four major groups previously proposed for Lachemilla, but species tree methods recovered different topologies for relationships between these four clades. Species network analyses revealed that one major clade, Orbiculate, is likely of ancient hybrid origin, representing one of the main sources of incongruence among the species trees. Additionally, we found evidence for a potential whole genome duplication event shared by Lachemilla and allied genera. Lachemilla shows clear evidence of ancient and recent hybridization throughout the evolutionary history of the group. Also, we show the necessity to use phylogenetic network approaches that can simultaneously accommodate incomplete lineage sorting and gene flow when studying groups that show patterns of reticulation.
Collapse
Affiliation(s)
- Diego F Morales-Briones
- Department of Biological Sciences, University of Idaho, 875 Perimeter Drive MS 3051, Moscow, ID, 83844-3051, USA
- Institute for Bioinformatics and Evolutionary Studies, University of Idaho, 875 Perimeter Drive MS 3051, Moscow, ID, 83844-3051, USA
- Stillinger Herbarium, University of Idaho, 875 Perimeter Drive MS 3051, Moscow, ID, 83844-3051, USA
| | - Aaron Liston
- Department of Botany and Plant Pathology, Oregon State University, 2082 Cordley Hall, Corvallis, OR, 97331, USA
| | - David C Tank
- Department of Biological Sciences, University of Idaho, 875 Perimeter Drive MS 3051, Moscow, ID, 83844-3051, USA
- Institute for Bioinformatics and Evolutionary Studies, University of Idaho, 875 Perimeter Drive MS 3051, Moscow, ID, 83844-3051, USA
- Stillinger Herbarium, University of Idaho, 875 Perimeter Drive MS 3051, Moscow, ID, 83844-3051, USA
| |
Collapse
|