Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Rasmussen MD, Kellis M. Unified modeling of gene duplication, loss, and coalescence using a locus tree. Genome Res 2012;22:755-65. [PMID: 22271778 DOI: 10.1101/gr.123901.111] [Citation(s) in RCA: 97] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

For:	Rasmussen MD, Kellis M. Unified modeling of gene duplication, loss, and coalescence using a locus tree. Genome Res 2012;22:755-65. [PMID: 22271778 DOI: 10.1101/gr.123901.111] [Citation(s) in RCA: 97] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Number

Cited by Other Article(s)

Lambert A. Ages, sizes and (trees within) trees of taxa and of urns, from Yule to today. Philos Trans R Soc Lond B Biol Sci 2025;380:20230305. [PMID: 39976410 PMCID: PMC11867158 DOI: 10.1098/rstb.2023.0305] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2024] [Revised: 09/19/2024] [Accepted: 09/20/2024] [Indexed: 02/21/2025] Open

Hakim SA, Ratul MRZ, Bayzid MS. wQFM-DISCO: DISCO-enabled wQFM improves phylogenomic analyses despite the presence of paralogs. BIOINFORMATICS ADVANCES 2024;4:vbae189. [PMID: 39664861 PMCID: PMC11634537 DOI: 10.1093/bioadv/vbae189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/22/2024] [Revised: 10/18/2024] [Accepted: 11/24/2024] [Indexed: 12/13/2024]

Schrago CG, Mello B. Challenges in Assembling the Dated Tree of Life. Genome Biol Evol 2024;16:evae229. [PMID: 39475308 PMCID: PMC11523137 DOI: 10.1093/gbe/evae229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/15/2024] [Indexed: 11/02/2024] Open

Bentz PC, Leebens‐Mack J. Developing Asparagaceae1726: An Asparagaceae-specific probe set targeting 1726 loci for Hyb-Seq and phylogenomics in the family. APPLICATIONS IN PLANT SCIENCES 2024;12:e11597. [PMID: 39360194 PMCID: PMC11443443 DOI: 10.1002/aps3.11597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 01/18/2024] [Accepted: 02/19/2024] [Indexed: 10/04/2024]

Li Q, Chan YB, Galtier N, Scornavacca C. The Effect of Copy Number Hemiplasy on Gene Family Evolution. Syst Biol 2024;73:355-374. [PMID: 38330161 DOI: 10.1093/sysbio/syae007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Revised: 01/24/2024] [Accepted: 02/03/2024] [Indexed: 02/10/2024] Open

Górecki P, Rutecka N, Mykowiecka A, Paszek J. Unifying duplication episode clustering and gene-species mapping inference. Algorithms Mol Biol 2024;19:7. [PMID: 38355611 PMCID: PMC10865717 DOI: 10.1186/s13015-024-00252-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Accepted: 01/04/2024] [Indexed: 02/16/2024] Open

Willson J, Tabatabaee Y, Liu B, Warnow T. DISCO+QR: rooting species trees in the presence of GDL and ILS. BIOINFORMATICS ADVANCES 2023;3:vbad015. [PMID: 36789293 PMCID: PMC9923442 DOI: 10.1093/bioadv/vbad015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Revised: 01/21/2023] [Accepted: 02/06/2023] [Indexed: 02/10/2023]

Hill M, Legried B, Roch S. Species tree estimation under joint modeling of coalescence and duplication: Sample complexity of quartet methods. ANN APPL PROBAB 2022. [DOI: 10.1214/22-aap1799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]

Menet H, Daubin V, Tannier E. Phylogenetic reconciliation. PLoS Comput Biol 2022;18:e1010621. [PMID: 36327227 PMCID: PMC9632901 DOI: 10.1371/journal.pcbi.1010621] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open

Lozano-Fernandez J. A Practical Guide to Design and Assess a Phylogenomic Study. Genome Biol Evol 2022;14:evac129. [PMID: 35946263 PMCID: PMC9452790 DOI: 10.1093/gbe/evac129] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/03/2022] [Indexed: 11/13/2022] Open

Chan YB, Li Q, Scornavacca C. The large-sample asymptotic behaviour of quartet-based summary methods for species tree inference. J Math Biol 2022;85:22. [PMID: 35976512 PMCID: PMC9385842 DOI: 10.1007/s00285-022-01786-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Revised: 06/08/2022] [Accepted: 07/14/2022] [Indexed: 12/03/2022]

Carson J, Ledda A, Ferretti L, Keeling M, Didelot X. The bounded coalescent model: Conditioning a genealogy on a minimum root date. J Theor Biol 2022;548:111186. [PMID: 35697144 DOI: 10.1016/j.jtbi.2022.111186] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Revised: 05/05/2022] [Accepted: 06/02/2022] [Indexed: 01/27/2023]

Smith ML, Vanderpool D, Hahn MW. Using all gene families vastly expands data available for phylogenomic inference. Mol Biol Evol 2022;39:6596367. [PMID: 35642314 PMCID: PMC9178227 DOI: 10.1093/molbev/msac112] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open

Wawerka M, Dąbkowski D, Rutecka N, Mykowiecka A, Górecki P. Embedding gene trees into phylogenetic networks by conflict resolution algorithms. Algorithms Mol Biol 2022;17:11. [PMID: 35590416 PMCID: PMC9119282 DOI: 10.1186/s13015-022-00218-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 03/22/2022] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Phylogenetic networks are mathematical models of evolutionary processes involving reticulate events such as hybridization, recombination, or horizontal gene transfer. One of the crucial notions in phylogenetic network modelling is displayed tree, which is obtained from a network by removing a set of reticulation edges. Displayed trees may represent an evolutionary history of a gene family if the evolution is shaped by reticulation events.

RESULTS

We address the problem of inferring an optimal tree displayed by a network, given a gene tree G and a tree-child network N, under the deep coalescence and duplication costs. We propose an O(mn)-time dynamic programming algorithm (DP) to compute a lower bound of the optimal displayed tree cost, where m and n are the sizes of G and N, respectively. In addition, our algorithm can verify whether the solution is exact. Moreover, it provides a set of reticulation edges corresponding to the obtained cost. If the cost is exact, the set induces an optimal displayed tree. Otherwise, the set contains pairs of conflicting edges, i.e., edges sharing a reticulation node. Next, we show a conflict resolution algorithm that requires [Formula: see text] invocations of DP in the worst case, where r is the number of reticulations. We propose a similar [Formula: see text]-time algorithm for level-k tree-child networks and a branch and bound solution to compute lower and upper bounds of optimal costs. We also extend the algorithms to a broader class of phylogenetic networks. Based on simulated data, the average runtime is [Formula: see text] under the deep-coalescence cost and [Formula: see text] under the duplication cost.

CONCLUSIONS

Despite exponential complexity in the worst case, our algorithms perform significantly well on empirical and simulated datasets, due to the strategy of resolving internal dissimilarities between gene trees and networks. Therefore, the algorithms are efficient alternatives to enumeration strategies commonly proposed in the literature and enable analyses of complex networks with dozens of reticulations.

Collapse

Willson J, Roddur MS, Liu B, Zaharias P, Warnow T. DISCO: Species Tree Inference using Multicopy Gene Family Tree Decomposition. Syst Biol 2022;71:610-629. [PMID: 34450658 PMCID: PMC9016570 DOI: 10.1093/sysbio/syab070] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Revised: 08/18/2021] [Accepted: 08/23/2021] [Indexed: 11/21/2022] Open

Yan Z, Smith ML, Du P, Hahn MW, Nakhleh L. Species Tree Inference Methods Intended to Deal with Incomplete Lineage Sorting Are Robust to the Presence of Paralogs. Syst Biol 2022;71:367-381. [PMID: 34245291 PMCID: PMC8978208 DOI: 10.1093/sysbio/syab056] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2020] [Revised: 06/23/2021] [Accepted: 06/30/2021] [Indexed: 11/24/2022] Open

Morel B, Schade P, Lutteropp S, Williams TA, Szöllősi GJ, Stamatakis A. SpeciesRax: A Tool for Maximum Likelihood Species Tree Inference from Gene Family Trees under Duplication, Transfer, and Loss. Mol Biol Evol 2022;39:msab365. [PMID: 35021210 PMCID: PMC8826479 DOI: 10.1093/molbev/msab365] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open

Mirarab S, Nakhleh L, Warnow T. Multispecies Coalescent: Theory and Applications in Phylogenetics. ANNUAL REVIEW OF ECOLOGY, EVOLUTION, AND SYSTEMATICS 2021. [DOI: 10.1146/annurev-ecolsys-012121-095340] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Du H, Ong YS, Knittel M, Mawhorter R, Liu N, Gross G, Tojo R, Libeskind-Hadas R, Wu YC. Multiple Optimal Reconciliations Under the Duplication-Loss-Coalescence Model. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021;18:2144-2156. [PMID: 31199267 DOI: 10.1109/tcbb.2019.2922337] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]

Yan Z, Cao Z, Liu Y, Ogilvie HA, Nakhleh L. Maximum Parsimony Inference of Phylogenetic Networks in the Presence of Polyploid Complexes. Syst Biol 2021;71:706-720. [PMID: 34605924 PMCID: PMC9017653 DOI: 10.1093/sysbio/syab081] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Revised: 09/26/2021] [Accepted: 09/29/2021] [Indexed: 12/18/2022] Open

Esquerré D, Keogh JS, Demangel D, Morando M, Avila LJ, Sites JW, Ferri-Yáñez F, Leaché AD. Rapid radiation and rampant reticulation: Phylogenomics of South American Liolaemus lizards. Syst Biol 2021;71:286-300. [PMID: 34259868 DOI: 10.1093/sysbio/syab058] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2019] [Revised: 06/25/2021] [Accepted: 06/30/2021] [Indexed: 01/09/2023] Open

Paszek J, Markin A, Górecki P, Eulenstein O. Taming the Duplication-Loss-Coalescence Model with Integer Linear Programming. J Comput Biol 2021;28:758-773. [PMID: 34125600 DOI: 10.1089/cmb.2021.0011] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Qi F, Zhao Y, Zhao N, Wang K, Li Z, Wang Y. Structural variation and evolution of chloroplast tRNAs in green algae. PeerJ 2021;9:e11524. [PMID: 34131524 PMCID: PMC8176911 DOI: 10.7717/peerj.11524] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Accepted: 05/05/2021] [Indexed: 01/18/2023] Open

Abstract

As one of the important groups of the core Chlorophyta (Green algae), Chlorophyceae plays an important role in the evolution of plants. As a carrier of amino acids, tRNA plays an indispensable role in life activities. However, the structural variation of chloroplast tRNA and its evolutionary characteristics in Chlorophyta species have not been well studied. In this study, we analyzed the chloroplast genome tRNAs of 14 species in five categories in the green algae. We found that the number of chloroplasts tRNAs of Chlorophyceae is maintained between 28-32, and the length of the gene sequence ranges from 71 nt to 91 nt. There are 23-27 anticodon types of tRNAs, and some tRNAs have missing anticodons that are compensated for by other types of anticodons of that tRNA. In addition, three tRNAs were found to contain introns in the anti-codon loop of the tRNA, but the analysis scored poorly and it is presumed that these introns are not functional. After multiple sequence alignment, the Ψ-loop is the most conserved structural unit in the tRNA secondary structure, containing mostly U-U-C-x-A-x-U conserved sequences. The number of transitions in tRNA is higher than the number of transversions. In the replication loss analysis, it was found that green algal chloroplast tRNAs may have undergone substantial gene loss during the course of evolution. Based on the constructed phylogenetic tree, mutations were found to accompany the evolution of the Green algae chloroplast tRNA. Moreover, chloroplast tRNAs of Chlorophyceae are consistent with those of monocotyledons and gymnosperms in terms of evolutionary patterns, sharing a common multi-phylogenetic pattern and rooted in a rich common ancestor. Sequence alignment and systematic analysis of tRNA in chloroplast genome of Chlorophyceae, clarified the characteristics and rules of tRNA changes, which will promote the evolutionary relationship of tRNA and the origin and evolution of chloroplast.

Collapse

Markin A, Eulenstein O. Quartet-Based Inference is Statistically Consistent Under the Unified Duplication-Loss-Coalescence Model. Bioinformatics 2021;37:4064-4074. [PMID: 34048529 PMCID: PMC9113308 DOI: 10.1093/bioinformatics/btab414] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Revised: 05/19/2021] [Accepted: 05/27/2021] [Indexed: 12/19/2022] Open

Dismukes W, Heath TA. treeducken: An R package for simulating cophylogenetic systems. Methods Ecol Evol 2021. [DOI: 10.1111/2041-210x.13625] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Morel B, Kozlov AM, Stamatakis A, Szöllősi GJ. GeneRax: A Tool for Species-Tree-Aware Maximum Likelihood-Based Gene Family Tree Inference under Gene Duplication, Transfer, and Loss. Mol Biol Evol 2021;37:2763-2774. [PMID: 32502238 PMCID: PMC8312565 DOI: 10.1093/molbev/msaa141] [Citation(s) in RCA: 84] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open

Abstract

Inferring phylogenetic trees for individual homologous gene families is difficult because alignments are often too short, and thus contain insufficient signal, while substitution models inevitably fail to capture the complexity of the evolutionary processes. To overcome these challenges, species-tree-aware methods also leverage information from a putative species tree. However, only few methods are available that implement a full likelihood framework or account for horizontal gene transfers. Furthermore, these methods often require expensive data preprocessing (e.g., computing bootstrap trees) and rely on approximations and heuristics that limit the degree of tree space exploration. Here, we present GeneRax, the first maximum likelihood species-tree-aware phylogenetic inference software. It simultaneously accounts for substitutions at the sequence level as well as gene level events, such as duplication, transfer, and loss relying on established maximum likelihood optimization algorithms. GeneRax can infer rooted phylogenetic trees for multiple gene families, directly from the per-gene sequence alignments and a rooted, yet undated, species tree. We show that compared with competing tools, on simulated data GeneRax infers trees that are the closest to the true tree in 90% of the simulations in terms of relative Robinson–Foulds distance. On empirical data sets, GeneRax is the fastest among all tested methods when starting from aligned sequences, and it infers trees with the highest likelihood score, based on our model. GeneRax completed tree inferences and reconciliations for 1,099 Cyanobacteria families in 8 min on 512 CPU cores. Thus, its parallelization scheme enables large-scale analyses. GeneRax is available under GNU GPL at https://github.com/BenoitMorel/GeneRax (last accessed June 17, 2020).

Collapse

Meleshko O, Martin MD, Korneliussen TS, Schröck C, Lamkowski P, Schmutz J, Healey A, Piatkowski BT, Shaw AJ, Weston DJ, Flatberg KI, Szövényi P, Hassel K, Stenøien HK. Extensive Genome-Wide Phylogenetic Discordance Is Due to Incomplete Lineage Sorting and Not Ongoing Introgression in a Rapidly Radiated Bryophyte Genus. Mol Biol Evol 2021;38:2750-2766. [PMID: 33681996 PMCID: PMC8233498 DOI: 10.1093/molbev/msab063] [Citation(s) in RCA: 57] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open

Shen XX, Steenwyk JL, Rokas A. Dissecting incongruence between concatenation- and quartet-based approaches in phylogenomic data. Syst Biol 2021;70:997-1014. [PMID: 33616672 DOI: 10.1093/sysbio/syab011] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2020] [Revised: 02/10/2021] [Accepted: 02/17/2021] [Indexed: 12/12/2022] Open

Abstract

Topological conflict or incongruence is widespread in phylogenomic data. Concatenation- and coalescent-based approaches often result in incongruent topologies, but the causes of this conflict can be difficult to characterize. We examined incongruence stemming from conflict between likelihood-based signal (quantified by the difference in gene-wise log likelihood score or ΔGLS) and quartet-based topological signal (quantified by the difference in gene-wise quartet score or ΔGQS) for every gene in three phylogenomic studies in animals, fungi, and plants, which were chosen because their concatenation-based IQ-TREE (T1) and quartet-based ASTRAL (T2) phylogenies are known to produce eight conflicting internal branches (bipartitions). By comparing the types of phylogenetic signal for all genes in these three data matrices, we found that 30% - 36% of genes in each data matrix are inconsistent, that is, each of these genes has higher log likelihood score for T1 versus T2 (i.e., ΔGLS >0) whereas its T1 topology has lower quartet score than its T2 topology (i.e., ΔGQS <0) or vice versa. Comparison of inconsistent and consistent genes using a variety of metrics (e.g., evolutionary rate, gene tree topology, distribution of branch lengths, hidden paralogy, and gene tree discordance) showed that inconsistent genes are more likely to recover neither T1 nor T2 and have higher levels of gene tree discordance than consistent genes. Simulation analyses demonstrate that removal of inconsistent genes from datasets with low levels of incomplete lineage sorting (ILS) and low and medium levels of gene tree estimation error (GTEE) reduced incongruence and increased accuracy. In contrast, removal of inconsistent genes from datasets with medium and high ILS levels and high GTEE levels eliminated or extensively reduced incongruence, but the resulting congruent species phylogenies were not always topologically identical to the true species trees.

Collapse

New Approaches for Inferring Phylogenies in the Presence of Paralogs. Trends Genet 2021;37:174-187. [DOI: 10.1016/j.tig.2020.08.012] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Revised: 08/13/2020] [Accepted: 08/19/2020] [Indexed: 12/18/2022]

Legried B, Molloy EK, Warnow T, Roch S. Polynomial-Time Statistical Estimation of Species Trees Under Gene Duplication and Loss. J Comput Biol 2020;28:452-468. [DOI: 10.1089/cmb.2020.0424] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open

Li Q, Scornavacca C, Galtier N, Chan YB. The Multilocus Multispecies Coalescent: A Flexible New Model of Gene Family Evolution. Syst Biol 2020;70:822-837. [PMID: 33169795 DOI: 10.1093/sysbio/syaa084] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Revised: 05/07/2020] [Accepted: 10/19/2020] [Indexed: 02/06/2023] Open

Zhang C, Scornavacca C, Molloy EK, Mirarab S. ASTRAL-Pro: Quartet-Based Species-Tree Inference despite Paralogy. Mol Biol Evol 2020;37:3292-3307. [PMID: 32886770 PMCID: PMC7751180 DOI: 10.1093/molbev/msaa139] [Citation(s) in RCA: 111] [Impact Index Per Article: 22.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open

Molloy EK, Warnow T. FastMulRFS: fast and accurate species tree estimation under generic gene duplication and loss models. Bioinformatics 2020;36:i57-i65. [PMID: 32657396 PMCID: PMC7355287 DOI: 10.1093/bioinformatics/btaa444] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open

Counting and sampling gene family evolutionary histories in the duplication-loss and duplication-loss-transfer models. J Math Biol 2020;80:1353-1388. [PMID: 32060618 PMCID: PMC7052048 DOI: 10.1007/s00285-019-01465-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Revised: 11/18/2019] [Indexed: 10/28/2022]

Abstract

Given a set of species whose evolution is represented by a species tree, a gene family is a group of genes having evolved from a single ancestral gene. A gene family evolves along the branches of a species tree through various mechanisms, including-but not limited to-speciation ([Formula: see text]), gene duplication ([Formula: see text]), gene loss ([Formula: see text]), and horizontal gene transfer ([Formula: see text]). The reconstruction of a gene tree representing the evolution of a gene family constrained by a species tree is an important problem in phylogenomics. However, unlike in the multispecies coalescent evolutionary model that considers only speciation and incomplete lineage sorting events, very little is known about the search space for gene family histories accounting for gene duplication, gene loss and horizontal gene transfer (the [Formula: see text]-model). In this work, we introduce the notion of evolutionary histories defined as a binary ordered rooted tree describing the evolution of a gene family, constrained by a species tree in the [Formula: see text]-model. We provide formal grammars describing the set of all evolutionary histories that are compatible with a given species tree, whether it is ranked or unranked. These grammars allow us, using either analytic combinatorics or dynamic programming, to efficiently compute the number of histories of a given size, and also to generate random histories of a given size under the uniform distribution. We apply these tools to obtain exact asymptotics for the number of gene family histories for two species trees, the rooted caterpillar and complete binary tree, as well as estimates of the range of the exponential growth factor of the number of histories for random species trees of size up to 25. Our results show that including horizontal gene transfers induce a dramatic increase of the number of evolutionary histories. We also show that, within ranked species trees, the number of evolutionary histories in the [Formula: see text]-model is almost independent of the species tree topology. These results establish firm foundations for the development of ensemble methods for the prediction of reconciliations.

Collapse

Polynomial-Time Statistical Estimation of Species Trees Under Gene Duplication and Loss. LECTURE NOTES IN COMPUTER SCIENCE 2020. [DOI: 10.1007/978-3-030-45257-5_8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]

Mawhorter R, Liu N, Libeskind-Hadas R, Wu YC. Inferring Pareto-optimal reconciliations across multiple event costs under the duplication-loss-coalescence model. BMC Bioinformatics 2019;20:639. [PMID: 31842732 PMCID: PMC6916210 DOI: 10.1186/s12859-019-3206-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open

Duchemin W, Gence G, Arigon Chifolleau AM, Arvestad L, Bansal MS, Berry V, Boussau B, Chevenet F, Comte N, Davín AA, Dessimoz C, Dylus D, Hasic D, Mallo D, Planel R, Posada D, Scornavacca C, Szöllosi G, Zhang L, Tannier É, Daubin V. RecPhyloXML: a format for reconciled gene trees. Bioinformatics 2019;34:3646-3652. [PMID: 29762653 PMCID: PMC6198865 DOI: 10.1093/bioinformatics/bty389] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Accepted: 05/09/2018] [Indexed: 12/21/2022] Open

Affiliation(s)

Wandrille Duchemin Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France.,INRIA Grenoble Rhône-Alpes, F-38334 Montbonnot, France.,MTA-ELTE Lendület Evolutionary Genomics Research Group, Budapest, Hungary.,Department of Biological Physics, Eötvös Loránd University, Budapest, Hungary
Guillaume Gence Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France
Anne-Muriel Arigon Chifolleau LIRMM, Université de Montpellier, CNRS, Montpellier, France.,Institut de Biologie Computationnelle (IBC), Montpellier, France
Lars Arvestad Department of Mathematics, Stockholm University, Stockholm, Sweden.,Swedish e-Science Research Centre (SeRC), Stockholm, Sweden
Mukul S Bansal Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA.,Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
Vincent Berry LIRMM, Université de Montpellier, CNRS, Montpellier, France.,Institut de Biologie Computationnelle (IBC), Montpellier, France.,ISEM, CNRS, Université de Montpellier, IRD, EPHE, Montpellier, France
Bastien Boussau Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France
François Chevenet LIRMM, Université de Montpellier, CNRS, Montpellier, France.,MIVEGEC, CNRS 5290, IRD 224, Université de Montpellier, Montpellier, France
Nicolas Comte INRIA Grenoble Rhône-Alpes, F-38334 Montbonnot, France
Adrián A Davín Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France.,MTA-ELTE Lendület Evolutionary Genomics Research Group, Budapest, Hungary.,Department of Biological Physics, Eötvös Loránd University, Budapest, Hungary
Christophe Dessimoz Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland.,Department of Genetics, Evolution and Environment, University College London, London, UK.,Department of Computer Science, University College London, London, UK.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
David Dylus Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
Damir Hasic Department of Mathematics, Faculty of Science, University of Sarajevo, Sarajevo, Bosnia and Herzegovina
Diego Mallo Virginia G. Piper Center for Personalized Diagnostics, Biodesign Institute, Arizona State University, Tempe, AZ, USA
Rémi Planel Laboratoire d'Analyse Bio-informatique en Génomique et Métabolisme CNRS-UMR 8030, Commissariat à l'Énergie Atomique (CEA), Institut de Génomique, Genoscope, Evry, France
David Posada Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain
Celine Scornavacca Institut de Biologie Computationnelle (IBC), Montpellier, France.,ISEM, CNRS, Université de Montpellier, IRD, EPHE, Montpellier, France
Gergely Szöllosi MTA-ELTE Lendület Evolutionary Genomics Research Group, Budapest, Hungary.,Department of Biological Physics, Eötvös Loránd University, Budapest, Hungary
Louxin Zhang Department of Mathematics, National University of Singapore, Singapore, Singapore
Éric Tannier Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France.,INRIA Grenoble Rhône-Alpes, F-38334 Montbonnot, France
Vincent Daubin Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France

Collapse

One Thousand Plant Transcriptomes Initiative. One thousand plant transcriptomes and the phylogenomics of green plants. Nature 2019;574:679-685. [PMID: 31645766 PMCID: PMC6872490 DOI: 10.1038/s41586-019-1693-2] [Citation(s) in RCA: 1005] [Impact Index Per Article: 167.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2017] [Accepted: 09/12/2019] [Indexed: 11/08/2022]

The avocado genome informs deep angiosperm phylogeny, highlights introgressive hybridization, and reveals pathogen-influenced gene space adaptation. Proc Natl Acad Sci U S A 2019;116:17081-17089. [PMID: 31387975 PMCID: PMC6708331 DOI: 10.1073/pnas.1822129116] [Citation(s) in RCA: 85] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open

Abstract

The avocado is a nutritious, economically important fruit species that occupies an unresolved position near the earliest evolutionary branchings of flowering plants. Our nuclear genome sequences of Mexican and Hass variety avocados inform ancient evolutionary relationships and genome doublings and the admixed nature of Hass and provide a look at how pathogen interactions have shaped the avocado’s more recent genomic evolutionary history.

The avocado, Persea americana, is a fruit crop of immense importance to Mexican agriculture with an increasing demand worldwide. Avocado lies in the anciently diverged magnoliid clade of angiosperms, which has a controversial phylogenetic position relative to eudicots and monocots. We sequenced the nuclear genomes of the Mexican avocado race, P. americana var. drymifolia, and the most commercially popular hybrid cultivar, Hass, and anchored the latter to chromosomes using a genetic map. Resequencing of Guatemalan and West Indian varieties revealed that ∼39% of the Hass genome represents Guatemalan source regions introgressed into a Mexican race background. Some introgressed blocks are extremely large, consistent with the recent origin of the cultivar. The avocado lineage experienced 2 lineage-specific polyploidy events during its evolutionary history. Although gene-tree/species-tree phylogenomic results are inconclusive, syntenic ortholog distances to other species place avocado as sister to the enormous monocot and eudicot lineages combined. Duplicate genes descending from polyploidy augmented the transcription factor diversity of avocado, while tandem duplicates enhanced the secondary metabolism of the species. Phenylpropanoid biosynthesis, known to be elicited by Colletotrichum (anthracnose) pathogen infection in avocado, is one enriched function among tandems. Furthermore, transcriptome data show that tandem duplicates are significantly up- and down-regulated in response to anthracnose infection, whereas polyploid duplicates are not, supporting the general view that collections of tandem duplicates contribute evolutionarily recent “tuning knobs” in the genome adaptive landscapes of given species.

Collapse

Chan YB, Robin C. Reconciliation of a gene network and species tree. J Theor Biol 2019;472:54-66. [DOI: 10.1016/j.jtbi.2019.04.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2018] [Revised: 03/29/2019] [Accepted: 04/02/2019] [Indexed: 12/26/2022]

Advances in Computational Methods for Phylogenetic Networks in the Presence of Hybridization. BIOINFORMATICS AND PHYLOGENETICS 2019. [DOI: 10.1007/978-3-030-10837-3_13] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Shekhar S, Roch S, Mirarab S. Species Tree Estimation Using ASTRAL: How Many Genes Are Enough? IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018;15:1738-1747. [PMID: 28976320 DOI: 10.1109/tcbb.2017.2757930] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]

Pérez-Losada M, Arenas M, Castro-Nallar E. Microbial sequence typing in the genomic era. INFECTION, GENETICS AND EVOLUTION : JOURNAL OF MOLECULAR EPIDEMIOLOGY AND EVOLUTIONARY GENETICS IN INFECTIOUS DISEASES 2018;63:346-359. [PMID: 28943406 PMCID: PMC5908768 DOI: 10.1016/j.meegid.2017.09.022] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2017] [Revised: 09/18/2017] [Accepted: 09/19/2017] [Indexed: 12/18/2022]

Emms DM, Kelly S. STRIDE: Species Tree Root Inference from Gene Duplication Events. Mol Biol Evol 2018;34:3267-3278. [PMID: 29029342 PMCID: PMC5850722 DOI: 10.1093/molbev/msx259] [Citation(s) in RCA: 176] [Impact Index Per Article: 25.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open

Willis A. Confidence Sets for Phylogenetic Trees. J Am Stat Assoc 2018. [DOI: 10.1080/01621459.2017.1395342] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]

Ciach MA, Muszewska A, Górecki P. Locus-aware decomposition of gene trees with respect to polytomous species trees. Algorithms Mol Biol 2018;13:11. [PMID: 29881445 PMCID: PMC5985597 DOI: 10.1186/s13015-018-0128-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2017] [Accepted: 05/11/2018] [Indexed: 12/29/2022] Open

Barley AJ, Brown JM, Thomson RC. Impact of Model Violations on the Inference of Species Boundaries Under the Multispecies Coalescent. Syst Biol 2018;67:269-284. [PMID: 28945903 DOI: 10.1093/sysbio/syx073] [Citation(s) in RCA: 64] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2017] [Accepted: 08/31/2017] [Indexed: 11/14/2022] Open

Abstract

The use of genetic data for identifying species-level lineages across the tree of life has received increasing attention in the field of systematics over the past decade. The multispecies coalescent model provides a framework for understanding the process of lineage divergence and has become widely adopted for delimiting species. However, because these studies lack an explicit assessment of model fit, in many cases, the accuracy of the inferred species boundaries are unknown. This is concerning given the large amount of empirical data and theory that highlight the complexity of the speciation process. Here, we seek to fill this gap by using simulation to characterize the sensitivity of inference under the multispecies coalescent (MSC) to several violations of model assumptions thought to be common in empirical data. We also assess the fit of the MSC model to empirical data in the context of species delimitation. Our results show substantial variation in model fit across data sets. Posterior predictive tests find the poorest model performance in data sets that were hypothesized to be impacted by model violations. We also show that while the inferences assuming the MSC are robust to minor model violations, such inferences can be biased under some biologically plausible scenarios. Taken together, these results suggest that researchers can identify individual data sets in which species delimitation under the MSC is likely to be problematic, thereby highlighting the cases where additional lines of evidence to identify species boundaries are particularly important to collect. Our study supports a growing body of work highlighting the importance of model checking in phylogenetics, and the usefulness of tailoring tests of model fit to assess the reliability of particular inferences. [Populations structure, gene flow, demographic changes, posterior prediction, simulation, genetics.].

Collapse

Posada D. Phylogenomics for Systematic Biology. Syst Biol 2018;65:353-6. [PMID: 27129844 DOI: 10.1093/sysbio/syw027] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Gregg WCT, Ather SH, Hahn MW. Gene-Tree Reconciliation with MUL-Trees to Resolve Polyploidy Events. Syst Biol 2018;66:1007-1018. [PMID: 28419377 DOI: 10.1093/sysbio/syx044] [Citation(s) in RCA: 54] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2016] [Accepted: 03/30/2017] [Indexed: 11/13/2022] Open

Sousa F, Bertrand YJK, Doyle JJ, Oxelman B, Pfeil BE. Using Genomic Location and Coalescent Simulation to Investigate Gene Tree Discordance in Medicago L. Syst Biol 2018;66:934-949. [PMID: 28177088 DOI: 10.1093/sysbio/syx035] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2015] [Accepted: 02/01/2017] [Indexed: 12/28/2022] Open

Abstract

Several well-documented evolutionary processes are known to cause conflict between species-level phylogenies and gene-level phylogenies. Three of the most challenging processes for species tree inference are incomplete lineage sorting, hybridization and gene duplication, which may result in unwarranted comparisons of paralogous genes. Several existing methods have dealt with these processes but none has yet been able to untangle all three at once. Here, we propose a stepwise method by which these processes can be discerned using information on genomic location coupled with coalescent simulations. In the first step, highly discordant genes within genomic blocks (putative paralogs) are identified and excluded from the data set and, in the second step, blocks of linked genes are grouped according to their hybrid history. Existing multispecies coalescent software can then be applied to recover the principal tree(s) that make up the species tree/network without violating the underlying model. The potential of the approach is evaluated on simulated data derived from a species network composed of nine species, of which one is of hybrid origin, and displaying a single-gene duplication that leads to paralogous comparisons. We apply our method to an empirical set of 12 genes from 7 species sampled in the plant genus Medicago that display phylogenetic discordance. We identify the causes of the discordance and demonstrate that the Medicago orbicularis lineage experienced an episode of ancient hybridization. Our results show promise as a new way to explore phylogenetic sequence data that can significantly improve species tree inference in presence of hybridization and undetected paralogy or other causes leading to extremely discordant gene trees. [Coalescent simulation; gene tree; genomic location; hybridization; incomplete lineage sorting; paralogy; phylogenetic incongruence; principal tree; species tree.].

Collapse