Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Islam M, Sarker K, Das T, Reaz R, Bayzid MS. STELAR: a statistically consistent coalescent-based species tree estimation method by maximizing triplet consistency. BMC Genomics 2020;21:136. [PMID: 32039704 PMCID: PMC7011378 DOI: 10.1186/s12864-020-6519-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2019] [Accepted: 01/20/2020] [Indexed: 12/14/2022] Open

For:	Islam M, Sarker K, Das T, Reaz R, Bayzid MS. STELAR: a statistically consistent coalescent-based species tree estimation method by maximizing triplet consistency. BMC Genomics 2020;21:136. [PMID: 32039704 PMCID: PMC7011378 DOI: 10.1186/s12864-020-6519-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2019] [Accepted: 01/20/2020] [Indexed: 12/14/2022] Open

Number

Cited by Other Article(s)

Rafi A, Rumi AMS, Hakim SA, Sohaib, Tahmid MT, Momin RJI, Zaman TA, Reaz R, Bayzid MS. wQFM-TREE: highly accurate and scalable quartet-based species tree inference from gene trees. BIOINFORMATICS ADVANCES 2025;5:vbaf053. [PMID: 40134580 PMCID: PMC11932941 DOI: 10.1093/bioadv/vbaf053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/28/2024] [Revised: 01/27/2025] [Accepted: 03/10/2025] [Indexed: 03/27/2025]

Hakim SA, Ratul MRZ, Bayzid MS. wQFM-DISCO: DISCO-enabled wQFM improves phylogenomic analyses despite the presence of paralogs. BIOINFORMATICS ADVANCES 2024;4:vbae189. [PMID: 39664861 PMCID: PMC11634537 DOI: 10.1093/bioadv/vbae189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/22/2024] [Revised: 10/18/2024] [Accepted: 11/24/2024] [Indexed: 12/13/2024]

Habib M, Roy K, Hasan S, Rahman AH, Bayzid MS. Terraces in species tree inference from gene trees. BMC Ecol Evol 2024;24:135. [PMID: 39497030 PMCID: PMC11533290 DOI: 10.1186/s12862-024-02309-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2023] [Accepted: 09/16/2024] [Indexed: 11/06/2024] Open

Dai J, Rubel T, Han Y, Molloy EK. Dollo-CDP: a polynomial-time algorithm for the clade-constrained large Dollo parsimony problem. Algorithms Mol Biol 2024;19:2. [PMID: 38191515 PMCID: PMC10775561 DOI: 10.1186/s13015-023-00249-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 12/10/2023] [Indexed: 01/10/2024] Open

Abstract

The last decade of phylogenetics has seen the development of many methods that leverage constraints plus dynamic programming. The goal of this algorithmic technique is to produce a phylogeny that is optimal with respect to some objective function and that lies within a constrained version of tree space. The popular species tree estimation method ASTRAL, for example, returns a tree that (1) maximizes the quartet score computed with respect to the input gene trees and that (2) draws its branches (bipartitions) from the input constraint set. This technique has yet to be used for parsimony problems where the input are binary characters, sometimes with missing values. Here, we introduce the clade-constrained character parsimony problem and present an algorithm that solves this problem for the Dollo criterion score in [Formula: see text] time, where n is the number of leaves, k is the number of characters, and [Formula: see text] is the set of clades used as constraints. Dollo parsimony, which requires traits/mutations to be gained at most once but allows them to be lost any number of times, is widely used for tumor phylogenetics as well as species phylogenetics, for example analyses of low-homoplasy retroelement insertions across the vertebrate tree of life. This motivated us to implement our algorithm in a software package, called Dollo-CDP, and evaluate its utility for analyzing retroelement insertion presence / absence patterns for bats, birds, toothed whales as well as simulated data. Our results show that Dollo-CDP can improve upon heuristic search from a single starting tree, often recovering a better scoring tree. Moreover, Dollo-CDP scales to data sets with much larger numbers of taxa than branch-and-bound while still having an optimality guarantee, albeit a more restricted one. Lastly, we show that our algorithm for Dollo parsimony can easily be adapted to Camin-Sokal parsimony but not Fitch parsimony.

Collapse

Allio R, Delsuc F, Belkhir K, Douzery EJP, Ranwez V, Scornavacca C. OrthoMaM v12: a database of curated single-copy ortholog alignments and trees to study mammalian evolutionary genomics. Nucleic Acids Res 2024;52:D529-D535. [PMID: 37843103 PMCID: PMC10767847 DOI: 10.1093/nar/gkad834] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 09/19/2023] [Accepted: 09/26/2023] [Indexed: 10/17/2023] Open

Han Y, Molloy EK. Quartets enable statistically consistent estimation of cell lineage trees under an unbiased error and missingness model. Algorithms Mol Biol 2023;18:19. [PMID: 38041123 PMCID: PMC10691101 DOI: 10.1186/s13015-023-00248-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 11/19/2023] [Indexed: 12/03/2023] Open

Singh NP, Love MI, Patro R. TreeTerminus -creating transcript trees using inferential replicate counts. iScience 2023;26:106961. [PMID: 37378336 PMCID: PMC10291472 DOI: 10.1016/j.isci.2023.106961] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Revised: 04/18/2023] [Accepted: 05/22/2023] [Indexed: 06/29/2023] Open

Mahbub S, Sawmya S, Saha A, Reaz R, Rahman MS, Bayzid MS. Quartet Based Gene Tree Imputation Using Deep Learning Improves Phylogenomic Analyses Despite Missing Data. JOURNAL OF COMPUTATIONAL BIOLOGY : A JOURNAL OF COMPUTATIONAL MOLECULAR CELL BIOLOGY 2022;29:1156-1172. [PMID: 36048555 DOI: 10.1089/cmb.2022.0212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Zhelezov G, Degnan JH. Trying Out a Million Genes to Find the Perfect Pair with RTIST. Bioinformatics 2022;38:3565-3573. [PMID: 35641003 DOI: 10.1093/bioinformatics/btac349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Revised: 05/07/2022] [Accepted: 05/17/2022] [Indexed: 11/13/2022] Open

Abstract

MOTIVATION

Consensus methods can be used for reconstructing a species tree from several gene trees which exhibit incompatible topologies due to incomplete lineage sorting. Motivated by the fact that there are no anomalous rooted gene trees with three taxa and no anomalous unrooted gene trees with four taxa in the multispecies coalescent model, several contemporary methods form the gene tree consensus by finding the median tree with respect to the triplet or quartet distance-i.e., estimate the species tree as the tree which minimizes the sum of triplet or quartet distances to the input gene trees. These methods reformulate the solution to the consensus problem as the solution to a recursively-solved dynamic programming problem. We present an iterative, easily-parallelizable approach to finding the exact median triplet tree, and implement it as an open source software package which can also find suboptimal consensus trees within a specified triplet distance to the gene trees. The most time-consuming step for methods of this type is the creation of a weights array for all possible subtree bipartitions. By grouping the relevant calculations and array update operations of different bipartitions of the same subtree together, this implementation finds the exact median tree of many gene trees faster than comparable methods, has better scaling properties with respect to the number of gene trees, and has a smaller memory footprint.

RESULTS

RTIST (Rooted Triple Inference of Species Trees) finds the exact median triplet tree of a set of gene trees. Its runtime and memory footprints scale better than existing algorithms. RTIST can resolve all the non-unique median trees, as well as sub-optimal consensus trees within a user-specified triplet distance to the median. Although it is limited in the number of taxa (≤ 20), its runtime changes little when the number of gene trees is changed by several orders of magnitude.

AVAILABILITY

RTIST is written in C and Python. It is freely available at https://github.com/glebzhelezov/rtist.

Collapse

Molloy EK, Gatesy J, Springer MS. Theoretical and practical considerations when using retroelement insertions to estimate species trees in the anomaly zone. Syst Biol 2021;71:721-740. [PMID: 34677617 DOI: 10.1093/sysbio/syab086] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Accepted: 10/11/2021] [Indexed: 11/13/2022] Open

Abstract

A potential shortcoming of concatenation methods for species tree estimation is their failure to account for incomplete lineage sorting. Coalescent methods address this problem but make various assumptions that, if violated, can result in worse performance than concatenation. Given the challenges of analyzing DNA sequences with both concatenation and coalescent methods, retroelement insertions (RIs) have emerged as powerful phylogenomic markers for species tree estimation. Here, we show that two recently proposed quartet-based methods, SDPquartets and ASTRAL_BP, are statistically consistent estimators of the unrooted species tree topology under the coalescent when RIs follow a neutral infinite-sites model of mutation and the expected number of new RIs per generation is constant across the species tree. The accuracy of these (and other) methods for inferring species trees from RIs has yet to be assessed on simulated data sets, where the true species tree topology is known. Therefore, we evaluated eight methods given RIs simulated from four model species trees, all of which have short branches and at least three of which are in the anomaly zone. In our simulation study, ASTRAL_BP and SDPquartets always recovered the correct species tree topology when given a sufficiently large number of RIs, as predicted. A distance-based method (ASTRID_BP) and Dollo parsimony also performed well in recovering the species tree topology. In contrast, unordered, polymorphism, and Camin-Sokal parsimony typically fail to recover the correct species tree topology in anomaly zone situations with more than four ingroup taxa. Of the methods studied, only ASTRAL_BP automatically estimates internal branch lengths (in coalescent units) and support values (i.e. local posterior probabilities). We examined the accuracy of branch length estimation, finding that estimated lengths were accurate for short branches but upwardly biased otherwise. This led us to derive the maximum likelihood (branch length) estimate for when RIs are given as input instead of binary gene trees; this corrected formula produced accurate estimates of branch lengths in our simulation study, provided that a sufficiently large number of RIs were given as input. Lastly, we evaluated the impact of data quantity on species tree estimation by repeating the above experiments with input sizes varying from 100 to 100 000 parsimony-informative RIs. We found that, when given just 1 000 parsimony-informative RIs as input, ASTRAL_BP successfully reconstructed major clades (i.e clades separated by branches > 0.3 CUs) with high support and identified rapid radiations (i.e. shorter connected branches), although not their precise branching order. The local posterior probability was effective for controlling false positive branches in these scenarios.

Collapse

Mahbub M, Wahab Z, Reaz R, Rahman MS, Bayzid MS. wQFM: Highly Accurate Genome-scale Species Tree Estimation from Weighted Quartets. Bioinformatics 2021;37:3734-3743. [PMID: 34086858 DOI: 10.1093/bioinformatics/btab428] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 05/24/2021] [Accepted: 06/03/2021] [Indexed: 02/01/2023] Open

Rineau V, Zaragüeta R, Bardin J. Information content of trees: three-taxon statements, inference rules and dependency. Biol J Linn Soc Lond 2021. [DOI: 10.1093/biolinnean/blab046] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Farah IT, Islam MM, Zinat KT, Rahman AH, Bayzid MS. Species tree estimation from gene trees by minimizing deep coalescence and maximizing quartet consistency: a comparative study and the presence of pseudo species tree terraces. Syst Biol 2021;70:1213-1231. [PMID: 33844023 DOI: 10.1093/sysbio/syab026] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2020] [Revised: 03/25/2021] [Accepted: 03/29/2021] [Indexed: 11/14/2022] Open

Biological computation and computational biology: survey, challenges, and discussion. Artif Intell Rev 2021. [DOI: 10.1007/s10462-020-09951-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]

Bhattacharjee A, Bayzid MS. Machine learning based imputation techniques for estimating phylogenetic trees from incomplete distance matrices. BMC Genomics 2020;21:497. [PMID: 32689946 PMCID: PMC7370488 DOI: 10.1186/s12864-020-06892-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2020] [Accepted: 07/07/2020] [Indexed: 02/08/2023] Open