Wu Z, Solís-Lemus C. Ultrafast learning of four-node hybridization cycles in phylogenetic networks using algebraic invariants.
BIOINFORMATICS ADVANCES 2024;
4:vbae014. [PMID:
38384862 PMCID:
PMC10879748 DOI:
10.1093/bioadv/vbae014]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 12/23/2023] [Accepted: 02/06/2024] [Indexed: 02/23/2024]
Abstract
Motivation
The abundance of gene flow in the Tree of Life challenges the notion that evolution can be represented with a fully bifurcating process which cannot capture important biological realities like hybridization, introgression, or horizontal gene transfer. Coalescent-based network methods are increasingly popular, yet not scalable for big data, because they need to perform a heuristic search in the space of networks as well as numerical optimization that can be NP-hard. Here, we introduce a novel method to reconstruct phylogenetic networks based on algebraic invariants. While there is a long tradition of using algebraic invariants in phylogenetics, our work is the first to define phylogenetic invariants on concordance factors (frequencies of four-taxon splits in the input gene trees) to identify level-1 phylogenetic networks under the multispecies coalescent model.
Results
Our novel hybrid detection methodology is optimization-free as it only requires the evaluation of polynomial equations, and as such, it bypasses the traversal of network space, yielding a computational speed at least 10 times faster than the fastest-to-date network methods. We illustrate our method's performance on simulated and real data from the genus Canis.
Availability and implementation
We present an open-source publicly available Julia package PhyloDiamond.jl available at https://github.com/solislemuslab/PhyloDiamond.jl with broad applicability within the evolutionary community.
Collapse