1
|
Fritze H, Pope N, Kelleher J, Ralph P. A forest is more than its trees: haplotypes and ancestral recombination graphs. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.11.30.626138. [PMID: 40060605 PMCID: PMC11888177 DOI: 10.1101/2024.11.30.626138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 03/17/2025]
Abstract
Foreshadowing haplotype-based methods of the genomics era, it is an old observation that the "junction" between two distinct haplotypes produced by recombination is inherited as a Mendelian marker. In a genealogical context, this recombination-mediated information reflects the persistence of ancestral haplotypes across local genealogical trees in which they do not represent coalescences. We show how these non-coalescing haplotypes ("locally-unary nodes") may be inserted into ancestral recombination graphs (ARGs), a compact but information-rich data structure describing the genealogical relationships among recombinant sequences. The resulting ARGs are smaller, faster to compute with, and the additional ancestral information that is inserted is nearly always correct where the initial ARG is correct. We provide efficient algorithms to infer locally-unary nodes within existing ARGs, and explore some consequences for ARGs inferred from real data. To do this, we introduce new metrics of agreement and disagreement between ARGs that, unlike previous methods, consider ARGs as describing relationships between haplotypes rather than just a collection of trees.
Collapse
Affiliation(s)
- Halley Fritze
- Department of Mathematics, University of Oregon, Eugene, Oregon
| | - Nathaniel Pope
- Institute of Evolution and Ecology and Department of Biology, University of Oregon, Eugene, Oregon
| | - Jerome Kelleher
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford
| | - Peter Ralph
- Institute of Evolution and Ecology and Department of Biology, University of Oregon, Eugene, Oregon
- Department of Mathematics, University of Oregon, Eugene, Oregon
- Department of Data Science, University of Oregon, Eugene, Oregon
| |
Collapse
|
2
|
Sukumaran J, Meila M. Piikun: an information theoretic toolkit for analysis and visualization of species delimitation metric space. BMC Bioinformatics 2024; 25:385. [PMID: 39695946 DOI: 10.1186/s12859-024-05997-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2024] [Accepted: 11/21/2024] [Indexed: 12/20/2024] Open
Abstract
BACKGROUND Existing software for comparison of species delimitation models do not provide a (true) metric or distance functions between species delimitation models, nor a way to compare these models in terms of relative clustering differences along a lattice of partitions. RESULTS Piikun is a Python package for analyzing and visualizing species delimitation models in an information theoretic framework that, in addition to classic measures of information such as the entropy and mutual information [1], provides for the calculation of the Variation of Information (VI) criterion [2], a true metric or distance function for species delimitation models that is aligned with the lattice of partitions. CONCLUSIONS Piikun is available under the MIT license from its public repository ( https://github.com/jeetsukumaran/piikun ), and can be installed locally using the Python package manager 'pip'.
Collapse
Affiliation(s)
- Jeet Sukumaran
- Biology, San Diego State University, San Diego, CA, USA.
| | - Marina Meila
- Statistics, University of Washington, Seattle, 10587, WA, USA
| |
Collapse
|
3
|
Neu AT, Torchin ME, Allen EE, Roy K. Microbiome divergence of marine gastropod species separated by the Isthmus of Panama. Appl Environ Microbiol 2024; 90:e0100324. [PMID: 39480095 PMCID: PMC11614449 DOI: 10.1128/aem.01003-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Accepted: 07/22/2024] [Indexed: 11/02/2024] Open
Abstract
The rise of the Isthmus of Panama separated the populations of many marine organisms, which then diverged into new geminate sister species currently living in the Eastern Pacific Ocean and the Caribbean Sea. However, we know very little about how such evolutionary divergences of host species have shaped the compositions of their microbiomes. Here, we compared the microbiomes of whole-body and shell-surface samples of geminate species of marine gastropods in the genera Cerithium and Cerithideopsis to those of congeneric outgroups. Our results suggest that the effects of ~3 million years of separation and isolation on microbiome composition varied among host genera and between sample types within the same hosts. In the whole-body samples, microbiome compositions of geminate species pairs tended to be similar, likely due to host filtering, although the strength of this relationship varied among the two groups and across similarity metrics. Shell-surface microbiomes show contrasting patterns, with co-divergence between the host taxa and a small number of microbial clades evident in Cerithideopsis but not Cerithium. These results suggest that (i) isolation of host populations after the rise of the Isthmus of Panama affected microbiomes of geminate hosts in a complex and host-specific manner, and (ii) host-associated microbial taxa respond differently to vicariance events than the hosts themselves.IMPORTANCEWhile considerable work has been done on evolutionary divergences of marine species in response to the rise of the Isthmus of Panama, which separated two previously connected oceans, how this event shaped the microbiomes of these marine hosts remains poorly known. Using whole-body and shell-surface microbiomes of closely related gastropod species from opposite sides of the Isthmus, we show that divergences of microbial taxa after the formation of the Isthmus are often not concordant with those of their gastropod hosts. Our results show that evolutionary responses of marine gastropod-associated microbiomes to major environmental perturbations are complex and are shaped more by local environments than host evolutionary history.
Collapse
Affiliation(s)
- Alexander T. Neu
- Department of Ecology,
Behavior and Evolution, School of Biological Sciences, University of
California San Diego, La
Jolla, California, USA
- Smithsonian Tropical
Research Institute, Ancon,
Balboa, Panama
| | - Mark E. Torchin
- Smithsonian Tropical
Research Institute, Ancon,
Balboa, Panama
| | - Eric E. Allen
- Department of
Molecular Biology, School of Biological Sciences, University of
California San Diego, La
Jolla, California, USA
- Marine Biology
Research Division, Scripps Institution of Oceanography, University of
California San Diego, La
Jolla, California, USA
| | - Kaustuv Roy
- Department of Ecology,
Behavior and Evolution, School of Biological Sciences, University of
California San Diego, La
Jolla, California, USA
| |
Collapse
|
4
|
Li W, Koshkarov A, Tahiri N. Comparison of phylogenetic trees defined on different but mutually overlapping sets of taxa: A review. Ecol Evol 2024; 14:e70054. [PMID: 39119174 PMCID: PMC11307105 DOI: 10.1002/ece3.70054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Revised: 07/03/2024] [Accepted: 07/10/2024] [Indexed: 08/10/2024] Open
Abstract
Phylogenetic trees represent the evolutionary relationships and ancestry of various species or groups of organisms. Comparing these trees by measuring the distance between them is essential for applications such as tree clustering and the Tree of Life project. Many distance metrics for phylogenetic trees focus on trees defined on the same set of taxa. However, some problems require calculating distances between trees with different but overlapping sets of taxa. This study reviews state-of-the-art distance measures for such trees, covering six major approaches, including the constraint-based Robinson-Foulds (RF) distance RF(-), the completion-based RF(+), the generalized RF (GRF), the dissimilarity measure, the vectorial tree distance, and the geodesic distance in the extended Billera-Holmes-Vogtmann tree space. Among these, three RF-based methods, RF(-), RF(+), and GRF, were examined in detail on generated clusters of phylogenetic trees defined on different but mutually overlapping sets of taxa. Additionally, we reviewed nine related techniques, including leaf imputation methods, the tree edit distance, and visual comparison. A comparison of the related distance measures, highlighting their principal advantages and shortcomings, is provided. This review offers valuable insights into their applicability and performance, guiding the appropriate use of these metrics based on tree type (rooted or unrooted) and information type (topological or branch lengths).
Collapse
Affiliation(s)
- Wanlin Li
- Department of Computer ScienceUniversity of SherbrookeSherbrookeQuebecCanada
| | - Aleksandr Koshkarov
- Department of Computer ScienceUniversity of SherbrookeSherbrookeQuebecCanada
| | - Nadia Tahiri
- Department of Computer ScienceUniversity of SherbrookeSherbrookeQuebecCanada
| |
Collapse
|
5
|
Vasei H, Foroughmand-Araabi MH, Daneshgar A. Weighted centroid trees: a general approach to summarize phylogenies in single-labeled tumor mutation tree inference. Bioinformatics 2024; 40:btae120. [PMID: 38984735 PMCID: PMC11520232 DOI: 10.1093/bioinformatics/btae120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 02/19/2024] [Accepted: 07/09/2024] [Indexed: 07/11/2024] Open
Abstract
MOTIVATION Tumor trees, which depict the evolutionary process of cancer, provide a backbone for discovering recurring evolutionary processes in cancer. While they are not the primary information extracted from genomic data, they are valuable for this purpose. One such extraction method involves summarizing multiple trees into a single representative tree, such as consensus trees or supertrees. RESULTS We define the "weighted centroid tree problem" to find the centroid tree of a set of single-labeled rooted trees through the following steps: (i) mapping the given trees into the Euclidean space, (ii) computing the weighted centroid matrix of the mapped trees, and (iii) finding the nearest mapped tree (NMTP) to the centroid matrix. We show that this setup encompasses previously studied parent-child and ancestor-descendent metrics as well as the GraPhyC and TuELiP consensus tree algorithms. Moreover, we show that, while the NMTP problem is polynomial-time solvable for the adjacency embedding, it is NP-hard for ancestry and distance mappings. We introduce integer linear programs for NMTP in different setups where we also provide a new algorithm for the case of ancestry embedding called 2-AncL2, that uses a novel weighting scheme for ancestry signals. Our experimental results show that 2-AncL2 has a superior performance compared to available consensus tree algorithms. We also illustrate our setup's application on providing representative trees for a large real breast cancer dataset, deducing that the cluster centroid trees summarize reliable evolutionary information about the original dataset. AVAILABILITY AND IMPLEMENTATION https://github.com/vasei/WAncILP.
Collapse
Affiliation(s)
- Hamed Vasei
- Department of Mathematical Sciences, Sharif University of Technology, Tehran 111559415, Iran
| | | | - Amir Daneshgar
- Department of Mathematical Sciences, Sharif University of Technology, Tehran 111559415, Iran
| |
Collapse
|
6
|
Khayatian E, Valiente G, Zhang L. The k-Robinson-Foulds Dissimilarity Measures for Comparison of Labeled Trees. J Comput Biol 2024; 31:328-344. [PMID: 38271573 PMCID: PMC11057537 DOI: 10.1089/cmb.2023.0312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2024] Open
Abstract
Understanding the mutational history of tumor cells is a critical endeavor in unraveling the mechanisms that drive the onset and progression of cancer. Modeling tumor cell evolution with labeled trees motivates researchers to develop different measures to compare labeled trees. Although the Robinson-Foulds (RF) distance is widely used for comparing species trees, its applicability to labeled trees reveals certain limitations. This study introduces the k-RF dissimilarity measures, tailored to address the challenges of labeled tree comparison. The RF distance is succinctly expressed as n-RF in the space of labeled trees with n nodes. Like the RF distance, the k-RF is a pseudometric for multiset-labeled trees and becomes a metric in the space of 1-labeled trees. By setting k to a small value, the k-RF dissimilarity can capture analogous local regions in two labeled trees with different size or different labels.
Collapse
Affiliation(s)
- Elahe Khayatian
- Department of Mathematics, National University of Singapore, Singapore, Singapore
| | - Gabriel Valiente
- Department of Computer Science, Technical University of Catalonia, Barcelona, Spain
| | - Louxin Zhang
- Department of Mathematics, National University of Singapore, Singapore, Singapore
| |
Collapse
|
7
|
Jia X, Zhang X, Ling Y, Zhang X, Tian D, Liao Y, Yi Z, Lu H. Application of nanopore sequencing in diagnosis of secondary infections in patients with severe COVID-19. Zhejiang Da Xue Xue Bao Yi Xue Ban 2021; 50:748-754. [PMID: 35347908 PMCID: PMC8931600 DOI: 10.3724/zdxbyxb-2021-0158] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2021] [Accepted: 10/10/2021] [Indexed: 06/14/2023]
Abstract
To explore the application value of nanopore sequencing technique in the diagnosis and treatment of secondary infections in patients with severe coronavirus disease 2019 (COVID-19). A total of 77 clinical specimens from 3 patients with severe COVID-19 were collected. After heat inactivation, all samples were subjected to total nucleic acid extraction based on magnetic bead enrichment. The extracted DNA was used for DNA library construction, then nanopore real-time sequencing detection was performed. The sequencing data were subjected to Centrifuge software database species matching and R program differential analysis to obtain potential pathogen identification. Nanopore sequencing results were compared with respiratory pathogen qPCR panel screening and conventional microbiological testing results to verify the effectiveness of nanopore sequencing detection. Nanopore sequencing results showed that positive pathogen were obtained in 44 specimens (57.1%). The potential pathogens identified by nanopore sequencing included , , and , et al. , , were also detected in clinical microbiological culture-based detection; was detected in respiratory pathogen screening qPCR panel; was only detected by the nanopore sequencing technique. Comprehensive considerations with the clinical symptoms, the patient was treated with antibiotics against , and the infection was controlled. Nanopore sequencing may assist the diagnosis and treatment of severe COVID-19 patients through rapid identification of potential pathogens.
Collapse
Affiliation(s)
- Xiaofang Jia
- 1. Scientific Research Department, Shanghai Public Health Clinical Center, Shanghai 201508, China
| | - Xiaonan Zhang
- 1. Scientific Research Department, Shanghai Public Health Clinical Center, Shanghai 201508, China
| | - Yun Ling
- 2. Infectious Disease Department, Shanghai Public Health Clinical Center, Shanghai 201508, China
| | - Xinyu Zhang
- 1. Scientific Research Department, Shanghai Public Health Clinical Center, Shanghai 201508, China
| | - Di Tian
- 1. Scientific Research Department, Shanghai Public Health Clinical Center, Shanghai 201508, China
| | - Yixin Liao
- 1. Scientific Research Department, Shanghai Public Health Clinical Center, Shanghai 201508, China
| | - Zhigang Yi
- 1. Scientific Research Department, Shanghai Public Health Clinical Center, Shanghai 201508, China
| | - Hongzhou Lu
- 3. Department of Infection and Immunology, Shanghai Public Health Clinical Center, Shanghai 201508, China
| |
Collapse
|