1
|
Anthoons B, Veltman MA, Tsiftsis S, Gravendeel B, Drouzas AD, de Boer H, Madesis P. Exploring the potential of Angiosperms353 markers for species identification of Eastern Mediterranean orchids. Mol Phylogenet Evol 2025; 209:108360. [PMID: 40288704 DOI: 10.1016/j.ympev.2025.108360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2024] [Revised: 03/26/2025] [Accepted: 04/22/2025] [Indexed: 04/29/2025]
Abstract
Tuberous orchids are ecologically vulnerable species, threatened by a range of environmental pressures such as overharvesting, grazing and land use change. Conservation efforts require accurate species identification, but are impeded by limited phylogenetic resolution of traditional genetic markers, which is exacerbated by widespread taxonomic conflict regarding the classification of orchids. Target enrichment holds promise to resolve both these challenges by offering a large set of nuclear loci with which to increase phylogenetic resolution and evaluate competing species models. Here, we evaluate the effectiveness of the Angiosperms353 markers for distinguishing over 50 tuberous orchid species native to Greece and we explore the possibility of narrowing these markers to a smaller set that could function as a minimal probe set. Our methodology consists of a three-tiered approach: 1) generating a species-level phylogeny using all Angiosperms353 loci with sufficient target recovery, 2) evaluating competing species models based on "splitter" and "lumper" classifications through Bayes Factor species delimitation, and 3) ranking the potential of Angiosperms353 loci to discriminate representatives of lineages with different divergence times based on their phylogenetic informativeness. While the inferred multi-species coalescent phylogeny had overall high support, Bayes Factor delimitation revealed mixed outcomes, favouring splitting in Serapias, while favouring splitting in basal clades and lumping in more recently diverged clades in Ophrys. A molecular clock analysis of Ophrys confirms rapid and recent radiation in clades marked by phylogenetic uncertainty, suggesting the need for additional loci to fully resolve this genus. Finally, we found 30 loci to be highly phylogenetically informative across four epochs of Orchidinae evolution; we suggest these are promising candidates for future marker development. Our findings enhance the Plant Tree of Life (PAFTOL) by contributing additional phylogenomic data for species that were previously underrepresented in trees built with these markers, while shedding light on the ongoing "splitter"-vs-"lumper" debate and offering new directions for species identification of tuberous orchids, a group with distinct taxonomic and conservation challenges.
Collapse
Affiliation(s)
- Bastien Anthoons
- Lab. of Systematic Botany and Phytogeography, School of Biology, P.O. Box: 104, Aristotle University of Thessaloniki GR-54124 Thessaloniki, Greece; Institute of Applied Biosciences, CERTH, 6th km Charilaou-Thermis Road, Thermi, GR-57001 Thessaloniki, Greece
| | - Margaretha A Veltman
- Natural History Museum, University of Oslo, Postboks 1172, Blindern, 0318 Oslo, Norway; Naturalis Biodiversity Center, Darwinweg 2, 2333 CR Leiden, the Netherlands
| | - Spyros Tsiftsis
- Department of Forest and Natural Environment Sciences, Democritus University of Thrace, Drama GR-66132, Greece
| | - Barbara Gravendeel
- Naturalis Biodiversity Center, Darwinweg 2, 2333 CR Leiden, the Netherlands; Radboud Institute of Environmental and Biological Sciences, Heyendaalseweg 135, 6500 GL Nijmegen, the Netherlands
| | - Andreas D Drouzas
- Lab. of Systematic Botany and Phytogeography, School of Biology, P.O. Box: 104, Aristotle University of Thessaloniki GR-54124 Thessaloniki, Greece.
| | - Hugo de Boer
- Natural History Museum, University of Oslo, Postboks 1172, Blindern, 0318 Oslo, Norway.
| | - Panagiotis Madesis
- Institute of Applied Biosciences, CERTH, 6th km Charilaou-Thermis Road, Thermi, GR-57001 Thessaloniki, Greece; Laboratory of Molecular Biology of Plants, School of Agricultural Sciences, University of Thessaly GR-38446 Thessaly, Greece.
| |
Collapse
|
2
|
Soares LS, Stehmann JR, Freitas LB. The Genus Petunia (Solanaceae): Evolutionary Synthesis and Taxonomic Review. PLANTS (BASEL, SWITZERLAND) 2025; 14:1478. [PMID: 40431043 PMCID: PMC12115208 DOI: 10.3390/plants14101478] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/12/2025] [Revised: 05/09/2025] [Accepted: 05/10/2025] [Indexed: 05/29/2025]
Abstract
Many plant groups exhibit complex evolutionary processes, including hybridization, incomplete lineage sorting, and variable evolutionary rates, which make species delimitation challenging. Molecular data have been essential for studying such groups, including Petunia, where local adaptation, allopatric speciation, pollinator interactions, and hybridization shape diversity and population structure. In this study, we produced the first broadly inclusive phylogenetic tree of Petunia using high-throughput DNA sequence data generated by genome complexity reduction-based sequencing (DArT), and incorporating all currently accepted taxa. Additionally, we reviewed previously published phylogenetic and phylogeographic studies on these species to support the taxonomic revision. Phylogenetic analyses based on SNPs were largely congruent, revealing two well-supported clades divided by corolla tube length, consistent with previous studies. These clades likely originated and diversified during the Pleistocene. The phylogenetic trees provided strong support for taxonomic changes, resolving long-standing uncertainties. We recognize P. axillaris, P. parodii, and P. subandina as independent species, elevate P. integrifolia subsp. depauperata to P. dichotoma Sendtn., and resurrect P. guarapuavensis. Additionally, our results highlighted unsolved questions regarding the evolutionary history of the short corolla tube clade, suggesting the need for further investigation into its diversification and genetic structure.
Collapse
Affiliation(s)
- Luana S. Soares
- Department of Genetics, Universidade Federal do Rio Grande do Sul, Porto Alegre 90509-900, Brazil;
| | - João R. Stehmann
- Department of Botany, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - Loreta B. Freitas
- Department of Genetics, Universidade Federal do Rio Grande do Sul, Porto Alegre 90509-900, Brazil;
| |
Collapse
|
3
|
McLamb F, Vazquez A, Olander N, Vasquez M, Feng Z, Malhotra N, Bozinovic L, Najera Ruiz K, O'Connell K, Stagg J, Bozinovic G. Comparative Three-Barcode Phylogenetics and Soil Microbiomes of Planted and Wild Arbutus Strawberry Trees. PLANT DIRECT 2025; 9:e70078. [PMID: 40343328 PMCID: PMC12059276 DOI: 10.1002/pld3.70078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/26/2025] [Revised: 04/26/2025] [Accepted: 04/28/2025] [Indexed: 05/11/2025]
Abstract
Taxonomic identification of closely related plants can be challenging due to convergent evolution, hybridization, and overlapping geographic distribution. To derive taxonomic relationships among planted and wild Arbutus plants across a large geographic range, we complemented three standard plastid barcodes rbcL, matK, and trnH-psbA with soil and fruit chemistry, soil microbiome, and plant morphology analyses. Soil and plant sampling included planted Arbutus from manicured sites in Southern California, USA, wild plants from Southern and Northern California, and wild populations from Mediterranean island of Hvar, Croatia. We hypothesized that phenotypic variation within and between sites correlates with plants' genotype and geographic distribution. Similar fruit chemistry corresponds to geographical proximity and morphological resemblance, while bulk soil bacterial content defines three distinct clusters distinguishing planted versus wild trees and continent of origin. The soil microbiome of wild California Arbutus was characterized by an abundance of Nitrobacter, while the presence of Candidatus Xiphinematobacter was high in wild Hvar samples and most planted samples, but low in all wild California samples. Although all three barcodes resolved four main groups, the position of samples varies across barcodes. The rbcL phylogram is relatively unbalanced, suggesting slower diversification among wild California populations and exhibiting greater resolution than other barcodes among planted individuals. While our data demonstrate an overall agreement among standard plant barcodes relative to geo-distribution and plant morphology, sustained efforts on cost-effective global plant DNA barcode library standardization for closely related and geographically overlapping plants is recommended.
Collapse
Affiliation(s)
- Flannery McLamb
- Boz Life Science Research and Teaching InstituteLa JollaCaliforniaUSA
- Division of Extended StudiesUniversity of California San DiegoLa JollaCaliforniaUSA
| | - Armando Vazquez
- Boz Life Science Research and Teaching InstituteLa JollaCaliforniaUSA
- Division of Extended StudiesUniversity of California San DiegoLa JollaCaliforniaUSA
| | - Natalie Olander
- Boz Life Science Research and Teaching InstituteLa JollaCaliforniaUSA
- Hope CollegeHollandMichiganUSA
| | - Miguel F. Vasquez
- Boz Life Science Research and Teaching InstituteLa JollaCaliforniaUSA
- Division of Extended StudiesUniversity of California San DiegoLa JollaCaliforniaUSA
| | - Zuying Feng
- Boz Life Science Research and Teaching InstituteLa JollaCaliforniaUSA
| | - Niharika Malhotra
- Boz Life Science Research and Teaching InstituteLa JollaCaliforniaUSA
- Division of Extended StudiesUniversity of California San DiegoLa JollaCaliforniaUSA
| | - Liisa Bozinovic
- Boz Life Science Research and Teaching InstituteLa JollaCaliforniaUSA
- Oregon Bioscience AssociationPortlandOregonUSA
| | - Karen Najera Ruiz
- Boz Life Science Research and Teaching InstituteLa JollaCaliforniaUSA
- Division of Extended StudiesUniversity of California San DiegoLa JollaCaliforniaUSA
| | - Katherine O'Connell
- Boz Life Science Research and Teaching InstituteLa JollaCaliforniaUSA
- Division of Extended StudiesUniversity of California San DiegoLa JollaCaliforniaUSA
- Bowdoin CollegeBrunswickMaineUSA
| | - Joseph Stagg
- Boz Life Science Research and Teaching InstituteLa JollaCaliforniaUSA
- Division of Extended StudiesUniversity of California San DiegoLa JollaCaliforniaUSA
| | - Goran Bozinovic
- Boz Life Science Research and Teaching InstituteLa JollaCaliforniaUSA
- Portland State UniversityPortlandOregonUSA
- Pharos International Institute for Science, Arts and CultureStari GradCroatia
- School of Biological SciencesUniversity of California San DiegoLa JollaCaliforniaUSA
| |
Collapse
|
4
|
Lakshman AH, Wright ES. EvoWeaver: large-scale prediction of gene functional associations from coevolutionary signals. Nat Commun 2025; 16:3878. [PMID: 40274827 PMCID: PMC12022180 DOI: 10.1038/s41467-025-59175-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2025] [Accepted: 04/09/2025] [Indexed: 04/26/2025] Open
Abstract
The known universe of uncharacterized proteins is expanding far faster than our ability to annotate their functions through laboratory study. Computational annotation approaches rely on similarity to previously studied proteins, thereby ignoring unstudied proteins. Coevolutionary approaches hold promise for injecting new information into our knowledge of the protein universe by linking proteins through 'guilt-by-association'. However, existing coevolutionary algorithms have insufficient accuracy and scalability to connect the entire universe of proteins. We present EvoWeaver, a method that weaves together 12 signals of coevolution to quantify the degree of shared evolution between genes. EvoWeaver accurately identifies proteins involved in protein complexes or separate steps of a biochemical pathway. We show the merits of EvoWeaver by partly reconstructing known biochemical pathways without any prior knowledge other than that available from genomic sequences. Applying EvoWeaver to 1545 gene groups from 8564 genomes reveals missing connections in popular databases and potentially undiscovered links between proteins.
Collapse
Affiliation(s)
- Aidan H Lakshman
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Erik S Wright
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA.
- Center for Evolutionary Biology and Medicine, Pittsburgh, PA, USA.
| |
Collapse
|
5
|
Tan B, Li S, Wang M, Li SC. CeiTEA: Adaptive Hierarchy of Single Cells with Topological Entropy. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2025:e2503539. [PMID: 40245302 DOI: 10.1002/advs.202503539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/24/2025] [Indexed: 04/19/2025]
Abstract
Advances in single-cell RNA sequencing (scRNA-seq) enable detailed analysis of cellular heterogeneity, but existing clustering methods often fail to capture the complex hierarchical structures of cell types and subtypes. CeiTEA is introduced, a novel algorithm for adaptive hierarchical clustering based on topological entropy (TE), designed to address this challenge. CeiTEA constructs a multi-nary partition tree that optimally represents relationships and diversity among cell types by minimizing TE. This method combines a bottom-up strategy for hierarchy construction with a top-down strategy for local diversification, facilitating the identification of smaller hierarchical structures within subtrees. CeiTEA is evaluated on both simulated and real-world scRNA-seq datasets, demonstrating superior clustering performance compared to state-of-the-art tools like Louvain, Leiden, K-means, and SEAT. In simulated multi-layer datasets, CeiTEA demonstrated superior performance in retrieving hierarchies with a lower average clustering information distance of 0.15, compared to 0.39 from SEAT and 0.67 from traditional hierarchical clustering methods. On real datasets, the CeiTEA hierarchy reflects the developmental potency of various cell populations, validated by gene ontology enrichment, cell-cell interaction, and pseudo-time analysis. These findings highlight CeiTEA's potential as a powerful tool for understanding complex relationships in single-cell data, with applications in tumor heterogeneity and tissue specification.
Collapse
Affiliation(s)
- Bowen Tan
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
| | - Shiying Li
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
| | - Mengbo Wang
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
| | - Shuai Cheng Li
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
| |
Collapse
|
6
|
Goloboff PA, Morales ME. On the effect of measures for comparing trees on the representation of treespace. Cladistics 2025. [PMID: 40186568 DOI: 10.1111/cla.12614] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2024] [Revised: 03/05/2025] [Accepted: 03/19/2025] [Indexed: 04/07/2025] Open
Abstract
The multidimensional space determined by distances between trees (as measured with various methods) is often reduced and projected with multidimensional scaling to visually represent the differences between trees in the possible "treespace". This paper discusses the influence of 18 alternative measures of distance on mapping the treespace for all possible trees (or a large sample thereof) when trees of different degrees of resolution are included. Measures of distance appropriate for such mapping are expected to produce (hyper)spherical mappings, with resolved trees in the outer layer, less resolved trees in inner layers and the bush situated in the middle of the diagram. Measures of tree comparison that rescale the values by the observed (rather than potential) resolution produce an inversion of such an arrangement, with less resolved trees in the outer layers. Additionally, some measures are shown to be strongly influenced by tree shape, so that trees of certain shapes end up being situated at specific depths of the diagram (which may become so distorted as to not even look like a hypersphere).
Collapse
Affiliation(s)
- Pablo A Goloboff
- Unidad Ejecutora Lillo (UEL, Fundación Miguel Lillo-CONICET), Miguel Lillo 251, 4000, San Miguel de Tucumán, Argentina
| | - Martín E Morales
- Unidad Ejecutora Lillo (UEL, Fundación Miguel Lillo-CONICET), Miguel Lillo 251, 4000, San Miguel de Tucumán, Argentina
| |
Collapse
|
7
|
Tagliacollo VA, de Pinna M, Chuctaya J, Datovo A. Accuracy of phylogenetic reconstructions from continuous characters analysed under parsimony and its parametric correlates. Cladistics 2025; 41:212-222. [PMID: 39915925 DOI: 10.1111/cla.12606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Revised: 12/18/2024] [Accepted: 12/27/2024] [Indexed: 03/11/2025] Open
Abstract
Quantitative traits are a source of evolutionary information often difficult to handle in cladistics. Tools exist to analyse this kind of data without subjective discretization, avoiding biases in the delimitation of categorical states. Nonetheless, our ability to accurately infer relationships from continuous characters is incompletely understood, particularly under parsimony analysis. This study evaluates the accuracy of phylogenetic reconstructions from simulated matrices of continuous characters evolving under alternative evolutionary processes and analysed by parsimony. We sampled 100 empirical trees to simulate 9000 matrices, each containing between 25 and 50 taxa, and 50 and 150 continuous characters evolving under three evolutionary processes: Brownian motion, Ornstein-Uhlenbeck and early burst with variable parametrizations. Our cladogram comparisons revealed that continuous character matrices, when properly coded and analysed by parsimony in TNT, carry phylogenetic signals from which species relationships can be inferred, regardless of the evolutionary models and parameterization schemes. Interestingly, implementing equal weighting or implied weighting with varying penalization strengths against homoplasies did not affect cladogram reconstructions based on continuous characters. Finally, the accuracy of continuous characters in resolving species relationships is skewed towards apical nodes of the recovered trees. Our findings provide general insights of the utility of quantitative traits in cladistics and demonstrate that their effectiveness in estimating shallower nodes is independent of the underlying evolutionary model, parameters and weighting schemes.
Collapse
Affiliation(s)
- Victor A Tagliacollo
- Instituto de Biologia, Universidade Federal de Uberlândia, Rua Ceará - S/N, Umuarama, Minas Gerais, Brazil
| | - Mario de Pinna
- Museu de Zoologia da Universidade de São Paulo, Avenida Nazaré 481, Ipiranga, São Paulo, Brazil
| | - Junior Chuctaya
- Instituto de Biologia, Universidade Federal de Uberlândia, Rua Ceará - S/N, Umuarama, Minas Gerais, Brazil
| | - Alessio Datovo
- Museu de Zoologia da Universidade de São Paulo, Avenida Nazaré 481, Ipiranga, São Paulo, Brazil
| |
Collapse
|
8
|
Derelle R, Madon K, Hellewell J, Rodríguez-Bouza V, Arinaminpathy N, Lalvani A, Croucher NJ, Harris SR, Lees JA, Chindelevitch L. Reference-Free Variant Calling with Local Graph Construction with ska lo (SKA). Mol Biol Evol 2025; 42:msaf077. [PMID: 40171940 PMCID: PMC11986325 DOI: 10.1093/molbev/msaf077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2024] [Revised: 02/20/2025] [Accepted: 03/20/2025] [Indexed: 04/04/2025] Open
Abstract
The study of genomic variants is increasingly important for public health surveillance of pathogens. Traditional variant-calling methods from whole-genome sequencing data rely on reference-based alignment, which can introduce biases and require significant computational resources. Alignment- and reference-free approaches offer an alternative by leveraging k-mer-based methods, but existing implementations often suffer from sensitivity limitations, particularly in high mutation density genomic regions. Here, we present ska lo, a graph-based algorithm that aims to identify within-strain variants in pathogen whole-genome sequencing data by traversing a colored De Bruijn graph and building variant groups (i.e. sets of variant combinations). Through in silico benchmarking and real-world dataset analyses, we demonstrate that ska lo achieves high sensitivity in single-nucleotide polymorphism (SNP) calls while also enabling the detection of insertions and deletions, as well as SNP positioning on a reference genome for recombination analyses. These findings highlight ska lo as a simple, fast, and effective tool for pathogen genomic epidemiology, extending the range of reference-free variant-calling approaches. ska lo is freely available as part of the SKA program (https://github.com/bacpop/ska.rust).
Collapse
Affiliation(s)
- Romain Derelle
- NIHR Health Protection Research Unit in Respiratory Infections, National Heart and Lung Institute, Imperial College London, London W2 1PG, UK
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London W12 0BZ, UK
| | - Kieran Madon
- NIHR Health Protection Research Unit in Respiratory Infections, National Heart and Lung Institute, Imperial College London, London W2 1PG, UK
| | - Joel Hellewell
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Víctor Rodríguez-Bouza
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Nimalan Arinaminpathy
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London W12 0BZ, UK
| | - Ajit Lalvani
- NIHR Health Protection Research Unit in Respiratory Infections, National Heart and Lung Institute, Imperial College London, London W2 1PG, UK
| | - Nicholas J Croucher
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London W12 0BZ, UK
| | - Simon R Harris
- Bill and Melinda Gates Foundation, 62 Buckingham Gate, Westminster, London SW1E 6AJ, UK
| | - John A Lees
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Leonid Chindelevitch
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London W12 0BZ, UK
| |
Collapse
|
9
|
Rossi SH, Dombrowe V, Godfrey L, Bucaciuc Mracica T, Pita S, Milne-Clark T, Kyeremeh J, Park G, Smith CG, Lach RP, Babbage A, Warren AY, Mitchell TJ, Stewart GD, Schwarz R, Massie CE. Evidence of DNA methylation heterogeneity and epipolymorphism in kidney cancer tissue samples. Oncogene 2025; 44:1024-1036. [PMID: 39824946 PMCID: PMC11976292 DOI: 10.1038/s41388-024-03270-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2024] [Revised: 12/04/2024] [Accepted: 12/27/2024] [Indexed: 01/20/2025]
Abstract
Clear cell renal cell carcinoma (ccRCC) is characterised by significant genetic heterogeneity, which has diagnostic and prognostic implications. Very limited evidence is available regarding DNA methylation heterogeneity. We therefore generate sequence level DNA methylation data on 136 multi-region tumour and normal kidney tissue from 18 ccRCC patients, along with matched whole exome sequencing (85 samples) and gene expression (47 samples) data on a subset of samples. We perform a comprehensive systematic analysis of heterogeneity between patients, within a patient and within a sample. We demonstrate that bulk methylation data may be deconvoluted into cell-type-specific latent methylation components (LMCs), and that LMC1, which is likely to represent T cells, is associated with prognostic parameters. Differential epipolymorphism was noted between ccRCC and normal tissue in the promoter region of genes which are known to be associated with kidney cancer. This was externally validated in an independent cohort of 71 ccRCC and normal kidney tissues. Differential epipolymorphism in the gene promoter was a predictor of gene expression, after adjusting for average methylation. This represents the first evaluation of epipolymorphism in ccRCC and suggests that gains and losses in methylation disorder may have a functional relevance, gleaning important information on tumourigenesis.
Collapse
Affiliation(s)
- Sabrina H Rossi
- Early Cancer Institute, Cancer Research UK Cambridge Centre, Cambridge Biomedical Campus, Cambridge, UK.
- Department of Surgery, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK.
- CRUK Cambridge Centre, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK.
| | - Victoria Dombrowe
- Institute for Computational Cancer Biology (ICCB), Centre for Integrated Oncology (CIO), Cancer Research Centre Cologne Essen (CCCE), Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany
- BIFOLD-Berlin Institute for the Foundations of Learning and Data, Berlin, Germany
| | - Laura Godfrey
- Institute for Computational Cancer Biology (ICCB), Centre for Integrated Oncology (CIO), Cancer Research Centre Cologne Essen (CCCE), Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany
| | - Teodora Bucaciuc Mracica
- Institute for Computational Cancer Biology (ICCB), Centre for Integrated Oncology (CIO), Cancer Research Centre Cologne Essen (CCCE), Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany
| | - Sara Pita
- Early Cancer Institute, Cancer Research UK Cambridge Centre, Cambridge Biomedical Campus, Cambridge, UK
- CRUK Cambridge Centre, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK
| | - Toby Milne-Clark
- Early Cancer Institute, Cancer Research UK Cambridge Centre, Cambridge Biomedical Campus, Cambridge, UK
- CRUK Cambridge Centre, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK
| | - Justicia Kyeremeh
- Department of Surgery, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK
| | - Gahee Park
- Early Cancer Institute, Cancer Research UK Cambridge Centre, Cambridge Biomedical Campus, Cambridge, UK
- CRUK Cambridge Centre, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK
| | - Christopher G Smith
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
| | - Radoslaw P Lach
- Early Cancer Institute, Cancer Research UK Cambridge Centre, Cambridge Biomedical Campus, Cambridge, UK
- CRUK Cambridge Centre, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK
| | - Anne Babbage
- Early Cancer Institute, Cancer Research UK Cambridge Centre, Cambridge Biomedical Campus, Cambridge, UK
- CRUK Cambridge Centre, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK
| | - Anne Y Warren
- CRUK Cambridge Centre, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK
- Department of Histopathology, University of Cambridge, Addenbrooke's Hospital, Cambridge Biomedical Campus, Cambridge, UK
| | - Thomas J Mitchell
- Early Cancer Institute, Cancer Research UK Cambridge Centre, Cambridge Biomedical Campus, Cambridge, UK
- Department of Surgery, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK
- CRUK Cambridge Centre, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK
| | - Grant D Stewart
- Department of Surgery, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK
- CRUK Cambridge Centre, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK
| | - Roland Schwarz
- Institute for Computational Cancer Biology (ICCB), Centre for Integrated Oncology (CIO), Cancer Research Centre Cologne Essen (CCCE), Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany
- BIFOLD-Berlin Institute for the Foundations of Learning and Data, Berlin, Germany
| | - Charlie E Massie
- Early Cancer Institute, Cancer Research UK Cambridge Centre, Cambridge Biomedical Campus, Cambridge, UK
- CRUK Cambridge Centre, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK
| |
Collapse
|
10
|
McHugh MP, Horsfield ST, von Wachsmann J, Toussaint J, Pettigrew KA, Czarniak E, Evans TJ, Leanord A, Tysall L, Gillespie SH, Templeton KE, Holden MTG, Croucher NJ, Lees JA. Integrated population clustering and genomic epidemiology with PopPIPE. Microb Genom 2025; 11:001404. [PMID: 40294103 PMCID: PMC12038005 DOI: 10.1099/mgen.0.001404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2024] [Accepted: 03/22/2025] [Indexed: 04/30/2025] Open
Abstract
Genetic distances between bacterial DNA sequences can be used to cluster populations into closely related subpopulations and as an additional source of information when detecting possible transmission events. Due to their variable gene content and order, reference-free methods offer more sensitive detection of genetic differences, especially among closely related samples found in outbreaks. However, across longer genetic distances, frequent recombination can make calculation and interpretation of these differences more challenging, requiring significant bioinformatic expertise and manual intervention during the analysis process. Here, we present a Population analysis PIPEline (PopPIPE) which combines rapid reference-free genome analysis methods to analyse bacterial genomes across these two scales, splitting whole populations into subclusters and detecting plausible transmission events within closely related clusters. We use k-mer sketching to split populations into strains, followed by split k-mer analysis and recombination removal to create alignments and subclusters within these strains. We first show that this approach creates high-quality subclusters on a population-wide dataset of Streptococcus pneumoniae. When applied to nosocomial vancomycin-resistant Enterococcus faecium samples, PopPIPE finds transmission clusters that are more epidemiologically plausible than core genome or multilocus sequence typing (MLST) approaches. Our pipeline is rapid and reproducible, creates interactive visualizations and can easily be reconfigured and re-run on new datasets. Therefore, PopPIPE provides a user-friendly pipeline for analyses spanning species-wide clustering to outbreak investigations.
Collapse
Affiliation(s)
- Martin P. McHugh
- Medical Microbiology, Department of Laboratory Medicine, Royal Infirmary of Edinburgh, Edinburgh EH16 4SA, UK
- Division of Infection and Global Health, University of St Andrews, St Andrews KY16 9AJ, UK
| | - Samuel T. Horsfield
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Hinxton CB10 1SD, UK
| | - Johanna von Wachsmann
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Hinxton CB10 1SD, UK
| | - Jacqueline Toussaint
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Hinxton CB10 1SD, UK
| | - Kerry A. Pettigrew
- Medical Microbiology, Department of Laboratory Medicine, Royal Infirmary of Edinburgh, Edinburgh EH16 4SA, UK
- Division of Infection and Global Health, University of St Andrews, St Andrews KY16 9AJ, UK
| | - Elzbieta Czarniak
- Medical Microbiology, Department of Laboratory Medicine, Royal Infirmary of Edinburgh, Edinburgh EH16 4SA, UK
| | - Thomas J. Evans
- School of Infection and Immunity, University of Glasgow, Glasgow G12 8QQ, UK
| | - Alistair Leanord
- School of Infection and Immunity, University of Glasgow, Glasgow G12 8QQ, UK
- Scottish Microbiology Reference Laboratories, Glasgow Royal Infirmary, Glasgow G4 0SF, UK
| | - Luke Tysall
- Medical Microbiology, Department of Laboratory Medicine, Royal Infirmary of Edinburgh, Edinburgh EH16 4SA, UK
| | - Stephen H. Gillespie
- Division of Infection and Global Health, University of St Andrews, St Andrews KY16 9AJ, UK
| | - Kate E. Templeton
- Medical Microbiology, Department of Laboratory Medicine, Royal Infirmary of Edinburgh, Edinburgh EH16 4SA, UK
| | - Matthew T. G. Holden
- Division of Infection and Global Health, University of St Andrews, St Andrews KY16 9AJ, UK
| | - Nicholas J. Croucher
- MRC Centre for Global Infectious Disease Analysis, School of Public Health, Imperial College London, London SW7 2AZ, UK
| | - John A. Lees
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Hinxton CB10 1SD, UK
| |
Collapse
|
11
|
Lin D, Shao B, Gao Z, Li J, Li Z, Li T, Huang W, Zhong X, Xu C, Chase MW, Jin X. Phylogenomics of angiosperms based on mitochondrial genes: insights into deep node relationships. BMC Biol 2025; 23:45. [PMID: 39948594 PMCID: PMC11827323 DOI: 10.1186/s12915-025-02135-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Accepted: 01/17/2025] [Indexed: 02/16/2025] Open
Abstract
BACKGROUND Angiosperms are the largest plant group and play an essential role in the biosphere. Phylogenetic relationships of many families and orders remain contentious, and, in an attempt to address these, we performed the most extensive sampling of mitochondrial genes to date. RESULTS We reconstructed a seed plant phylogenetic framework based on 41 mitochondrial protein-coding sequences (mtCDSs), representing 335 families and 63 orders with 481 angiosperm species. The results for major clades of angiosperms produced moderate to strong support (> 70% bootstrap) for more than 80% of nodes and strong support for most orders. Eight major nodes were supported, including the three paraphyletic ANA orders (Amborellales, Nymphaeales, and Austrobaileyales) and five major core-angiosperm clades. Chloranthales and Ceratophyllales are sister to the eudicots, whereas the monocots are sister to the magnoliids. Although well-supported, relationships within the asterids and rosids were in some cases unresolved or weakly supported, due to the low levels of variability detected in these genes. CONCLUSIONS Our results indicated that mitochondrial genomic data were effective at resolving deep node relationships of angiosperm phylogeny and thus represent an important resource for phylogenetics and evolutionary studies of angiosperm.
Collapse
Affiliation(s)
- Dongliang Lin
- State Key Laboratory of Plant Diversity and Speciality Crops, Institute of Botany, Chinese Academy of Sciences, Beijing, 100093, China
- University of Chinese Academy of Sciences, Beijing, China
- China National Botanical Garden, Beijing, China
| | - Bingyi Shao
- State Key Laboratory of Plant Diversity and Speciality Crops, Institute of Botany, Chinese Academy of Sciences, Beijing, 100093, China
| | - Zhiyuan Gao
- State Key Laboratory of Plant Diversity and Speciality Crops, Institute of Botany, Chinese Academy of Sciences, Beijing, 100093, China
| | - Jianwu Li
- Center for Integrative Conservation, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Yunnan, 666303, China
| | - Zhanghai Li
- State Key Laboratory of Plant Diversity and Speciality Crops, Institute of Botany, Chinese Academy of Sciences, Beijing, 100093, China
| | - Tingyu Li
- University of Chinese Academy of Sciences, Beijing, China
- Key Laboratory of Economic Plants and Biotechnology, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, Yunnan, China
| | - Weichang Huang
- Shanghai Chenshan Botanical Garden, Chenhua Road 3888, Songjiang, Shanghai, 201602, China
| | - Xin Zhong
- Shanghai Chenshan Botanical Garden, Chenhua Road 3888, Songjiang, Shanghai, 201602, China
| | - Chao Xu
- State Key Laboratory of Plant Diversity and Speciality Crops, Institute of Botany, Chinese Academy of Sciences, Beijing, 100093, China
- China National Botanical Garden, Beijing, China
| | - Mark W Chase
- The Royal Botanic Gardens, Kew, Richmond, Surrey, TW9 3AB, UK.
- Department of Environment and Agriculture, Curtin University, Bentley, WA, 6102, Australia.
| | - Xiaohua Jin
- State Key Laboratory of Plant Diversity and Speciality Crops, Institute of Botany, Chinese Academy of Sciences, Beijing, 100093, China.
- China National Botanical Garden, Beijing, China.
| |
Collapse
|
12
|
Grass Phylogeny Working Group III. A nuclear phylogenomic tree of grasses (Poaceae) recovers current classification despite gene tree incongruence. THE NEW PHYTOLOGIST 2025; 245:818-834. [PMID: 39568153 DOI: 10.1111/nph.20263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Accepted: 10/10/2024] [Indexed: 11/22/2024]
Abstract
Grasses (Poaceae) comprise c. 11 800 species and are central to human livelihoods and terrestrial ecosystems. Knowing their relationships and evolutionary history is key to comparative research and crop breeding. Advances in genome-scale sequencing allow for increased breadth and depth of phylogenomic analyses, making it possible to infer a new reference species tree of the family. We inferred a comprehensive species tree of grasses by combining new and published sequences for 331 nuclear genes from genome, transcriptome, target enrichment and shotgun data. Our 1153-tip tree covers 79% of grass genera (including 21 genera sequenced for the first time) and all but two small tribes. We compared it to a newly inferred 910-tip plastome tree. We recovered most of the tribes and subfamilies previously established, despite pervasive incongruence among nuclear gene trees. The early diversification of the PACMAD clade could represent a hard polytomy. Gene tree-species tree reconciliation suggests that reticulation events occurred repeatedly. Nuclear-plastome incongruence is rare, with very few cases of supported conflict. We provide a robust framework for the grass tree of life to support research on grass evolution, including modes of reticulation, and genetic diversity for sustainable agriculture.
Collapse
|
13
|
DeCandia AL, Lu J, Hamblen EE, Brenner LJ, King JL, Gagorik CN, Schamel JT, Baker SS, Ferrara FJ, Booker M, Bridges A, Carrasco C, vonHoldt BM, Koepfli KP, Maldonado JE. Phylosymbiosis and Elevated Cancer Risk in Genetically Depauperate Channel Island Foxes. Mol Ecol 2025; 34:e17610. [PMID: 39655703 DOI: 10.1111/mec.17610] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2024] [Revised: 11/08/2024] [Accepted: 11/26/2024] [Indexed: 01/07/2025]
Abstract
Examination of the host-associated microbiome in wildlife can provide critical insights into the eco-evolutionary factors driving species diversification and response to disease. This is particularly relevant for isolated populations lacking genomic variation, a phenomenon that is increasingly common as human activities create habitat 'islands' for wildlife. Here, we characterised the gut and otic microbial communities of one such species: Channel Island foxes (Urocyon littoralis). The gut microbiome provided evidence of phylosymbiosis by reflecting the host phylogeny, geographic proximity, history of island colonisation and contemporary ecological differences, whereas the otic microbiome primarily reflected geography and disease. Santa Catalina Island foxes are uniquely predisposed to ceruminous gland tumours following infection with Otodectes cynotis ear mites, while San Clemente and San Nicolas Island foxes exhibit ear mite infections without evidence of tumours. Comparative analyses of otic microbiomes revealed that mite-infected Santa Catalina and San Clemente Island foxes exhibited reduced bacterial diversity, skewed abundance towards the opportunistic pathogen Staphylococcus pseudintermedius and disrupted microbial community networks. However, Santa Catalina Island foxes uniquely harboured Fusobacterium and Prevotella bacteria as potential keystone taxa. These bacteria have previously been associated with colorectal cancer and may predispose Santa Catalina Island foxes to an elevated cancer risk. In contrast, mite-infected San Nicolas Island foxes maintained high bacterial diversity and robust microbial community networks, suggesting that they harbour more resilient microbiomes. Considered together, our results highlight the diverse eco-evolutionary factors influencing commensal microbial communities and their hosts and underscore how the microbiome can contribute to disease outcomes.
Collapse
Affiliation(s)
- Alexandra L DeCandia
- Department of Biology, Georgetown University, Washington, DC, USA
- Center for Conservation Genomics, Smithsonian's National Zoo & Conservation Biology Institute, Washington, DC, USA
| | - Jasmine Lu
- Ecology and Evolutionary Biology, Princeton University, Princeton, New Jersey, USA
| | | | | | - Julie L King
- Catalina Island Conservancy, Avalon, California, USA
- Santa Clara Valley Habitat Agency, Morgan Hill, California, USA
| | - Calypso N Gagorik
- Department of Biology, Northern Arizona University, Flagstaff, Arizona, USA
| | | | | | - Francesca J Ferrara
- Environmental Division - Environmental Planning and Conservation Branch, Naval Base Ventura County, Point Mugu, California, USA
| | - Melissa Booker
- Environmental Division, Naval Base Coronado, San Diego, California, USA
| | - Andrew Bridges
- Institute for Wildlife Studies, San Diego, California, USA
| | - Cesar Carrasco
- Center for Conservation Genomics, Smithsonian's National Zoo & Conservation Biology Institute, Washington, DC, USA
| | - Bridgett M vonHoldt
- Ecology and Evolutionary Biology, Princeton University, Princeton, New Jersey, USA
| | - Klaus-Peter Koepfli
- Smithsonian-Mason School of Conservation, George Mason University, Front Royal, Virginia, USA
| | - Jesús E Maldonado
- Center for Conservation Genomics, Smithsonian's National Zoo & Conservation Biology Institute, Washington, DC, USA
| |
Collapse
|
14
|
Sukumaran J, Meila M. Piikun: an information theoretic toolkit for analysis and visualization of species delimitation metric space. BMC Bioinformatics 2024; 25:385. [PMID: 39695946 DOI: 10.1186/s12859-024-05997-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2024] [Accepted: 11/21/2024] [Indexed: 12/20/2024] Open
Abstract
BACKGROUND Existing software for comparison of species delimitation models do not provide a (true) metric or distance functions between species delimitation models, nor a way to compare these models in terms of relative clustering differences along a lattice of partitions. RESULTS Piikun is a Python package for analyzing and visualizing species delimitation models in an information theoretic framework that, in addition to classic measures of information such as the entropy and mutual information [1], provides for the calculation of the Variation of Information (VI) criterion [2], a true metric or distance function for species delimitation models that is aligned with the lattice of partitions. CONCLUSIONS Piikun is available under the MIT license from its public repository ( https://github.com/jeetsukumaran/piikun ), and can be installed locally using the Python package manager 'pip'.
Collapse
Affiliation(s)
- Jeet Sukumaran
- Biology, San Diego State University, San Diego, CA, USA.
| | - Marina Meila
- Statistics, University of Washington, Seattle, 10587, WA, USA
| |
Collapse
|
15
|
Neu AT, Torchin ME, Allen EE, Roy K. Microbiome divergence of marine gastropod species separated by the Isthmus of Panama. Appl Environ Microbiol 2024; 90:e0100324. [PMID: 39480095 PMCID: PMC11614449 DOI: 10.1128/aem.01003-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Accepted: 07/22/2024] [Indexed: 11/02/2024] Open
Abstract
The rise of the Isthmus of Panama separated the populations of many marine organisms, which then diverged into new geminate sister species currently living in the Eastern Pacific Ocean and the Caribbean Sea. However, we know very little about how such evolutionary divergences of host species have shaped the compositions of their microbiomes. Here, we compared the microbiomes of whole-body and shell-surface samples of geminate species of marine gastropods in the genera Cerithium and Cerithideopsis to those of congeneric outgroups. Our results suggest that the effects of ~3 million years of separation and isolation on microbiome composition varied among host genera and between sample types within the same hosts. In the whole-body samples, microbiome compositions of geminate species pairs tended to be similar, likely due to host filtering, although the strength of this relationship varied among the two groups and across similarity metrics. Shell-surface microbiomes show contrasting patterns, with co-divergence between the host taxa and a small number of microbial clades evident in Cerithideopsis but not Cerithium. These results suggest that (i) isolation of host populations after the rise of the Isthmus of Panama affected microbiomes of geminate hosts in a complex and host-specific manner, and (ii) host-associated microbial taxa respond differently to vicariance events than the hosts themselves.IMPORTANCEWhile considerable work has been done on evolutionary divergences of marine species in response to the rise of the Isthmus of Panama, which separated two previously connected oceans, how this event shaped the microbiomes of these marine hosts remains poorly known. Using whole-body and shell-surface microbiomes of closely related gastropod species from opposite sides of the Isthmus, we show that divergences of microbial taxa after the formation of the Isthmus are often not concordant with those of their gastropod hosts. Our results show that evolutionary responses of marine gastropod-associated microbiomes to major environmental perturbations are complex and are shaped more by local environments than host evolutionary history.
Collapse
Affiliation(s)
- Alexander T. Neu
- Department of Ecology,
Behavior and Evolution, School of Biological Sciences, University of
California San Diego, La
Jolla, California, USA
- Smithsonian Tropical
Research Institute, Ancon,
Balboa, Panama
| | - Mark E. Torchin
- Smithsonian Tropical
Research Institute, Ancon,
Balboa, Panama
| | - Eric E. Allen
- Department of
Molecular Biology, School of Biological Sciences, University of
California San Diego, La
Jolla, California, USA
- Marine Biology
Research Division, Scripps Institution of Oceanography, University of
California San Diego, La
Jolla, California, USA
| | - Kaustuv Roy
- Department of Ecology,
Behavior and Evolution, School of Biological Sciences, University of
California San Diego, La
Jolla, California, USA
| |
Collapse
|
16
|
Beránková T, Arora J, Romero Arias J, Buček A, Tokuda G, Šobotník J, Hellemans S, Bourguignon T. Termites and subsocial roaches inherited many bacterial-borne carbohydrate-active enzymes (CAZymes) from their common ancestor. Commun Biol 2024; 7:1449. [PMID: 39506101 PMCID: PMC11541852 DOI: 10.1038/s42003-024-07146-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Accepted: 10/24/2024] [Indexed: 11/08/2024] Open
Abstract
Termites digest wood using Carbohydrate-Active Enzymes (CAZymes) produced by gut bacteria with whom they have cospeciated at geological timescales. Whether CAZymes were encoded in the genomes of their ancestor's gut bacteria and transmitted to modern termites or acquired more recently from bacteria not associated with termites is unclear. We used gut metagenomes from 195 termites and one Cryptocercus, the sister group of termites, to investigate the evolution of termite gut bacterial CAZymes. We found 420 termite-specific clusters in 81 bacterial CAZyme gene trees, including 404 clusters showing strong cophylogenetic patterns with termites. Of the 420 clusters, 131 included at least one bacterial CAZyme sequence associated with Cryptocercus or Mastotermes, the sister group of all other termites. Our results suggest many bacterial CAZymes have been encoded in the genomes of termite gut bacteria since termite origin, indicating termites rely upon many bacterial CAZymes endemic to their guts to digest wood.
Collapse
Affiliation(s)
- Tereza Beránková
- Okinawa Institute of Science & Technology Graduate University, 1919-1 Tancha, Onna-son, Okinawa, 904-0495, Japan
- Faculty of Tropical AgriSciences, Czech University of Life Sciences, Prague, Czech Republic
| | - Jigyasa Arora
- Okinawa Institute of Science & Technology Graduate University, 1919-1 Tancha, Onna-son, Okinawa, 904-0495, Japan
- Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA, 94720, USA
| | - Johanna Romero Arias
- Faculty of Tropical AgriSciences, Czech University of Life Sciences, Prague, Czech Republic
| | - Aleš Buček
- Okinawa Institute of Science & Technology Graduate University, 1919-1 Tancha, Onna-son, Okinawa, 904-0495, Japan
- Biology Centre of the Czech Academy of Sciences, Institute of Entomology, České Budějovice, Czech Republic
| | - Gaku Tokuda
- Tropical Biosphere Research Center, University of the Ryukyus, Okinawa, Japan
| | - Jan Šobotník
- Faculty of Tropical AgriSciences, Czech University of Life Sciences, Prague, Czech Republic
- Biology Centre of the Czech Academy of Sciences, Institute of Entomology, České Budějovice, Czech Republic
| | - Simon Hellemans
- Okinawa Institute of Science & Technology Graduate University, 1919-1 Tancha, Onna-son, Okinawa, 904-0495, Japan
| | - Thomas Bourguignon
- Okinawa Institute of Science & Technology Graduate University, 1919-1 Tancha, Onna-son, Okinawa, 904-0495, Japan.
- Faculty of Tropical AgriSciences, Czech University of Life Sciences, Prague, Czech Republic.
| |
Collapse
|
17
|
McArthur RN, Zehmakan AN, Charleston MA, Lin Y, Huttley G. Spectral cluster supertree: fast and statistically robust merging of rooted phylogenetic trees. Front Mol Biosci 2024; 11:1432495. [PMID: 39544404 PMCID: PMC11561713 DOI: 10.3389/fmolb.2024.1432495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Accepted: 09/24/2024] [Indexed: 11/17/2024] Open
Abstract
The algorithms for phylogenetic reconstruction are central to computational molecular evolution. The relentless pace of data acquisition has exposed their poor scalability and the conclusion that the conventional application of these methods is impractical and not justifiable from an energy usage perspective. Furthermore, the drive to improve the statistical performance of phylogenetic methods produces increasingly parameter-rich models of sequence evolution, which worsens the computational performance. Established theoretical and algorithmic results identify supertree methods as critical to divide-and-conquer strategies for improving scalability of phylogenetic reconstruction. Of particular importance is the ability to explicitly accommodate rooted topologies. These can arise from the more biologically plausible non-stationary models of sequence evolution. We make a contribution to addressing this challenge with Spectral Cluster Supertree, a novel supertree method for merging a set of overlapping rooted phylogenetic trees. It offers significant improvements over Min-Cut supertree and previous state-of-the-art methods in terms of both time complexity and overall topological accuracy, particularly for problems of large size. We perform comparisons against Min-Cut supertree and Bad Clade Deletion. Leveraging two tree topology distance metrics, we demonstrate that while Bad Clade Deletion generates more correct clades in its resulting supertree, Spectral Cluster Supertree's generated tree is generally more topologically close to the true model tree. Over large datasets containing 10,000 taxa and ∼ 500 source trees, where Bad Clade Deletion usually takes ∼ 2 h to run, our method generates a supertree in on average 20 s. Spectral Cluster Supertree is released under an open source license and is available on the python package index as sc-supertree.
Collapse
Affiliation(s)
- Robert N. McArthur
- Research School of Biology, The Australian National University, Canberra, ACT, Australia
| | - Ahad N. Zehmakan
- School of Computing, The Australian National University, Canberra, ACT, Australia
| | | | - Yu Lin
- School of Computing, The Australian National University, Canberra, ACT, Australia
| | - Gavin Huttley
- Research School of Biology, The Australian National University, Canberra, ACT, Australia
| |
Collapse
|
18
|
Derelle R, von Wachsmann J, Mäklin T, Hellewell J, Russell T, Lalvani A, Chindelevitch L, Croucher NJ, Harris SR, Lees JA. Seamless, rapid, and accurate analyses of outbreak genomic data using split k-mer analysis. Genome Res 2024; 34:1661-1673. [PMID: 39406504 PMCID: PMC11529842 DOI: 10.1101/gr.279449.124] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Accepted: 09/16/2024] [Indexed: 11/01/2024]
Abstract
Sequence variation observed in populations of pathogens can be used for important public health and evolutionary genomic analyses, especially outbreak analysis and transmission reconstruction. Identifying this variation is typically achieved by aligning sequence reads to a reference genome, but this approach is susceptible to reference biases and requires careful filtering of called genotypes. There is a need for tools that can process this growing volume of bacterial genome data, providing rapid results, but that remain simple so they can be used without highly trained bioinformaticians, expensive data analysis, and long-term storage and processing of large files. Here we describe split k-mer analysis (SKA2), a method that supports both reference-free and reference-based mapping to quickly and accurately genotype populations of bacteria using sequencing reads or genome assemblies. SKA2 is highly accurate for closely related samples, and in outbreak simulations, we show superior variant recall compared with reference-based methods, with no false positives. SKA2 can also accurately map variants to a reference and be used with recombination detection methods to rapidly reconstruct vertical evolutionary history. SKA2 is many times faster than comparable methods and can be used to add new genomes to an existing call set, allowing sequential use without the need to reanalyze entire collections. With an inherent absence of reference bias, high accuracy, and a robust implementation, SKA2 has the potential to become the tool of choice for genotyping bacteria. SKA2 is implemented in Rust and is freely available as open-source software.
Collapse
Affiliation(s)
- Romain Derelle
- NIHR Health Protection Research Unit in Respiratory Infections, National Heart and Lung Institute, Imperial College London, London W21PG, United Kingdom
| | - Johanna von Wachsmann
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom
| | - Tommi Mäklin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom
- Department of Mathematics and Statistics, University of Helsinki, Helsinki 00014, Finland
| | - Joel Hellewell
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom
| | - Timothy Russell
- Centre for Mathematical Modelling of Infectious Diseases, London School of Hygiene & Tropical Medicine, London WC1E 7HT, United Kingdom
| | - Ajit Lalvani
- NIHR Health Protection Research Unit in Respiratory Infections, National Heart and Lung Institute, Imperial College London, London W21PG, United Kingdom
| | - Leonid Chindelevitch
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London W12 0BZ, United Kingdom
| | - Nicholas J Croucher
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London W12 0BZ, United Kingdom
| | - Simon R Harris
- Bill and Melinda Gates Foundation, Westminster, London SW1E 6AJ, United Kingdom
| | - John A Lees
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom;
| |
Collapse
|
19
|
Holvast EJ, Celik MA, Phillips MJ, Wilson LAB. Do morphometric data improve phylogenetic reconstruction? A systematic review and assessment. BMC Ecol Evol 2024; 24:127. [PMID: 39425066 PMCID: PMC11487705 DOI: 10.1186/s12862-024-02313-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Accepted: 10/02/2024] [Indexed: 10/21/2024] Open
Abstract
BACKGROUND Isolating phylogenetic signal from morphological data is crucial for accurately merging fossils into the tree of life and for calibrating molecular dating. However, subjective character definition is a major limitation which can introduce biases that mislead phylogenetic inferences and divergence time estimation. The use of quantitative data, e.g., geometric morphometric (GMM; shape) data can allow for more objective integration of morphological data into phylogenetic inference. This systematic review describes the current state of the field in using continuous morphometric data (e.g., GMM data) for phylogenetic reconstruction and assesses the efficacy of these data compared to discrete characters using the PRISMA-EcoEvo v1.0. reporting guideline, and offers some pathways for approaching this task with GMM data. A comprehensive search string yielded 11,123 phylogenetic studies published in English up to Oct 2023 in the Web of Science database. Title and abstract screening removed 10,975 articles, and full-text screening was performed for 132 articles. Of these, a total of twelve articles met final inclusion criteria and were used for downstream analyses. RESULTS Phylogenetic performance was compared between approaches that employed continuous morphometric and discrete morphological data. Overall, the reconstructed phylogenies did not show increased resolution or accuracy (i.e., benchmarked against molecular phylogenies) as continuous data alone or combined with discrete morphological datasets. CONCLUSIONS An exhaustive search of the literature for existing empirical continuous data resulted in a total of twelve articles for final inclusion following title/abstract, and full-text screening. Our study was performed under a rigorous framework for systematic reviews, which showed that the lack of available comparisons between discrete and continuous data hinders our understanding of the performance of continuous data. Our study demonstrates the problem surrounding the efficacy of continuous data as remaining relatively intractable despite an exhaustive search, due in part to the difficulty in obtaining relevant comparisons from the literature. Thus, we implore researchers to address this issue with studies that collect discrete and continuous data sets with directly comparable properties (i.e., describing shape, or size).
Collapse
Affiliation(s)
- Emma J Holvast
- School of Archaeology and Anthropology, The Australian National University, Canberra, Australia.
| | - Mélina A Celik
- School of Biology and Environmental Science, Queensland University of Technology, Brisbane, QLD, Australia
| | - Matthew J Phillips
- School of Biology and Environmental Science, Queensland University of Technology, Brisbane, QLD, Australia
| | - Laura A B Wilson
- School of Archaeology and Anthropology, The Australian National University, Canberra, Australia
- School of Biological, Earth and Environmental Sciences, University of New South Wales, Kensington, NSW, 2052, Australia
- ARC Training Centre for Multiscale 3D Imaging, Modelling and Manufacturing, Research School of Physics, The Australian National University, Acton, ACT, 2601, Australia
| |
Collapse
|
20
|
Smith MR, Long EJ, Dhungana A, Dobson KJ, Yang J, Zhang X. Organ systems of a Cambrian euarthropod larva. Nature 2024; 633:120-126. [PMID: 39085610 PMCID: PMC11374701 DOI: 10.1038/s41586-024-07756-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Accepted: 06/26/2024] [Indexed: 08/02/2024]
Abstract
The Cambrian radiation of euarthropods can be attributed to an adaptable body plan. Sophisticated brains and specialized feeding appendages, which are elaborations of serially repeated organ systems and jointed appendages, underpin the dominance of Euarthropoda in a broad suite of ecological settings. The origin of the euarthropod body plan from a grade of vermiform taxa with hydrostatic lobopodous appendages ('lobopodian worms')1,2 is founded on data from Burgess Shale-type fossils. However, the compaction associated with such preservation obscures internal anatomy3-6. Phosphatized microfossils provide a complementary three-dimensional perspective on early crown group euarthropods7, but few lobopodians8,9. Here we describe the internal and external anatomy of a three-dimensionally preserved euarthropod larva with lobopods, midgut glands and a sophisticated head. The architecture of the nervous system informs the early configuration of the euarthropod brain and its associated appendages and sensory organs, clarifying homologies across Panarthropoda. The deep evolutionary position of Youti yuanshi gen. et sp. nov. informs the sequence of character acquisition during arthropod evolution, demonstrating a deep origin of sophisticated haemolymph circulatory systems, and illuminating the internal anatomical changes that propelled the rise and diversification of this enduringly successful group.
Collapse
Affiliation(s)
- Martin R Smith
- Department of Earth Sciences, Durham University, Durham, UK.
| | - Emma J Long
- Department of Earth Sciences, Durham University, Durham, UK
- Science Group, Natural History Museum, London, UK
- Centre for Ecology and Conservation, University of Exeter, Cornwall, UK
| | | | - Katherine J Dobson
- Department of Earth Sciences, Durham University, Durham, UK
- Department of Civil and Environmental Engineering, University of Strathclyde, Glasgow, UK
- Department of Chemical and Process Engineering, University of Strathclyde, Glasgow, UK
| | - Jie Yang
- Institute of Palaeontology, Yunnan University, Chenggong, Kunming, China
| | - Xiguang Zhang
- Institute of Palaeontology, Yunnan University, Chenggong, Kunming, China
| |
Collapse
|
21
|
de Vos JM, Streiff SJR, Bachelier JB, Epitawalage N, Maurin O, Forest F, Baker WJ. Phylogenomics of the pantropical Connaraceae: revised infrafamilial classification and the evolution of heterostyly. PLANT SYSTEMATICS AND EVOLUTION = ENTWICKLUNGSGESCHICHTE UND SYSTEMATIK DER PFLANZEN 2024; 310:29. [PMID: 39105137 PMCID: PMC11297820 DOI: 10.1007/s00606-024-01909-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Accepted: 05/28/2024] [Indexed: 08/07/2024]
Abstract
Connaraceae is a pantropical family of about 200 species containing lianas and small trees with remarkably diverse floral polymorphisms, including distyly, tristyly, homostyly, and dioecy. To date, relationships within the family have not been investigated using a targeted molecular phylogenetic treatment, severely limiting systematic understanding and reconstruction of trait evolution. Accordingly, their last infrafamilial classification was based only on morphological data. Here, we used phylogenomic data obtained using the Angiosperms353 nuclear target sequence capture probes, sampling all tribes and almost all genera, entirely from herbarium specimens, to revise infrafamilial classification and investigate the evolution of heterostyly. The backbone of the resulting molecular phylogenetic tree is almost entirely resolved. Connaraceae consists of two clades, one containing only the African genus Manotes (4 or 5 species), which we newly recognize at the subfamily level. Vegetative and reproductive synapomorphies are proposed for Manotoideae. Within Connaroideae, Connareae is expanded to include the former Jollydoreae. The backbone of Cnestideae, which contains more than half of the Connaraceae species, remains incompletely resolved. Reconstructions of reproductive system evolution are presented that tentatively support tristyly as the ancestral state for the family, with multiple parallel losses, in agreement with previous hypotheses, plus possible re-gains. However, the great diversity of stylar polymorphisms and their phylogenetic lability preclude a definitive answer. Overall, this study reinforces the usefulness of herbarium phylogenomics, and unlocks the reproductive diversity of Connaraceae as a model system for the evolution of complex biological phenomena. Supplementary Information The online version contains supplementary material available at 10.1007/s00606-024-01909-y.
Collapse
Affiliation(s)
- Jurriaan M. de Vos
- Department of Environmental Sciences - Botany, University of Basel, Schönbeinstrasse 6, 4056 Basel, Switzerland
| | - Serafin J. R. Streiff
- Department of Environmental Sciences - Botany, University of Basel, Schönbeinstrasse 6, 4056 Basel, Switzerland
- UMR DIADE, Université de Montpellier, IRD, CIRAD, 911 Avenue Agropolis, 34090 Montpellier, France
| | - Julien B. Bachelier
- Institüt für Biologie/Dahlem Centre of Plant Sciences, Freie Universität Berlin, Altensteinstrasse 6, 14195 Berlin, Germany
| | - Niroshini Epitawalage
- Royal Botanic Gardens, Kew, Richmond, Surrey, TW9 3AE UK
- The New York Botanical Garden, 2900 Southern Blvd, Bronx, NY 10458 USA
| | - Olivier Maurin
- Royal Botanic Gardens, Kew, Richmond, Surrey, TW9 3AE UK
| | - Félix Forest
- Royal Botanic Gardens, Kew, Richmond, Surrey, TW9 3AE UK
| | - William J. Baker
- Royal Botanic Gardens, Kew, Richmond, Surrey, TW9 3AE UK
- Department of Biology, Aarhus University, Ny Munkegade 116, 8000 Aarhus, Denmark
| |
Collapse
|
22
|
Kharma N, Bédard-Couture R. Robustness and evolvability: Revisited, redefined and applied. Biosystems 2024; 246:105281. [PMID: 39098381 DOI: 10.1016/j.biosystems.2024.105281] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2024] [Revised: 07/27/2024] [Accepted: 07/31/2024] [Indexed: 08/06/2024]
Abstract
Building on and extending existing definitions of robustness and evolvability, we propose and utilize new formal definitions, with matching measures, of robustness and evolvability of systems with genotypes and corresponding phenotypes. We explain and show how these measures are more general and more representative of the concepts they stand for, than the commonly used/referenced measures originally proposed by Wagner. Further, a versatile digital modeling approach (BNK) is proposed that is inspired by NK systems. However, unlike NK systems, BNK incorporates a genotype and a phenotype, in addition to fitness. We develop and apply an Evolutionary Algorithm to a BNK-modeled system to find different types of perfect oscillators. We then map the resulting oscillating systems to possible genetic circuit realizations. Continuing with the synthetic biology theme, we also investigate the effect of noise in DNA synthesis on the predicted functionality of a DNA-based biosensor (i.e., its robustness), and we carry out a theoretical assessment of the evolvability of different types of ribozymes, undergoing directed evolution.
Collapse
Affiliation(s)
- Nawwaf Kharma
- Electrical and Computer Engineering Department, Concordia University, 1455 Blvd. De Maisonneuve Ouest, Montreal, H3G 1M8, Quebec, Canada
| | - Rémi Bédard-Couture
- Département de génie logiciel et des technologies de l'information, École de Technologie Supérieure, 1100 Notre-Dame St W, Montreal, H3C 1K3, Quebec, Canada.
| |
Collapse
|
23
|
Berling L, Collienne L, Gavryushkin A. Estimating the mean in the space of ranked phylogenetic trees. Bioinformatics 2024; 40:btae514. [PMID: 39177090 PMCID: PMC11364146 DOI: 10.1093/bioinformatics/btae514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 05/16/2024] [Accepted: 08/21/2024] [Indexed: 08/24/2024] Open
Abstract
MOTIVATION Reconstructing evolutionary histories of biological entities, such as genes, cells, organisms, populations, and species, from phenotypic and molecular sequencing data is central to many biological, palaeontological, and biomedical disciplines. Typically, due to uncertainties and incompleteness in data, the true evolutionary history (phylogeny) is challenging to estimate. Statistical modelling approaches address this problem by introducing and studying probability distributions over all possible evolutionary histories, but can also introduce uncertainties due to misspecification. In practice, computational methods are deployed to learn those distributions typically by sampling them. This approach, however, is fundamentally challenging as it requires designing and implementing various statistical methods over a space of phylogenetic trees (or treespace). Although the problem of developing statistics over a treespace has received substantial attention in the literature and numerous breakthroughs have been made, it remains largely unsolved. The challenge of solving this problem is 2-fold: a treespace has nontrivial often counter-intuitive geometry implying that much of classical Euclidean statistics does not immediately apply; many parametrizations of treespace with promising statistical properties are computationally hard, so they cannot be used in data analyses. As a result, there is no single conventional method for estimating even the most fundamental statistics over any treespace, such as mean and variance, and various heuristics are used in practice. Despite the existence of numerous tree summary methods to approximate means of probability distributions over a treespace based on its geometry, and the theoretical promise of this idea, none of the attempts resulted in a practical method for summarizing tree samples. RESULTS In this paper, we present a tree summary method along with useful properties of our chosen treespace while focusing on its impact on phylogenetic analyses of real datasets. We perform an extensive benchmark study and demonstrate that our method outperforms currently most popular methods with respect to a number of important 'quality' statistics. Further, we apply our method to three empirical datasets ranging from cancer evolution to linguistics and find novel insights into corresponding evolutionary problems in all of them. We hence conclude that this treespace is a promising candidate to serve as a foundation for developing statistics over phylogenetic trees analytically, as well as new computational tools for evolutionary data analyses. AVAILABILITY AND IMPLEMENTATION An implementation is available at https://github.com/bioDS/Centroid-Code.
Collapse
Affiliation(s)
- Lars Berling
- Biological Data Science Lab, School of Mathematics and Statistics, University of Canterbury, Christchurch 8041, New Zealand
| | - Lena Collienne
- Biological Data Science Lab, School of Mathematics and Statistics, University of Canterbury, Christchurch 8041, New Zealand
| | - Alex Gavryushkin
- Biological Data Science Lab, School of Mathematics and Statistics, University of Canterbury, Christchurch 8041, New Zealand
| |
Collapse
|
24
|
Lemos-Costa P, Miller ZR, Allesina S. Phylogeny structures species' interactions in experimental ecological communities. Ecol Lett 2024; 27:e14490. [PMID: 39152685 DOI: 10.1111/ele.14490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Revised: 06/24/2024] [Accepted: 07/11/2024] [Indexed: 08/19/2024]
Abstract
Species' traits and interactions are products of evolutionary history. Despite the long-standing hypothesis that closely related species possess similar traits, and thus experience stronger competition, measuring the effect of evolutionary history on the ecology of natural communities remains challenging. We propose a novel framework to test whether phylogeny influences patterns of coexistence and abundance of species assemblages. In our approach, phylogenetic trees are used to parameterize species' interactions, which in turn determine the abundance of species in a given assemblage. We use likelihoods to score models parameterized with a given phylogeny, and contrast them with models built using random trees, allowing us to test whether phylogenetic information helps to predict species' abundances. Our statistical framework reveals that interactions are indeed structured by phylogeny in a large set of experimental plant communities. Our results confirm that evolutionary history can help predict, and potentially manage or conserve, the structure and function of complex ecological communities.
Collapse
Affiliation(s)
- Paula Lemos-Costa
- Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, USA
| | - Zachary R Miller
- Department of Earth and Planetary Sciences, Yale University, New Haven, Connecticut, USA
| | - Stefano Allesina
- Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, USA
- Northwestern Institute on Complex Systems, Northwestern University, Evanston, Illinois, USA
| |
Collapse
|
25
|
Xiang Z, Liu Z, Dinh KN. Inference of chromosome selection parameters and missegregation rate in cancer from DNA-sequencing data. Sci Rep 2024; 14:17699. [PMID: 39085295 PMCID: PMC11291923 DOI: 10.1038/s41598-024-67842-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Accepted: 07/16/2024] [Indexed: 08/02/2024] Open
Abstract
Aneuploidy is frequently observed in cancers and has been linked to poor patient outcome. Analysis of aneuploidy in DNA-sequencing (DNA-seq) data necessitates untangling the effects of the Copy Number Aberration (CNA) occurrence rates and the selection coefficients that act upon the resulting karyotypes. We introduce a parameter inference algorithm that takes advantage of both bulk and single-cell DNA-seq cohorts. The method is based on Approximate Bayesian Computation (ABC) and utilizes CINner, our recently introduced simulation algorithm of chromosomal instability in cancer. We examine three groups of statistics to summarize the data in the ABC routine: (A) Copy Number-based measures, (B) phylogeny tip statistics, and (C) phylogeny balance indices. Using these statistics, our method can recover both the CNA probabilities and selection parameters from ground truth data, and performs well even for data cohorts of relatively small sizes. We find that only statistics in groups A and C are well-suited for identifying CNA probabilities, and only group A carries the signals for estimating selection parameters. Moreover, the low number of CNA events at large scale compared to cell counts in single-cell samples means that statistics in group B cannot be estimated accurately using phylogeny reconstruction algorithms at the chromosome level. As data from both bulk and single-cell DNA-sequencing techniques becomes increasingly available, our inference framework promises to facilitate the analysis of distinct cancer types, differentiation between selection and neutral drift, and prediction of cancer clonal dynamics.
Collapse
Affiliation(s)
- Zijin Xiang
- Irving Institute for Cancer Dynamics and Department of Statistics, Columbia University, New York, NY, USA
| | - Zhihan Liu
- Irving Institute for Cancer Dynamics and Department of Statistics, Columbia University, New York, NY, USA
| | - Khanh N Dinh
- Irving Institute for Cancer Dynamics and Department of Statistics, Columbia University, New York, NY, USA.
| |
Collapse
|
26
|
Rose JP, Kriebel R, Sytsma KJ, Drew BT. Phylogenomic perspectives on speciation and reproductive isolation in a North American biodiversity hotspot: an example using California sages (Salvia subgenus Audibertia: Lamiaceae). ANNALS OF BOTANY 2024; 134:295-310. [PMID: 38733329 PMCID: PMC11232522 DOI: 10.1093/aob/mcae073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Accepted: 05/07/2024] [Indexed: 05/13/2024]
Abstract
BACKGROUND AND AIMS The California Floristic Province (CA-FP) is the most species-rich region of North America north of Mexico. One of several proposed hypotheses explaining the exceptional diversity of the region is that the CA-FP harbours myriad recently diverged lineages with nascent reproductive barriers. Salvia subgenus Audibertia is a conspicuous element of the CA-FP, with multiple sympatric and compatible species. METHODS Using 305 nuclear loci and both organellar genomes, we reconstruct species trees, examine genomic discordance, conduct divergence-time estimation, and analyse contemporaneous patterns of gene flow and mechanical reproductive isolation. KEY RESULTS Despite strong genomic discordance, an underlying bifurcating tree is supported. Organellar genomes capture additional introgression events not detected in the nuclear genome. Most interfertility is found within clades, indicating that reproductive barriers arise with increasing genetic divergence. Species are generally not mechanically isolated, suggesting that it is unlikely to be the primary factor leading to reproductive isolation. CONCLUSIONS Rapid, recent speciation with some interspecific gene flow in conjunction with the onset of a Mediterranean-like climate is the underlying cause of extant diversity in Salvia subgenus Audibertia. Speciation has largely not been facilitated by gene flow. Its signal in the nuclear genome seems to mostly be erased by backcrossing, but organellar genomes each capture different instances of historical gene flow, probably characteristic of many CA-FP lineages. Mechanical reproductive isolation appears to be only part of a mosaic of factors limiting gene flow.
Collapse
Affiliation(s)
- Jeffrey P Rose
- Department of Biology, University of Nebraska at Kearney, Kearney, NE 68849, USA
- Department of Botany, University of Wisconsin-Madison, 430 Lincoln Drive, Madison, WI 53706, USA
| | - Ricardo Kriebel
- Department of Botany, University of Wisconsin-Madison, 430 Lincoln Drive, Madison, WI 53706, USA
- California Academy of Sciences, San Francisco, CA 94118, USA
| | - Kenneth J Sytsma
- Department of Botany, University of Wisconsin-Madison, 430 Lincoln Drive, Madison, WI 53706, USA
| | - Bryan T Drew
- Department of Biology, University of Nebraska at Kearney, Kearney, NE 68849, USA
| |
Collapse
|
27
|
Naranjo JG, Sither CB, Conant GC. Shared single copy genes are generally reliable for inferring phylogenetic relationships among polyploid taxa. Mol Phylogenet Evol 2024; 196:108087. [PMID: 38677353 DOI: 10.1016/j.ympev.2024.108087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 03/22/2024] [Accepted: 04/24/2024] [Indexed: 04/29/2024]
Abstract
Polyploidy, or whole-genome duplication, is expected to confound the inference of species trees with phylogenetic methods for two reasons. First, the presence of retained duplicated genes requires the reconciliation of the inferred gene trees to a proposed species tree. Second, even if the analyses are restricted to shared single copy genes, the occurrence of reciprocal gene loss, where the surviving genes in different species are paralogs from the polyploidy rather than orthologs, will mean that such genes will not have evolved under the corresponding species tree and may not produce gene trees that allow inference of that species tree. Here we analyze three different ancient polyploidy events, using synteny-based inferences of orthology and paralogy to infer gene trees from nearly 17,000 sets of homologous genes. We find that the simple use of single copy genes from polyploid organisms provides reasonably robust phylogenetic signals, despite the presence of reciprocal gene losses. Such gene trees are also most often in accord with the inferred species relationships inferred from maximum likelihood models of gene loss after polyploidy: a completely distinct phylogenetic signal present in these genomes. As seen in other studies, however, we find that methods for inferring phylogenetic confidence yield high support values even in cases where the underlying data suggest meaningful conflict in the phylogenetic signals.
Collapse
Affiliation(s)
- Jaells G Naranjo
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC, USA
| | - Charles B Sither
- Department of Entomology and Plant Pathology, North Carolina State University, Raleigh, NC, USA
| | - Gavin C Conant
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC, USA; Genetics and Genomics Academy, North Carolina State University, Raleigh, NC, USA; Department of Biological Sciences, North Carolina State University, Raleigh, NC, USA.
| |
Collapse
|
28
|
Ezcurra MD. Exploring the effects of weighting against homoplasy in genealogies of palaeontological phylogenetic matrices. Cladistics 2024; 40:242-281. [PMID: 38728134 DOI: 10.1111/cla.12581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 04/15/2024] [Accepted: 04/16/2024] [Indexed: 05/12/2024] Open
Abstract
Although simulations have shown that implied weighting (IW) outperforms equal weighting (EW) in phylogenetic parsimony analyses, weighting against homoplasy lacks extensive usage in palaeontology. Iterative modifications of several phylogenetic matrices in the last decades resulted in extensive genealogies of datasets that allow the evaluation of differences in the stability of results for alternative character weighting methods directly on empirical data. Each generation was compared against the most recent generation in each genealogy because it is assumed that it is the most comprehensive (higher sampling), revised (fewer misscorings) and complete (lower amount of missing data) matrix of the genealogy. The analyses were conducted on six different genealogies under EW and IW and extended implied weighting (EIW) with a range of concavity constant values (k) between 3 and 30. Pairwise comparisons between trees were conducted using Robinson-Foulds distances normalized by the total number of groups, distortion coefficient, subtree pruning and regrafting moves, and the proportional sum of group dissimilarities. The results consistently show that IW and EIW produce results more similar to those of the last dataset than EW in the vast majority of genealogies and for all comparative measures. This is significant because almost all of these matrices were originally analysed only under EW. Implied weighting and EIW do not outperform each other unambiguously. Euclidean distances based on a principal components analysis of the comparative measures show that different ranges of k-values retrieve the most similar results to the last generation in different genealogies. There is a significant positive linear correlation between the optimal k-values and the number of terminals of the last generations. This could be employed to inform about the range of k-values to be used in phylogenetic analyses based on matrix size but with the caveat that this emergent relationship still relies on a low sample size of genealogies.
Collapse
Affiliation(s)
- Martín D Ezcurra
- Sección Paleontología de Vertebrados, CONICET-Museo Argentino de Ciencias Naturales, Ángel Gallardo 470, C1405DJR, Ciudad Autónoma de Buenos Aires, Argentina
- School of Geography, Earth and Environmental Sciences, University of Birmingham, Edgbaston, B15 2TT, Birmingham, UK
| |
Collapse
|
29
|
Goloboff PA, De Laet J. Farewell to the requirement for character independence: phylogenetic methods to incorporate different types of dependence between characters. Cladistics 2024; 40:209-241. [PMID: 38014464 DOI: 10.1111/cla.12564] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 10/15/2023] [Accepted: 10/18/2023] [Indexed: 11/29/2023] Open
Abstract
This paper discusses methods to take into account interactions between characters, in the context of parsimony analysis. These interactions can be in the form of some characters becoming inapplicable given certain states of other, primary characters; in the form of only certain states being allowed in some characters when a given state or set of states occurs for other characters; or in the form of transformation costs in some character being higher or lower when other characters have certain states or transformations between states. Character-state reconstructions and evaluation of trees under the assumption of independence may easily lead to ancestral assignments that violate elementary rules of biomechanics, well-established theories relating form and function or ideas about character co-variation. An obvious example is reconstructing an ancestral bird as wingless and flying at the same time; another is reconstructing a protein-coding gene as having a stop codon in some ancestors. If the characters are optimized independently, such chimeric ancestral reconstructions can occur even when no terminal displays the impossible combination of states. A set of conventions (implemented via new TNT commands and options) allows the definition of complex rules of interaction. By recoding groups of characters with proper step-matrix costs (and excluding impossible combinations from the set of permissible states), it is possible to find the ancestral reconstructions that maximize homology (and thus the degree to which similarities can be explained by common ancestry), within the constraints imposed by the rules specified by the user. We expect that considerations of biomechanics, functional morphology and natural history will be a source of many theories on possible character dependences, and that the present implementation will encourage users to take the possibility of character dependences into account in their phylogenetic analyses.
Collapse
Affiliation(s)
- Pablo A Goloboff
- Unidad Ejecutora Lillo, UEL (CONICET-Fundación Miguel Lillo), Miguel Lillo 251, 4000, S.M. de Tucumán, Argentina
| | - Jan De Laet
- Meise Botanic Garden, Nieuwelaan 38, Meise, Belgium
| |
Collapse
|
30
|
Rick JA, Brock CD, Lewanski AL, Golcher-Benavides J, Wagner CE. Reference Genome Choice and Filtering Thresholds Jointly Influence Phylogenomic Analyses. Syst Biol 2024; 73:76-101. [PMID: 37881861 DOI: 10.1093/sysbio/syad065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 09/20/2023] [Accepted: 10/20/2023] [Indexed: 10/27/2023] Open
Abstract
Molecular phylogenies are a cornerstone of modern comparative biology and are commonly employed to investigate a range of biological phenomena, such as diversification rates, patterns in trait evolution, biogeography, and community assembly. Recent work has demonstrated that significant biases may be introduced into downstream phylogenetic analyses from processing genomic data; however, it remains unclear whether there are interactions among bioinformatic parameters or biases introduced through the choice of reference genome for sequence alignment and variant calling. We address these knowledge gaps by employing a combination of simulated and empirical data sets to investigate the extent to which the choice of reference genome in upstream bioinformatic processing of genomic data influences phylogenetic inference, as well as the way that reference genome choice interacts with bioinformatic filtering choices and phylogenetic inference method. We demonstrate that more stringent minor allele filters bias inferred trees away from the true species tree topology, and that these biased trees tend to be more imbalanced and have a higher center of gravity than the true trees. We find the greatest topological accuracy when filtering sites for minor allele count (MAC) >3-4 in our 51-taxa data sets, while tree center of gravity was closest to the true value when filtering for sites with MAC >1-2. In contrast, filtering for missing data increased accuracy in the inferred topologies; however, this effect was small in comparison to the effect of minor allele filters and may be undesirable due to a subsequent mutation spectrum distortion. The bias introduced by these filters differs based on the reference genome used in short read alignment, providing further support that choosing a reference genome for alignment is an important bioinformatic decision with implications for downstream analyses. These results demonstrate that attributes of the study system and dataset (and their interaction) add important nuance for how best to assemble and filter short-read genomic data for phylogenetic inference.
Collapse
Affiliation(s)
- Jessica A Rick
- School of Natural Resources & the Environment, University of Arizona, Tucson, AZ 85719, USA
| | - Chad D Brock
- Department of Biological Sciences, Tarleton State University, Stephenville, TX 76401, USA
| | - Alexander L Lewanski
- Department of Integrative Biology and W.K. Kellogg Biological Station, Michigan State University, East Lansing, MI 48824, USA
| | - Jimena Golcher-Benavides
- Department of Natural Resource Ecology and Management, Iowa State University, Ames, IA 50011, USA
| | - Catherine E Wagner
- Program in Ecology and Evolution, University of Wyoming, Laramie, WY 82071, USA
- Department of Botany, University of Wyoming, Laramie, WY 82071, USA
| |
Collapse
|
31
|
Jensen CG, Sumner JA, Kleinstein SH, Hoehn KB. Inferring B Cell Phylogenies from Paired H and L Chain BCR Sequences with Dowser. JOURNAL OF IMMUNOLOGY (BALTIMORE, MD. : 1950) 2024; 212:1579-1588. [PMID: 38557795 PMCID: PMC11073909 DOI: 10.4049/jimmunol.2300851] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Accepted: 03/07/2024] [Indexed: 04/04/2024]
Abstract
Abs are vital to human immune responses and are composed of genetically variable H and L chains. These structures are initially expressed as BCRs. BCR diversity is shaped through somatic hypermutation and selection during immune responses. This evolutionary process produces B cell clones, cells that descend from a common ancestor but differ by mutations. Phylogenetic trees inferred from BCR sequences can reconstruct the history of mutations within a clone. Until recently, BCR sequencing technologies separated H and L chains, but advancements in single-cell sequencing now pair H and L chains from individual cells. However, it is unclear how these separate genes should be combined to infer B cell phylogenies. In this study, we investigated strategies for using paired H and L chain sequences to build phylogenetic trees. We found that incorporating L chains significantly improved tree accuracy and reproducibility across all methods tested. This improvement was greater than the difference between tree-building methods and persisted even when mixing bulk and single-cell sequencing data. However, we also found that many phylogenetic methods estimated significantly biased branch lengths when some L chains were missing, such as when mixing single-cell and bulk BCR data. This bias was eliminated using maximum likelihood methods with separate branch lengths for H and L chain gene partitions. Thus, we recommend using maximum likelihood methods with separate H and L chain partitions, especially when mixing data types. We implemented these methods in the R package Dowser: https://dowser.readthedocs.io.
Collapse
Affiliation(s)
- Cole G. Jensen
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06511, USA
| | - Jacob A. Sumner
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06511, USA
- Integrated Graduate Program in Physical and Engineering Biology, Yale University, New Haven, Connecticut, 06520, USA
| | - Steven H. Kleinstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06511, USA
- Department of Pathology, Yale School of Medicine, New Haven, CT 06520, USA
- Department of Immunobiology, Yale School of Medicine, New Haven, CT 06520, USA
| | - Kenneth B. Hoehn
- Department of Pathology, Yale School of Medicine, New Haven, CT 06520, USA
- Current address: Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA
| |
Collapse
|
32
|
Tremble K, Henkel T, Bradshaw A, Domnauer C, Brown LM, Thám LX, Furci G, Aime MC, Moncalvo JM, Dentinger B. A revised phylogeny of Boletaceae using whole genome sequences. Mycologia 2024; 116:392-408. [PMID: 38551379 DOI: 10.1080/00275514.2024.2314963] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 01/30/2024] [Indexed: 05/01/2024]
Abstract
The porcini mushroom family Boletaceae is a diverse, widespread group of ectomycorrhizal (ECM) mushroom-forming fungi that so far has eluded intrafamilial phylogenetic resolution based on morphology and multilocus data sets. In this study, we present a genome-wide molecular data set of 1764 single-copy gene families from a global sampling of 418 Boletaceae specimens. The resulting phylogenetic analysis has strong statistical support for most branches of the tree, including the first statistically robust backbone. The enigmatic Phylloboletellus chloephorus from non-ECM Argentinian subtropical forests was recovered as a new subfamily sister to the core Boletaceae. Time-calibrated branch lengths estimate that the family first arose in the early to mid-Cretaceous and underwent a rapid radiation in the Eocene, possibly when the ECM nutritional mode arose with the emergence and diversification of ECM angiosperms. Biogeographic reconstructions reveal a complex history of vicariance and episodic long-distance dispersal correlated with historical geologic events, including Gondwanan origins and inferred vicariance associated with its disarticulation. Together, this study represents the most comprehensively sampled, data-rich molecular phylogeny of the Boletaceae to date, establishing a foundation for future robust inferences of biogeography in the group.
Collapse
Affiliation(s)
- Keaton Tremble
- Natural History Museum of Utah and School of Biological Sciences, University of Utah, Salt Lake City, Utah 84108, USA
| | - Terry Henkel
- Department of Biological Sciences, California State Polytechnic University, Humboldt, Arcata 95521, California
| | - Alexander Bradshaw
- Natural History Museum of Utah and School of Biological Sciences, University of Utah, Salt Lake City, Utah 84108, USA
| | - Colin Domnauer
- Natural History Museum of Utah and School of Biological Sciences, University of Utah, Salt Lake City, Utah 84108, USA
| | - Lyda M Brown
- Natural History Museum of Utah and School of Biological Sciences, University of Utah, Salt Lake City, Utah 84108, USA
| | - Lê Xuân Thám
- Laboratory for Computation and Applications in Life Sciences, Institute for Computation Science and Artificial Intelligence, Van Lang University, Ho Chi Minh City 700000, Viet Nam
- Faculty of Applied Technology, School of Technology, Van Lang University, Ho Chi Minh City 700000, Viet Nam
| | | | - M Catherine Aime
- Department of Botany and Plant Pathology, Purdue University, West Lafayette, Indiana 47906, USA
| | - Jean-Marc Moncalvo
- Department of Natural History, Royal Ontario Museum and Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Ontario M5S 2C6, Canada
| | - Bryn Dentinger
- Natural History Museum of Utah and School of Biological Sciences, University of Utah, Salt Lake City, Utah 84108, USA
| |
Collapse
|
33
|
Wagle S, Markin A, Górecki P, Anderson TK, Eulenstein O. Asymmetric Cluster-Based Measures for Comparative Phylogenetics. J Comput Biol 2024; 31:312-327. [PMID: 38634854 PMCID: PMC11057527 DOI: 10.1089/cmb.2023.0338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2024] Open
Abstract
Phylogenetic inference and reconstruction methods generate hypotheses on evolutionary history. Competing inference methods are frequently used, and the evaluation of the generated hypotheses is achieved using tree comparison costs. The Robinson-Foulds (RF) distance is a widely used cost to compare the topology of two trees, but this cost is sensitive to tree error and can overestimate tree differences. To overcome this limitation, a refined version of the RF distance called the Cluster Affinity (CA) distance was introduced. However, CA distances are symmetric and cannot compare different types of trees. These asymmetric comparisons occur when gene trees are compared with species trees, when disparate datasets are integrated into a supertree, or when tree comparison measures are used to infer a phylogenetic network. In this study, we introduce a relaxation of the original Affinity distance to compare heterogeneous trees called the asymmetric CA cost. We also develop a biologically interpretable cost, the Cluster Support cost that normalizes by cluster size across gene trees. The characteristics of these costs are similar to the symmetric CA cost. We describe efficient algorithms, derive the exact diameters, and use these to standardize the cost to be applicable in practice. These costs provide objective, fine-scale, and biologically interpretable values that can assess differences and similarities between phylogenetic trees.
Collapse
Affiliation(s)
- Sanket Wagle
- Department of Computer Science, Iowa State University, Ames, Iowa, USA
| | - Alexey Markin
- National Animal Disease Center, USDA-ARS, Ames, Iowa, USA
| | - Paweł Górecki
- Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland
| | | | - Oliver Eulenstein
- Department of Computer Science, Iowa State University, Ames, Iowa, USA
| |
Collapse
|
34
|
Xie O, Morris JM, Hayes AJ, Towers RJ, Jespersen MG, Lees JA, Ben Zakour NL, Berking O, Baines SL, Carter GP, Tonkin-Hill G, Schrieber L, McIntyre L, Lacey JA, James TB, Sriprakash KS, Beatson SA, Hasegawa T, Giffard P, Steer AC, Batzloff MR, Beall BW, Pinho MD, Ramirez M, Bessen DE, Dougan G, Bentley SD, Walker MJ, Currie BJ, Tong SYC, McMillan DJ, Davies MR. Inter-species gene flow drives ongoing evolution of Streptococcus pyogenes and Streptococcus dysgalactiae subsp. equisimilis. Nat Commun 2024; 15:2286. [PMID: 38480728 PMCID: PMC10937727 DOI: 10.1038/s41467-024-46530-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Accepted: 02/28/2024] [Indexed: 03/17/2024] Open
Abstract
Streptococcus dysgalactiae subsp. equisimilis (SDSE) is an emerging cause of human infection with invasive disease incidence and clinical manifestations comparable to the closely related species, Streptococcus pyogenes. Through systematic genomic analyses of 501 disseminated SDSE strains, we demonstrate extensive overlap between the genomes of SDSE and S. pyogenes. More than 75% of core genes are shared between the two species with one third demonstrating evidence of cross-species recombination. Twenty-five percent of mobile genetic element (MGE) clusters and 16 of 55 SDSE MGE insertion regions were shared across species. Assessing potential cross-protection from leading S. pyogenes vaccine candidates on SDSE, 12/34 preclinical vaccine antigen genes were shown to be present in >99% of isolates of both species. Relevant to possible vaccine evasion, six vaccine candidate genes demonstrated evidence of inter-species recombination. These findings demonstrate previously unappreciated levels of genomic overlap between these closely related pathogens with implications for streptococcal pathobiology, disease surveillance and prevention.
Collapse
Affiliation(s)
- Ouli Xie
- Department of Infectious Diseases, The University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
- Monash Infectious Diseases, Monash Health, Melbourne, Australia
| | - Jacqueline M Morris
- Department of Microbiology and Immunology, The University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
| | - Andrew J Hayes
- Department of Microbiology and Immunology, The University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
| | - Rebecca J Towers
- Menzies School of Health Research, Charles Darwin University, Darwin, Australia
| | - Magnus G Jespersen
- Department of Microbiology and Immunology, The University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
| | - John A Lees
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Hinxton, Cambridgeshire, UK
| | - Nouri L Ben Zakour
- Australian Infectious Diseases Research Centre and School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
| | - Olga Berking
- Australian Infectious Diseases Research Centre and School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
| | - Sarah L Baines
- Doherty Applied Microbial Genomics, Department of Microbiology and Immunology, The University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
| | - Glen P Carter
- Doherty Applied Microbial Genomics, Department of Microbiology and Immunology, The University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
| | | | - Layla Schrieber
- Faculty of Veterinary Science, The University of Sydney, Sydney, Australia
| | - Liam McIntyre
- Department of Microbiology and Immunology, The University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
| | - Jake A Lacey
- Department of Infectious Diseases, The University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
| | - Taylah B James
- Department of Microbiology and Immunology, The University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
| | - Kadaba S Sriprakash
- Infection and Inflammation Program, QIMR Berghofer Medical Research Institute, Brisbane, Australia
- School of Science & Technology, University of New England, Armidale, Australia
| | - Scott A Beatson
- Australian Infectious Diseases Research Centre and School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
| | - Tadao Hasegawa
- Department of Bacteriology, Nagoya City University Graduate School of Medical Sciences, Nagoya, Japan
| | - Phil Giffard
- Menzies School of Health Research, Charles Darwin University, Darwin, Australia
| | - Andrew C Steer
- Tropical Diseases, Murdoch Children's Research Institute, Parkville, Australia
| | - Michael R Batzloff
- Infection and Inflammation Program, QIMR Berghofer Medical Research Institute, Brisbane, Australia
- Institute for Glycomics, Griffith University, Southport, Australia
| | - Bernard W Beall
- Respiratory Disease Branch, National Center for Immunizations and Respiratory Diseases, Centers for Disease Control and Prevention, Atlanta, GA, USA
| | - Marcos D Pinho
- Instituto de Microbiologia, Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, Lisboa, Portugal
| | - Mario Ramirez
- Instituto de Microbiologia, Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, Lisboa, Portugal
| | - Debra E Bessen
- Department of Pathology, Microbiology and Immunology, New York Medical College, Valhalla, NY, USA
| | - Gordon Dougan
- Parasites and Microbes, Wellcome Sanger Institute, Hinxton, Cambridgeshire, UK
| | - Stephen D Bentley
- Parasites and Microbes, Wellcome Sanger Institute, Hinxton, Cambridgeshire, UK
| | - Mark J Walker
- Australian Infectious Diseases Research Centre and School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia
| | - Bart J Currie
- Menzies School of Health Research, Charles Darwin University, Darwin, Australia
| | - Steven Y C Tong
- Department of Infectious Diseases, The University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
- Victorian Infectious Disease Service, The Royal Melbourne Hospital at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
| | - David J McMillan
- School of Science, Technology and Engineering, and Centre for Bioinnovation, University of the Sunshine Coast, Sippy Downs, Australia
| | - Mark R Davies
- Department of Microbiology and Immunology, The University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Melbourne, Australia.
| |
Collapse
|
35
|
Kiledal EA, Reitz LA, Kuiper EQ, Evans J, Siddiqui R, Denef VJ, Dick GJ. Comparative genomic analysis of Microcystis strain diversity using conserved marker genes. HARMFUL ALGAE 2024; 132:102580. [PMID: 38331539 DOI: 10.1016/j.hal.2024.102580] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 01/08/2024] [Accepted: 01/09/2024] [Indexed: 02/10/2024]
Abstract
Microcystis-dominated cyanobacterial harmful algal blooms (cyanoHABs) have a global impact on freshwater environments, affecting both wildlife and human health. Microcystis diversity and function in field samples and laboratory cultures can be determined by sequencing whole genomes of cultured isolates or natural populations, but these methods remain computationally and financially expensive. Amplicon sequencing of marker genes is a lower cost and higher throughput alternative to characterize strain composition and diversity in mixed samples. However, the selection of appropriate marker gene region(s) and primers requires prior understanding of the relationship between single gene genotype, whole genome content, and phenotype. To identify phylogenetic markers of Microcystis strain diversity, we compared phylogenetic trees built from each of 2,351 individual core genes to an established phylogeny and assessed the ability of these core genes to predict whole genome content and bioactive compound genotypes. We identified single-copy core genes better able to resolve Microcystis phylogenies than previously identified marker genes. We developed primers suitable for current Illumina-based amplicon sequencing with near-complete coverage of available Microcystis genomes and demonstrate that they outperform existing options for assessing Microcystis strain composition. Results showed that genetic markers can be used to infer Microcystis gene content and phenotypes such as potential production of bioactive compounds , although marker performance varies by bioactive compound gene and sequence similarity. Finally, we demonstrate that these markers can be used to characterize the Microcystis strain composition of laboratory or field samples like those collected for surveillance and modeling of Microcystis-dominated cyanobacterial harmful algal blooms.
Collapse
Affiliation(s)
- E Anders Kiledal
- Department of Earth and Environmental Sciences, University of Michigan, 2534 North University Building, 1100 North University Avenue Ave, Rm. 2004, Ann Arbor, MI 48109-1005, USA.
| | - Laura A Reitz
- Department of Earth and Environmental Sciences, University of Michigan, 2534 North University Building, 1100 North University Avenue Ave, Rm. 2004, Ann Arbor, MI 48109-1005, USA
| | - Esmée Q Kuiper
- Department of Earth and Environmental Sciences, University of Michigan, 2534 North University Building, 1100 North University Avenue Ave, Rm. 2004, Ann Arbor, MI 48109-1005, USA
| | - Jacob Evans
- Department of Ecology and Evolutionary Biology, University of Michigan, 2220 Biological Sciences Building, 1105 North University Avenue, Ann Arbor, MI 48109-1005, USA
| | - Ruqaiya Siddiqui
- Microbiome Core, University of Michigan, 1500 MSRB 1, 1150W Medical Center Drive, Ann Arbor, MI 48109-5666, USA
| | - Vincent J Denef
- Department of Ecology and Evolutionary Biology, University of Michigan, 2220 Biological Sciences Building, 1105 North University Avenue, Ann Arbor, MI 48109-1005, USA
| | - Gregory J Dick
- Department of Earth and Environmental Sciences, University of Michigan, 2534 North University Building, 1100 North University Avenue Ave, Rm. 2004, Ann Arbor, MI 48109-1005, USA; Cooperative Institute for Great Lakes Research, University of Michigan, 4040 Dana Building, 440 Church Street, Ann Arbor, MI 48109-1041, USA
| |
Collapse
|
36
|
Ma B, Gong H, Xu Q, Gao Y, Guan A, Wang H, Hua K, Luo R, Jin H. Bases-dependent Rapid Phylogenetic Clustering (Bd-RPC) enables precise and efficient phylogenetic estimation in viruses. Virus Evol 2024; 10:veae005. [PMID: 38361823 PMCID: PMC10868571 DOI: 10.1093/ve/veae005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 01/06/2024] [Accepted: 01/22/2024] [Indexed: 02/17/2024] Open
Abstract
Understanding phylogenetic relationships among species is essential for many biological studies, which call for an accurate phylogenetic tree to understand major evolutionary transitions. The phylogenetic analyses present a major challenge in estimation accuracy and computational efficiency, especially recently facing a wave of severe emerging infectious disease outbreaks. Here, we introduced a novel, efficient framework called Bases-dependent Rapid Phylogenetic Clustering (Bd-RPC) for new sample placement for viruses. In this study, a brand-new recoding method called Frequency Vector Recoding was implemented to approximate the phylogenetic distance, and the Phylogenetic Simulated Annealing Search algorithm was developed to match the recoded distance matrix with the phylogenetic tree. Meanwhile, the indel (insertion/deletion) was heuristically introduced to foreign sequence recognition for the first time. Here, we compared the Bd-RPC with the recent placement software (PAGAN2, EPA-ng, TreeBeST) and evaluated it in Alphacoronavirus, Alphaherpesvirinae, and Betacoronavirus by using Split and Robinson-Foulds distances. The comparisons showed that Bd-RPC maintained the highest precision with great efficiency, demonstrating good performance in new sample placement on all three virus genera. Finally, a user-friendly website (http://www.bd-rpc.xyz) is available for users to classify new samples instantly and facilitate exploration of the phylogenetic research in viruses, and the Bd-RPC is available on GitHub (http://github.com/Bin-Ma/bd-rpc).
Collapse
Affiliation(s)
- Bin Ma
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
- College of Veterinary Medicine, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
| | - Huimin Gong
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
- College of Veterinary Medicine, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
| | - Qianshuai Xu
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
- College of Veterinary Medicine, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
| | - Yuan Gao
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
- College of Veterinary Medicine, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
| | - Aohan Guan
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
- College of Veterinary Medicine, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
| | - Haoyu Wang
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
- College of Veterinary Medicine, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
| | - Kexin Hua
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
- College of Veterinary Medicine, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
| | - Rui Luo
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
- College of Veterinary Medicine, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
| | - Hui Jin
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
- College of Veterinary Medicine, Huazhong Agricultural University, No.1 Shizishan Street, Wuhan, Hubei 430070, China
| |
Collapse
|
37
|
Wang L, Dong W, Yin Z, Sheng J, Ezeana CF, Yang L, Yu X, Wong SSY, Wan Z, Danforth RL, Han K, Gao D, Wong STC. Charting Single Cell Lineage Dynamics and Mutation Networks via Homing CRISPR. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.05.574236. [PMID: 38260351 PMCID: PMC10802354 DOI: 10.1101/2024.01.05.574236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Single cell lineage tracing, essential for unraveling cellular dynamics in disease evolution is critical for developing targeted therapies. CRISPR-Cas9, known for inducing permanent and cumulative mutations, is a cornerstone in lineage tracing. The novel homing guide RNA (hgRNA) technology enhances this by enabling dynamic retargeting and facilitating ongoing genetic modifications. Charting these mutations, especially through successive hgRNA edits, poses a significant challenge. Our solution, LINEMAP, is a computational framework designed to trace and map these mutations with precision. LINEMAP meticulously discerns mutation alleles at single-cell resolution and maps their complex interrelationships through a mutation evolution network. By utilizing a Markov Process model, we can predict mutation transition probabilities, revealing potential mutational routes and pathways. Our reconstruction algorithm, anchored in the Markov model's attributes, reconstructs cellular lineage pathways, shedding light on the cell's evolutionary journey to the minutiae of single-cell division. Our findings reveal an intricate network of mutation evolution paired with a predictive Markov model, advancing our capability to reconstruct single-cell lineage via hgRNA. This has substantial implications for advancing our understanding of biological mechanisms and propelling medical research forward.
Collapse
Affiliation(s)
- Lin Wang
- Department of System Medicine and Bioengineering, Houston Methodist Neal Cancer Center, Houston, Texas 77030
| | - Wenjuan Dong
- Department of System Medicine and Bioengineering, Houston Methodist Neal Cancer Center, Houston, Texas 77030
| | - Zheng Yin
- Department of System Medicine and Bioengineering, Houston Methodist Neal Cancer Center, Houston, Texas 77030
- Biostatistics and Bioinformatics Shared Resource, Houston Methodist Neal Cancer Center, Houston, Texas 77030
| | - Jianting Sheng
- Department of System Medicine and Bioengineering, Houston Methodist Neal Cancer Center, Houston, Texas 77030
| | - Chika F. Ezeana
- Department of System Medicine and Bioengineering, Houston Methodist Neal Cancer Center, Houston, Texas 77030
| | - Li Yang
- T.T. and W. F. Chao Center for BRAIN, Houston Methodist Research Institute, Houston, Texas 77030
| | - Xiaohui Yu
- Department of System Medicine and Bioengineering, Houston Methodist Neal Cancer Center, Houston, Texas 77030
| | | | - Zhihao Wan
- Department of System Medicine and Bioengineering, Houston Methodist Neal Cancer Center, Houston, Texas 77030
| | - Rebecca L. Danforth
- Department of System Medicine and Bioengineering, Houston Methodist Neal Cancer Center, Houston, Texas 77030
| | - Kun Han
- Department of System Medicine and Bioengineering, Houston Methodist Neal Cancer Center, Houston, Texas 77030
| | - Dingcheng Gao
- Department of Cell & Development Biology, Weill Cornell Medical College, New York, NY 10065
| | - Stephen T. C. Wong
- Department of System Medicine and Bioengineering, Houston Methodist Neal Cancer Center, Houston, Texas 77030
- Departments of Radiology, Pathology and Genomic Medicine, Houston Methodist Hospital, Weill Cornell Medical College, Houston, TX 77030
| |
Collapse
|
38
|
Pan X, Li H, Putta P, Zhang X. LinRace: cell division history reconstruction of single cells using paired lineage barcode and gene expression data. Nat Commun 2023; 14:8388. [PMID: 38104156 PMCID: PMC10725445 DOI: 10.1038/s41467-023-44173-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Accepted: 12/03/2023] [Indexed: 12/19/2023] Open
Abstract
Lineage tracing technology using CRISPR/Cas9 genome editing has enabled simultaneous readouts of gene expressions and lineage barcodes in single cells, which allows for inference of cell lineage and cell types at the whole organism level. While most state-of-the-art methods for lineage reconstruction utilize only the lineage barcode data, methods that incorporate gene expressions are emerging. Effectively incorporating the gene expression data requires a reasonable model of how gene expression data changes along generations of divisions. Here, we present LinRace (Lineage Reconstruction with asymmetric cell division model), which integrates lineage barcode and gene expression data using asymmetric cell division model and infers cell lineages and ancestral cell states using Neighbor-Joining and maximum-likelihood heuristics. On both simulated and real data, LinRace outputs more accurate cell division trees than existing methods. With inferred ancestral states, LinRace can also show how a progenitor cell generates a large population of cells with various functionalities.
Collapse
Affiliation(s)
- Xinhai Pan
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA
| | - Hechen Li
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA
| | - Pranav Putta
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA
| | - Xiuwei Zhang
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA.
| |
Collapse
|
39
|
Morreale DP, St Geme III JW, Planet PJ. Phylogenomic analysis of the understudied Neisseriaceae species reveals a poly- and paraphyletic Kingella genus. Microbiol Spectr 2023; 11:e0312323. [PMID: 37882538 PMCID: PMC10715097 DOI: 10.1128/spectrum.03123-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Accepted: 09/15/2023] [Indexed: 10/27/2023] Open
Abstract
IMPORTANCE Understanding the evolutionary relationships between the species in the Neisseriaceae family has been a persistent challenge in bacterial systematics due to high recombination rates in these species. Previous studies of this family have focused on Neisseria meningitidis and N. gonorrhoeae. However, previously understudied Neisseriaceae species are gaining new attention, with Kingella kingae now recognized as a common human pathogen and with Alysiella and Simonsiella being unique in the bacterial world as multicellular organisms. A better understanding of the genomic evolution of the Neisseriaceae can lead to the identification of specific genes and traits that underlie the remarkable diversity of this family.
Collapse
Affiliation(s)
- Daniel P. Morreale
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Division of Infectious Diseases, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA
| | - Joseph W. St Geme III
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Division of Infectious Diseases, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA
| | - Paul J. Planet
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Division of Infectious Diseases, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA
- Comparative Genomics, American Museum of Natural History, New York, New York, USA
| |
Collapse
|
40
|
Li X, Trovão NS, Wertheim JO, Baele G, de Bernardi Schneider A. Optimizing ancestral trait reconstruction of large HIV Subtype C datasets through multiple-trait subsampling. Virus Evol 2023; 9:vead069. [PMID: 38046219 PMCID: PMC10691791 DOI: 10.1093/ve/vead069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 10/29/2023] [Accepted: 11/20/2023] [Indexed: 12/05/2023] Open
Abstract
Large datasets along with sampling bias represent a challenge for phylodynamic reconstructions, particularly when the study data are obtained from various heterogeneous sources and/or through convenience sampling. In this study, we evaluate the presence of unbalanced sampled distribution by collection date, location, and risk group of human immunodeficiency virus Type 1 Subtype C using a comprehensive subsampling strategy and assess their impact on the reconstruction of the viral spatial and risk group dynamics using phylogenetic comparative methods. Our study shows that a most suitable dataset for ancestral trait reconstruction can be obtained through subsampling by all available traits, particularly using multigene datasets. We also demonstrate that sampling bias is inflated when considerable information for a given trait is unavailable or of poor quality, as we observed for the trait risk group. In conclusion, we suggest that, even if traits are not well recorded, including them deliberately optimizes the representativeness of the original dataset rather than completely excluding them. Therefore, we advise the inclusion of as many traits as possible with the aid of subsampling approaches in order to optimize the dataset for phylodynamic analysis while reducing the computational burden. This will benefit research communities investigating the evolutionary and spatio-temporal patterns of infectious diseases.
Collapse
Affiliation(s)
| | - Nídia S Trovão
- Division of International Epidemiology and Population Studies, Fogarty International Center, National Institutes of Health, 31 Center Dr, Bethesda, MA 20892, USA
| | - Joel O Wertheim
- Department of Medicine, University of California, La Jolla, San Diego, CA 92093, USA
| | - Guy Baele
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven BE-3000, Belgium
| | - Adriano de Bernardi Schneider
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
- Ningbo No.2 Hospital, Ningbo 315010, China
- Ningbo Institute of Life and Health Industry, University of Chinese Academy of Sciences, Ningbo 315000, China
| |
Collapse
|
41
|
Hendriks KP, Kiefer C, Al-Shehbaz IA, Bailey CD, Hooft van Huysduynen A, Nikolov LA, Nauheimer L, Zuntini AR, German DA, Franzke A, Koch MA, Lysak MA, Toro-Núñez Ó, Özüdoğru B, Invernón VR, Walden N, Maurin O, Hay NM, Shushkov P, Mandáková T, Schranz ME, Thulin M, Windham MD, Rešetnik I, Španiel S, Ly E, Pires JC, Harkess A, Neuffer B, Vogt R, Bräuchler C, Rainer H, Janssens SB, Schmull M, Forrest A, Guggisberg A, Zmarzty S, Lepschi BJ, Scarlett N, Stauffer FW, Schönberger I, Heenan P, Baker WJ, Forest F, Mummenhoff K, Lens F. Global Brassicaceae phylogeny based on filtering of 1,000-gene dataset. Curr Biol 2023; 33:4052-4068.e6. [PMID: 37659415 DOI: 10.1016/j.cub.2023.08.026] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 06/22/2023] [Accepted: 08/08/2023] [Indexed: 09/04/2023]
Abstract
The mustard family (Brassicaceae) is a scientifically and economically important family, containing the model plant Arabidopsis thaliana and numerous crop species that feed billions worldwide. Despite its relevance, most phylogenetic trees of the family are incompletely sampled and often contain poorly supported branches. Here, we present the most complete Brassicaceae genus-level family phylogenies to date (Brassicaceae Tree of Life or BrassiToL) based on nuclear (1,081 genes, 319 of the 349 genera; 57 of the 58 tribes) and plastome (60 genes, 265 genera; all tribes) data. We found cytonuclear discordance between the two, which is likely a result of rampant hybridization among closely and more distantly related lineages. To evaluate the impact of such hybridization on the nuclear phylogeny reconstruction, we performed five different gene sampling routines, which increasingly removed putatively paralog genes. Our cleaned subset of 297 genes revealed high support for the tribes, whereas support for the main lineages (supertribes) was moderate. Calibration based on the 20 most clock-like nuclear genes suggests a late Eocene to late Oligocene origin of the family. Finally, our results strongly support a recently published new family classification, dividing the family into two subfamilies (one with five supertribes), together representing 58 tribes. This includes five recently described or re-established tribes, including Arabidopsideae, a monogeneric tribe accommodating Arabidopsis without any close relatives. With a worldwide community of thousands of researchers working on Brassicaceae and its diverse members, our new genus-level family phylogeny will be an indispensable tool for studies on biodiversity and plant biology.
Collapse
Affiliation(s)
- Kasper P Hendriks
- Department of Biology, Botany, University of Osnabrück, Barbarastraße 11, 49076 Osnabrück, Germany; Functional Traits Group, Naturalis Biodiversity Center, Darwinweg 2, 2333 CR Leiden, the Netherlands.
| | - Christiane Kiefer
- Centre for Organismal Studies (COS), Heidelberg University, Im Neuenheimer Feld 345, 69120 Heidelberg, Germany
| | | | - C Donovan Bailey
- Department of Biology, New Mexico State University, PO Box 30001, MSC 3AF, Las Cruces, NM 88003, USA
| | - Alex Hooft van Huysduynen
- Functional Traits Group, Naturalis Biodiversity Center, Darwinweg 2, 2333 CR Leiden, the Netherlands; Department of Biology, University of Antwerp, Groenenborgerlaan 171, 2020 Antwerp, Belgium
| | - Lachezar A Nikolov
- Department of Molecular, Cell and Developmental Biology, University of California, 610 Charles E. Young Dr. S., Los Angeles, CA 90095, USA
| | - Lars Nauheimer
- Australian Tropical Herbarium, James Cook University, PO Box 6811, Cairns, QLD 4870, Australia
| | | | - Dmitry A German
- South-Siberian Botanical Garden, Altai State University, Barnaul, Lesosechnaya Ulitsa, 25, Barnaul, Altai Krai, Russia
| | - Andreas Franzke
- Heidelberg Botanic Garden, Heidelberg University, Im Neuenheimer Feld 361, 69120 Heidelberg, Germany
| | - Marcus A Koch
- Centre for Organismal Studies (COS), Heidelberg University, Im Neuenheimer Feld 345, 69120 Heidelberg, Germany
| | - Martin A Lysak
- CEITEC-Central European Institute of Technology, Masaryk University, Kamenice 5, Brno 625 00, Czech Republic
| | - Óscar Toro-Núñez
- Departamento de Botánica, Universidad de Concepción, Barrio Universitario, Concepción, Chile
| | - Barış Özüdoğru
- Department of Biology, Hacettepe University, Beytepe, Ankara 06800, Türkiye
| | - Vanessa R Invernón
- Sorbonne Université, Muséum National d'Histoire Naturelle, Institut de Systématique, Évolution, Biodiversité (ISYEB), CP 39, 57 rue Cuvier, 75231 Paris Cedex 05, France
| | - Nora Walden
- Centre for Organismal Studies (COS), Heidelberg University, Im Neuenheimer Feld 345, 69120 Heidelberg, Germany
| | - Olivier Maurin
- Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3AE, UK
| | - Nikolai M Hay
- Department of Biology, Duke University, Durham, NC 27708, USA
| | - Philip Shushkov
- Department of Chemistry, Indiana University, 800 E. Kirkwood Ave., Bloomington, IN 47405, USA
| | - Terezie Mandáková
- CEITEC-Central European Institute of Technology, Masaryk University, Kamenice 5, Brno 625 00, Czech Republic
| | - M Eric Schranz
- Biosystematics Group, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, the Netherlands
| | - Mats Thulin
- Department of Organismal Biology, Uppsala University, Norbyvägen 18, 752 36 Uppsala, Sweden
| | | | - Ivana Rešetnik
- Department of Biology, University of Zagreb, Marulićev trg 20/II, 10000 Zagreb, Croatia
| | - Stanislav Španiel
- Institute of Botany, Slovak Academy of Sciences, Plant Science and Biodiversity Centre, Dúbravská cesta 9, 845 23 Bratislava, Slovakia
| | - Elfy Ly
- Functional Traits Group, Naturalis Biodiversity Center, Darwinweg 2, 2333 CR Leiden, the Netherlands; Wetsus, European Centre of Excellence for Sustainable Water Technology, Oostergoweg 9, 8911 MA Leeuwarden, the Netherlands; Department of Biotechnology, Delft University of Technology, Van der Maasweg 9, 2629 HZ Delft, the Netherlands
| | - J Chris Pires
- Soil and Crop Sciences, Colorado State University, 307 University Ave., Fort Collins, CO 80523-1170, USA
| | - Alex Harkess
- HudsonAlpha Institute for Biotechnology, 601 Genome Way Northwest, Huntsville, AL 35806, USA
| | - Barbara Neuffer
- Department of Biology, Botany, University of Osnabrück, Barbarastraße 11, 49076 Osnabrück, Germany
| | - Robert Vogt
- Botanischer Garten und Botanisches Museum, Freie Universität Berlin, Königin-Luise-Straße 6-8, 14195 Berlin, Germany
| | - Christian Bräuchler
- Department of Botany, Natural History Museum Vienna, Burgring 7, 1010 Vienna, Austria
| | - Heimo Rainer
- Department of Botany, Natural History Museum Vienna, Burgring 7, 1010 Vienna, Austria
| | - Steven B Janssens
- Department of Biology, KU Leuven, Kasteelpark Arenberg 31 - box 2435, 3001 Leuven, Belgium; Meise Botanic Garden, Nieuwelaan 38, 1860 Meise, Belgium
| | - Michaela Schmull
- Harvard University Herbaria, 22 Divinity Ave., Cambridge, MA 02138, USA
| | - Alan Forrest
- Centre for Middle Eastern Plants, Royal Botanic Garden Edinburgh, 20A Inverleith Row, Edinburgh EH3 5LR, UK
| | - Alessia Guggisberg
- ETH Zürich, Institut für Integrative Biologie, Universitätstrasse 16, 8092 Zürich, Switzerland
| | - Sue Zmarzty
- Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3AE, UK
| | - Brendan J Lepschi
- Australian National Herbarium, Centre for Australian National Biodiversity Research, Clunies Ross St, Acton, ACT 2601, Australia
| | - Neville Scarlett
- La Trobe University, Plenty Road and Kingsbury Dr., Bundoora, VIC 3086, Australia
| | - Fred W Stauffer
- Conservatory and Botanic Gardens of Geneva, CP 60, Chambésy, 1292 Geneva, Switzerland
| | - Ines Schönberger
- Manaaki Whenua Landcare Research, Allan Herbarium, PO Box 69040, Lincoln, New Zealand
| | - Peter Heenan
- Manaaki Whenua Landcare Research, Allan Herbarium, PO Box 69040, Lincoln, New Zealand
| | | | - Félix Forest
- Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3AE, UK
| | - Klaus Mummenhoff
- Department of Biology, Botany, University of Osnabrück, Barbarastraße 11, 49076 Osnabrück, Germany.
| | - Frederic Lens
- Functional Traits Group, Naturalis Biodiversity Center, Darwinweg 2, 2333 CR Leiden, the Netherlands; Institute of Biology Leiden, Plant Sciences, Leiden University, Sylviusweg 72, 2333 BE Leiden, the Netherlands.
| |
Collapse
|
42
|
Jensen CG, Sumner JA, Kleinstein SH, Hoehn KB. Inferring B cell phylogenies from paired heavy and light chain BCR sequences with Dowser. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.29.560187. [PMID: 37873135 PMCID: PMC10592837 DOI: 10.1101/2023.09.29.560187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
Antibodies are vital to human immune responses and are composed of genetically variable heavy and light chains. These structures are initially expressed as B cell receptors (BCRs). BCR diversity is shaped through somatic hypermutation and selection during immune responses. This evolutionary process produces B cell clones, cells that descend from a common ancestor but differ by mutations. Phylogenetic trees inferred from BCR sequences can reconstruct the history of mutations within a clone. Until recently, BCR sequencing technologies separated heavy and light chains, but advancements in single cell sequencing now pair heavy and light chains from individual cells. However, it is unclear how these separate genes should be combined to infer B cell phylogenies. In this study, we investigated strategies for using paired heavy and light chain sequences to build phylogenetic trees. We found incorporating light chains significantly improved tree accuracy and reproducibility across all methods tested. This improvement was greater than the difference between tree building methods and persisted even when mixing bulk and single cell sequencing data. However, we also found that many phylogenetic methods estimated significantly biased branch lengths when some light chains were missing, such as when mixing single cell and bulk BCR data. This bias was eliminated using maximum likelihood methods with separate branch lengths for heavy and light chain gene partitions. Thus, we recommend using maximum likelihood methods with separate heavy and light chain partitions, especially when mixing data types. We implemented these methods in the R package Dowser: https://dowser.readthedocs.io.
Collapse
Affiliation(s)
- Cole G. Jensen
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06511, USA
| | - Jacob A. Sumner
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06511, USA
- Integrated Graduate Program in Physical and Engineering Biology, Yale University, New Haven, Connecticut, 06520, USA
| | - Steven H. Kleinstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06511, USA
- Department of Pathology, Yale School of Medicine, New Haven, CT 06520, USA
- Department of Immunobiology, Yale School of Medicine, New Haven, CT 06520, USA
| | - Kenneth B. Hoehn
- Department of Pathology, Yale School of Medicine, New Haven, CT 06520, USA
- Current address: Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA
| |
Collapse
|
43
|
Simmons MP, Goloboff PA, Stöver BC, Springer MS, Gatesy J. Quantification of congruence among gene trees with polytomies using overall success of resolution for phylogenomic coalescent analyses. Cladistics 2023; 39:418-436. [PMID: 37096985 DOI: 10.1111/cla.12540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Revised: 02/22/2023] [Accepted: 03/24/2023] [Indexed: 04/26/2023] Open
Abstract
Gene-tree-inference error can cause species-tree-inference artefacts in summary phylogenomic coalescent analyses. Here we integrate two ways of accommodating these inference errors: collapsing arbitrarily or dubiously resolved gene-tree branches, and subsampling gene trees based on their pairwise congruence. We tested the effect of collapsing gene-tree branches with 0% approximate-likelihood-ratio-test (SH-like aLRT) support in likelihood analyses and strict consensus trees for parsimony, and then subsampled those partially resolved trees based on congruence measures that do not penalize polytomies. For this purpose we developed a new TNT script for congruence sorting (congsort), and used it to calculate topological incongruence for eight phylogenomic datasets using three distance measures: standard Robinson-Foulds (RF) distances; overall success of resolution (OSR), which is based on counting both matching and contradicting clades; and RF contradictions, which only counts contradictory clades. As expected, we found that gene-tree incongruence was often concentrated in clades that are arbitrarily or dubiously resolved and that there was greater congruence between the partially collapsed gene trees and the coalescent and concatenation topologies inferred from those genes. Coalescent branch lengths typically increased as the most incongruent gene trees were excluded, although branch supports typically did not. We investigated two successful and complementary approaches to prioritizing genes for investigation of alignment or homology errors. Coalescent-tree clades that contradicted concatenation-tree clades were generally less robust to gene-tree subsampling than congruent clades. Our preferred approach to collapsing likelihood gene-tree clades (0% SH-like aLRT support) and subsampling those trees (OSR) generally outperformed competing approaches for a large fungal dataset with respect to branch lengths, support and congruence. We recommend widespread application of this approach (and strict consensus trees for parsimony-based analyses) for improving quantification of gene-tree congruence/conflict, estimating coalescent branch lengths, testing robustness of coalescent analyses to gene-tree-estimation error, and improving topological robustness of summary coalescent analyses. This approach is quick and easy to implement, even for huge datasets.
Collapse
Affiliation(s)
- Mark P Simmons
- Department of Biology, Colorado State University, Fort Collins, CO, 80523, USA
| | - Pablo A Goloboff
- CONICET, INSUE, Fundación Miguel Lillo, Miguel Lillo 251, 4000, S.M. de Tucumán, Argentina
| | - Ben C Stöver
- Institute for Evolution and Biodiversity, WMU Münster, 48149, Münster, Germany
| | - Mark S Springer
- Department of Evolution, Ecology, and Organismal Biology, University of California, Riverside, CA, 92521, USA
| | - John Gatesy
- Division of Vertebrate Zoology, American Museum of Natural History, New York, NY, 10024, USA
| |
Collapse
|
44
|
Prusokiene A, Prusokas A, Retkute R. Machine learning based lineage tree reconstruction improved with knowledge of higher level relationships between cells and genomic barcodes. NAR Genom Bioinform 2023; 5:lqad077. [PMID: 37608801 PMCID: PMC10440785 DOI: 10.1093/nargab/lqad077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 06/26/2023] [Accepted: 08/11/2023] [Indexed: 08/24/2023] Open
Abstract
Tracking cells as they divide and progress through differentiation is a fundamental step in understanding many biological processes, such as the development of organisms and progression of diseases. In this study, we investigate a machine learning approach to reconstruct lineage trees in experimental systems based on mutating synthetic genomic barcodes. We refine previously proposed methodology by embedding information of higher level relationships between cells and single-cell barcode values into a feature space. We test performance of the algorithm on shallow trees (up to 100 cells) and deep trees (up to 10 000 cells). Our proposed algorithm can improve tree reconstruction accuracy in comparison to reconstructions based on a maximum parsimony method, but this comes at a higher computational time requirement.
Collapse
Affiliation(s)
- Alisa Prusokiene
- School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne NE1 7RU, UK
| | | | - Renata Retkute
- Department of Plant Sciences, University of Cambridge, Downing Street, Cambridge CB2 3EA, UK
| |
Collapse
|
45
|
Baldwin E, McNair M, Leebens-Mack J. Rampant chloroplast capture in Sarracenia revealed by plastome phylogeny. FRONTIERS IN PLANT SCIENCE 2023; 14:1237749. [PMID: 37711293 PMCID: PMC10497973 DOI: 10.3389/fpls.2023.1237749] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 07/20/2023] [Indexed: 09/16/2023]
Abstract
Introgression can produce novel genetic variation in organisms that hybridize. Sympatric species pairs in the carnivorous plant genus Sarracenia L. frequently hybridize, and all known hybrids are fertile. Despite being a desirable system for studying the evolutionary consequences of hybridization, the extent to which introgression occurs in the genus is limited to a few species in only two field sites. Previous phylogenomic analysis of Sarracenia estimated a highly resolved species tree from 199 nuclear genes, but revealed a plastid genome that is highly discordant with the species tree. Such cytonuclear discordance could be caused by chloroplast introgression (i.e. chloroplast capture) or incomplete lineage sorting (ILS). To better understand the extent to which introgression is occurring in Sarracenia, the chloroplast capture and ILS hypotheses were formally evaluated. Plastomes were assembled de-novo from sequencing reads generated from 17 individuals in addition to reads obtained from the previous study. Assemblies of 14 whole plastomes were generated and annotated, and the remaining fragmented assemblies were scaffolded to these whole-plastome assemblies. Coding sequence from 79 homologous genes were aligned and concatenated for maximum-likelihood phylogeny estimation. The plastome tree is extremely discordant with the published species tree. Plastome trees were simulated under the coalescent and tree distance from the species tree was calculated to generate a null distribution of discordance that is expected under ILS alone. A t-test rejected the null hypothesis that ILS could cause the level of discordance seen in the plastome tree, suggesting that chloroplast capture must be invoked to explain the discordance. Due to the extreme level of discordance in the plastome tree, it is likely that chloroplast capture has been common in the evolutionary history of Sarracenia.
Collapse
Affiliation(s)
- Ethan Baldwin
- Department of Plant Biology, University of Georgia, Athens, GA, United States
| | - Mason McNair
- Department of Plant & Environmental Science, Clemson University, Florence, SC, United States
| | - Jim Leebens-Mack
- Department of Plant Biology, University of Georgia, Athens, GA, United States
| |
Collapse
|
46
|
Struck TH, Golombek A, Hoesel C, Dimitrov D, Elgetany AH. Mitochondrial Genome Evolution in Annelida-A Systematic Study on Conservative and Variable Gene Orders and the Factors Influencing its Evolution. Syst Biol 2023; 72:925-945. [PMID: 37083277 PMCID: PMC10405356 DOI: 10.1093/sysbio/syad023] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Revised: 04/15/2023] [Accepted: 04/18/2023] [Indexed: 04/22/2023] Open
Abstract
The mitochondrial genomes of Bilateria are relatively conserved in their protein-coding, rRNA, and tRNA gene complement, but the order of these genes can range from very conserved to very variable depending on the taxon. The supposedly conserved gene order of Annelida has been used to support the placement of some taxa within Annelida. Recently, authors have cast doubts on the conserved nature of the annelid gene order. Various factors may influence gene order variability including, among others, increased substitution rates, base composition differences, structure of noncoding regions, parasitism, living in extreme habitats, short generation times, and biomineralization. However, these analyses were neither done systematically nor based on well-established reference trees. Several focused on only a few of these factors and biological factors were usually explored ad-hoc without rigorous testing or correlation analyses. Herein, we investigated the variability and evolution of the annelid gene order and the factors that potentially influenced its evolution, using a comprehensive and systematic approach. The analyses were based on 170 genomes, including 33 previously unrepresented species. Our analyses included 706 different molecular properties, 20 life-history and ecological traits, and a reference tree corresponding to recent improvements concerning the annelid tree. The results showed that the gene order with and without tRNAs is generally conserved. However, individual taxa exhibit higher degrees of variability. None of the analyzed life-history and ecological traits explained the observed variability across mitochondrial gene orders. In contrast, the combination and interaction of the best-predicting factors for substitution rate and base composition explained up to 30% of the observed variability. Accordingly, correlation analyses of different molecular properties of the mitochondrial genomes showed an intricate network of direct and indirect correlations between the different molecular factors. Hence, gene order evolution seems to be driven by molecular evolutionary aspects rather than by life history or ecology. On the other hand, variability of the gene order does not predict if a taxon is difficult to place in molecular phylogenetic reconstructions using sequence data or not. We also discuss the molecular properties of annelid mitochondrial genomes considering canonical views on gene evolution and potential reasons why the canonical views do not always fit to the observed patterns without making some adjustments. [Annelida; compositional biases; ecology; gene order; life history; macroevolution; mitochondrial genomes; substitution rates.].
Collapse
Affiliation(s)
- Torsten H Struck
- Natural History Museum, University of Oslo, P.O. Box 1172, Blindern, 0318 Oslo, Norway
- Centre of Molecular Biodiversity Research, Zoological Research Museum Alexander KoenigBonn 53113, Germany
- FB05 Biology/Chemistry; University of Osnabrück, Osnabrück 49069, Germany
| | - Anja Golombek
- Centre of Molecular Biodiversity Research, Zoological Research Museum Alexander KoenigBonn 53113, Germany
- FB05 Biology/Chemistry; University of Osnabrück, Osnabrück 49069, Germany
| | - Christoph Hoesel
- FB05 Biology/Chemistry; University of Osnabrück, Osnabrück 49069, Germany
| | - Dimitar Dimitrov
- Department of Natural History, University Museum of Bergen, University of Bergen, P.O. Box 7800, 5020 Bergen, Norway
| | - Asmaa Haris Elgetany
- Natural History Museum, University of Oslo, P.O. Box 1172, Blindern, 0318 Oslo, Norway
- Zoology Department, Faculty of Science, Damietta University, New Damietta, Central zone, 34517, Egypt
| |
Collapse
|
47
|
Guerrini V, Conte A, Grossi R, Liti G, Rosone G, Tattini L. phyBWT2: phylogeny reconstruction via eBWT positional clustering. Algorithms Mol Biol 2023; 18:11. [PMID: 37537624 PMCID: PMC10399073 DOI: 10.1186/s13015-023-00232-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 06/10/2023] [Indexed: 08/05/2023] Open
Abstract
BACKGROUND Molecular phylogenetics studies the evolutionary relationships among the individuals of a population through their biological sequences. It may provide insights about the origin and the evolution of viral diseases, or highlight complex evolutionary trajectories. A key task is inferring phylogenetic trees from any type of sequencing data, including raw short reads. Yet, several tools require pre-processed input data e.g. from complex computational pipelines based on de novo assembly or from mappings against a reference genome. As sequencing technologies keep becoming cheaper, this puts increasing pressure on designing methods that perform analysis directly on their outputs. From this viewpoint, there is a growing interest in alignment-, assembly-, and reference-free methods that could work on several data including raw reads data. RESULTS We present phyBWT2, a newly improved version of phyBWT (Guerrini et al. in 22nd International Workshop on Algorithms in Bioinformatics (WABI) 242:23-12319, 2022). Both of them directly reconstruct phylogenetic trees bypassing both the alignment against a reference genome and de novo assembly. They exploit the combinatorial properties of the extended Burrows-Wheeler Transform (eBWT) and the corresponding eBWT positional clustering framework to detect relevant blocks of the longest shared substrings of varying length (unlike the k-mer-based approaches that need to fix the length k a priori). As a result, they provide novel alignment-, assembly-, and reference-free methods that build partition trees without relying on the pairwise comparison of sequences, thus avoiding to use a distance matrix to infer phylogeny. In addition, phyBWT2 outperforms phyBWT in terms of running time, as the former reconstructs phylogenetic trees step-by-step by considering multiple partitions, instead of just one partition at a time, as previously done by the latter. CONCLUSIONS Based on the results of the experiments on sequencing data, we conclude that our method can produce trees of quality comparable to the benchmark phylogeny by handling datasets of different types (short reads, contigs, or entire genomes). Overall, the experiments confirm the effectiveness of phyBWT2 that improves the performance of its previous version phyBWT, while preserving the accuracy of the results.
Collapse
Affiliation(s)
| | - Alessio Conte
- Dipartimento di Informatica, University of Pisa, Pisa, Italy.
| | - Roberto Grossi
- Dipartimento di Informatica, University of Pisa, Pisa, Italy.
| | - Gianni Liti
- CNRS UMR 7284, INSERM U1081 Université Côte d'Azu, Nice, France
| | - Giovanna Rosone
- Dipartimento di Informatica, University of Pisa, Pisa, Italy.
| | - Lorenzo Tattini
- CNRS UMR 7284, INSERM U1081 Université Côte d'Azu, Nice, France
| |
Collapse
|
48
|
Weisbecker V, Beck RMD, Guillerme T, Harrington AR, Lange-Hodgson L, Lee MSY, Mardon K, Phillips MJ. Multiple modes of inference reveal less phylogenetic signal in marsupial basicranial shape compared with the rest of the cranium. Philos Trans R Soc Lond B Biol Sci 2023; 378:20220085. [PMID: 37183893 PMCID: PMC10184248 DOI: 10.1098/rstb.2022.0085] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Accepted: 12/17/2022] [Indexed: 05/16/2023] Open
Abstract
Incorporating morphological data into modern phylogenies allows integration of fossil evidence, facilitating divergence dating and macroevolutionary inferences. Improvements in the phylogenetic utility of morphological data have been sought via Procrustes-based geometric morphometrics (GMM), but with mixed success and little clarity over what anatomical areas are most suitable. Here, we assess GMM-based phylogenetic reconstructions in a heavily sampled source of discrete characters for mammalian phylogenetics-the basicranium-in 57 species of marsupial mammals, compared with the remainder of the cranium. We show less phylogenetic signal in the basicranium compared with a 'Rest of Cranium' partition, using diverse metrics of phylogenetic signal (Kmult, phylogenetically aligned principal components analysis, comparisons of UPGMA/neighbour-joining/parsimony trees and cophenetic distances to a reference phylogeny) for scaled, Procrustes-aligned landmarks and allometry-corrected residuals. Surprisingly, a similar pattern emerged from parsimony-based analyses of discrete cranial characters. The consistent results across methods suggest that easily computed metrics such as Kmult can provide good guidance on phylogenetic information in a landmarking configuration. In addition, GMM data may be less informative for intricate but conservative anatomical regions such as the basicranium, while better-but not necessarily novel-phylogenetic information can be expected for broadly characterized shapes such as entire bones. This article is part of the theme issue 'The mammalian skull: development, structure and function'.
Collapse
Affiliation(s)
- Vera Weisbecker
- College of Science and Engineering, Flinders University, Adelaide, South Australia 5042, Australia
| | - Robin M. D. Beck
- School of Science, Engineering and Environment, University of Salford, Salford, M5 4WT, UK
| | - Thomas Guillerme
- School of Biosciences, University of Sheffield, Sheffield, S10 2TN, UK
| | | | - Leonie Lange-Hodgson
- School of Biological Sciences, University of Queensland, Saint Lucia, Queensland, 4072, Australia
| | - Michael S. Y. Lee
- College of Science and Engineering, Flinders University, Adelaide, South Australia 5042, Australia
- Earth Sciences Section, South Australian Museum, Adelaide, South Australia, 5000 Australia
| | - Karine Mardon
- Centre of Advanced Imaging, University of Queensland, Saint Lucia, Queensland, 4072, Australia
| | - Matthew J. Phillips
- School of Biology & Environmental Science, Queensland University of Technology, Brisbane, Queensland, 4000, Australia
| |
Collapse
|
49
|
Arora J, Buček A, Hellemans S, Beránková T, Arias JR, Fisher BL, Clitheroe C, Brune A, Kinjo Y, Šobotník J, Bourguignon T. Evidence of cospeciation between termites and their gut bacteria on a geological time scale. Proc Biol Sci 2023; 290:20230619. [PMID: 37339742 DOI: 10.1098/rspb.2023.0619] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Accepted: 05/24/2023] [Indexed: 06/22/2023] Open
Abstract
Termites host diverse communities of gut microbes, including many bacterial lineages only found in this habitat. The bacteria endemic to termite guts are transmitted via two routes: a vertical route from parent colonies to daughter colonies and a horizontal route between colonies sometimes belonging to different termite species. The relative importance of both transmission routes in shaping the gut microbiota of termites remains unknown. Using bacterial marker genes derived from the gut metagenomes of 197 termites and one Cryptocercus cockroach, we show that bacteria endemic to termite guts are mostly transferred vertically. We identified 18 lineages of gut bacteria showing cophylogenetic patterns with termites over tens of millions of years. Horizontal transfer rates estimated for 16 bacterial lineages were within the range of those estimated for 15 mitochondrial genes, suggesting that horizontal transfers are uncommon and vertical transfers are the dominant transmission route in these lineages. Some of these associations probably date back more than 150 million years and are an order of magnitude older than the cophylogenetic patterns between mammalian hosts and their gut bacteria. Our results suggest that termites have cospeciated with their gut bacteria since first appearing in the geological record.
Collapse
Affiliation(s)
- Jigyasa Arora
- Okinawa Institute of Science and Technology Graduate University, 1919-1 Tancha, Onna-son, Okinawa 904-0495, Japan
| | - Aleš Buček
- Okinawa Institute of Science and Technology Graduate University, 1919-1 Tancha, Onna-son, Okinawa 904-0495, Japan
- Faculty of Tropical AgriScience, Czech University of Life Sciences, Kamýcká 129, Suchdol, 165 00, Prague 6, Czech Republic
| | - Simon Hellemans
- Okinawa Institute of Science and Technology Graduate University, 1919-1 Tancha, Onna-son, Okinawa 904-0495, Japan
| | - Tereza Beránková
- Faculty of Tropical AgriScience, Czech University of Life Sciences, Kamýcká 129, Suchdol, 165 00, Prague 6, Czech Republic
| | - Johanna Romero Arias
- Faculty of Tropical AgriScience, Czech University of Life Sciences, Kamýcká 129, Suchdol, 165 00, Prague 6, Czech Republic
| | - Brian L Fisher
- Madagascar Biodiversity Center, Parc Botanique et Zoologique de Tsimbazaza, Antananarivo 101, Madagascar
- California Academy of Sciences, San Francisco, CA, USA
| | - Crystal Clitheroe
- Okinawa Institute of Science and Technology Graduate University, 1919-1 Tancha, Onna-son, Okinawa 904-0495, Japan
| | - Andreas Brune
- Research Group Insect Gut Microbiology and Symbiosis, Max Planck Institute for Terrestrial Microbiology, Marburg, 35043, Germany
| | - Yukihiro Kinjo
- Okinawa Institute of Science and Technology Graduate University, 1919-1 Tancha, Onna-son, Okinawa 904-0495, Japan
- College of Economics and Environmental Policy, Okinawa International University, 2-6-1 Ginowan, Ginowan, 901-2701, Okinawa, Japan
| | - Jan Šobotník
- Faculty of Tropical AgriScience, Czech University of Life Sciences, Kamýcká 129, Suchdol, 165 00, Prague 6, Czech Republic
- College of Economics and Environmental Policy, Okinawa International University, 2-6-1 Ginowan, Ginowan, 901-2701, Okinawa, Japan
| | - Thomas Bourguignon
- Okinawa Institute of Science and Technology Graduate University, 1919-1 Tancha, Onna-son, Okinawa 904-0495, Japan
- Faculty of Tropical AgriScience, Czech University of Life Sciences, Kamýcká 129, Suchdol, 165 00, Prague 6, Czech Republic
| |
Collapse
|
50
|
Simões TR, Vernygora OV, de Medeiros BAS, Wright AM. Handling Logical Character Dependency in Phylogenetic Inference: Extensive Performance Testing of Assumptions and Solutions Using Simulated and Empirical Data. Syst Biol 2023; 72:662-680. [PMID: 36773019 PMCID: PMC10276625 DOI: 10.1093/sysbio/syad006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Revised: 12/08/2022] [Accepted: 02/09/2023] [Indexed: 02/12/2023] Open
Abstract
Logical character dependency is a major conceptual and methodological problem in phylogenetic inference of morphological data sets, as it violates the assumption of character independence that is common to all phylogenetic methods. It is more frequently observed in higher-level phylogenies or in data sets characterizing major evolutionary transitions, as these represent parts of the tree of life where (primary) anatomical characters either originate or disappear entirely. As a result, secondary traits related to these primary characters become "inapplicable" across all sampled taxa in which that character is absent. Various solutions have been explored over the last three decades to handle character dependency, such as alternative character coding schemes and, more recently, new algorithmic implementations. However, the accuracy of the proposed solutions, or the impact of character dependency across distinct optimality criteria, has never been directly tested using standard performance measures. Here, we utilize simple and complex simulated morphological data sets analyzed under different maximum parsimony optimization procedures and Bayesian inference to test the accuracy of various coding and algorithmic solutions to character dependency. This is complemented by empirical analyses using a recoded data set on palaeognathid birds. We find that in small, simulated data sets, absent coding performs better than other popular coding strategies available (contingent and multistate), whereas in more complex simulations (larger data sets controlled for different tree structure and character distribution models) contingent coding is favored more frequently. Under contingent coding, a recently proposed weighting algorithm produces the most accurate results for maximum parsimony. However, Bayesian inference outperforms all parsimony-based solutions to handle character dependency due to fundamental differences in their optimization procedures-a simple alternative that has been long overlooked. Yet, we show that the more primary characters bearing secondary (dependent) traits there are in a data set, the harder it is to estimate the true phylogenetic tree, regardless of the optimality criterion, owing to a considerable expansion of the tree parameter space. [Bayesian inference, character dependency, character coding, distance metrics, morphological phylogenetics, maximum parsimony, performance, phylogenetic accuracy.].
Collapse
Affiliation(s)
- Tiago R Simões
- Department of Organismic and Evolutionary Biology and Museum of Comparative Zoology, Harvard University, Cambridge, Massachusetts, USA
| | - Oksana V Vernygora
- Department of Entomology, University of Kentucky, Lexington, Kentucky, USA
| | | | - April M Wright
- Department of Biological Sciences, Southeastern Louisiana University, Hammond, Louisiana, USA
| |
Collapse
|