1
|
Ezcurra MD. Exploring the effects of weighting against homoplasy in genealogies of palaeontological phylogenetic matrices. Cladistics 2024; 40:242-281. [PMID: 38728134 DOI: 10.1111/cla.12581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 04/15/2024] [Accepted: 04/16/2024] [Indexed: 05/12/2024] Open
Abstract
Although simulations have shown that implied weighting (IW) outperforms equal weighting (EW) in phylogenetic parsimony analyses, weighting against homoplasy lacks extensive usage in palaeontology. Iterative modifications of several phylogenetic matrices in the last decades resulted in extensive genealogies of datasets that allow the evaluation of differences in the stability of results for alternative character weighting methods directly on empirical data. Each generation was compared against the most recent generation in each genealogy because it is assumed that it is the most comprehensive (higher sampling), revised (fewer misscorings) and complete (lower amount of missing data) matrix of the genealogy. The analyses were conducted on six different genealogies under EW and IW and extended implied weighting (EIW) with a range of concavity constant values (k) between 3 and 30. Pairwise comparisons between trees were conducted using Robinson-Foulds distances normalized by the total number of groups, distortion coefficient, subtree pruning and regrafting moves, and the proportional sum of group dissimilarities. The results consistently show that IW and EIW produce results more similar to those of the last dataset than EW in the vast majority of genealogies and for all comparative measures. This is significant because almost all of these matrices were originally analysed only under EW. Implied weighting and EIW do not outperform each other unambiguously. Euclidean distances based on a principal components analysis of the comparative measures show that different ranges of k-values retrieve the most similar results to the last generation in different genealogies. There is a significant positive linear correlation between the optimal k-values and the number of terminals of the last generations. This could be employed to inform about the range of k-values to be used in phylogenetic analyses based on matrix size but with the caveat that this emergent relationship still relies on a low sample size of genealogies.
Collapse
Affiliation(s)
- Martín D Ezcurra
- Sección Paleontología de Vertebrados, CONICET-Museo Argentino de Ciencias Naturales, Ángel Gallardo 470, C1405DJR, Ciudad Autónoma de Buenos Aires, Argentina
- School of Geography, Earth and Environmental Sciences, University of Birmingham, Edgbaston, B15 2TT, Birmingham, UK
| |
Collapse
|
2
|
Minahan NT, Yen TY, Guo YLL, Shu PY, Tsai KH. Concatenated ScaA and TSA56 Surface Antigen Sequences Reflect Genome-Scale Phylogeny of Orientia tsutsugamushi: An Analysis Including Two Genomes from Taiwan. Pathogens 2024; 13:299. [PMID: 38668254 PMCID: PMC11054523 DOI: 10.3390/pathogens13040299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Revised: 03/29/2024] [Accepted: 03/29/2024] [Indexed: 04/29/2024] Open
Abstract
Orientia tsutsugamushi is an obligate intracellular bacterium associated with trombiculid mites and is the causative agent of scrub typhus, a life-threatening febrile disease. Strain typing of O. tsutsugamushi is based on its immunodominant surface antigen, 56-kDa type-specific antigen (TSA56). However, TSA56 gene sequence-based phylogenetic analysis is only partially congruent with core genome-based phylogenetic analysis. Thus, this study investigated whether concatenated surface antigen sequences, including surface cell antigen (Sca) proteins, can reflect the genome-scale phylogeny of O. tsutsugamushi. Complete genomes were obtained for two common O. tsutsugamushi strains in Taiwan, TW-1 and TW-22, and the core genome/proteome was identified for 11 O. tsutsugamushi strains. Phylogenetic analysis was performed using maximum likelihood (ML) and neighbor-joining (NJ) methods, and the congruence between trees was assessed using a quartet similarity measure. Phylogenetic analysis based on 691 concatenated core protein sequences produced identical tree topologies with ML and NJ methods. Among TSA56 and core Sca proteins (ScaA, ScaC, ScaD, and ScaE), TSA56 trees were most similar to the core protein tree, and ScaA trees were the least similar. However, concatenated ScaA and TSA56 sequences produced trees that were highly similar to the core protein tree, the NJ tree being more similar. Strain-level characterization of O. tsutsugamushi may be improved by coanalyzing ScaA and TSA56 sequences, which are also important targets for their combined immunogenicity.
Collapse
Affiliation(s)
- Nicholas T. Minahan
- Institute of Environmental and Occupational Health Sciences, College of Public Health, National Taiwan University, Taipei 100025, Taiwan; (N.T.M.); (Y.-L.L.G.)
| | - Tsai-Ying Yen
- Centers for Diagnostics and Vaccine Development, Centers for Disease Control, Ministry of Health and Welfare, Taipei 115210, Taiwan; (T.-Y.Y.); (P.-Y.S.)
| | - Yue-Liang Leon Guo
- Institute of Environmental and Occupational Health Sciences, College of Public Health, National Taiwan University, Taipei 100025, Taiwan; (N.T.M.); (Y.-L.L.G.)
- Department of Environmental and Occupational Medicine, National Taiwan University (NTU) College of Medicine and NTU Hospital, Taipei 100025, Taiwan
| | - Pei-Yun Shu
- Centers for Diagnostics and Vaccine Development, Centers for Disease Control, Ministry of Health and Welfare, Taipei 115210, Taiwan; (T.-Y.Y.); (P.-Y.S.)
| | - Kun-Hsien Tsai
- Institute of Environmental and Occupational Health Sciences, College of Public Health, National Taiwan University, Taipei 100025, Taiwan; (N.T.M.); (Y.-L.L.G.)
- Global Health Program, College of Public Health, National Taiwan University, Taipei 100025, Taiwan
| |
Collapse
|
3
|
Kramer AM, Thornlow B, Ye C, De Maio N, McBroome J, Hinrichs AS, Lanfear R, Turakhia Y, Corbett-Detig R. Online Phylogenetics with matOptimize Produces Equivalent Trees and is Dramatically More Efficient for Large SARS-CoV-2 Phylogenies than de novo and Maximum-Likelihood Implementations. Syst Biol 2023; 72:1039-1051. [PMID: 37232476 PMCID: PMC10627557 DOI: 10.1093/sysbio/syad031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Revised: 05/14/2023] [Accepted: 06/22/2023] [Indexed: 05/27/2023] Open
Abstract
Phylogenetics has been foundational to SARS-CoV-2 research and public health policy, assisting in genomic surveillance, contact tracing, and assessing emergence and spread of new variants. However, phylogenetic analyses of SARS-CoV-2 have often relied on tools designed for de novo phylogenetic inference, in which all data are collected before any analysis is performed and the phylogeny is inferred once from scratch. SARS-CoV-2 data sets do not fit this mold. There are currently over 14 million sequenced SARS-CoV-2 genomes in online databases, with tens of thousands of new genomes added every day. Continuous data collection, combined with the public health relevance of SARS-CoV-2, invites an "online" approach to phylogenetics, in which new samples are added to existing phylogenetic trees every day. The extremely dense sampling of SARS-CoV-2 genomes also invites a comparison between likelihood and parsimony approaches to phylogenetic inference. Maximum likelihood (ML) and pseudo-ML methods may be more accurate when there are multiple changes at a single site on a single branch, but this accuracy comes at a large computational cost, and the dense sampling of SARS-CoV-2 genomes means that these instances will be extremely rare because each internal branch is expected to be extremely short. Therefore, it may be that approaches based on maximum parsimony (MP) are sufficiently accurate for reconstructing phylogenies of SARS-CoV-2, and their simplicity means that they can be applied to much larger data sets. Here, we evaluate the performance of de novo and online phylogenetic approaches, as well as ML, pseudo-ML, and MP frameworks for inferring large and dense SARS-CoV-2 phylogenies. Overall, we find that online phylogenetics produces similar phylogenetic trees to de novo analyses for SARS-CoV-2, and that MP optimization with UShER and matOptimize produces equivalent SARS-CoV-2 phylogenies to some of the most popular ML and pseudo-ML inference tools. MP optimization with UShER and matOptimize is thousands of times faster than presently available implementations of ML and online phylogenetics is faster than de novo inference. Our results therefore suggest that parsimony-based methods like UShER and matOptimize represent an accurate and more practical alternative to established ML implementations for large SARS-CoV-2 phylogenies and could be successfully applied to other similar data sets with particularly dense sampling and short branch lengths.
Collapse
Affiliation(s)
- Alexander M Kramer
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Bryan Thornlow
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Cheng Ye
- Department of Electrical and Computer Engineering, University of California San Diego, San Diego, CA 92093, USA
| | - Nicola De Maio
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge CB10 1SD, UK
| | - Jakob McBroome
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Angie S Hinrichs
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Robert Lanfear
- Department of Ecology and Evolution, Research School of Biology, Australian National University, Canberra, ACT 2601, Australia
| | - Yatish Turakhia
- Department of Electrical and Computer Engineering, University of California San Diego, San Diego, CA 92093, USA
| | - Russell Corbett-Detig
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| |
Collapse
|
4
|
Zhang Z, Smith MR, Ren X. The Cambrian cirratuliform Iotuba denotes an early annelid radiation. Proc Biol Sci 2023; 290:20222014. [PMID: 36722078 PMCID: PMC9890102 DOI: 10.1098/rspb.2022.2014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
The principal animal lineages (phyla) diverged in the Cambrian, but most diversity at lower taxonomic ranks arose more gradually over the subsequent 500 Myr. Annelid worms seem to exemplify this pattern, based on molecular analyses and the fossil record: Cambrian Burgess Shale-type deposits host a single, early-diverging crown-group annelid alongside a morphologically and taxonomically conservative stem group; the polychaete sub-classes diverge in the Ordovician; and many orders and families are first documented in Carboniferous Lagerstätten. Fifteen new fossils of the 'phoronid' Iotuba (=Eophoronis) chengjiangensis from the early Cambrian Chengjiang Lagerstätte challenge this picture. A chaetal cephalic cage surrounds a retractile head with branchial plates, affiliating Iotuba with the derived polychaete families 'Flabelligeridae' and Acrocirridae. Unless this similarity represents profound convergent evolution, this relationship would pull back the origin of the nested crown groups of Cirratuliformia, Sedentaria and Pleistoannelida by tens of millions of years-indicating a dramatic unseen origin of modern annelid diversity in the heat of the Cambrian 'explosion'.
Collapse
Affiliation(s)
- ZhiFei Zhang
- State Key Laboratory of Continental Dynamics, Shaanxi Key Laboratory of Early Life and Environments and Department of Geology, Northwest University, Xi'an 710069, People's Republic of China
| | - Martin R. Smith
- Department of Earth Sciences, Durham University, Mountjoy Site, South Road, Durham DH1 3LE, UK
| | - XinYi Ren
- State Key Laboratory of Continental Dynamics, Shaanxi Key Laboratory of Early Life and Environments and Department of Geology, Northwest University, Xi'an 710069, People's Republic of China
| |
Collapse
|
5
|
Did some extinct South American native ungulates arise from an afrothere ancestor? A critical appraisal of Avilla and Mothé’s (2021) Sudamericungulata – Panameridiungulata hypothesis. J MAMM EVOL 2022. [DOI: 10.1007/s10914-022-09633-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|