1
|
Zwaans A, Seidel S, Manceau M, Stadler T. A Bayesian phylodynamic inference framework for single-cell CRISPR/Cas9 lineage tracing barcode data with dependent target sites. Philos Trans R Soc Lond B Biol Sci 2025; 380:20230318. [PMID: 39976408 PMCID: PMC11867110 DOI: 10.1098/rstb.2023.0318] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Revised: 08/03/2024] [Accepted: 08/05/2024] [Indexed: 02/21/2025] Open
Abstract
Analysing single-cell lineage relationships of an organism is crucial towards understanding the fundamental cellular dynamics that drive development. Clustered regularly interspaced short palindromic repeats (CRISPR)-based dynamic lineage tracing relies on recent advances in genome editing and sequencing technologies to generate inheritable, evolving genetic barcode sequences that enable reconstruction of such cell lineage trees, also referred to as phylogenetic trees. Recent work generated custom computational strategies to produce robust tree estimates from such data. We further capitalize on these advancements and introduce GESTALT analysis using Bayesian inference (GABI), which extends the analysis of genome editing of synthetic target arrays for lineage tracing (GESTALT) data to a fully integrated Bayesian phylogenetic inference framework in software BEAST 2. This implementation allows users to represent the uncertainty in reconstructed trees and enables their scaling in absolute time. Furthermore, based on such time-scaled lineage trees, the underlying processes of growth, differentiation and apoptosis are quantified through so-called phylodynamic inference, typically relying on a birth-death or coalescent model. After validating its implementation, we demonstrate that our methodology results in robust estimates of growth dynamics characteristic of early Danio rerio development. GABI's codebase is publicly available at https://github.com/azwaans/GABI.This article is part of the theme issue '"A mathematical theory of evolution": phylogenetic models dating back 100 years'.
Collapse
Affiliation(s)
- A. Zwaans
- Department of Biosystems Science and Engineering, ETH Zurich, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - S. Seidel
- Department of Biosystems Science and Engineering, ETH Zurich, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - M. Manceau
- Department of Biosystems Science and Engineering, ETH Zurich, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - T. Stadler
- Department of Biosystems Science and Engineering, ETH Zurich, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
2
|
Berling L, Collienne L, Gavryushkin A. Estimating the mean in the space of ranked phylogenetic trees. Bioinformatics 2024; 40:btae514. [PMID: 39177090 PMCID: PMC11364146 DOI: 10.1093/bioinformatics/btae514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 05/16/2024] [Accepted: 08/21/2024] [Indexed: 08/24/2024] Open
Abstract
MOTIVATION Reconstructing evolutionary histories of biological entities, such as genes, cells, organisms, populations, and species, from phenotypic and molecular sequencing data is central to many biological, palaeontological, and biomedical disciplines. Typically, due to uncertainties and incompleteness in data, the true evolutionary history (phylogeny) is challenging to estimate. Statistical modelling approaches address this problem by introducing and studying probability distributions over all possible evolutionary histories, but can also introduce uncertainties due to misspecification. In practice, computational methods are deployed to learn those distributions typically by sampling them. This approach, however, is fundamentally challenging as it requires designing and implementing various statistical methods over a space of phylogenetic trees (or treespace). Although the problem of developing statistics over a treespace has received substantial attention in the literature and numerous breakthroughs have been made, it remains largely unsolved. The challenge of solving this problem is 2-fold: a treespace has nontrivial often counter-intuitive geometry implying that much of classical Euclidean statistics does not immediately apply; many parametrizations of treespace with promising statistical properties are computationally hard, so they cannot be used in data analyses. As a result, there is no single conventional method for estimating even the most fundamental statistics over any treespace, such as mean and variance, and various heuristics are used in practice. Despite the existence of numerous tree summary methods to approximate means of probability distributions over a treespace based on its geometry, and the theoretical promise of this idea, none of the attempts resulted in a practical method for summarizing tree samples. RESULTS In this paper, we present a tree summary method along with useful properties of our chosen treespace while focusing on its impact on phylogenetic analyses of real datasets. We perform an extensive benchmark study and demonstrate that our method outperforms currently most popular methods with respect to a number of important 'quality' statistics. Further, we apply our method to three empirical datasets ranging from cancer evolution to linguistics and find novel insights into corresponding evolutionary problems in all of them. We hence conclude that this treespace is a promising candidate to serve as a foundation for developing statistics over phylogenetic trees analytically, as well as new computational tools for evolutionary data analyses. AVAILABILITY AND IMPLEMENTATION An implementation is available at https://github.com/bioDS/Centroid-Code.
Collapse
Affiliation(s)
- Lars Berling
- Biological Data Science Lab, School of Mathematics and Statistics, University of Canterbury, Christchurch 8041, New Zealand
| | - Lena Collienne
- Biological Data Science Lab, School of Mathematics and Statistics, University of Canterbury, Christchurch 8041, New Zealand
| | - Alex Gavryushkin
- Biological Data Science Lab, School of Mathematics and Statistics, University of Canterbury, Christchurch 8041, New Zealand
| |
Collapse
|
3
|
Ly-Trong N, Bielow C, De Maio N, Minh BQ. CMAPLE: Efficient Phylogenetic Inference in the Pandemic Era. Mol Biol Evol 2024; 41:msae134. [PMID: 38934791 PMCID: PMC11232695 DOI: 10.1093/molbev/msae134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Revised: 05/15/2024] [Accepted: 06/21/2024] [Indexed: 06/28/2024] Open
Abstract
We have recently introduced MAPLE (MAximum Parsimonious Likelihood Estimation), a new pandemic-scale phylogenetic inference method exclusively designed for genomic epidemiology. In response to the need for enhancing MAPLE's performance and scalability, here we present two key components: (i) CMAPLE software, a highly optimized C++ reimplementation of MAPLE with many new features and advancements, and (ii) CMAPLE library, a suite of application programming interfaces to facilitate the integration of the CMAPLE algorithm into existing phylogenetic inference packages. Notably, we have successfully integrated CMAPLE into the widely used IQ-TREE 2 software, enabling its rapid adoption in the scientific community. These advancements serve as a vital step toward better preparedness for future pandemics, offering researchers powerful tools for large-scale pathogen genomic analysis.
Collapse
Affiliation(s)
- Nhan Ly-Trong
- School of Computing, College of Engineering, Computing and Cybernetics, Australian National University, Canberra, ACT 2600, Australia
| | - Chris Bielow
- Bioinformatics Solution Center, Freie Universität Berlin, 14195 Berlin, Germany
| | - Nicola De Maio
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Bui Quang Minh
- School of Computing, College of Engineering, Computing and Cybernetics, Australian National University, Canberra, ACT 2600, Australia
| |
Collapse
|
4
|
Weber LL, Zhang C, Ochoa I, El-Kebir M. Phertilizer: Growing a clonal tree from ultra-low coverage single-cell DNA sequencing of tumors. PLoS Comput Biol 2023; 19:e1011544. [PMID: 37819942 PMCID: PMC10593221 DOI: 10.1371/journal.pcbi.1011544] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 10/23/2023] [Accepted: 09/26/2023] [Indexed: 10/13/2023] Open
Abstract
Emerging ultra-low coverage single-cell DNA sequencing (scDNA-seq) technologies have enabled high resolution evolutionary studies of copy number aberrations (CNAs) within tumors. While these sequencing technologies are well suited for identifying CNAs due to the uniformity of sequencing coverage, the sparsity of coverage poses challenges for the study of single-nucleotide variants (SNVs). In order to maximize the utility of increasingly available ultra-low coverage scDNA-seq data and obtain a comprehensive understanding of tumor evolution, it is important to also analyze the evolution of SNVs from the same set of tumor cells. We present Phertilizer, a method to infer a clonal tree from ultra-low coverage scDNA-seq data of a tumor. Based on a probabilistic model, our method recursively partitions the data by identifying key evolutionary events in the history of the tumor. We demonstrate the performance of Phertilizer on simulated data as well as on two real datasets, finding that Phertilizer effectively utilizes the copy-number signal inherent in the data to more accurately uncover clonal structure and genotypes compared to previous methods.
Collapse
Affiliation(s)
- Leah L. Weber
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana-Champaign, Illinois, United States of America
| | - Chuanyi Zhang
- Department of Electrical & Computer Engineering, University of Illinois Urbana-Champaign, Urbana-Champaign, Illinois, United States of America
| | - Idoia Ochoa
- Department of Electrical & Computer Engineering, University of Illinois Urbana-Champaign, Urbana-Champaign, Illinois, United States of America
- Department of Electrical and Electronics Engineering, University of Navarre, Donostia, Spain
| | - Mohammed El-Kebir
- Department of Electrical and Electronics Engineering, University of Navarre, Donostia, Spain
- Cancer Center at Illinois, University of Illinois Urbana-Champaign, Urbana-Champaign, Illinois, United States of America
| |
Collapse
|
5
|
Liu X, Griffiths JI, Bishara I, Liu J, Bild AH, Chang JT. Phylogenetic inference from single-cell RNA-seq data. Sci Rep 2023; 13:12854. [PMID: 37553438 PMCID: PMC10409753 DOI: 10.1038/s41598-023-39995-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Accepted: 08/03/2023] [Indexed: 08/10/2023] Open
Abstract
Tumors are comprised of subpopulations of cancer cells that harbor distinct genetic profiles and phenotypes that evolve over time and during treatment. By reconstructing the course of cancer evolution, we can understand the acquisition of the malignant properties that drive tumor progression. Unfortunately, recovering the evolutionary relationships of individual cancer cells linked to their phenotypes remains a difficult challenge. To address this need, we have developed PhylinSic, a method that reconstructs the phylogenetic relationships among cells linked to their gene expression profiles from single cell RNA-sequencing (scRNA-Seq) data. This method calls nucleotide bases using a probabilistic smoothing approach and then estimates a phylogenetic tree using a Bayesian modeling algorithm. We showed that PhylinSic identified evolutionary relationships underpinning drug selection and metastasis and was sensitive enough to identify subclones from genetic drift. We found that breast cancer tumors resistant to chemotherapies harbored multiple genetic lineages that independently acquired high K-Ras and β-catenin, suggesting that therapeutic strategies may need to control multiple lineages to be durable. These results demonstrated that PhylinSic can reconstruct evolution and link the genotypes and phenotypes of cells across monophyletic tumors using scRNA-Seq.
Collapse
Affiliation(s)
- Xuan Liu
- Department of Integrative Biology & Pharmacology, University of Texas Health Science Center at Houston, 6431 Fannin St, MSB 4.218, Houston, TX, 77030, USA
| | - Jason I Griffiths
- Division of Molecular Pharmacology, Department of Medical Oncology & Clinical Therapeutics, City of Hope, Monrovia, CA, USA
| | - Isaac Bishara
- Division of Molecular Pharmacology, Department of Medical Oncology & Clinical Therapeutics, City of Hope, Monrovia, CA, USA
| | - Jiayi Liu
- Department of Integrative Biology & Pharmacology, University of Texas Health Science Center at Houston, 6431 Fannin St, MSB 4.218, Houston, TX, 77030, USA
| | - Andrea H Bild
- Division of Molecular Pharmacology, Department of Medical Oncology & Clinical Therapeutics, City of Hope, Monrovia, CA, USA
| | - Jeffrey T Chang
- Department of Integrative Biology & Pharmacology, University of Texas Health Science Center at Houston, 6431 Fannin St, MSB 4.218, Houston, TX, 77030, USA.
| |
Collapse
|
6
|
Drummond AJ, Chen K, Mendes FK, Xie D. LinguaPhylo: A probabilistic model specification language for reproducible phylogenetic analyses. PLoS Comput Biol 2023; 19:e1011226. [PMID: 37463154 PMCID: PMC10381047 DOI: 10.1371/journal.pcbi.1011226] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Accepted: 05/30/2023] [Indexed: 07/20/2023] Open
Abstract
Phylogenetic models have become increasingly complex, and phylogenetic data sets have expanded in both size and richness. However, current inference tools lack a model specification language that can concisely describe a complete phylogenetic analysis while remaining independent of implementation details. We introduce a new lightweight and concise model specification language, 'LPhy', which is designed to be both human and machine-readable. A graphical user interface accompanies 'LPhy', allowing users to build models, simulate data, and create natural language narratives describing the models. These narratives can serve as the foundation for manuscript method sections. Additionally, we present a command-line interface for converting LPhy-specified models into analysis specification files (in XML format) compatible with the BEAST2 software platform. Collectively, these tools aim to enhance the clarity of descriptions and reporting of probabilistic models in phylogenetic studies, ultimately promoting reproducibility of results.
Collapse
Affiliation(s)
- Alexei J Drummond
- Centre for Computational Evolution, University of Auckland, Auckland, New Zealand
- School of Biological Sciences, University of Auckland, Auckland, New Zealand
- School of Computer Science, University of Auckland, Auckland, New Zealand
| | - Kylie Chen
- Centre for Computational Evolution, University of Auckland, Auckland, New Zealand
- School of Biological Sciences, University of Auckland, Auckland, New Zealand
- School of Computer Science, University of Auckland, Auckland, New Zealand
| | - Fábio K Mendes
- Centre for Computational Evolution, University of Auckland, Auckland, New Zealand
- Department of Biology, Washington University in St. Louis, St. Louis, United States of America
| | - Dong Xie
- Centre for Computational Evolution, University of Auckland, Auckland, New Zealand
- School of Biological Sciences, University of Auckland, Auckland, New Zealand
- School of Computer Science, University of Auckland, Auckland, New Zealand
| |
Collapse
|