1
|
Teo B, Bastide P, Ané C. Leveraging graphical model techniques to study evolution on phylogenetic networks. Philos Trans R Soc Lond B Biol Sci 2025; 380:20230310. [PMID: 39976402 PMCID: PMC11867149 DOI: 10.1098/rstb.2023.0310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Revised: 08/27/2024] [Accepted: 09/16/2024] [Indexed: 02/21/2025] Open
Abstract
The evolution of molecular and phenotypic traits is commonly modelled using Markov processes along a phylogeny. This phylogeny can be a tree, or a network if it includes reticulations, representing events such as hybridization or admixture. Computing the likelihood of data observed at the leaves is costly as the size and complexity of the phylogeny grows. Efficient algorithms exist for trees, but cannot be applied to networks. We show that a vast array of models for trait evolution along phylogenetic networks can be reformulated as graphical models, for which efficient belief propagation algorithms exist. We provide a brief review of belief propagation on general graphical models, then focus on linear Gaussian models for continuous traits. We show how belief propagation techniques can be applied for exact or approximate (but more scalable) likelihood and gradient calculations, and prove novel results for efficient parameter inference of some models. We highlight the possible fruitful interactions between graphical models and phylogenetic methods. For example, approximate likelihood approaches have the potential to greatly reduce computational costs for phylogenies with reticulations.This article is part of the theme issue '"A mathematical theory of evolution": phylogenetic models dating back 100 years'.
Collapse
Affiliation(s)
- Benjamin Teo
- Department of Statistics, University of Wisconsin-Madison, Madison, WI, USA
| | - Paul Bastide
- IMAG, Université de Montpellier, CNRS, Montpellier, France
| | - Cécile Ané
- Department of Statistics, University of Wisconsin-Madison, Madison, WI, USA
- Department of Botany, University of Wisconsin-Madison, Madison, WI, USA
| |
Collapse
|
2
|
Mohammadi F, Visagan S, Gross SM, Karginov L, Lagarde JC, Heiser LM, Meyer AS. A lineage tree-based hidden Markov model quantifies cellular heterogeneity and plasticity. Commun Biol 2022; 5:1258. [PMID: 36396800 PMCID: PMC9671968 DOI: 10.1038/s42003-022-04208-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Accepted: 11/01/2022] [Indexed: 11/18/2022] Open
Abstract
Individual cells can assume a variety of molecular and phenotypic states and recent studies indicate that cells can rapidly adapt in response to therapeutic stress. Such phenotypic plasticity may confer resistance, but also presents opportunities to identify molecular programs that could be targeted for therapeutic benefit. Approaches to quantify tumor-drug responses typically focus on snapshot, population-level measurements. While informative, these methods lack lineage and temporal information, which are particularly critical for understanding dynamic processes such as cell state switching. As new technologies have become available to measure lineage relationships, modeling approaches will be needed to identify the forms of cell-to-cell heterogeneity present in these data. Here we apply a lineage tree-based adaptation of a hidden Markov model that employs single cell lineages as input to learn the characteristic patterns of phenotypic heterogeneity and state transitions. In benchmarking studies, we demonstrated that the model successfully classifies cells within experimentally-tractable dataset sizes. As an application, we analyzed experimental measurements in cancer and non-cancer cell populations under various treatments. We find evidence of multiple phenotypically distinct states, with considerable heterogeneity and unique drug responses. In total, this framework allows for the flexible modeling of single cell heterogeneity across lineages to quantify, understand, and control cell state switching.
Collapse
Affiliation(s)
- Farnaz Mohammadi
- Department of Bioengineering, University of California, Los Angeles, CA, USA
| | - Shakthi Visagan
- Department of Bioengineering, University of California, Los Angeles, CA, USA
| | - Sean M Gross
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA
| | - Luka Karginov
- Department of Bioengineering, University of Illinois, Urbana Champaign, IL, USA
| | - J C Lagarde
- Department of Bioengineering, University of California, Los Angeles, CA, USA
| | - Laura M Heiser
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA
| | - Aaron S Meyer
- Department of Bioengineering, University of California, Los Angeles, CA, USA.
- Department of Bioinformatics, University of California, Los Angeles, CA, USA.
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, USA.
- Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, University of California, Los Angeles, CA, USA.
| |
Collapse
|
3
|
Lecocq M, Groussin M, Gouy M, Brochier-Armanet C. The Molecular Determinants of Thermoadaptation: Methanococcales as a Case Study. Mol Biol Evol 2021; 38:1761-1776. [PMID: 33450027 PMCID: PMC8097290 DOI: 10.1093/molbev/msaa312] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Previous reports have shown that environmental temperature impacts proteome evolution in Bacteria and Archaea. However, it is unknown whether thermoadaptation mainly occurs via the sequential accumulation of substitutions, massive horizontal gene transfers, or both. Measuring the real contribution of amino acid substitution to thermoadaptation is challenging, because of confounding environmental and genetic factors (e.g., pH, salinity, genomic G + C content) that also affect proteome evolution. Here, using Methanococcales, a major archaeal lineage, as a study model, we show that optimal growth temperature is the major factor affecting variations in amino acid frequencies of proteomes. By combining phylogenomic and ancestral sequence reconstruction approaches, we disclose a sequential substitutional scheme in which lysine plays a central role by fine tuning the pool of arginine, serine, threonine, glutamine, and asparagine, whose frequencies are strongly correlated with optimal growth temperature. Finally, we show that colonization to new thermal niches is not associated with high amounts of horizontal gene transfers. Altogether, although the acquisition of a few key proteins through horizontal gene transfer may have favored thermoadaptation in Methanococcales, our findings support sequential amino acid substitutions as the main factor driving thermoadaptation.
Collapse
Affiliation(s)
- Michel Lecocq
- Laboratoire de Biométrie et Biologie Évolutive, Université de Lyon, Université Lyon 1, CNRS, UMR5558, Villeurbanne, France
| | - Mathieu Groussin
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Manolo Gouy
- Laboratoire de Biométrie et Biologie Évolutive, Université de Lyon, Université Lyon 1, CNRS, UMR5558, Villeurbanne, France
| | - Céline Brochier-Armanet
- Laboratoire de Biométrie et Biologie Évolutive, Université de Lyon, Université Lyon 1, CNRS, UMR5558, Villeurbanne, France
| |
Collapse
|
4
|
Bastide P, Ho LST, Baele G, Lemey P, Suchard MA. Efficient Bayesian inference of general Gaussian models on large phylogenetic trees. Ann Appl Stat 2021. [DOI: 10.1214/20-aoas1419] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
| | - Lam Si Tung Ho
- Department of Mathematics and Statistics, Dalhousie University
| | - Guy Baele
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven
| | - Philippe Lemey
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven
| | - Marc A. Suchard
- Departments of Biostatistics, Biomathematics, and Human Genetics, University of California, Los Angeles
| |
Collapse
|
5
|
Mitov V, Bartoszek K, Asimomitis G, Stadler T. Fast likelihood calculation for multivariate Gaussian phylogenetic models with shifts. Theor Popul Biol 2019; 131:66-78. [PMID: 31805292 DOI: 10.1016/j.tpb.2019.11.005] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2019] [Revised: 11/19/2019] [Accepted: 11/20/2019] [Indexed: 11/27/2022]
Abstract
Phylogenetic comparative methods (PCMs) have been used to study the evolution of quantitative traits in various groups of organisms, ranging from micro-organisms to animal and plant species. A common approach has been to assume a Gaussian phylogenetic model for the trait evolution along the tree, such as a branching Brownian motion (BM) or an Ornstein-Uhlenbeck (OU) process. Then, the parameters of the process have been inferred based on a given tree and trait data for the sampled species. At the heart of this inference lie multiple calculations of the model likelihood, that is, the probability density of the observed trait data, conditional on the model parameters and the tree. With the increasing availability of big phylogenetic trees, spanning hundreds to several thousand sampled species, this approach is facing a two-fold challenge. First, the assumption of a single Gaussian process governing the entire tree is not adequate in the presence of heterogeneous evolutionary forces acting in different parts of the tree. Second, big trees present a computational challenge, due to the time and memory complexity of the model likelihood calculation. Here, we explore a sub-family, denoted GLInv, of the Gaussian phylogenetic models, with the transition density exhibiting the properties that the expectation depends Linearly on the ancestral trait value and the variance is Invariant with respect to the ancestral value. We show that GLInv contains the vast majority of Gaussian models currently used in PCMs, while supporting an efficient (linear in the number of nodes) algorithm for the likelihood calculation. The algorithm supports scenarios with missing data, as well as different types of trees, including trees with polytomies and non-ultrametric trees. To account for the heterogeneity in the evolutionary forces, the algorithm supports models with "shifts" occurring at specific points in the tree. Such shifts can include changes in some or all parameters, as well as the type of the model, provided that the model remains within the GLInv family. This contrasts with most of the current implementations where, due to slow likelihood calculation, the shifts are restricted to specific parameters in a single type of model, such as the long-term selection optima of an OU process, assuming that all of its other parameters, such as evolutionary rate and selection strength, are global for the entire tree. We provide an implementation of this likelihood calculation algorithm in an accompanying R-package called PCMBase. The package has been designed as a generic library that can be integrated with existing or novel maximum likelihood or Bayesian inference tools.
Collapse
Affiliation(s)
- Venelin Mitov
- Department of Biosystems Science and Engineering, Eidgenössische Technische Hochschule Zürich, Basel, Switzerland; Swiss Institute of Bioinformatics, Lausanne, Switzerland.
| | - Krzysztof Bartoszek
- Department of Computer and Information Science, Linköping University, Linköping, Sweden.
| | - Georgios Asimomitis
- Department of Biosystems Science and Engineering, Eidgenössische Technische Hochschule Zürich, Basel, Switzerland.
| | - Tanja Stadler
- Department of Biosystems Science and Engineering, Eidgenössische Technische Hochschule Zürich, Basel, Switzerland; Swiss Institute of Bioinformatics, Lausanne, Switzerland.
| |
Collapse
|
6
|
Bastide P, Mariadassou M, Robin S. Detection of adaptive shifts on phylogenies by using shifted stochastic processes on a tree. J R Stat Soc Series B Stat Methodol 2016. [DOI: 10.1111/rssb.12206] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Affiliation(s)
- Paul Bastide
- AgroParisTech; Paris
- Institut National de la Recherche Agronomique; Jouy-en-Josas and Paris
- Université Paris-Saclay; France
| | - Mahendra Mariadassou
- Institut National de la Recherche Agronomique; Jouy-en-Josas and Paris
- Université Paris-Saclay; France
| | - Stéphane Robin
- AgroParisTech; Paris
- Institut National de la Recherche Agronomique; Jouy-en-Josas and Paris
- Université Paris-Saclay; France
| |
Collapse
|
7
|
Clavel J, Escarguel G, Merceron G. mv
morph
: an
r
package for fitting multivariate evolutionary models to morphometric data. Methods Ecol Evol 2015. [DOI: 10.1111/2041-210x.12420] [Citation(s) in RCA: 251] [Impact Index Per Article: 25.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Affiliation(s)
- Julien Clavel
- Ecole Normale Supérieure IBENS UMR 8197 CNRS 46 rue d'Ulm 75005 Paris France
- Laboratoire de Géologie de Lyon, UMR 5276 CNRS, UCB Lyon 1, ENS Lyon Campus de la Doua 2 rue Raphaël Dubois 69622 Villeurbanne Cedex France
| | - Gilles Escarguel
- Laboratoire de Géologie de Lyon, UMR 5276 CNRS, UCB Lyon 1, ENS Lyon Campus de la Doua 2 rue Raphaël Dubois 69622 Villeurbanne Cedex France
| | - Gildas Merceron
- IPHEP, UMR 7262 CNRS, Université de Poitiers Bat. B35 – TSA‐51106 – 6 rue M. Brunet 86073 Poitiers Cedex 9 France
| |
Collapse
|