1
|
Huebert DNG, Ghorbani A, Lam SYB, Larijani M. Coevolution of Lentiviral Vif with Host A3F and A3G: Insights from Computational Modelling and Ancestral Sequence Reconstruction. Viruses 2025; 17:393. [PMID: 40143321 PMCID: PMC11946711 DOI: 10.3390/v17030393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2024] [Revised: 03/03/2025] [Accepted: 03/05/2025] [Indexed: 03/28/2025] Open
Abstract
The evolutionary arms race between host restriction factors and viral antagonists provides crucial insights into immune system evolution and viral adaptation. This study investigates the structural and evolutionary dynamics of the double-domain restriction factors A3F and A3G and their viral inhibitor, Vif, across diverse primate species. By constructing 3D structural homology models and integrating ancestral sequence reconstruction (ASR), we identified patterns of sequence diversity, structural conservation, and functional adaptation. Inactive CD1 (Catalytic Domain 1) domains displayed greater sequence diversity and more positive surface charges than active CD2 domains, aiding nucleotide chain binding and intersegmental transfer. Despite variability, the CD2 DNA-binding grooves remained structurally consistent with conserved residues maintaining critical functions. A3F and A3G diverged in loop 7' interaction strategies, utilising distinct molecular interactions to facilitate their roles. Vif exhibited charge variation linked to host species, reflecting its coevolution with A3 proteins. These findings illuminate how structural adaptations and charge dynamics enable both restriction factors and their viral antagonists to adapt to selective pressures. Our results emphasize the importance of studying structural evolution in host-virus interactions, with implications for understanding immune defense mechanisms, zoonotic risks, and viral evolution. This work establishes a foundation for further exploration of restriction factor diversity and coevolution across species.
Collapse
Affiliation(s)
- David Nicolas Giuseppe Huebert
- Immunology and Infectious Diseases Program, Division of Biomedical Sciences, Faculty of Medicine, Memorial University of Newfoundland, St. John’s, NL A1C 5S7, Canada; (D.N.G.H.); (A.G.)
- Structural Biology and Immunology Program, Department of Molecular Biology and Biochemistry, Faculty of Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada;
| | - Atefeh Ghorbani
- Immunology and Infectious Diseases Program, Division of Biomedical Sciences, Faculty of Medicine, Memorial University of Newfoundland, St. John’s, NL A1C 5S7, Canada; (D.N.G.H.); (A.G.)
| | - Shaw Yick Brian Lam
- Structural Biology and Immunology Program, Department of Molecular Biology and Biochemistry, Faculty of Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada;
| | - Mani Larijani
- Immunology and Infectious Diseases Program, Division of Biomedical Sciences, Faculty of Medicine, Memorial University of Newfoundland, St. John’s, NL A1C 5S7, Canada; (D.N.G.H.); (A.G.)
- Structural Biology and Immunology Program, Department of Molecular Biology and Biochemistry, Faculty of Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada;
| |
Collapse
|
2
|
Jennings-Shaffer C, Rich DH, Macaulay M, Karcher MD, Ganapathy T, Kiami S, Kooperberg A, Zhang C, Suchard MA, Matsen FA. Finding high posterior density phylogenies by systematically extending a directed acyclic graph. Algorithms Mol Biol 2025; 20:2. [PMID: 40022201 PMCID: PMC11869616 DOI: 10.1186/s13015-025-00273-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2024] [Accepted: 02/13/2025] [Indexed: 03/03/2025] Open
Abstract
Bayesian phylogenetics typically estimates a posterior distribution, or aspects thereof, using Markov chain Monte Carlo methods. These methods integrate over tree space by applying local rearrangements to move a tree through its space as a random walk. Previous work explored the possibility of replacing this random walk with a systematic search, but was quickly overwhelmed by the large number of probable trees in the posterior distribution. In this paper we develop methods to sidestep this problem using a recently introduced structure called the subsplit directed acyclic graph (sDAG). This structure can represent many trees at once, and local rearrangements of trees translate to methods of enlarging the sDAG. Here we propose two methods of introducing, ranking, and selecting local rearrangements on sDAGs to produce a collection of trees with high posterior density. One of these methods successfully recovers the set of high posterior density trees across a range of data sets. However, we find that a simpler strategy of aggregating trees into an sDAG in fact is computationally faster and returns a higher fraction of probable trees.
Collapse
Affiliation(s)
- Chris Jennings-Shaffer
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
| | - David H Rich
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
| | - Matthew Macaulay
- Australian Institute for Microbiology & Infection, University of Technology Sydney, Sydney, Australia
| | - Michael D Karcher
- Department of Math & Computer Science, Muhlenberg College, Allentown, Pennsylvania, USA
| | - Tanvi Ganapathy
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
| | - Shosuke Kiami
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
| | - Anna Kooperberg
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
| | - Cheng Zhang
- School of Mathematical Sciences, Peking University, Beijing, China
| | - Marc A Suchard
- Department of Human Genetics, University of California, Los Angeles, USA
- Department of Computational Medicine, University of California, Los Angeles, USA
- Department of Biostatistics, University of California, Los Angeles, USA
| | - Frederick A Matsen
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA.
- Department of Genome Sciences, University of Washington, Seattle, USA.
- Howard Hughes Medical Institute, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA.
- Computational Biology Program, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave. N., Mail stop: S2-140, Seattle, 98109-1024, WA, USA.
| |
Collapse
|
3
|
Jennings-Shaffer C, Rich DH, Macaulay M, Karcher MD, Ganapathy T, Kiami S, Kooperberg A, Zhang C, Suchard MA, Matsen FA. Finding high posterior density phylogenies by systematically extending a directed acyclic graph. ARXIV 2024:arXiv:2411.09074v2. [PMID: 39606729 PMCID: PMC11601806] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
Bayesian phylogenetics typically estimates a posterior distribution, or aspects thereof, using Markov chain Monte Carlo methods. These methods integrate over tree space by applying local rearrangements to move a tree through its space as a random walk. Previous work explored the possibility of replacing this random walk with a systematic search, but was quickly overwhelmed by the large number of probable trees in the posterior distribution. In this paper we develop methods to sidestep this problem using a recently introduced structure called the subsplit directed acyclic graph (sDAG). This structure can represent many trees at once, and local rearrangements of trees translate to methods of enlarging the sDAG. Here we propose two methods of introducing, ranking, and selecting local rearrangements on sDAGs to produce a collection of trees with high posterior density. One of these methods successfully recovers the set of high posterior density trees across a range of data sets. However, we find that a simpler strategy of aggregating trees into an sDAG in fact is computationally faster and returns a higher fraction of probable trees.
Collapse
Affiliation(s)
- Chris Jennings-Shaffer
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
| | - David H Rich
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
| | - Matthew Macaulay
- University of Technology Sydney, Australian Institute for Microbiology & Infection, Sydney, Australia
| | - Michael D Karcher
- Department of Math & Computer Science, Muhlenberg College, Allentown, Pennsylvania, USA
| | - Tanvi Ganapathy
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
| | - Shosuke Kiami
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
| | - Anna Kooperberg
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
| | - Cheng Zhang
- School of Mathematical Sciences, Peking University, Beijing, China
| | - Marc A Suchard
- Department of Human Genetics, University of California, Los Angeles, USA
- Department of Computational Medicine, University of California, Los Angeles, USA
- Department of Biostatistics, University of California, Los Angeles, USA
| | - Frederick A Matsen
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
- Department of Genome Sciences, University of Washington, Seattle, USA
- Howard Hughes Medical Institute, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
| |
Collapse
|
4
|
Magee A, Karcher M, Matsen FA, Minin VM. How Trustworthy Is Your Tree? Bayesian Phylogenetic Effective Sample Size Through the Lens of Monte Carlo Error. BAYESIAN ANALYSIS 2024; 19:565-593. [PMID: 38665694 PMCID: PMC11042687 DOI: 10.1214/22-ba1339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/28/2024]
Abstract
Bayesian inference is a popular and widely-used approach to infer phylogenies (evolutionary trees). However, despite decades of widespread application, it remains difficult to judge how well a given Bayesian Markov chain Monte Carlo (MCMC) run explores the space of phylogenetic trees. In this paper, we investigate the Monte Carlo error of phylogenies, focusing on high-dimensional summaries of the posterior distribution, including variability in estimated edge/branch (known in phylogenetics as "split") probabilities and tree probabilities, and variability in the estimated summary tree. Specifically, we ask if there is any measure of effective sample size (ESS) applicable to phylogenetic trees which is capable of capturing the Monte Carlo error of these three summary measures. We find that there are some ESS measures capable of capturing the error inherent in using MCMC samples to approximate the posterior distributions on phylogenies. We term these tree ESS measures, and identify a set of three which are useful in practice for assessing the Monte Carlo error. Lastly, we present visualization tools that can improve comparisons between multiple independent MCMC runs by accounting for the Monte Carlo error present in each chain. Our results indicate that common post-MCMC workflows are insufficient to capture the inherent Monte Carlo error of the tree, and highlight the need for both within-chain mixing and between-chain convergence assessments.
Collapse
Affiliation(s)
- Andrew Magee
- Department of Biology, University of Washington, Seattle, WA, 98195, USA
| | - Michael Karcher
- Department of Mathematics and Computer Science, Muhlenberg College, Allentown, PA, 18104, USA
| | - Frederick A. Matsen
- Howard Hughes Medical Institute, Fred Hutchison Cancer Research Center, Departments of Genome Sciences and Statistics, University of Washington, Seattle, WA, 98109, USA
| | - Volodymyr M. Minin
- Department of Statistics, University of California, Irvine, Irvine, CA, 92697, USA
| |
Collapse
|
5
|
Hassler GW, Magee A, Zhang Z, Baele G, Lemey P, Ji X, Fourment M, Suchard MA. Data integration in Bayesian phylogenetics. ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION 2022; 10:353-377. [PMID: 38774036 PMCID: PMC11108065 DOI: 10.1146/annurev-statistics-033021-112532] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2024]
Abstract
Researchers studying the evolution of viral pathogens and other organisms increasingly encounter and use large and complex data sets from multiple different sources. Statistical research in Bayesian phylogenetics has risen to this challenge. Researchers use phylogenetics not only to reconstruct the evolutionary history of a group of organisms, but also to understand the processes that guide its evolution and spread through space and time. To this end, it is now the norm to integrate numerous sources of data. For example, epidemiologists studying the spread of a virus through a region incorporate data including genetic sequences (e.g. DNA), time, location (both continuous and discrete) and environmental covariates (e.g. social connectivity between regions) into a coherent statistical model. Evolutionary biologists routinely do the same with genetic sequences, location, time, fossil and modern phenotypes, and ecological covariates. These complex, hierarchical models readily accommodate both discrete and continuous data and have enormous combined discrete/continuous parameter spaces including, at a minimum, phylogenetic tree topologies and branch lengths. The increased size and complexity of these statistical models have spurred advances in computational methods to make them tractable. We discuss both the modeling and computational advances below, as well as unsolved problems and areas of active research.
Collapse
Affiliation(s)
- Gabriel W Hassler
- Department of Computational Medicine, University of California, Los Angeles, USA, 90095
| | - Andrew Magee
- Department of Biostatistics, University of California, Los Angeles, USA, 90095
| | - Zhenyu Zhang
- Department of Biostatistics, University of California, Los Angeles, USA, 90095
| | - Guy Baele
- Department of Microbiology and Immunology, Rega Institute, KU Leuven, Leuven, Belgium, 3000
| | - Philippe Lemey
- Department of Microbiology and Immunology, Rega Institute, KU Leuven, Leuven, Belgium, 3000
| | - Xiang Ji
- Department of Mathematics, Tulane University, New Orleans, USA, 70118
| | - Mathieu Fourment
- Australian Institute for Microbiology and Infection, University of Technology Sydney, Ultimo NSW, Australia, 2007
| | - Marc A Suchard
- Department of Computational Medicine, University of California, Los Angeles, USA, 90095
- Department of Biostatistics, University of California, Los Angeles, USA, 90095
- Department of Human Genetics, University of California, Los Angeles, USA, 90095
| |
Collapse
|
6
|
Harrington SM, Wishingrad V, Thomson RC. Properties of Markov Chain Monte Carlo Performance across Many Empirical Alignments. Mol Biol Evol 2021; 38:1627-1640. [PMID: 33185685 PMCID: PMC8042746 DOI: 10.1093/molbev/msaa295] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Nearly all current Bayesian phylogenetic applications rely on Markov chain Monte Carlo (MCMC) methods to approximate the posterior distribution for trees and other parameters of the model. These approximations are only reliable if Markov chains adequately converge and sample from the joint posterior distribution. Although several studies of phylogenetic MCMC convergence exist, these have focused on simulated data sets or select empirical examples. Therefore, much that is considered common knowledge about MCMC in empirical systems derives from a relatively small family of analyses under ideal conditions. To address this, we present an overview of commonly applied phylogenetic MCMC diagnostics and an assessment of patterns of these diagnostics across more than 18,000 empirical analyses. Many analyses appeared to perform well and failures in convergence were most likely to be detected using the average standard deviation of split frequencies, a diagnostic that compares topologies among independent chains. Different diagnostics yielded different information about failed convergence, demonstrating that multiple diagnostics must be employed to reliably detect problems. The number of taxa and average branch lengths in analyses have clear impacts on MCMC performance, with more taxa and shorter branches leading to more difficult convergence. We show that the usage of models that include both Γ-distributed among-site rate variation and a proportion of invariable sites is not broadly problematic for MCMC convergence but is also unnecessary. Changes to heating and the usage of model-averaged substitution models can both offer improved convergence in some cases, but neither are a panacea.
Collapse
Affiliation(s)
| | - Van Wishingrad
- School of Life Sciences, University of Hawai'i, Honolulu, HI
| | | |
Collapse
|