1
|
Gunnarsson EB, Leder K, Zhang X. Limit theorems for the site frequency spectrum of neutral mutations in an exponentially growing population. Stoch Process Their Appl 2025; 182:104565. [PMID: 40191782 PMCID: PMC11970945 DOI: 10.1016/j.spa.2025.104565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/09/2025]
Abstract
The site frequency spectrum (SFS) is a widely used summary statistic of genomic data. Motivated by recent evidence for the role of neutral evolution in cancer, we investigate the SFS of neutral mutations in an exponentially growing population. Using branching process techniques, we establish (first-order) almost sure convergence results for the SFS of a Galton-Watson process, evaluated either at a fixed time or at the stochastic time at which the population first reaches a certain size. We finally use our results to construct consistent estimators for the extinction probability and the effective mutation rate of a birth-death process.
Collapse
Affiliation(s)
- Einar Bjarki Gunnarsson
- Department of Industrial and Systems Engineering, University of Minnesota, Twin Cities, Minneapolis, MN, USA
- School of Mathematics, University of Minnesota, Twin Cities, Minneapolis, MN, USA
- Applied Mathematics Division, Science Institute, University of Iceland, Reykjavík, Iceland
| | - Kevin Leder
- Department of Industrial and Systems Engineering, University of Minnesota, Twin Cities, Minneapolis, MN, USA
| | - Xuanming Zhang
- Department of Industrial and Systems Engineering, University of Minnesota, Twin Cities, Minneapolis, MN, USA
| |
Collapse
|
2
|
Doulcier G, Lambert A. Neutral diversity in experimental metapopulations. Theor Popul Biol 2024; 158:89-108. [PMID: 38493997 DOI: 10.1016/j.tpb.2024.02.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 02/07/2024] [Accepted: 02/27/2024] [Indexed: 03/19/2024]
Abstract
New automated and high-throughput methods allow the manipulation and selection of numerous bacterial populations. In this manuscript we are interested in the neutral diversity patterns that emerge from such a setup in which many bacterial populations are grown in parallel serial transfers, in some cases with population-wide extinction and splitting events. We model bacterial growth by a birth-death process and use the theory of coalescent point processes. We show that there is a dilution factor that optimises the expected amount of neutral diversity for a given number of cycles, and study the power law behaviour of the mutation frequency spectrum for different experimental regimes. We also explore how neutral variation diverges between two recently split populations by establishing a new formula for the expected number of shared and private mutations. Finally, we show the interest of such a setup to select a phenotype of interest that requires multiple mutations.
Collapse
Affiliation(s)
- Guilhem Doulcier
- Macquarie University, Department of Philosophy, Sydney, Australia; Max Planck Institute for Evolutionary Biology, Department of Theoretical Biology, Plön, Germany.
| | - Amaury Lambert
- SMILE - Stochastic Models for the Inference of Life Evolution, Institut de Biologie de l'ENS (IBENS), École Normale Supérieure, CNRS UMR8197, INSERM U1024, France; Centre Interdisciplinaire de Recherche en Biologie (CIRB), Collège de France, CNRS UMR7241, INSERM U1050, PSL Université, Paris, France.
| |
Collapse
|
3
|
Johnson B, Shuai Y, Schweinsberg J, Curtius K. cloneRate: fast estimation of single-cell clonal dynamics using coalescent theory. Bioinformatics 2023; 39:btad561. [PMID: 37699006 PMCID: PMC10534056 DOI: 10.1093/bioinformatics/btad561] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 08/25/2023] [Indexed: 09/14/2023] Open
Abstract
MOTIVATION While evolutionary approaches to medicine show promise, measuring evolution itself is difficult due to experimental constraints and the dynamic nature of body systems. In cancer evolution, continuous observation of clonal architecture is impossible, and longitudinal samples from multiple timepoints are rare. Increasingly available DNA sequencing datasets at single-cell resolution enable the reconstruction of past evolution using mutational history, allowing for a better understanding of dynamics prior to detectable disease. There is an unmet need for an accurate, fast, and easy-to-use method to quantify clone growth dynamics from these datasets. RESULTS We derived methods based on coalescent theory for estimating the net growth rate of clones using either reconstructed phylogenies or the number of shared mutations. We applied and validated our analytical methods for estimating the net growth rate of clones, eliminating the need for complex simulations used in previous methods. When applied to hematopoietic data, we show that our estimates may have broad applications to improve mechanistic understanding and prognostic ability. Compared to clones with a single or unknown driver mutation, clones with multiple drivers have significantly increased growth rates (median 0.94 versus 0.25 per year; P = 1.6×10-6). Further, stratifying patients with a myeloproliferative neoplasm (MPN) by the growth rate of their fittest clone shows that higher growth rates are associated with shorter time to MPN diagnosis (median 13.9 versus 26.4 months; P = 0.0026). AVAILABILITY AND IMPLEMENTATION We developed a publicly available R package, cloneRate, to implement our methods (Package website: https://bdj34.github.io/cloneRate/). Source code: https://github.com/bdj34/cloneRate/.
Collapse
Affiliation(s)
- Brian Johnson
- Division of Biomedical Informatics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, United States
| | - Yubo Shuai
- Department of Mathematics, University of California San Diego, La Jolla, CA 92093, United States
| | - Jason Schweinsberg
- Department of Mathematics, University of California San Diego, La Jolla, CA 92093, United States
| | - Kit Curtius
- Division of Biomedical Informatics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, United States
- Moores Cancer Center, University of California San Diego, La Jolla, CA 92093, United States
- VA San Diego Healthcare System, San Diego, CA 92161, United States
| |
Collapse
|
4
|
Kurpas MK, Jaksik R, Kuś P, Kimmel M. Genomic Analysis of SARS-CoV-2 Alpha, Beta and Delta Variants of Concern Uncovers Signatures of Neutral and Non-Neutral Evolution. Viruses 2022; 14:2375. [PMID: 36366473 PMCID: PMC9695218 DOI: 10.3390/v14112375] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Revised: 10/20/2022] [Accepted: 10/21/2022] [Indexed: 01/31/2023] Open
Abstract
Due to the emergence of new variants of the SARS-CoV-2 coronavirus, the question of how the viral genomes evolved, leading to the formation of highly infectious strains, becomes particularly important. Three major emergent strains, Alpha, Beta and Delta, characterized by a significant number of missense mutations, provide a natural test field. We accumulated and aligned 4.7 million SARS-CoV-2 genomes from the GISAID database and carried out a comprehensive set of analyses. This collection covers the period until the end of October 2021, i.e., the beginnings of the Omicron variant. First, we explored combinatorial complexity of the genomic variants emerging and their timing, indicating very strong, albeit hidden, selection forces. Our analyses show that the mutations that define variants of concern did not arise gradually but rather co-evolved rapidly, leading to the emergence of the full variant strain. To explore in more detail the evolutionary forces at work, we developed time trajectories of mutations at all 29,903 sites of the SARS-CoV-2 genome, week by week, and stratified them into trends related to (i) point substitutions, (ii) deletions and (iii) non-sequenceable regions. We focused on classifying the genetic forces active at different ranges of the mutational spectrum. We observed the agreement of the lowest-frequency mutation spectrum with the Griffiths-Tavaré theory, under the Infinite Sites Model and neutrality. If we widen the frequency range, we observe the site frequency spectra much more consistently with the Tung-Durrett model assuming clone competition and selection. The coefficients of the fitting model indicate the possibility of selection acting to promote gradual growth slowdown, as observed in the history of the variants of concern. These results add up to a model of genomic evolution, which partly fits into the classical drift barrier ideas. Certain observations, such as mutation "bands" persistent over the epidemic history, suggest contribution of genetic forces different from mutation, drift and selection, including recombination or other genome transformations. In addition, we show that a "toy" mathematical model can qualitatively reproduce how new variants (clones) stem from rare advantageous driver mutations, and then acquire neutral or disadvantageous passenger mutations which gradually reduce their fitness so they can be then outcompeted by new variants due to other driver mutations.
Collapse
Affiliation(s)
- Monika Klara Kurpas
- Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland; (M.K.K.); (R.J.); (P.K.)
| | - Roman Jaksik
- Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland; (M.K.K.); (R.J.); (P.K.)
| | - Pawel Kuś
- Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland; (M.K.K.); (R.J.); (P.K.)
| | - Marek Kimmel
- Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland; (M.K.K.); (R.J.); (P.K.)
- Department of Statistics and Bioengineering, Rice University, 6100 Main Street, Houston, TX 77005, USA
| |
Collapse
|
5
|
Kurpas MK, Kimmel M. Modes of Selection in Tumors as Reflected by Two Mathematical Models and Site Frequency Spectra. Front Ecol Evol 2022; 10:889438. [PMID: 37333691 PMCID: PMC10275603 DOI: 10.3389/fevo.2022.889438] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/28/2024] Open
Abstract
The tug-of-war model was developed in a series of papers of McFarland and co-authors to account for existence of mutually counteracting rare advantageous driver mutations and more frequent slightly deleterious passenger mutations in cancer. In its original version, it was a state-dependent branching process. Because of its formulation, the tug-of-war model is of importance for tackling the problem as to whether evolution of cancerous tumors is "Darwinian" or "non-Darwinian." We define two Time-Continuous Markov Chain versions of the model, including identical mutation processes but adopting different drift and selection components. In Model A, drift and selection process preserves expected fitness whereas in Model B it leads to non-decreasing expected fitness. We investigate these properties using mathematical analysis and extensive simulations, which detect the effect of the so-called drift barrier in Model B but not in Model A. These effects are reflected in different structure of clone genealogies in the two models. Our work is related to the past theoretical work in the field of evolutionary genetics, concerning the interplay among mutation, drift and selection, in absence of recombination (asexual reproduction), where epistasis plays a major role. Finally, we use the statistics of mutation frequencies known as the Site Frequency Spectra (SFS), to compare the variant frequencies in DNA of sequenced HER2+ breast cancers, to those based on Model A and B simulations. The tumor-based SFS are better reproduced by Model A, pointing out a possible selection pattern of HER2+ tumor evolution. To put our models in context, we carried out an exploratory study of how publicly accessible data from breast, prostate, skin and ovarian cancers fit a range of models found in the literature.
Collapse
Affiliation(s)
- Monika K. Kurpas
- Department of Systems Biology and Engineering, Silesian University of Technology, Gliwice, Poland
| | - Marek Kimmel
- Department of Systems Biology and Engineering, Silesian University of Technology, Gliwice, Poland
- Department of Statistics and Bioengineering, Rice University, Houston, TX, United States
| |
Collapse
|
6
|
A characterisation of the reconstructed birth-death process through time rescaling. Theor Popul Biol 2020; 134:61-76. [PMID: 32439294 DOI: 10.1016/j.tpb.2020.05.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Revised: 04/15/2020] [Accepted: 05/05/2020] [Indexed: 11/23/2022]
Abstract
The dynamics of a population exhibiting exponential growth can be modelled as a birth-death process, which naturally captures the stochastic variation in population size over time. In this article, we consider a supercritical birth-death process, started at a random time in the past, and conditioned to have n sampled individuals at the present. The genealogy of individuals sampled at the present time is then described by the reversed reconstructed process (RRP), which traces the ancestry of the sample backwards from the present. We show that a simple, analytic, time rescaling of the RRP provides a straightforward way to derive its inter-event times. The same rescaling characterises other distributions underlying this process, obtained elsewhere in the literature via more cumbersome calculations. We also consider the case of incomplete sampling of the population, in which each leaf of the genealogy is retained with an independent Bernoulli trial with probability ψ, and we show that corresponding results for Bernoulli-sampled RRPs can be derived using time rescaling, for any values of the underlying parameters. A central result is the derivation of a scaling limit as ψ approaches 0, corresponding to the underlying population growing to infinity, using the time rescaling formalism. We show that in this setting, after a linear time rescaling, the event times are the order statistics of n logistic random variables with mode log(1∕ψ); moreover, we show that the inter-event times are approximately exponentially distributed.
Collapse
|