1
|
Renz J, Dauda KA, Aga ONL, Diaz-Uriarte R, Löhr IH, Blomberg B, Johnston IG. Evolutionary accumulation modeling in AMR: machine learning to infer and predict evolutionary dynamics of multi-drug resistance. mBio 2025:e0048825. [PMID: 40396716 DOI: 10.1128/mbio.00488-25] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/22/2025] Open
Abstract
Can we understand and predict the evolutionary pathways by which bacteria acquire multi-drug resistance (MDR)? These questions have substantial potential impact in basic biology and in applied approaches to address the global health challenge of antimicrobial resistance (AMR). In this minireview, we discuss how a class of machine-learning approaches called evolutionary accumulation modeling (EvAM) may help reveal these dynamics using genetic and/or phenotypic AMR data sets, without requiring longitudinal sampling. These approaches are well-established in cancer progression and evolutionary biology but currently less used in AMR research. We discuss how EvAM can learn the evolutionary pathways by which drug resistances and other AMR features (for example, mutations driving these resistances) are acquired as pathogens evolve, predict next evolutionary steps, identify influences between AMR features, and explore differences in MDR evolution between regions, demographics, and more. We demonstrate a case study from the literature on MDR evolution in Mycobacterium tuberculosis and discuss the strengths and weaknesses of these approaches, providing links to some approaches for implementation.
Collapse
Affiliation(s)
- Jessica Renz
- Department of Mathematics, University of Bergen, Bergen, Norway
| | - Kazeem A Dauda
- Department of Mathematics, University of Bergen, Bergen, Norway
| | - Olav N L Aga
- Computational Biology Unit, University of Bergen, Bergen, Norway
- Department of Clinical Science, University of Bergen, Bergen, Norway
| | - Ramon Diaz-Uriarte
- Department of Biochemistry, School of Medicine, Universidad Autónoma de Madrid, Madrid, Community of Madrid, Spain
- Instituto de Investigaciones Biomédicas Sols-Morreale (IIBM), CSIC-UAM, Madrid, Community of Madrid, Spain
| | - Iren H Löhr
- Department of Clinical Science, University of Bergen, Bergen, Norway
- Department of Medical Microbiology, Stavanger University Hospital, Stavanger, Norway
| | - Bjørn Blomberg
- Department of Clinical Science, University of Bergen, Bergen, Norway
- Department of Medicine, Haukeland University Hospital, Bergen, Norway
- National Advisory Unit for Tropical Infectious Diseases, Haukeland University Hospital, Bergen, Norway
| | - Iain G Johnston
- Department of Mathematics, University of Bergen, Bergen, Norway
- Computational Biology Unit, University of Bergen, Bergen, Norway
| |
Collapse
|
2
|
Johnston IG, Diaz-Uriarte R. A hypercubic Mk model framework for capturing reversibility in disease, cancer, and evolutionary accumulation modelling. Bioinformatics 2024; 41:btae737. [PMID: 39666947 PMCID: PMC11681934 DOI: 10.1093/bioinformatics/btae737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2024] [Revised: 11/29/2024] [Accepted: 12/11/2024] [Indexed: 12/14/2024] Open
Abstract
MOTIVATION Accumulation models, where a system progressively acquires binary features over time, are common in the study of cancer progression, evolutionary biology, and other fields. Many approaches have been developed to infer the accumulation pathways by which features (e.g. mutations) are acquired over time. However, most of these approaches do not support reversibility: the loss of a feature once it has been acquired (e.g. the clearing of a mutation from a tumor or population). RESULTS Here, we demonstrate how the well-established Mk model from evolutionary biology, embedded on a hypercubic transition graph, can be used to infer the dynamics of accumulation processes, including the possibility of reversible transitions, from data which may be uncertain and cross-sectional, longitudinal, or phylogenetically/phylogenomically embedded. Positive and negative interactions between arbitrary sets of features (not limited to pairwise interactions) are supported. We demonstrate this approach with synthetic datasets and real data on bacterial drug resistance and cancer progression. While this implementation is limited in the number of features that can be considered, we discuss how this limitation may be relaxed to deal with larger systems. AVAILABILITY AND IMPLEMENTATION The code implementing this setup in R is freely available at https://github.com/StochasticBiology/hypermk.
Collapse
Affiliation(s)
- Iain G Johnston
- Department of Mathematics, University of Bergen, Realfagbygget, Bergen 5007, Norway
- Computational Biology Unit, University of Bergen, Thormøhlensgate 55, Bergen 5008, Norway
| | - Ramon Diaz-Uriarte
- Department of Biochemistry, School of Medicine, Universidad Autonoma de Madrid, Madrid 28029, Spain
- Instituto de Investigaciones Biomedicas Sols-Morreale (IIBM), CSIC-UAM, Madrid 28029, Spain
| |
Collapse
|
3
|
Aga ONL, Brun M, Dauda KA, Diaz-Uriarte R, Giannakis K, Johnston IG. HyperTraPS-CT: Inference and prediction for accumulation pathways with flexible data and model structures. PLoS Comput Biol 2024; 20:e1012393. [PMID: 39231165 PMCID: PMC11404842 DOI: 10.1371/journal.pcbi.1012393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Revised: 09/16/2024] [Accepted: 08/06/2024] [Indexed: 09/06/2024] Open
Abstract
Accumulation processes, where many potentially coupled features are acquired over time, occur throughout the sciences from evolutionary biology to disease progression, and particularly in the study of cancer progression. Existing methods for learning the dynamics of such systems typically assume limited (often pairwise) relationships between feature subsets, cross-sectional or untimed observations, small feature sets, or discrete orderings of events. Here we introduce HyperTraPS-CT (Hypercubic Transition Path Sampling in Continuous Time) to compute posterior distributions on continuous-time dynamics of many, arbitrarily coupled, traits in unrestricted state spaces, accounting for uncertainty in observations and their timings. We demonstrate the capacity of HyperTraPS-CT to deal with cross-sectional, longitudinal, and phylogenetic data, which may have no, uncertain, or precisely specified sampling times. HyperTraPS-CT allows positive and negative interactions between arbitrary subsets of features (not limited to pairwise interactions), supporting Bayesian and maximum-likelihood inference approaches to identify these interactions, consequent pathways, and predictions of future and unobserved features. We also introduce a range of visualisations for the inferred outputs of these processes and demonstrate model selection and regularisation for feature interactions. We apply this approach to case studies on the accumulation of mutations in cancer progression and the acquisition of anti-microbial resistance genes in tuberculosis, demonstrating its flexibility and capacity to produce predictions aligned with applied priorities.
Collapse
Affiliation(s)
- Olav N L Aga
- Computational Biology Unit, University of Bergen, Bergen, Norway
- Department of Clinical Science, University of Bergen, Bergen, Norway
| | - Morten Brun
- Department of Mathematics, University of Bergen, Bergen, Norway
| | - Kazeem A Dauda
- Department of Mathematics, University of Bergen, Bergen, Norway
| | - Ramon Diaz-Uriarte
- Department of Biochemistry, School of Medicine, Universidad Autonoma de Madrid, Madrid, Spain
- Instituto de Investigaciones Biomedicas Sols-Morreale (IIBM), CSIC-UAM, Madrid, Spain
| | - Konstantinos Giannakis
- Department of Mathematics, University of Bergen, Bergen, Norway
- Department of Disease Burden, Norwegian Institute of Public Health, Bergen, Norway
| | - Iain G Johnston
- Computational Biology Unit, University of Bergen, Bergen, Norway
- Department of Mathematics, University of Bergen, Bergen, Norway
| |
Collapse
|
4
|
Shuaibi A, Chitra U, Raphael BJ. A latent variable model for evaluating mutual exclusivity and co-occurrence between driver mutations in cancer. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.24.590995. [PMID: 38712136 PMCID: PMC11071465 DOI: 10.1101/2024.04.24.590995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
A key challenge in cancer genomics is understanding the functional relationships and dependencies between combinations of somatic mutations that drive cancer development. Such driver mutations frequently exhibit patterns of mutual exclusivity or co-occurrence across tumors, and many methods have been developed to identify such dependency patterns from bulk DNA sequencing data of a cohort of patients. However, while mutual exclusivity and co-occurrence are described as properties of driver mutations, existing methods do not explicitly disentangle functional, driver mutations from neutral, passenger mutations. In particular, nearly all existing methods evaluate mutual exclusivity or co-occurrence at the gene level, marking a gene as mutated if any mutation - driver or passenger - is present. Since some genes have a large number of passenger mutations, existing methods either restrict their analyses to a small subset of suspected driver genes - limiting their ability to identify novel dependencies - or make spurious inferences of mutual exclusivity and co-occurrence involving genes with many passenger mutations. We introduce DIALECT, an algorithm to identify dependencies between pairs of driver mutations from somatic mutation counts. We derive a latent variable mixture model for drivers and passengers that combines existing probabilistic models of passenger mutation rates with a latent variable describing the unknown status of a mutation as a driver or passenger. We use an expectation maximization (EM) algorithm to estimate the parameters of our model, including the rates of mutually exclusivity and co-occurrence between drivers. We demonstrate that DIALECT more accurately infers mutual exclusivity and co-occurrence between driver mutations compared to existing methods on both simulated mutation data and somatic mutation data from 5 cancer types in The Cancer Genome Atlas (TCGA).
Collapse
Affiliation(s)
- Ahmed Shuaibi
- Department of Computer Science, Princeton University
- Lewis-Sigler Institute for Integrative Genomics, Princeton University
| | - Uthsav Chitra
- Department of Computer Science, Princeton University
| | | |
Collapse
|
5
|
Fontana D, Crespiatico I, Crippa V, Malighetti F, Villa M, Angaroni F, De Sano L, Aroldi A, Antoniotti M, Caravagna G, Piazza R, Graudenzi A, Mologni L, Ramazzotti D. Evolutionary signatures of human cancers revealed via genomic analysis of over 35,000 patients. Nat Commun 2023; 14:5982. [PMID: 37749078 PMCID: PMC10519956 DOI: 10.1038/s41467-023-41670-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 09/13/2023] [Indexed: 09/27/2023] Open
Abstract
Recurring sequences of genomic alterations occurring across patients can highlight repeated evolutionary processes with significant implications for predicting cancer progression. Leveraging the ever-increasing availability of cancer omics data, here we unveil cancer's evolutionary signatures tied to distinct disease outcomes, representing "favored trajectories" of acquisition of driver mutations detected in patients with similar prognosis. We present a framework named ASCETIC (Agony-baSed Cancer EvoluTion InferenCe) to extract such signatures from sequencing experiments generated by different technologies such as bulk and single-cell sequencing data. We apply ASCETIC to (i) single-cell data from 146 myeloid malignancy patients and bulk sequencing from 366 acute myeloid leukemia patients, (ii) multi-region sequencing from 100 early-stage lung cancer patients, (iii) exome/genome data from 10,000+ Pan-Cancer Atlas samples, and (iv) targeted sequencing from 25,000+ MSK-MET metastatic patients, revealing subtype-specific single-nucleotide variant signatures associated with distinct prognostic clusters. Validations on several datasets underscore the robustness and generalizability of the extracted signatures.
Collapse
Affiliation(s)
- Diletta Fontana
- Department of Medicine and Surgery, University of Milano-Bicocca, Monza, Italy
| | - Ilaria Crespiatico
- Department of Medicine and Surgery, University of Milano-Bicocca, Monza, Italy
| | - Valentina Crippa
- Department of Medicine and Surgery, University of Milano-Bicocca, Monza, Italy
| | - Federica Malighetti
- Department of Medicine and Surgery, University of Milano-Bicocca, Monza, Italy
| | - Matteo Villa
- Department of Medicine and Surgery, University of Milano-Bicocca, Monza, Italy
| | - Fabrizio Angaroni
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy
- Center of Computational Biology, Human Technopole, Milano, Italy
| | - Luca De Sano
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy
| | - Andrea Aroldi
- Department of Medicine and Surgery, University of Milano-Bicocca, Monza, Italy
- Hematology and Clinical Research Unit, Fondazione IRCCS San Gerardo dei Tintori, Monza, Italy
| | - Marco Antoniotti
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy
- Bicocca Bioinformatics, Biostatistics and Bioimaging Centre-B4, Milan, Italy
| | - Giulio Caravagna
- Department of Mathematics and Geosciences, University of Trieste, Trieste, Italy
| | - Rocco Piazza
- Department of Medicine and Surgery, University of Milano-Bicocca, Monza, Italy
| | - Alex Graudenzi
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milan, Italy.
- Bicocca Bioinformatics, Biostatistics and Bioimaging Centre-B4, Milan, Italy.
- Institute of Molecular Bioimaging and Physiology, Consiglio Nazionale delle Ricerche (IBFM-CNR), Segrate, Milan, Italy.
| | - Luca Mologni
- Department of Medicine and Surgery, University of Milano-Bicocca, Monza, Italy
| | - Daniele Ramazzotti
- Department of Medicine and Surgery, University of Milano-Bicocca, Monza, Italy.
| |
Collapse
|
6
|
Diaz-Uriarte R, Herrera-Nieto P. EvAM-Tools: tools for evolutionary accumulation and cancer progression models. Bioinformatics 2022; 38:5457-5459. [PMID: 36287062 PMCID: PMC9750106 DOI: 10.1093/bioinformatics/btac710] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Revised: 10/03/2022] [Accepted: 10/25/2022] [Indexed: 12/25/2022] Open
Abstract
SUMMARY EvAM-Tools is an R package and web application that provides a unified interface to state-of-the-art cancer progression models and, more generally, evolutionary models of event accumulation. The output includes, in addition to the fitted models, the transition (and transition rate) matrices between genotypes and the probabilities of evolutionary paths. Generation of random cancer progression models is also available. Using the GUI in the web application, users can easily construct models (modifying directed acyclic graphs of restrictions, matrices of mutual hazards or specifying genotype composition), generate data from them (with user-specified observational/genotyping error) and analyze the data. AVAILABILITY AND IMPLEMENTATION Implemented in R and C; open source code available under the GNU Affero General Public License v3.0 at https://github.com/rdiaz02/EvAM-Tools. Docker images freely available from https://hub.docker.com/u/rdiaz02. Web app freely accessible at https://iib.uam.es/evamtools. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Pablo Herrera-Nieto
- Department of Biochemistry, Universidad Autónoma de Madrid, Instituto de Investigaciones Biomédicas “Alberto Sols” (UAM-CSIC), Madrid, Spain
| |
Collapse
|
7
|
Angaroni F, Guidi A, Ascolani G, d'Onofrio A, Antoniotti M, Graudenzi A. J-SPACE: a Julia package for the simulation of spatial models of cancer evolution and of sequencing experiments. BMC Bioinformatics 2022; 23:269. [PMID: 35804300 PMCID: PMC9270769 DOI: 10.1186/s12859-022-04779-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Accepted: 06/09/2022] [Indexed: 11/15/2022] Open
Abstract
Background The combined effects of biological variability and measurement-related errors on cancer sequencing data remain largely unexplored. However, the spatio-temporal simulation of multi-cellular systems provides a powerful instrument to address this issue. In particular, efficient algorithmic frameworks are needed to overcome the harsh trade-off between scalability and expressivity, so to allow one to simulate both realistic cancer evolution scenarios and the related sequencing experiments, which can then be used to benchmark downstream bioinformatics methods. Result We introduce a Julia package for SPAtial Cancer Evolution (J-SPACE), which allows one to model and simulate a broad set of experimental scenarios, phenomenological rules and sequencing settings.Specifically, J-SPACE simulates the spatial dynamics of cells as a continuous-time multi-type birth-death stochastic process on a arbitrary graph, employing different rules of interaction and an optimised Gillespie algorithm. The evolutionary dynamics of genomic alterations (single-nucleotide variants and indels) is simulated either under the Infinite Sites Assumption or several different substitution models, including one based on mutational signatures. After mimicking the spatial sampling of tumour cells, J-SPACE returns the related phylogenetic model, and allows one to generate synthetic reads from several Next-Generation Sequencing (NGS) platforms, via the ART read simulator. The results are finally returned in standard FASTA, FASTQ, SAM, ALN and Newick file formats. Conclusion J-SPACE is designed to efficiently simulate the heterogeneous behaviour of a large number of cancer cells and produces a rich set of outputs. Our framework is useful to investigate the emergent spatial dynamics of cancer subpopulations, as well as to assess the impact of incomplete sampling and of experiment-specific errors. Importantly, the output of J-SPACE is designed to allow the performance assessment of downstream bioinformatics pipelines processing NGS data. J-SPACE is freely available at: https://github.com/BIMIB-DISCo/J-Space.jl.
Collapse
Affiliation(s)
- Fabrizio Angaroni
- Dept. of Informatics, Systems and Communication, Univ. of Milan-Bicocca, Milan, Italy.
| | - Alessandro Guidi
- Dept. of Informatics, Systems and Communication, Univ. of Milan-Bicocca, Milan, Italy
| | - Gianluca Ascolani
- Dept. of Informatics, Systems and Communication, Univ. of Milan-Bicocca, Milan, Italy
| | - Alberto d'Onofrio
- Department of Mathematics and Geosciences, Univ. of Trieste, Trieste, Italy
| | - Marco Antoniotti
- Dept. of Informatics, Systems and Communication, Univ. of Milan-Bicocca, Milan, Italy.,Bicocca Bioinformatics, Biostatistics and Bioimaging Centre (B4), Milan, Italy
| | - Alex Graudenzi
- Dept. of Informatics, Systems and Communication, Univ. of Milan-Bicocca, Milan, Italy.,Bicocca Bioinformatics, Biostatistics and Bioimaging Centre (B4), Milan, Italy.,Inst. of Molecular Bioimaging and Physiology, National Research Council (IBFM-CNR), Segrate, Italy
| |
Collapse
|