1
|
Satas G, Myers MA, McPherson A, Shah SP. Inferring active mutational processes in cancer using single cell sequencing and evolutionary constraints. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.24.639589. [PMID: 40060559 PMCID: PMC11888314 DOI: 10.1101/2025.02.24.639589] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 03/17/2025]
Abstract
Ongoing mutagenesis in cancer drives genetic diversity throughout the natural history of cancers. As the activities of mutational processes are dynamic throughout evolution, distinguishing the mutational signatures of 'active' and 'historical' processes has important implications for studying how tumors evolve. This can aid in understanding mutagenic states at the time of presentation, and in associating active mutational process with therapeutic resistance. As bulk sequencing primarily captures historical mutational processes, we studied whether ultra-low-coverage single-cell whole-genome sequencing (scWGS), which measures the distribution of mutations across hundreds or thousands of individual cells, could enable the distinction between historical and active mutational processes. While technical challenges and data sparsity have limited mutation analysis in scWGS, we show that these data contain valuable information about dynamic mutational processes. To robustly interpret single nucleotide variants (SNVs) in scWGS, we introduce ArtiCull, a method to identify and remove SNV artifacts by leveraging evolutionary constraints, enabling reliable detection of mutations for signature analysis. Applying this approach to scWGS data from pancreatic ductal adenocarcinoma (PDAC), triple-negative breast cancer (TNBC), and high-grade serous ovarian cancer (HGSOC), we uncover temporal and spatial patterns in mutational processes. In PDAC, we observe a temporal increase in mismatch repair deficiency (MMRd). In cisplatin-treated TNBC patient-derived xenografts, we identify therapy-induced mutagenesis and inactivation of APOBEC3 activity. In HGSOC, we show distinct patterns of APOBEC3 mutagenesis, including late tumor-wide activation in one case and clade-specific enrichment in another. Additionally, we detect a clone-specific increase in SBS17 activity, in a clone previously linked to recurrence. Our findings establish ultra-low-coverage scWGS as a powerful approach for studying active mutational processes that may influence ongoing clonal evolution and therapeutic resistance.
Collapse
Affiliation(s)
- Gryte Satas
- Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- The Halvorsen Center for Computational Oncology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Matthew A. Myers
- Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- The Halvorsen Center for Computational Oncology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Andrew McPherson
- Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- The Halvorsen Center for Computational Oncology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Sohrab P. Shah
- Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- The Halvorsen Center for Computational Oncology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| |
Collapse
|
2
|
Wiens M, Farahani H, Scott RW, Underhill TM, Bashashati A. Benchmarking bulk and single-cell variant-calling approaches on Chromium scRNA-seq and scATAC-seq libraries. Genome Res 2024; 34:1196-1210. [PMID: 39147582 PMCID: PMC11444184 DOI: 10.1101/gr.277066.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2023] [Accepted: 08/12/2024] [Indexed: 08/17/2024]
Abstract
Single-cell sequencing methodologies such as scRNA-seq and scATAC-seq have become widespread and effective tools to interrogate tissue composition. Increasingly, variant callers are being applied to these methodologies to resolve the genetic heterogeneity of a sample, especially in the case of detecting the clonal architecture of a tumor. Typically, traditional bulk DNA variant callers are applied to the pooled reads of a single-cell library to detect candidate mutations. Recently, multiple studies have applied such callers on reads from individual cells, with some citing the ability to detect rare variants with higher sensitivity. Many studies apply these two approaches to the Chromium (10x Genomics) scRNA-seq and scATAC-seq methodologies. However, Chromium-based libraries may offer additional challenges to variant calling compared with existing single-cell methodologies, raising questions regarding the validity of variants obtained from such a workflow. To determine the merits and challenges of various variant-calling approaches on Chromium scRNA-seq and scATAC-seq libraries, we use sample libraries with matched bulk whole-genome sequencing to evaluate the performance of callers. We review caller performance, finding that bulk callers applied on pooled reads significantly outperform individual-cell approaches. We also evaluate variants unique to scRNA-seq and scATAC-seq methodologies, finding patterns of noise but also potential capture of RNA-editing events. Finally, we review the notion that variant calling at the single-cell level can detect rare somatic variants, providing empirical results that suggest resolving such variants is infeasible in single-cell Chromium libraries.
Collapse
Affiliation(s)
- Matthew Wiens
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia V6T 2B9, Canada
| | - Hossein Farahani
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia V6T 2B9, Canada
| | - R Wilder Scott
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia V6T 2B9, Canada
| | - T Michael Underhill
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia V6T 2B9, Canada
- Department of Cellular & Physiological Sciences, University of British Columbia, Vancouver, British Columbia V6T 2A1, Canada
| | - Ali Bashashati
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia V6T 2B9, Canada;
- Department of Pathology & Laboratory Medicine, University of British Columbia, Vancouver, British Columbia V6T 1Z7, Canada
| |
Collapse
|
3
|
Xie D, An B, Yang M, Wang L, Guo M, Luo H, Huang S, Sun F. Application and research progress of single cell sequencing technology in leukemia. Front Oncol 2024; 14:1389468. [PMID: 39267837 PMCID: PMC11390353 DOI: 10.3389/fonc.2024.1389468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Accepted: 08/08/2024] [Indexed: 09/15/2024] Open
Abstract
Leukemia is a malignant tumor with high heterogeneity and a complex evolutionary process. It is difficult to resolve the heterogeneity and clonal evolution of leukemia cells by applying traditional bulk sequencing techniques, thus preventing a deep understanding of the mechanisms of leukemia development and the identification of potential therapeutic targets. However, with the development and application of single-cell sequencing technology, it is now possible to investigate the gene expression profile, mutations, and epigenetic features of leukemia at the single-cell level, thus providing a new perspective for leukemia research. In this article, we review the recent applications and advances of single-cell sequencing technology in leukemia research, discuss its potential for enhancing our understanding of the mechanisms of leukemia development, discovering therapeutic targets and personalized treatment, and provide reference guidelines for the significance of this technology in clinical research.
Collapse
Affiliation(s)
- Dan Xie
- Medical College, Guizhou University, Guiyang, China
| | - Bangquan An
- Guizhou Provincial People's Hospital, Guiyang, Guizhou, China
| | - Mingyue Yang
- Medical College, Guizhou University, Guiyang, China
| | - Lei Wang
- Medical College, Guizhou University, Guiyang, China
| | - Min Guo
- Medical College, Guizhou University, Guiyang, China
| | - Heng Luo
- State Key Laboratory of Functions and Applications of Medicinal Plants, Guizhou Medical University, Guiyang, Guizhou, China
- Guizhou Provincial Engineering Research Center for Natural Drugs, Guiyang, Guizhou, China
| | - Shengwen Huang
- Guizhou Provincial People's Hospital, Guiyang, Guizhou, China
| | - Fa Sun
- Medical College, Guizhou University, Guiyang, China
| |
Collapse
|
4
|
Kang S, Borgsmüller N, Valecha M, Kuipers J, Alves JM, Prado-López S, Chantada D, Beerenwinkel N, Posada D, Szczurek E. SIEVE: joint inference of single-nucleotide variants and cell phylogeny from single-cell DNA sequencing data. Genome Biol 2022; 23:248. [PMID: 36451239 PMCID: PMC9714196 DOI: 10.1186/s13059-022-02813-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2022] [Accepted: 11/08/2022] [Indexed: 12/02/2022] Open
Abstract
We present SIEVE, a statistical method for the joint inference of somatic variants and cell phylogeny under the finite-sites assumption from single-cell DNA sequencing. SIEVE leverages raw read counts for all nucleotides and corrects the acquisition bias of branch lengths. In our simulations, SIEVE outperforms other methods in phylogenetic reconstruction and variant calling accuracy, especially in the inference of homozygous variants. Applying SIEVE to three datasets, one for triple-negative breast (TNBC), and two for colorectal cancer (CRC), we find that double mutant genotypes are rare in CRC but unexpectedly frequent in the TNBC samples.
Collapse
Affiliation(s)
- Senbai Kang
- Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland
| | - Nico Borgsmüller
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, 4058 Basel, Switzerland
| | - Monica Valecha
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Vigo, Spain
| | - Jack Kuipers
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, 4058 Basel, Switzerland
| | - Joao M. Alves
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Vigo, Spain
| | - Sonia Prado-López
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Vigo, Spain
- Institute of Solid State Electronics E362, Technische Universität Wien, Vienna, Austria
| | - Débora Chantada
- Department of Pathology, Hospital Álvaro Cunqueiro, Vigo, Spain
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, 4058 Basel, Switzerland
| | - David Posada
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Vigo, Spain
- Department of Biochemistry, Genetics, and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| | - Ewa Szczurek
- Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland
| |
Collapse
|
5
|
Kuipers J, Singer J, Beerenwinkel N. Single-cell mutation calling and phylogenetic tree reconstruction with loss and recurrence. Bioinformatics 2022; 38:4713-4719. [PMID: 36000873 PMCID: PMC9563700 DOI: 10.1093/bioinformatics/btac577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 07/08/2022] [Accepted: 08/23/2022] [Indexed: 11/13/2022] Open
Abstract
Motivation Tumours evolve as heterogeneous populations of cells, which may be distinguished by different genomic aberrations. The resulting intra-tumour heterogeneity plays an important role in cancer patient relapse and treatment failure, so that obtaining a clear understanding of each patient’s tumour composition and evolutionary history is key for personalized therapies. Single-cell sequencing (SCS) now provides the possibility to resolve tumour heterogeneity at the highest resolution of individual tumour cells, but brings with it challenges related to the particular noise profiles of the sequencing protocols as well as the complexity of the underlying evolutionary process. Results By modelling the noise processes and allowing mutations to be lost or to reoccur during tumour evolution, we present a method to jointly call mutations in each cell, reconstruct the phylogenetic relationship between cells, and determine the locations of mutational losses and recurrences. Our Bayesian approach allows us to accurately call mutations as well as to quantify our certainty in such predictions. We show the advantages of allowing mutational loss or recurrence with simulated data and present its application to tumour SCS data. Availability and implementation SCIΦN is available at https://github.com/cbg-ethz/SCIPhIN. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jack Kuipers
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Jochen Singer
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| |
Collapse
|
6
|
Rozhoňová H, Danciu D, Stark S, Rätsch G, Kahles A, Lehmann KV. SECEDO: SNV-based subclone detection using ultra-low coverage single-cell DNA sequencing. Bioinformatics 2022; 38:4293-4300. [PMID: 35900151 PMCID: PMC9477524 DOI: 10.1093/bioinformatics/btac510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Revised: 07/04/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Several recently developed single-cell DNA sequencing technologies enable whole-genome sequencing of thousands of cells. However, the ultra-low coverage of the sequenced data (<0.05× per cell) mostly limits their usage to the identification of copy number alterations in multi-megabase segments. Many tumors are not copy number-driven, and thus single-nucleotide variant (SNV)-based subclone detection may contribute to a more comprehensive view on intra-tumor heterogeneity. Due to the low coverage of the data, the identification of SNVs is only possible when superimposing the sequenced genomes of hundreds of genetically similar cells. Thus, we have developed a new approach to efficiently cluster tumor cells based on a Bayesian filtering approach of relevant loci and exploiting read overlap and phasing. RESULTS We developed Single Cell Data Tumor Clusterer (SECEDO, lat. 'to separate'), a new method to cluster tumor cells based solely on SNVs, inferred on ultra-low coverage single-cell DNA sequencing data. We applied SECEDO to a synthetic dataset simulating 7250 cells and eight tumor subclones from a single patient and were able to accurately reconstruct the clonal composition, detecting 92.11% of the somatic SNVs, with the smallest clusters representing only 6.9% of the total population. When applied to five real single-cell sequencing datasets from a breast cancer patient, each consisting of ≈2000 cells, SECEDO was able to recover the major clonal composition in each dataset at the original coverage of 0.03×, achieving an Adjusted Rand Index (ARI) score of ≈0.6. The current state-of-the-art SNV-based clustering method achieved an ARI score of ≈0, even after merging cells to create higher coverage data (factor 10 increase), and was only able to match SECEDOs performance when pooling data from all five datasets, in addition to artificially increasing the sequencing coverage by a factor of 7. Variant calling on the resulting clusters recovered more than twice as many SNVs as would have been detected if calling on all cells together. Further, the allelic ratio of the called SNVs on each subcluster was more than double relative to the allelic ratio of the SNVs called without clustering, thus demonstrating that calling variants on subclones, in addition to both increasing sensitivity of SNV detection and attaching SNVs to subclones, significantly increases the confidence of the called variants. AVAILABILITY AND IMPLEMENTATION SECEDO is implemented in C++ and is publicly available at https://github.com/ratschlab/secedo. Instructions to download the data and the evaluation code to reproduce the findings in this paper are available at: https://github.com/ratschlab/secedo-evaluation. The code and data of the submitted version are archived at: https://doi.org/10.5281/zenodo.6516955. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Stefan Stark
- Biomedical Informatics Group, Department of Computer Science, ETH Zurich, Zurich, Switzerland,Swiss Institute of Bioinformatics, Lausanne, Switzerland,Biomedical Informatics Research, University Hospital Zurich, Zurich, Switzerland
| | | | | | | |
Collapse
|
7
|
Valecha M, Posada D. Somatic variant calling from single-cell DNA sequencing data. Comput Struct Biotechnol J 2022; 20:2978-2985. [PMID: 35782734 PMCID: PMC9218383 DOI: 10.1016/j.csbj.2022.06.013] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Revised: 06/06/2022] [Accepted: 06/06/2022] [Indexed: 11/03/2022] Open
Abstract
Single-cell sequencing has gained popularity in recent years. Despite its numerous applications, single-cell DNA sequencing data is highly error-prone due to technical biases arising from uneven sequencing coverage, allelic dropout, and amplification error. With these artifacts, the identification of somatic genomic variants becomes a challenging task, and over the years, several methods have been developed explicitly for this type of data. Single-cell variant callers implement distinct strategies, make different use of the data, and typically result in many discordant calls when applied to real data. Here, we review current approaches for single-cell variant calling, emphasizing single nucleotide variants. We highlight their potential benefits and shortcomings to help users choose a suitable tool for their data at hand.
Collapse
Key Words
- ADO, allelic dropout
- Allele dropout
- Amplification error
- CNV, copy number variant
- Indel, short insertion or deletion
- LDO, locus dropout
- SNV, single nucleotide variant
- SV, structural variant
- Single-cell genomics
- Somatic variants
- VAF, variant allele frequency
- Variant calling
- hSNP, heterozygous single-nucleotide polymorphism
- scATAC-seq, single-cell sequencing assay for transposase-accessible chromatin
- scDNA-seq, single-cell DNA sequencing
- scHi-C, single-cell Hi-C sequencing
- scMethyl-seq, single-cell Methylation sequencing
- scRNA-seq, single-cell RNA sequencing
- scWGA, single-cell whole-genome amplification
Collapse
Affiliation(s)
- Monica Valecha
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Spain
| | - David Posada
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Spain
- Department of Biochemistry, Genetics, and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| |
Collapse
|
8
|
Kozlov A, Alves JM, Stamatakis A, Posada D. CellPhy: accurate and fast probabilistic inference of single-cell phylogenies from scDNA-seq data. Genome Biol 2022; 23:37. [PMID: 35081992 PMCID: PMC8790911 DOI: 10.1186/s13059-021-02583-w] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Accepted: 12/20/2021] [Indexed: 01/15/2023] Open
Abstract
We introduce CellPhy, a maximum likelihood framework for inferring phylogenetic trees from somatic single-cell single-nucleotide variants. CellPhy leverages a finite-site Markov genotype model with 16 diploid states and considers amplification error and allelic dropout. We implement CellPhy into RAxML-NG, a widely used phylogenetic inference package that provides statistical confidence measurements and scales well on large datasets with hundreds or thousands of cells. Comprehensive simulations suggest that CellPhy is more robust to single-cell genomics errors and outperforms state-of-the-art methods under realistic scenarios, both in accuracy and speed. CellPhy is freely available at https://github.com/amkozlov/cellphy .
Collapse
Affiliation(s)
- Alexey Kozlov
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, 69118 Heidelberg, Germany
- Institute for Theoretical Informatics, Karlsruhe Institute of Technology, 76128 Karlsruhe, Germany
| | - Joao M. Alves
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics, and Immunology, Universidade de Vigo, 36310 Vigo, Spain
- Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Vigo, Spain
| | - Alexandros Stamatakis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, 69118 Heidelberg, Germany
| | - David Posada
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics, and Immunology, Universidade de Vigo, 36310 Vigo, Spain
- Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Vigo, Spain
| |
Collapse
|