1
|
|
2
|
Author Correction: Reconstructing evolutionary trajectories of mutation signature activities in cancer using TrackSig. Nat Commun 2022; 13:7567. [PMID: 36482170 PMCID: PMC9731941 DOI: 10.1038/s41467-022-32336-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
|
3
|
Reconstructing cancer phylogenies using Pairtree, a clone tree reconstruction algorithm. STAR Protoc 2022; 3:101706. [PMID: 36129821 PMCID: PMC9494285 DOI: 10.1016/j.xpro.2022.101706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Revised: 07/21/2022] [Accepted: 08/22/2022] [Indexed: 01/25/2023] Open
Abstract
Pairtree is a clone tree reconstruction algorithm that uses somatic point mutations to build clone trees describing the evolutionary history of individual cancers. Using the Pairtree software package, we describe steps to preprocess somatic mutation data, cluster mutations into subclones, search for clone trees, and visualize clone trees. Pairtree builds clone trees using up to 100 samples from a single cancer with at least 30 subclonal populations. For complete details on the use and execution of this protocol, please refer to Wintersinger et al. (2022).
Collapse
|
4
|
Characterizing genetic intra-tumor heterogeneity across 2,658 human cancer genomes. Cell 2021; 184:2239-2254.e39. [PMID: 33831375 PMCID: PMC8054914 DOI: 10.1016/j.cell.2021.03.009] [Citation(s) in RCA: 199] [Impact Index Per Article: 66.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Revised: 09/21/2020] [Accepted: 03/03/2021] [Indexed: 02/07/2023]
Abstract
Intra-tumor heterogeneity (ITH) is a mechanism of therapeutic resistance and therefore an important clinical challenge. However, the extent, origin, and drivers of ITH across cancer types are poorly understood. To address this, we extensively characterize ITH across whole-genome sequences of 2,658 cancer samples spanning 38 cancer types. Nearly all informative samples (95.1%) contain evidence of distinct subclonal expansions with frequent branching relationships between subclones. We observe positive selection of subclonal driver mutations across most cancer types and identify cancer type-specific subclonal patterns of driver gene mutations, fusions, structural variants, and copy number alterations as well as dynamic changes in mutational processes between subclonal expansions. Our results underline the importance of ITH and its drivers in tumor evolution and provide a pan-cancer resource of comprehensively annotated subclonal events from whole-genome sequencing data.
Collapse
|
5
|
A practical guide to cancer subclonal reconstruction from DNA sequencing. Nat Methods 2021; 18:144-155. [PMID: 33398189 PMCID: PMC7867630 DOI: 10.1038/s41592-020-01013-2] [Citation(s) in RCA: 64] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2019] [Accepted: 11/09/2020] [Indexed: 01/28/2023]
Abstract
Subclonal reconstruction from bulk tumor DNA sequencing has become a pillar of cancer evolution studies, providing insight into the clonality and relative ordering of mutations and mutational processes. We provide an outline of the complex computational approaches used for subclonal reconstruction from single and multiple tumor samples. We identify the underlying assumptions and uncertainties in each step and suggest best practices for analysis and quality assessment. This guide provides a pragmatic resource for the growing user community of subclonal reconstruction methods.
Collapse
|
6
|
Reconstructing tumor evolutionary histories and clone trees in polynomial-time with SubMARine. PLoS Comput Biol 2021; 17:e1008400. [PMID: 33465079 PMCID: PMC7845980 DOI: 10.1371/journal.pcbi.1008400] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2020] [Revised: 01/29/2021] [Accepted: 09/22/2020] [Indexed: 11/18/2022] Open
Abstract
Tumors contain multiple subpopulations of genetically distinct cancer cells. Reconstructing their evolutionary history can improve our understanding of how cancers develop and respond to treatment. Subclonal reconstruction methods cluster mutations into groups that co-occur within the same subpopulations, estimate the frequency of cells belonging to each subpopulation, and infer the ancestral relationships among the subpopulations by constructing a clone tree. However, often multiple clone trees are consistent with the data and current methods do not efficiently capture this uncertainty; nor can these methods scale to clone trees with a large number of subclonal populations. Here, we formalize the notion of a partially-defined clone tree (partial clone tree for short) that defines a subset of the pairwise ancestral relationships in a clone tree, thereby implicitly representing the set of all clone trees that have these defined pairwise relationships. Also, we introduce a special partial clone tree, the Maximally-Constrained Ancestral Reconstruction (MAR), which summarizes all clone trees fitting the input data equally well. Finally, we extend commonly used clone tree validity conditions to apply to partial clone trees and describe SubMARine, a polynomial-time algorithm producing the subMAR, which approximates the MAR and guarantees that its defined relationships are a subset of those present in the MAR. We also extend SubMARine to work with subclonal copy number aberrations and define equivalence constraints for this purpose. Further, we extend SubMARine to permit noise in the estimates of the subclonal frequencies while retaining its validity conditions and guarantees. In contrast to other clone tree reconstruction methods, SubMARine runs in time and space that scale polynomially in the number of subclones. We show through extensive noise-free simulation, a large lung cancer dataset and a prostate cancer dataset that the subMAR equals the MAR in all cases where only a single clone tree exists and that it is a perfect match to the MAR in most of the other cases. Notably, SubMARine runs in less than 70 seconds on a single thread with less than one Gb of memory on all datasets presented in this paper, including ones with 50 nodes in a clone tree. On the real-world data, SubMARine almost perfectly recovers the previously reported trees and identifies minor errors made in the expert-driven reconstructions of those trees. The freely-available open-source code implementing SubMARine can be downloaded at https://github.com/morrislab/submarine. Cancer cells accumulate mutations over time and consist of genetically distinct subpopulations. Their evolutionary history (as represented by tumor phylogenies) can be inferred from bulk cancer genome sequencing data. Current tumor phylogeny reconstruction methods have two main issues: they are slow, and they do not efficiently represent uncertainty in the reconstruction. To address these issues, we developed SubMARine, a fast algorithm that summarizes all valid phylogenies in an intuitive format. SubMARine solved all reconstruction problems in this manuscript in less than 70 seconds, orders of magnitude faster than other methods. These reconstruction problems included those with up to 50 subclones; problems that are too large for other algorithms to even attempt. SubMARine achieves these result because, unlike other algorithms, it performs its reconstruction by identifying an upper-bound on the solution set of trees and the amount of noise in the estimates of the subclonal frequencies. In the vast majority of cases we checked, i. e. an extensive noise-free simulation, a lung cancer and a prostate cancer dataset, this upper bound is tight: when only a single solution exists, SubMARine converges to it every time. When multiple solutions exist, our algorithm correctly recovers the uncertain relationships in 71% of cases. In addition to solving these two major challenges, we introduce some useful new concepts for and open research problems in the field of tumor phylogeny reconstruction. Specifically, we formalize the concept of a partial clone tree which provides a set of constraints on the solution set of clone trees; and provide a complete set of conditions under which a partial clone tree is valid. These conditions guarantee that all trees in the solution set satisfy the constraints implied by the partial clone tree.
Collapse
|
7
|
Reconstructing evolutionary trajectories of mutation signature activities in cancer using TrackSig. Nat Commun 2020; 11:731. [PMID: 32024834 PMCID: PMC7002414 DOI: 10.1038/s41467-020-14352-7] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2018] [Accepted: 12/23/2019] [Indexed: 12/14/2022] Open
Abstract
The type and genomic context of cancer mutations depend on their causes. These causes have been characterized using signatures that represent mutation types that co-occur in the same tumours. However, it remains unclear how mutation processes change during cancer evolution due to the lack of reliable methods to reconstruct evolutionary trajectories of mutational signature activity. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole-genome sequencing data from 2658 cancers across 38 tumour types, we present TrackSig, a new method that reconstructs these trajectories using optimal, joint segmentation and deconvolution of mutation type and allele frequencies from a single tumour sample. In simulations, we find TrackSig has a 3-5% activity reconstruction error, and 12% false detection rate. It outperforms an aggressive baseline in situations with branching evolution, CNA gain, and neutral mutations. Applied to data from 2658 tumours and 38 cancer types, TrackSig permits pan-cancer insight into evolutionary changes in mutational processes.
Collapse
|
8
|
Abstract
Cancer develops through a process of somatic evolution1,2. Sequencing data from a single biopsy represent a snapshot of this process that can reveal the timing of specific genomic aberrations and the changing influence of mutational processes3. Here, by whole-genome sequencing analysis of 2,658 cancers as part of the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA)4, we reconstruct the life history and evolution of mutational processes and driver mutation sequences of 38 types of cancer. Early oncogenesis is characterized by mutations in a constrained set of driver genes, and specific copy number gains, such as trisomy 7 in glioblastoma and isochromosome 17q in medulloblastoma. The mutational spectrum changes significantly throughout tumour evolution in 40% of samples. A nearly fourfold diversification of driver genes and increased genomic instability are features of later stages. Copy number alterations often occur in mitotic crises, and lead to simultaneous gains of chromosomal segments. Timing analyses suggest that driver mutations often precede diagnosis by many years, if not decades. Together, these results determine the evolutionary trajectories of cancer, and highlight opportunities for early cancer detection.
Collapse
|
9
|
Abstract 218: The evolutionary history of 2,658 cancers. Cancer Res 2018. [DOI: 10.1158/1538-7445.am2018-218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Cancer develops through a continuous process of somatic evolution. Whole genome sequencing provides a snapshot of the tumor genome at the point of sampling, however, the data can contain information that permits the reconstruction of a tumor's evolutionary past.
Here, we apply such life history analyses on an unprecedented scale, to a set of 2,658 tumors spanning 39 cancer types. We estimated the timing of large chromosomal gains during tumor evolution, by comparing the rates of doubled to non-doubled point mutations within gained regions. Although we find that such events typically occur in the second half of clonal evolution, we also observe distinctive and early chromosomal gains in some cancer types, such as gains of chromosomes 7, 19 and 20 in glioblastoma, and isochromosome 17q in medulloblastoma. By integrating these results with the qualitative timing of individual driver mutations, we obtained an overall ranking, from early to late, of frequent somatic events per cancer type, which both identified novel patterns of tumor evolution, and incorporated additional detail into known models, such as the progression of APC-KRAS-TP53 in colorectal cancer proposed by Vogelstein and Fearon.
To estimate how mutational processes acting on the tumor genome change over time, we classified mutations in each sample according to three broad time periods (early clonal, late clonal, and subclonal), and quantified the activity of mutational signatures in each period. Most mutational processes appear to remain remarkably constant, however, certain signatures show clear and consistent changes during clonal evolution. Particularly, mutational signatures associated with exposure to carcinogens, such as smoking and UV light, tend to decrease over time. In contrast, signatures associated with defective endogenous processes, such as APOBEC mutagenesis and defective double strand break repair, show an increase between early and late phases of tumor evolution.
Making use of clock-like mutational signatures, we converted mutational time estimates for large events, such as whole genome duplication (WGD), and the emergence of the most recent common ancestor (MRCA), into real time estimates, which allowed us to combine our analyses into overall timelines of cancer evolution, per tumor type. For example, the typical timeline of ovarian adenocarcinoma development shows that early tumor evolution is characterized by mutations in TP53, and widespread genome instability, with WGD events taking place on average 8 years prior to diagnosis. In later stages of evolution, signatures of defective repair processes increase, and the MRCA emerges on average 1 year before diagnosis.
Taken together, these data reveal the common and divergent evolutionary trajectories available to a cancer, which might be crucial in understanding specific tumor biology, and in providing new opportunities for early detection and cancer prevention.
Citation Format: Clemency Jolly, Moritz Gerstung, Ignaty Leshchiner, Stefan C. Dentro, Santiago Gonzalez, Thomas J. Mitchell, Yulia Rubanova, Pavana Anur, Daniel Rosebrock, Kaixian Yu, Maxime Tarabichi, Amit Deshwar, Jeff Wintersinger, Kortine Kleinheinz, Ignacio Vásquez-García, Kerstin Haase, Subhajit Sengupta, Geoff Macintyre, Salem Malikic, Nilgun Donmez, Dimitri G. Livitz, Mark Cmero, Jonas Demeulemeester, Steve Schumacher, Yu Fan, Xiaotong Yao, Juhee Lee, Matthias Schlesner, Paul C. Boutros, David D. Bowtell, Hongtu Zhu, Gad Getz, Marcin Imielinski, Rameen Beroukhim, S Cenk Sahinalp, Yuan Ji, Martin Peifer, Florian Markowetz, Ville Mustonen, Ke Juan, Wenyi Wang, Quaid D. Morris, Paul T. Spellman, David C. Wedge, Peter Van Loo, PCAWG Evolution and Heterogeneity Working Group. The evolutionary history of 2,658 cancers [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2018; 2018 Apr 14-18; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2018;78(13 Suppl):Abstract nr 218.
Collapse
|
10
|
33 PAN-cancer whole genome sequencing reveals patterns of subclonal mutations, signature changes and selection. ESMO Open 2018. [DOI: 10.1136/esmoopen-2018-eacr25.33] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
|
11
|
The Evolutionary Landscape of Localized Prostate Cancers Drives Clinical Aggression. Cell 2018; 173:1003-1013.e15. [DOI: 10.1016/j.cell.2018.03.029] [Citation(s) in RCA: 149] [Impact Index Per Article: 24.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2017] [Revised: 01/01/2018] [Accepted: 03/13/2018] [Indexed: 12/12/2022]
|
12
|
Abstract B2-59: PhyloSpan: Using multi-mutation reads to resolve subclonal architectures from heterogeneous tumor samples. Cancer Res 2015. [DOI: 10.1158/1538-7445.compsysbio-b2-59] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
We have developed a new method that uses high-throughput reads that span multiple somatic point mutations to reconstruct multiple, genetically diverse subclonal populations from one or more heterogeneous tumor samples.
Tumors often contain multiple, genetically diverse subclonal populations, as predicted by the clonal theory of cancer. These subclonal populations develop through successive waves of expansion and selection and have differing abilities to metastasize and resist treatment. Identifying these sub-populations and their evolutionary relationships can help identify driver mutations associated with cancer development and progression.
Subclonal reconstruction algorithms attempt to infer the prevalence and genotype of multiple, genetically-related subclonal populations using the variant allele frequency (VAF) of somatic variants. To date, these algorithms exclusively use data on individual somatic mutations. This restriction greatly reduces their ability to fully resolve phylogenic ambiguities. In some cases, it is possible to determine the mutation status of >1 mutation in a single cell, for example, when single reads cover multiple single nucleotide variants (SNVs). This type of information can add considerable power to the phylogenetic reconstruction of the tumor subclonal population. We have developed the PhyloSpan algorithm which attempts to infer the states of multiple SNVs in single cells, and then exploits that information in subclonal reconstruction.
Our algorithm starts with phasing somatic SNVs by looking for reads / read-pairs that cover both a somatic mutation and germline heterozygous single nucleotide polymorphism (SNP). These germline SNPs are often available through profiling of normal tissue. PhyloSpan then identifies SNVs that are on the same chromosome and close enough to be covered by a single read or paired reads. These pairs of mutations provide more phylogenetic certainty than can be found by looking at mutations independently. For example, if those SNVs are found in the same evolutionary branch, then we expect to see some reads containing both mutations. If however, the SNVs are an separate branches then no reads should show both SNVs. PhyloSpan integrates this phylogenetic information, along with information about the VAF of each somatic SNV in order to perform subclonal reconstruction. Incorporating these various types of information, especially given the substantial uncertainty in phasing and NGS read content, requires a rigorous statistical approach and so we have developed a Bayesian non-parametric tree-based clustering algorithm, based on our existing PhyloWGS method. This algorithm not only infers the number of subclonal populations and their genotype but also provides a measure of uncertainty about this inference, enabling users to determine which parts of the subclonal reconstruction are certain and which parts remain ambiguous.
While the number of SNVs a short-read length distance away from another SNV is small, a handful of such pairs are all that is needed to eliminate a substantial amount of ambiguity in subclonal reconstruction. Furthermore, long (>10k) read technologies, such as PacBio, can be used to supplement short read sequence. Our approach generalizes to permit the integration of single-cell sequencing with bulk tumor sequencing. Furthermore, we can also use our framework to identify a small number of SNVs for which low throughput assays would be most useful to resolve subclonal reconstruction ambiguity.
We will present results applying our algorithm to whole genome sequencing data showing the added value of considering multiple SNVs compared to independent SNVs.
Citation Format: Amit G. Deshwar, Levi Boyles, Jeff Wintersinger, Paul C. Boutros, Yee Whye Teh, Quaid Morris, Quaid Morris. PhyloSpan: Using multi-mutation reads to resolve subclonal architectures from heterogeneous tumor samples. [abstract]. In: Proceedings of the AACR Special Conference on Computational and Systems Biology of Cancer; Feb 8-11 2015; San Francisco, CA. Philadelphia (PA): AACR; Cancer Res 2015;75(22 Suppl 2):Abstract nr B2-59.
Collapse
|
13
|
Abstract 4865: PhyloSpan: using multi-mutation reads to resolve subclonal architectures from heterogeneous tumor samples. Cancer Res 2015. [DOI: 10.1158/1538-7445.am2015-4865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
We have developed a new method that uses high-throughput reads that span multiple somatic point mutations to reconstruct multiple, genetically diverse subclonal populations from one or more heterogeneous tumor samples. Subclonal reconstruction algorithms attempt to infer the prevalence and genotype of multiple, genetically-related subclonal populations using the variant allele frequency (VAF) of somatic variants. To date, these algorithms exclusively use data on individual somatic mutations. This restriction greatly reduces their ability to fully resolve phylogenic ambiguities. In some cases, it is possible to determine the mutation status of >1 mutation in a single cell, for example, when single reads cover multiple single nucleotide variants (SNVs). This type of information can add considerable power to the phylogenetic reconstruction of the tumor subclonal population. We have developed the PhyloSpan algorithm which attempts to infer the states of multiple SNVs in single cells, and then exploits that information in subclonal reconstruction. Our algorithm starts with phasing somatic SNVs by looking for reads / read-pairs that cover both a somatic mutation and germline heterozygous single nucleotide polymorphism (SNP). These germline SNPs are often available through profiling of normal tissue. PhyloSpan then identifies SNVs that are on the same chromosome and close enough to be covered by a single read or paired reads. These pairs of mutations provide more phylogenetic certainty than can be found by looking at mutations independently. For example, if those SNVs are found in the same evolutionary branch, then we expect to see some reads containing both mutations. If however, the SNVs are an separate branches then no reads should show both SNVs. PhyloSpan integrates this phylogenetic information, along with information about the VAF of each somatic SNV in order to perform subclonal reconstruction. Incorporating these various types of information requires a rigorous statistical approach, and so we have developed a Bayesian non-parametric tree-based clustering algorithm. This algorithm not only infers the number of subclonal populations and their genotype but also provides a measure of uncertainty about this inference, enabling users to determine which parts of the subclonal reconstruction are certain and which parts remain ambiguous. While the number of SNVs a short-read length distance away from another SNV is small, a handful of such pairs are all that is needed to eliminate a substantial amount of ambiguity in subclonal reconstruction. Furthermore, long read technologies, such as PacBio, can be used to supplement short reads. Our approach generalizes to permit the integration of single-cell sequencing with bulk tumor sequencing. We will present results applying our algorithm to whole genome sequencing data showing the added value of considering multiple SNVs compared to independent SNVs.
Citation Format: Amit G. Deshwar, Levi Boyles, Jeff Wintersinger, Paul C. Boutros, Yee Whye Teh, Quaid Morris. PhyloSpan: using multi-mutation reads to resolve subclonal architectures from heterogeneous tumor samples. [abstract]. In: Proceedings of the 106th Annual Meeting of the American Association for Cancer Research; 2015 Apr 18-22; Philadelphia, PA. Philadelphia (PA): AACR; Cancer Res 2015;75(15 Suppl):Abstract nr 4865. doi:10.1158/1538-7445.AM2015-4865
Collapse
|