1
|
Seillier L, Peifer M. Reconstructing Phylogenetic Relationship in Bladder Cancer: A Methodological Overview. Methods Mol Biol 2023; 2684:113-132. [PMID: 37410230 DOI: 10.1007/978-1-0716-3291-8_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/07/2023]
Abstract
Bladder cancer (BC) expresses itself as a highly heterogeneous disease both at the histological and molecular level, often occurring as synchronous or metachronous multifocal disease with high risk of recurrence and potential to metastasize. Multiple sequencing studies focusing on both non-muscle-invasive bladder cancer (NMIBC) and muscle-invasive bladder cancer (MIBC) gave insights into the extent of both inter- and intrapatient heterogeneity, but many questions on clonal evolution in BC remain unanswered. In this review article, we provide an overview over the technical and theoretical concepts linked to reconstructing evolutionary trajectories in BC and propose a set of tools and established software for phylogenetic analysis.
Collapse
Affiliation(s)
| | - Martin Peifer
- Department of Translational Genomics, University of Cologne, Cologne, Germany
| |
Collapse
|
2
|
Vavoulis DV, Cutts A, Taylor JC, Schuh A. A statistical approach for tracking clonal dynamics in cancer using longitudinal next-generation sequencing data. Bioinformatics 2021; 37:147-154. [PMID: 32722772 PMCID: PMC8055230 DOI: 10.1093/bioinformatics/btaa672] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2020] [Revised: 05/13/2020] [Accepted: 07/20/2020] [Indexed: 01/22/2023] Open
Abstract
MOTIVATION Tumours are composed of distinct cancer cell populations (clones), which continuously adapt to their local micro-environment. Standard methods for clonal deconvolution seek to identify groups of mutations and estimate the prevalence of each group in the tumour, while considering its purity and copy number profile. These methods have been applied on cross-sectional data and on longitudinal data after discarding information on the timing of sample collection. Two key questions are how can we incorporate such information in our analyses and is there any benefit in doing so? RESULTS We developed a clonal deconvolution method, which incorporates explicitly the temporal spacing of longitudinally sampled tumours. By merging a Dirichlet Process Mixture Model with Gaussian Process priors and using as input a sequence of several sparsely collected samples, our method can reconstruct the temporal profile of the abundance of any mutation cluster supported by the data as a continuous function of time. We benchmarked our method on whole genome, whole exome and targeted sequencing data from patients with chronic lymphocytic leukaemia, on liquid biopsy data from a patient with melanoma and on synthetic data and we found that incorporating information on the timing of tissue collection improves model performance, as long as data of sufficient volume and complexity are available for estimating free model parameters. Thus, our approach is particularly useful when collecting a relatively long sequence of tumour samples is feasible, as in liquid cancers (e.g. leukaemia) and liquid biopsies. AVAILABILITY AND IMPLEMENTATION The statistical methodology presented in this paper is freely available at github.com/dvav/clonosGP. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dimitrios V Vavoulis
- Department of Oncology, University of Oxford, Oxford, OX3 7DQ, UK
- Nuffield Department of Medicine, Wellcome Centre for Human Genetics, University of Oxford, Oxford, OX3 7BN, UK
- NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Trust, Oxford, OX3 9DU, UK
- Department of Oncology, Molecular Diagnostic Centre, University of Oxford, Oxford OX3 9DU, UK
| | - Anthony Cutts
- Department of Oncology, University of Oxford, Oxford, OX3 7DQ, UK
- Department of Oncology, Molecular Diagnostic Centre, University of Oxford, Oxford OX3 9DU, UK
| | - Jenny C Taylor
- Nuffield Department of Medicine, Wellcome Centre for Human Genetics, University of Oxford, Oxford, OX3 7BN, UK
- NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Trust, Oxford, OX3 9DU, UK
| | - Anna Schuh
- Department of Oncology, University of Oxford, Oxford, OX3 7DQ, UK
- NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Trust, Oxford, OX3 9DU, UK
- Department of Oncology, Molecular Diagnostic Centre, University of Oxford, Oxford OX3 9DU, UK
- Department of Haematology, Oxford University Hospitals NHS Trust, Oxford OX3 9DU, UK
| |
Collapse
|
3
|
Dentro SC, Leshchiner I, Haase K, Tarabichi M, Wintersinger J, Deshwar AG, Yu K, Rubanova Y, Macintyre G, Demeulemeester J, Vázquez-García I, Kleinheinz K, Livitz DG, Malikic S, Donmez N, Sengupta S, Anur P, Jolly C, Cmero M, Rosebrock D, Schumacher SE, Fan Y, Fittall M, Drews RM, Yao X, Watkins TBK, Lee J, Schlesner M, Zhu H, Adams DJ, McGranahan N, Swanton C, Getz G, Boutros PC, Imielinski M, Beroukhim R, Sahinalp SC, Ji Y, Peifer M, Martincorena I, Markowetz F, Mustonen V, Yuan K, Gerstung M, Spellman PT, Wang W, Morris QD, Wedge DC, Van Loo P. Characterizing genetic intra-tumor heterogeneity across 2,658 human cancer genomes. Cell 2021; 184:2239-2254.e39. [PMID: 33831375 PMCID: PMC8054914 DOI: 10.1016/j.cell.2021.03.009] [Citation(s) in RCA: 199] [Impact Index Per Article: 66.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Revised: 09/21/2020] [Accepted: 03/03/2021] [Indexed: 02/07/2023]
Abstract
Intra-tumor heterogeneity (ITH) is a mechanism of therapeutic resistance and therefore an important clinical challenge. However, the extent, origin, and drivers of ITH across cancer types are poorly understood. To address this, we extensively characterize ITH across whole-genome sequences of 2,658 cancer samples spanning 38 cancer types. Nearly all informative samples (95.1%) contain evidence of distinct subclonal expansions with frequent branching relationships between subclones. We observe positive selection of subclonal driver mutations across most cancer types and identify cancer type-specific subclonal patterns of driver gene mutations, fusions, structural variants, and copy number alterations as well as dynamic changes in mutational processes between subclonal expansions. Our results underline the importance of ITH and its drivers in tumor evolution and provide a pan-cancer resource of comprehensively annotated subclonal events from whole-genome sequencing data.
Collapse
Affiliation(s)
- Stefan C Dentro
- Cancer Genomics Laboratory, The Francis Crick Institute, London NW1 1AT, UK; Wellcome Trust Sanger Institute, Cambridge CB10 1SA, UK; Big Data Institute, University of Oxford, Oxford OX3 7LF, UK
| | | | - Kerstin Haase
- Cancer Genomics Laboratory, The Francis Crick Institute, London NW1 1AT, UK
| | - Maxime Tarabichi
- Cancer Genomics Laboratory, The Francis Crick Institute, London NW1 1AT, UK; Wellcome Trust Sanger Institute, Cambridge CB10 1SA, UK
| | - Jeff Wintersinger
- University of Toronto, Toronto, ON M5S 3E1, Canada; Vector Institute, Toronto, ON M5G 1L7, Canada
| | - Amit G Deshwar
- University of Toronto, Toronto, ON M5S 3E1, Canada; Vector Institute, Toronto, ON M5G 1L7, Canada
| | - Kaixian Yu
- The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Yulia Rubanova
- University of Toronto, Toronto, ON M5S 3E1, Canada; Vector Institute, Toronto, ON M5G 1L7, Canada
| | - Geoff Macintyre
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge CB2 0RE, UK
| | - Jonas Demeulemeester
- Cancer Genomics Laboratory, The Francis Crick Institute, London NW1 1AT, UK; Department of Human Genetics, University of Leuven, 3000 Leuven, Belgium
| | - Ignacio Vázquez-García
- Wellcome Trust Sanger Institute, Cambridge CB10 1SA, UK; University of Cambridge, Cambridge CB2 0QQ, UK; Computational Oncology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; Irving Institute for Cancer Dynamics, Columbia University, New York, NY 10027, USA
| | - Kortine Kleinheinz
- German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; Heidelberg University, 69120 Heidelberg, Germany
| | | | - Salem Malikic
- Cancer Data Science Laboratory, National Cancer Institute, NIH, Bethesda, MD 20892, USA
| | - Nilgun Donmez
- Simon Fraser University, Burnaby, BC V5A 1S6, Canada; Vancouver Prostate Centre, Vancouver, BC V6H 3Z6, Canada
| | | | - Pavana Anur
- Molecular and Medical Genetics, Oregon Health & Science University, Portland, OR 97231, USA
| | - Clemency Jolly
- Cancer Genomics Laboratory, The Francis Crick Institute, London NW1 1AT, UK
| | - Marek Cmero
- University of Melbourne, Melbourne, VIC 3010, Australia; Walter + Eliza Hall Institute, Melbourne, VIC 3000, Australia
| | | | | | - Yu Fan
- The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Matthew Fittall
- Cancer Genomics Laboratory, The Francis Crick Institute, London NW1 1AT, UK
| | - Ruben M Drews
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge CB2 0RE, UK
| | - Xiaotong Yao
- Weill Cornell Medicine, New York, NY 10065, USA; New York Genome Center, New York, NY 10013, USA
| | - Thomas B K Watkins
- Cancer Evolution and Genome Instability Laboratory, The Francis Crick Institute, London NW1 1AT, UK
| | - Juhee Lee
- University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | | | - Hongtu Zhu
- The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - David J Adams
- Wellcome Trust Sanger Institute, Cambridge CB10 1SA, UK
| | - Nicholas McGranahan
- Cancer Research UK Lung Cancer Centre of Excellence, University College London Cancer Institute, London WC1E 6BT, UK; Cancer Genome Evolution Research Group, University College London Cancer Institute, London WC1E 6DD, UK
| | - Charles Swanton
- Cancer Evolution and Genome Instability Laboratory, The Francis Crick Institute, London NW1 1AT, UK; Cancer Research UK Lung Cancer Centre of Excellence, University College London Cancer Institute, London WC1E 6BT, UK; Department of Medical Oncology, University College London Hospitals, London NW1 2BU, UK
| | - Gad Getz
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Massachusetts General Hospital Center for Cancer Research, Charlestown, MA 02129, USA; Massachusetts General Hospital, Department of Pathology, Boston, MA 02114, USA; Harvard Medical School, Boston, MA 02215, USA
| | - Paul C Boutros
- University of Toronto, Toronto, ON M5S 3E1, Canada; Ontario Institute for Cancer Research, Toronto, ON M5G 0A3, Canada; University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Marcin Imielinski
- Weill Cornell Medicine, New York, NY 10065, USA; New York Genome Center, New York, NY 10013, USA
| | - Rameen Beroukhim
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - S Cenk Sahinalp
- Cancer Data Science Laboratory, National Cancer Institute, NIH, Bethesda, MD 20892, USA
| | - Yuan Ji
- NorthShore University HealthSystem, Evanston, IL 60201, USA; The University of Chicago, Chicago, IL 60637, USA
| | - Martin Peifer
- Department of Translational Genomics, Center for Integrated Oncology Cologne-Bonn, Medical Faculty, University of Cologne, 50931 Cologne, Germany
| | | | - Florian Markowetz
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge CB2 0RE, UK
| | - Ville Mustonen
- Organismal and Evolutionary Biology Research Programme, Department of Computer Science, Institute of Biotechnology, University of Helsinki, 00014 Helsinki, Finland
| | - Ke Yuan
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge CB2 0RE, UK; School of Computing Science, University of Glasgow, Glasgow G12 8RZ, UK
| | - Moritz Gerstung
- Wellcome Trust Sanger Institute, Cambridge CB10 1SA, UK; European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge CB10 1SD, UK; European Molecular Biology Laboratory, Genome Biology Unit, 69117 Heidelberg, Germany
| | - Paul T Spellman
- Molecular and Medical Genetics, Oregon Health & Science University, Portland, OR 97231, USA
| | - Wenyi Wang
- The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Quaid D Morris
- University of Toronto, Toronto, ON M5S 3E1, Canada; Vector Institute, Toronto, ON M5G 1L7, Canada; Ontario Institute for Cancer Research, Toronto, ON M5G 0A3, Canada; Computational and Systems Biology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - David C Wedge
- Big Data Institute, University of Oxford, Oxford OX3 7LF, UK; Oxford NIHR Biomedical Research Centre, Oxford OX4 2PG, UK; Manchester Cancer Research Centre, University of Manchester, Manchester M20 4GJ, UK
| | - Peter Van Loo
- Cancer Genomics Laboratory, The Francis Crick Institute, London NW1 1AT, UK.
| |
Collapse
|
4
|
Sadeqi Azer E, Rashidi Mehrabadi F, Malikić S, Li XC, Bartok O, Litchfield K, Levy R, Samuels Y, Schäffer AA, Gertz EM, Day CP, Pérez-Guijarro E, Marie K, Lee MP, Merlino G, Ergun F, Sahinalp SC. PhISCS-BnB: a fast branch and bound algorithm for the perfect tumor phylogeny reconstruction problem. Bioinformatics 2021; 36:i169-i176. [PMID: 32657358 DOI: 10.1093/bioinformatics/btaa464] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
MOTIVATION Recent advances in single-cell sequencing (SCS) offer an unprecedented insight into tumor emergence and evolution. Principled approaches to tumor phylogeny reconstruction via SCS data are typically based on general computational methods for solving an integer linear program, or a constraint satisfaction program, which, although guaranteeing convergence to the most likely solution, are very slow. Others based on Monte Carlo Markov Chain or alternative heuristics not only offer no such guarantee, but also are not faster in practice. As a result, novel methods that can scale up to handle the size and noise characteristics of emerging SCS data are highly desirable to fully utilize this technology. RESULTS We introduce PhISCS-BnB (phylogeny inference using SCS via branch and bound), a branch and bound algorithm to compute the most likely perfect phylogeny on an input genotype matrix extracted from an SCS dataset. PhISCS-BnB not only offers an optimality guarantee, but is also 10-100 times faster than the best available methods on simulated tumor SCS data. We also applied PhISCS-BnB on a recently published large melanoma dataset derived from the sublineages of a cell line involving 20 clones with 2367 mutations, which returned the optimal tumor phylogeny in <4 h. The resulting phylogeny agrees with and extends the published results by providing a more detailed picture on the clonal evolution of the tumor. AVAILABILITY AND IMPLEMENTATION https://github.com/algo-cancer/PhISCS-BnB. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Erfan Sadeqi Azer
- Department of Computer Science, Indiana University, Bloomington, IN 47408, USA
| | - Farid Rashidi Mehrabadi
- Department of Computer Science, Indiana University, Bloomington, IN 47408, USA.,Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Salem Malikić
- Department of Computer Science, Indiana University, Bloomington, IN 47408, USA
| | - Xuan Cindy Li
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA.,Program in Computational Biology, Bioinformatics and Genomics, University of Maryland, College Park, MD 20742, USA
| | - Osnat Bartok
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Kevin Litchfield
- Cancer Evolution and Genome Instability Laboratory, Francis Crick Institute, London NW1 1AT, UK.,Cancer Research UK Lung Cancer Centre of Excellence London, University College London Cancer Institute, London WC1E 6DD, UK
| | - Ronen Levy
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Yardena Samuels
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Alejandro A Schäffer
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - E Michael Gertz
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Chi-Ping Day
- Laboratory of Cancer Biology and Genetics, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Eva Pérez-Guijarro
- Laboratory of Cancer Biology and Genetics, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Kerrie Marie
- Laboratory of Cancer Biology and Genetics, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Maxwell P Lee
- Laboratory of Cancer Biology and Genetics, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Glenn Merlino
- Laboratory of Cancer Biology and Genetics, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Funda Ergun
- Department of Computer Science, Indiana University, Bloomington, IN 47408, USA
| | - S Cenk Sahinalp
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| |
Collapse
|
5
|
Baek M, Chang JT, Echeverria GV. Methodological Advancements for Investigating Intra-tumoral Heterogeneity in Breast Cancer at the Bench and Bedside. J Mammary Gland Biol Neoplasia 2020; 25:289-304. [PMID: 33300087 PMCID: PMC7960623 DOI: 10.1007/s10911-020-09470-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Accepted: 11/12/2020] [Indexed: 12/20/2022] Open
Abstract
There is a major need to overcome therapeutic resistance and metastasis that eventually arises in many breast cancer patients. Therapy resistant and metastatic tumors are increasingly recognized to possess intra-tumoral heterogeneity (ITH), a diversity of cells within an individual tumor. First hypothesized in the 1970s, the possibility that this complex ITH may endow tumors with adaptability and evolvability to metastasize and evade therapies is now supported by multiple lines of evidence. Our understanding of ITH has been driven by recent methodological advances including next-generation sequencing, computational modeling, lineage tracing, single-cell technologies, and multiplexed in situ approaches. These have been applied across a range of specimens, including patient tumor biopsies, liquid biopsies, cultured cell lines, and mouse models. In this review, we discuss these approaches and how they have deepened our understanding of the mechanistic origins of ITH amongst tumor cells, including stem cell-like differentiation hierarchies and Darwinian evolution, and the functional role for ITH in breast cancer progression. While ITH presents a challenge for combating tumor evolution, in-depth analyses of ITH in clinical biopsies and laboratory models hold promise to elucidate therapeutic strategies that should ultimately improve outcomes for breast cancer patients.
Collapse
Affiliation(s)
- Mokryun Baek
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX, 77030, USA
- Department of Medicine, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Jeffrey T Chang
- Department of Pharmacology and Integrative Biology, University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| | - Gloria V Echeverria
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX, 77030, USA.
- Department of Medicine, Baylor College of Medicine, Houston, TX, 77030, USA.
- Dan L. Duncan Cancer Center, Baylor College of Medicine, Houston, TX, 77030, USA.
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX, USA.
| |
Collapse
|
6
|
Sadeqi Azer E, Haghir Ebrahimabadi M, Malikić S, Khardon R, Sahinalp SC. Tumor Phylogeny Topology Inference via Deep Learning. iScience 2020; 23:101655. [PMID: 33117968 PMCID: PMC7582044 DOI: 10.1016/j.isci.2020.101655] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Revised: 08/10/2020] [Accepted: 10/02/2020] [Indexed: 01/24/2023] Open
Abstract
Principled computational approaches for tumor phylogeny reconstruction via single-cell sequencing typically aim to build the most likely perfect phylogeny tree from the noisy genotype matrix - which represents genotype calls of single cells. This problem is NP-hard, and as a result, existing approaches aim to solve relatively small instances of it through combinatorial optimization techniques or Bayesian inference. As expected, even when the goal is to infer basic topological features of the tumor phylogeny, rather than reconstructing the topology entirely, these approaches could be prohibitively slow. In this paper, we introduce fast deep learning solutions to the problems of inferring whether the most likely tree has a linear (chain) or branching topology and whether a perfect phylogeny is feasible from a given genotype matrix. We also present a reinforcement learning approach for reconstructing the most likely tumor phylogeny. This preliminary work demonstrates that data-driven approaches can reconstruct key features of tumor evolution.
Collapse
Affiliation(s)
- Erfan Sadeqi Azer
- Department of Computer Science, Indiana University, Bloomington, IN 47408, USA
| | - Mohammad Haghir Ebrahimabadi
- Department of Computer Science, Indiana University, Bloomington, IN 47408, USA
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Salem Malikić
- Department of Computer Science, Indiana University, Bloomington, IN 47408, USA
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Roni Khardon
- Department of Computer Science, Indiana University, Bloomington, IN 47408, USA
| | - S. Cenk Sahinalp
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| |
Collapse
|
7
|
Abstract
BACKGROUND Bacterial cells during many replication cycles accumulate spontaneous mutations, which result in the birth of novel clones. As a result of this clonal expansion, an evolving bacterial population has different clonal composition over time, as revealed in the long-term evolution experiments (LTEEs). Accurately inferring the haplotypes of novel clones as well as the clonal frequencies and the clonal evolutionary history in a bacterial population is useful for the characterization of the evolutionary pressure on multiple correlated mutations instead of that on individual mutations. RESULTS In this paper, we study the computational problem of reconstructing the haplotypes of bacterial clones from the variant allele frequencies observed from an evolving bacterial population at multiple time points. We formalize the problem using a maximum likelihood function, which is defined under the assumption that mutations occur spontaneously, and thus the likelihood of a mutation occurring in a specific clone is proportional to the frequency of the clone in the population when the mutation occurs. We develop a series of heuristic algorithms to address the maximum likelihood inference, and show through simulation experiments that the algorithms are fast and achieve near optimal accuracy that is practically plausible under the maximum likelihood framework. We also validate our method using experimental data obtained from a recent study on long-term evolution of Escherichia coli. CONCLUSION We developed efficient algorithms to reconstruct the clonal evolution history from time course genomic sequencing data. Our algorithm can also incorporate clonal sequencing data to improve the reconstruction results when they are available. Based on the evaluation on both simulated and experimental sequencing data, our algorithms can achieve satisfactory results on the genome sequencing data from long-term evolution experiments. AVAILABILITY The program (ClonalTREE) is available as open-source software on GitHub at https://github.com/COL-IU/ClonalTREE.
Collapse
Affiliation(s)
- Wazim Mohammed Ismail
- School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN, USA
| | - Haixu Tang
- School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN, USA
| |
Collapse
|
8
|
Ismail WM, Nzabarushimana E, Tang H. Algorithmic approaches to clonal reconstruction in heterogeneous cell populations. QUANTITATIVE BIOLOGY 2019; 7:255-265. [PMID: 32431959 PMCID: PMC7236794 DOI: 10.1007/s40484-019-0188-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2019] [Revised: 08/09/2019] [Accepted: 08/25/2019] [Indexed: 12/15/2022]
Abstract
BACKGROUND The reconstruction of clonal haplotypes and their evolutionary history in evolving populations is a common problem in both microbial evolutionary biology and cancer biology. The clonal theory of evolution provides a theoretical framework for modeling the evolution of clones. RESULTS In this paper, we review the theoretical framework and assumptions over which the clonal reconstruction problem is formulated. We formally define the problem and then discuss the complexity and solution space of the problem. Various methods have been proposed to find the phylogeny that best explains the observed data. We categorize these methods based on the type of input data that they use (space-resolved or time-resolved), and also based on their computational formulation as either combinatorial or probabilistic. It is crucial to understand the different types of input data because each provides essential but distinct information for drastically reducing the solution space of the clonal reconstruction problem. Complementary information provided by single cell sequencing or from whole genome sequencing of randomly isolated clones can also improve the accuracy of clonal reconstruction. We briefly review the existing algorithms and their relationships. Finally we summarize the tools that are developed for either directly solving the clonal reconstruction problem or a related computational problem. CONCLUSIONS In this review, we discuss the various formulations of the problem of inferring the clonal evolutionary history from allele frequeny data, review existing algorithms and catergorize them according to their problem formulation and solution approaches. We note that most of the available clonal inference algorithms were developed for elucidating tumor evolution whereas clonal reconstruction for unicellular genomes are less addressed. We conclude the review by discussing more open problems such as the lack of benchmark datasets and comparison of performance between available tools.
Collapse
Affiliation(s)
- Wazim Mohammed Ismail
- School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN 47405-7000, USA
| | - Etienne Nzabarushimana
- School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN 47405-7000, USA
| | - Haixu Tang
- School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN 47405-7000, USA
| |
Collapse
|
9
|
Malikic S, Mehrabadi FR, Ciccolella S, Rahman MK, Ricketts C, Haghshenas E, Seidman D, Hach F, Hajirasouliha I, Sahinalp SC. PhISCS: a combinatorial approach for subperfect tumor phylogeny reconstruction via integrative use of single-cell and bulk sequencing data. Genome Res 2019; 29:1860-1877. [PMID: 31628256 PMCID: PMC6836735 DOI: 10.1101/gr.234435.118] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Accepted: 09/11/2019] [Indexed: 12/29/2022]
Abstract
Available computational methods for tumor phylogeny inference via single-cell sequencing (SCS) data typically aim to identify the most likely perfect phylogeny tree satisfying the infinite sites assumption (ISA). However, the limitations of SCS technologies including frequent allele dropout and variable sequence coverage may prohibit a perfect phylogeny. In addition, ISA violations are commonly observed in tumor phylogenies due to the loss of heterozygosity, deletions, and convergent evolution. In order to address such limitations, we introduce the optimal subperfect phylogeny problem which asks to integrate SCS data with matching bulk sequencing data by minimizing a linear combination of potential false negatives (due to allele dropout or variance in sequence coverage), false positives (due to read errors) among mutation calls, and the number of mutations that violate ISA (real or because of incorrect copy number estimation). We then describe a combinatorial formulation to solve this problem which ensures that several lineage constraints imposed by the use of variant allele frequencies (VAFs, derived from bulk sequence data) are satisfied. We express our formulation both in the form of an integer linear program (ILP) and—as a first in tumor phylogeny reconstruction—a Boolean constraint satisfaction problem (CSP) and solve them by leveraging state-of-the-art ILP/CSP solvers. The resulting method, which we name PhISCS, is the first to integrate SCS and bulk sequencing data while accounting for ISA violating mutations. In contrast to the alternative methods, typically based on probabilistic approaches, PhISCS provides a guarantee of optimality in reported solutions. Using simulated and real data sets, we demonstrate that PhISCS is more general and accurate than all available approaches.
Collapse
Affiliation(s)
- Salem Malikic
- School of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
| | - Farid Rashidi Mehrabadi
- Department of Computer Science, Indiana University, Bloomington, Indiana 47408, USA.,Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Simone Ciccolella
- Department of Computer Systems and Communication, University of Milano-Bicocca, 20136 Milan, Italy.,Institute for Computational Biomedicine, Weill Cornell Medicine, New York, New York 10065, USA
| | - Md Khaledur Rahman
- Department of Computer Science, Indiana University, Bloomington, Indiana 47408, USA
| | - Camir Ricketts
- Institute for Computational Biomedicine, Weill Cornell Medicine, New York, New York 10065, USA.,Tri-I Computational Biology and Medicine Graduate Program, Cornell University, New York, New York 10065, USA
| | - Ehsan Haghshenas
- School of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
| | - Daniel Seidman
- Tri-I Computational Biology and Medicine Graduate Program, Cornell University, New York, New York 10065, USA
| | - Faraz Hach
- School of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada.,Department of Urologic Sciences, University of British Columbia, Vancouver, BC V5Z 1M9, Canada.,Vancouver Prostate Centre, Vancouver, BC V6H 3Z6, Canada
| | - Iman Hajirasouliha
- Institute for Computational Biomedicine, Weill Cornell Medicine, New York, New York 10065, USA.,Department of Physiology and Biophysics, Englander Institute for Precision Medicine, The Meyer Cancer Center, Weill Cornell Medicine, New York, New York 10065, USA
| | - S Cenk Sahinalp
- Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| |
Collapse
|
10
|
Ricketts C, Seidman D, Popic V, Hormozdiari F, Batzoglou S, Hajirasouliha I. Meltos: multi-sample tumor phylogeny reconstruction for structural variants. Bioinformatics 2019; 36:1082-1090. [PMID: 31584621 PMCID: PMC8215921 DOI: 10.1093/bioinformatics/btz737] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2018] [Revised: 08/10/2019] [Accepted: 09/25/2019] [Indexed: 01/31/2023] Open
Abstract
MOTIVATION We propose Meltos, a novel computational framework to address the challenging problem of building tumor phylogeny trees using somatic structural variants (SVs) among multiple samples. Meltos leverages the tumor phylogeny tree built on somatic single nucleotide variants (SNVs) to identify high confidence SVs and produce a comprehensive tumor lineage tree, using a novel optimization formulation. While we do not assume the evolutionary progression of SVs is necessarily the same as SNVs, we show that a tumor phylogeny tree using high-quality somatic SNVs can act as a guide for calling and assigning somatic SVs on a tree. Meltos utilizes multiple genomic read signals for potential SV breakpoints in whole genome sequencing data and proposes a probabilistic formulation for estimating variant allele fractions (VAFs) of SV events. RESULTS In order to assess the ability of Meltos to correctly refine SNV trees with SV information, we tested Meltos on two simulated datasets with five genomes in both. We also assessed Meltos on two real cancer datasets. We tested Meltos on multiple samples from a liposarcoma tumor and on a multi-sample breast cancer data (Yates et al., 2015), where the authors provide validated structural variation events together with deep, targeted sequencing for a collection of somatic SNVs. We show Meltos has the ability to place high confidence validated SV calls on a refined tumor phylogeny tree. We also showed the flexibility of Meltos to either estimate VAFs directly from genomic data or to use copy number corrected estimates. AVAILABILITY AND IMPLEMENTATION Meltos is available at https://github.com/ih-lab/Meltos. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Victoria Popic
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Fereydoun Hormozdiari
- Department of Biochemistry and Molecular Medicine, MIND Institute and Genome Center, University of California, Davis, CA 95616, USA
| | - Serafim Batzoglou
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | | |
Collapse
|
11
|
Karpov N, Malikic S, Rahman MK, Sahinalp SC. A multi-labeled tree dissimilarity measure for comparing "clonal trees" of tumor progression. Algorithms Mol Biol 2019; 14:17. [PMID: 31372179 PMCID: PMC6661107 DOI: 10.1186/s13015-019-0152-9] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Accepted: 07/15/2019] [Indexed: 12/18/2022] Open
Abstract
We introduce a new dissimilarity measure between a pair of "clonal trees", each representing the progression and mutational heterogeneity of a tumor sample, constructed by the use of single cell or bulk high throughput sequencing data. In a clonal tree, each vertex represents a specific tumor clone, and is labeled with one or more mutations in a way that each mutation is assigned to the oldest clone that harbors it. Given two clonal trees, our multi-labeled tree dissimilarity (MLTD) measure is defined as the minimum number of mutation/label deletions, (empty) leaf deletions, and vertex (clonal) expansions, applied in any order, to convert each of the two trees to the maximum common tree. We show that the MLTD measure can be computed efficiently in polynomial time and it captures the similarity between trees of different clonal granularity well.
Collapse
Affiliation(s)
- Nikolai Karpov
- Department of Computer Science, Indiana University, Bloomington, IN USA
| | - Salem Malikic
- School of Computing Science, Simon Fraser University, Burnaby, BC Canada
| | | | - S. Cenk Sahinalp
- Department of Computer Science, Indiana University, Bloomington, IN USA
| |
Collapse
|
12
|
Khakabimamaghani S, Malikic S, Tang J, Ding D, Morin R, Chindelevitch L, Ester M. Collaborative intra-tumor heterogeneity detection. Bioinformatics 2019; 35:i379-i388. [PMID: 31510674 PMCID: PMC6612880 DOI: 10.1093/bioinformatics/btz355] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
MOTIVATION Despite the remarkable advances in sequencing and computational techniques, noise in the data and complexity of the underlying biological mechanisms render deconvolution of the phylogenetic relationships between cancer mutations difficult. Besides that, the majority of the existing datasets consist of bulk sequencing data of single tumor sample of an individual. Accurate inference of the phylogenetic order of mutations is particularly challenging in these cases and the existing methods are faced with several theoretical limitations. To overcome these limitations, new methods are required for integrating and harnessing the full potential of the existing data. RESULTS We introduce a method called Hintra for intra-tumor heterogeneity detection. Hintra integrates sequencing data for a cohort of tumors and infers tumor phylogeny for each individual based on the evolutionary information shared between different tumors. Through an iterative process, Hintra learns the repeating evolutionary patterns and uses this information for resolving the phylogenetic ambiguities of individual tumors. The results of synthetic experiments show an improved performance compared to two state-of-the-art methods. The experimental results with a recent Breast Cancer dataset are consistent with the existing knowledge and provide potentially interesting findings. AVAILABILITY AND IMPLEMENTATION The source code for Hintra is available at https://github.com/sahandk/HINTRA.
Collapse
Affiliation(s)
| | - Salem Malikic
- School of Computing Science, Simon Fraser University, Burnaby, BC
| | - Jeffrey Tang
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC
| | - Dujian Ding
- School of Computing Science, Simon Fraser University, Burnaby, BC
| | - Ryan Morin
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC
- Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC
| | | | - Martin Ester
- School of Computing Science, Simon Fraser University, Burnaby, BC
- Vancouver Prostate Centre, Vancouver, BC, Canada
| |
Collapse
|
13
|
Malikic S, Jahn K, Kuipers J, Sahinalp SC, Beerenwinkel N. Integrative inference of subclonal tumour evolution from single-cell and bulk sequencing data. Nat Commun 2019; 10:2750. [PMID: 31227714 PMCID: PMC6588593 DOI: 10.1038/s41467-019-10737-5] [Citation(s) in RCA: 72] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2018] [Accepted: 05/30/2019] [Indexed: 02/07/2023] Open
Abstract
Understanding the clonal architecture and evolutionary history of a tumour poses one of the key challenges to overcome treatment failure due to resistant cell populations. Previously, studies on subclonal tumour evolution have been primarily based on bulk sequencing and in some recent cases on single-cell sequencing data. Either data type alone has shortcomings with regard to this task, but methods integrating both data types have been lacking. Here, we present B-SCITE, the first computational approach that infers tumour phylogenies from combined single-cell and bulk sequencing data. Using a comprehensive set of simulated data, we show that B-SCITE systematically outperforms existing methods with respect to tree reconstruction accuracy and subclone identification. B-SCITE provides high-fidelity reconstructions even with a modest number of single cells and in cases where bulk allele frequencies are affected by copy number changes. On real tumour data, B-SCITE generated mutation histories show high concordance with expert generated trees. Intra-tumour heterogeneity provides important information about subclonal tumour evolution. Here, the authors develop B-SCITE, a computational method for inferring tumour phylogenies from combined single-cell and bulk sequencing data.
Collapse
Affiliation(s)
- Salem Malikic
- School of Computing Science, Simon Fraser University, Burnaby, V5A 1S6, BC, Canada.,Vancouver Prostate Centre, Vancouver, V6H 3Z6, BC, Canada
| | - Katharina Jahn
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, 4058, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland
| | - Jack Kuipers
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, 4058, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland
| | - S Cenk Sahinalp
- Department of Computer Science, Indiana University, Bloomington, 47405, IN, USA.
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, 4058, Switzerland. .,Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland.
| |
Collapse
|
14
|
Tolkach Y, Thomann S, Kristiansen G. Three-dimensional reconstruction of prostate cancer architecture with serial immunohistochemical sections: hallmarks of tumour growth, tumour compartmentalisation, and implications for grading and heterogeneity. Histopathology 2018; 72:1051-1059. [PMID: 29323728 DOI: 10.1111/his.13467] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2017] [Revised: 12/29/2017] [Accepted: 01/08/2018] [Indexed: 12/15/2022]
Abstract
AIMS Conventional morphology of prostate cancer considers only the two-dimensional (2D) architecture of the tumour. Our aim was to examine the feasibility of three-dimensional (3D) reconstruction of tumour morphology based on multiple consecutive histological sections and to decipher relevant features of prostate cancer architecture. METHODS AND RESULTS Seventy-five consecutive histological sections (5 μm) of a typical prostate adenocarcinoma (Gleason score of 3 + 4 = 7) were immunostained (pan-cytokeratin) and scanned for further 3D reconstructions with fiji/imagej software. The main findings related to the prostate cancer architecture in this case were: (i) continuity of all glands, with the tumour being an integrated system, even in Gleason pattern 4 with poorly formed glands-no short-range migration of cells by Gleason pattern 4 (poorly formed glands); (ii) no repeated interconnections between the glands, with a tumour building a tree-like branched structure with very 'plastic' branches (maximal depth of investigation 375 μm); (iii) very stark compartmentalisation of the tumour related to extensive branching, the coexistence of independent terminal units of such branches in one 2D slice explaining intratumoral heterogeneity; (iv) evidence of a craniocaudal growth direction in interglandular regions of the prostate and for a lateromedial growth direction in subcapsular posterolateral regions; and (v) a 3D architecture-based description of Gleason pattern 4 with poorly formed glands, and its continuum with Gleason pattern 3. CONCLUSIONS Consecutive histological sections provide high-quality material for 3D reconstructions of the tumour architecture, with excellent resolution. The reconstruction of multiple regions in this typical case of a Gleason score 3 + 4 = 7 tumour provides insights into relevant aspects of tumour growth, the continuity of Gleason patterns 3 and 4, and tumour heterogeneity.
Collapse
Affiliation(s)
- Yuri Tolkach
- Institute of Pathology, University Hospital Bonn, Bonn, Germany
| | - Stefan Thomann
- Institute of Pathology, University Hospital Heidelberg, Heidelberg, Germany
| | | |
Collapse
|
15
|
Kuipers J, Jahn K, Raphael BJ, Beerenwinkel N. Single-cell sequencing data reveal widespread recurrence and loss of mutational hits in the life histories of tumors. Genome Res 2017; 27:1885-1894. [PMID: 29030470 PMCID: PMC5668945 DOI: 10.1101/gr.220707.117] [Citation(s) in RCA: 74] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2017] [Accepted: 09/20/2017] [Indexed: 01/04/2023]
Abstract
Intra-tumor heterogeneity poses substantial challenges for cancer treatment. A tumor's composition can be deduced by reconstructing its mutational history. Central to current approaches is the infinite sites assumption that every genomic position can only mutate once over the lifetime of a tumor. The validity of this assumption has never been quantitatively assessed. We developed a rigorous statistical framework to test the infinite sites assumption with single-cell sequencing data. Our framework accounts for the high noise and contamination present in such data. We found strong evidence for the same genomic position being mutationally affected multiple times in individual tumors for 11 of 12 single-cell sequencing data sets from a variety of human cancers. Seven cases involved the loss of earlier mutations, five of which occurred at sites unaffected by large-scale genomic deletions. Four cases exhibited a parallel mutation, potentially indicating convergent evolution at the base pair level. Our results refute the general validity of the infinite sites assumption and indicate that more complex models are needed to adequately quantify intra-tumor heterogeneity for more effective cancer treatment.
Collapse
Affiliation(s)
- Jack Kuipers
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, 4058, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel, 4058, Switzerland
| | - Katharina Jahn
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, 4058, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel, 4058, Switzerland
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, Princeton, New Jersey 08540, USA
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, 4058, Switzerland.,SIB Swiss Institute of Bioinformatics, Basel, 4058, Switzerland
| |
Collapse
|