1
|
Rossi N, Gigante N, Vitacolonna N, Piazza C. Inferring Markov Chains to Describe Convergent Tumor Evolution With CIMICE. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:106-119. [PMID: 38015671 DOI: 10.1109/tcbb.2023.3337258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2023]
Abstract
The field of tumor phylogenetics focuses on studying the differences within cancer cell populations. Many efforts are done within the scientific community to build cancer progression models trying to understand the heterogeneity of such diseases. These models are highly dependent on the kind of data used for their construction, therefore, as the experimental technologies evolve, it is of major importance to exploit their peculiarities. In this work we describe a cancer progression model based on Single Cell DNA Sequencing data. When constructing the model, we focus on tailoring the formalism on the specificity of the data. We operate by defining a minimal set of assumptions needed to reconstruct a flexible DAG structured model, capable of identifying progression beyond the limitation of the infinite site assumption. Our proposal is conservative in the sense that we aim to neither discard nor infer knowledge which is not represented in the data. We provide simulations and analytical results to show the features of our model, test it on real data, show how it can be integrated with other approaches to cope with input noise. Moreover, our framework can be exploited to produce simulated data that follows our theoretical assumptions. Finally, we provide an open source R implementation of our approach, called CIMICE, that is publicly available on BioConductor.
Collapse
|
2
|
Liu X, Griffiths JI, Bishara I, Liu J, Bild AH, Chang JT. Phylogenetic inference from single-cell RNA-seq data. Sci Rep 2023; 13:12854. [PMID: 37553438 PMCID: PMC10409753 DOI: 10.1038/s41598-023-39995-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Accepted: 08/03/2023] [Indexed: 08/10/2023] Open
Abstract
Tumors are comprised of subpopulations of cancer cells that harbor distinct genetic profiles and phenotypes that evolve over time and during treatment. By reconstructing the course of cancer evolution, we can understand the acquisition of the malignant properties that drive tumor progression. Unfortunately, recovering the evolutionary relationships of individual cancer cells linked to their phenotypes remains a difficult challenge. To address this need, we have developed PhylinSic, a method that reconstructs the phylogenetic relationships among cells linked to their gene expression profiles from single cell RNA-sequencing (scRNA-Seq) data. This method calls nucleotide bases using a probabilistic smoothing approach and then estimates a phylogenetic tree using a Bayesian modeling algorithm. We showed that PhylinSic identified evolutionary relationships underpinning drug selection and metastasis and was sensitive enough to identify subclones from genetic drift. We found that breast cancer tumors resistant to chemotherapies harbored multiple genetic lineages that independently acquired high K-Ras and β-catenin, suggesting that therapeutic strategies may need to control multiple lineages to be durable. These results demonstrated that PhylinSic can reconstruct evolution and link the genotypes and phenotypes of cells across monophyletic tumors using scRNA-Seq.
Collapse
Affiliation(s)
- Xuan Liu
- Department of Integrative Biology & Pharmacology, University of Texas Health Science Center at Houston, 6431 Fannin St, MSB 4.218, Houston, TX, 77030, USA
| | - Jason I Griffiths
- Division of Molecular Pharmacology, Department of Medical Oncology & Clinical Therapeutics, City of Hope, Monrovia, CA, USA
| | - Isaac Bishara
- Division of Molecular Pharmacology, Department of Medical Oncology & Clinical Therapeutics, City of Hope, Monrovia, CA, USA
| | - Jiayi Liu
- Department of Integrative Biology & Pharmacology, University of Texas Health Science Center at Houston, 6431 Fannin St, MSB 4.218, Houston, TX, 77030, USA
| | - Andrea H Bild
- Division of Molecular Pharmacology, Department of Medical Oncology & Clinical Therapeutics, City of Hope, Monrovia, CA, USA
| | - Jeffrey T Chang
- Department of Integrative Biology & Pharmacology, University of Texas Health Science Center at Houston, 6431 Fannin St, MSB 4.218, Houston, TX, 77030, USA.
| |
Collapse
|
3
|
Miura S, Dolker T, Sanderford M, Kumar S. Improving cellular phylogenies through the integrated use of mutation order and optimality principles. Comput Struct Biotechnol J 2023; 21:3894-3903. [PMID: 37602230 PMCID: PMC10432911 DOI: 10.1016/j.csbj.2023.07.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2023] [Revised: 07/10/2023] [Accepted: 07/19/2023] [Indexed: 08/22/2023] Open
Abstract
The study of tumor evolution is being revolutionalized by single-cell sequencing technologies that survey the somatic variation of cancer cells. In these endeavors, reliable inference of the evolutionary relationship of single cells is a key step. However, single-cell sequences contain many errors and missing bases, which necessitate advancing standard molecular phylogenetics approaches for applications in analyzing these datasets. We have developed a computational approach that integratively applies standard phylogenetic optimality principles and patterns of co-occurrence of sequence variations to produce more expansive and accurate cellular phylogenies from single-cell sequence datasets. We found the new approach to also perform well for CRISPR/Cas9 genome editing datasets, suggesting that it can be useful for various applications. We apply the new approach to some empirical datasets to showcase its use for reconstructing recurrent mutations and mutational reversals as well as for phylodynamics analysis to infer metastatic cell migrations between tumors.
Collapse
Affiliation(s)
- Sayaka Miura
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA
- Department of Biology, Temple University, Philadelphia, PA 19122, USA
| | - Tenzin Dolker
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA
- Department of Biology, Temple University, Philadelphia, PA 19122, USA
| | - Maxwell Sanderford
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA
- Department of Biology, Temple University, Philadelphia, PA 19122, USA
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA
- Department of Biology, Temple University, Philadelphia, PA 19122, USA
| |
Collapse
|
4
|
Chen Z, Zhang B, Gong F, Wan L, Ma L. RobustTree: An adaptive, robust PCA algorithm for embedded tree structure recovery from single-cell sequencing data. Front Genet 2023; 14:1110899. [PMID: 36968591 PMCID: PMC10030613 DOI: 10.3389/fgene.2023.1110899] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Accepted: 02/13/2023] [Indexed: 03/11/2023] Open
Abstract
Robust Principal Component Analysis (RPCA) offers a powerful tool for recovering a low-rank matrix from highly corrupted data, with growing applications in computational biology. Biological processes commonly form intrinsic hierarchical structures, such as tree structures of cell development trajectories and tumor evolutionary history. The rapid development of single-cell sequencing (SCS) technology calls for the recovery of embedded tree structures from noisy and heterogeneous SCS data. In this study, we propose RobustTree, a unified framework to reconstruct the inherent topological structure underlying high-dimensional data with noise. By extending RPCA to handle tree structure optimization, RobustTree leverages data denoising, clustering, and tree structure reconstruction. It solves the tree optimization problem with an adaptive parameter selection scheme that we proposed. In addition to recovering real datasets, RobustTree can reconstruct continuous topological structure and discrete-state topological structure of underlying SCS data. We apply RobustTree on multiple synthetic and real datasets and demonstrate its high accuracy and robustness when analyzing high-noise SCS data with embedded complex structures. The code is available at https://github.com/ucasdp/RobustTree.
Collapse
Affiliation(s)
- Ziwei Chen
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, United States
| | - Bingwei Zhang
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Fuzhou Gong
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Lin Wan
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China
- *Correspondence: Lin Wan, ; Liang Ma,
| | - Liang Ma
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China
- Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- *Correspondence: Lin Wan, ; Liang Ma,
| |
Collapse
|
5
|
A phylogenetic approach to study the evolution of somatic mutational processes in cancer. Commun Biol 2022; 5:617. [PMID: 35732905 PMCID: PMC9217972 DOI: 10.1038/s42003-022-03560-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Accepted: 06/07/2022] [Indexed: 11/09/2022] Open
Abstract
Cancer cell genomes change continuously due to mutations, and mutational processes change over time in patients, leaving dynamic signatures in the accumulated genomic variation in tumors. Many computational methods detect the relative activities of known mutation signatures. However, these methods may produce erroneous signatures when applied to individual branches in cancer cell phylogenies. Here, we show that the inference of branch-specific mutational signatures can be improved through a joint analysis of the collections of mutations mapped on proximal branches of the cancer cell phylogeny. This approach reduces the false-positive discovery rate of branch-specific signatures and can sometimes detect faint signatures. An analysis of empirical data from 61 lung cancer patients supports trends based on computer-simulated datasets for which the correct signatures are known. In lung cancer somatic variation, we detect a decreasing trend of smoking-related mutational processes over time and an increasing influence of APOBEC mutational processes as the tumor evolution progresses. These analyses also reveal patterns of conservation and divergence of mutational processes in cell lineages within patients.
Collapse
|
6
|
Valecha M, Posada D. Somatic variant calling from single-cell DNA sequencing data. Comput Struct Biotechnol J 2022; 20:2978-2985. [PMID: 35782734 PMCID: PMC9218383 DOI: 10.1016/j.csbj.2022.06.013] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Revised: 06/06/2022] [Accepted: 06/06/2022] [Indexed: 11/03/2022] Open
Abstract
Single-cell sequencing has gained popularity in recent years. Despite its numerous applications, single-cell DNA sequencing data is highly error-prone due to technical biases arising from uneven sequencing coverage, allelic dropout, and amplification error. With these artifacts, the identification of somatic genomic variants becomes a challenging task, and over the years, several methods have been developed explicitly for this type of data. Single-cell variant callers implement distinct strategies, make different use of the data, and typically result in many discordant calls when applied to real data. Here, we review current approaches for single-cell variant calling, emphasizing single nucleotide variants. We highlight their potential benefits and shortcomings to help users choose a suitable tool for their data at hand.
Collapse
Key Words
- ADO, allelic dropout
- Allele dropout
- Amplification error
- CNV, copy number variant
- Indel, short insertion or deletion
- LDO, locus dropout
- SNV, single nucleotide variant
- SV, structural variant
- Single-cell genomics
- Somatic variants
- VAF, variant allele frequency
- Variant calling
- hSNP, heterozygous single-nucleotide polymorphism
- scATAC-seq, single-cell sequencing assay for transposase-accessible chromatin
- scDNA-seq, single-cell DNA sequencing
- scHi-C, single-cell Hi-C sequencing
- scMethyl-seq, single-cell Methylation sequencing
- scRNA-seq, single-cell RNA sequencing
- scWGA, single-cell whole-genome amplification
Collapse
Affiliation(s)
- Monica Valecha
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Spain
| | - David Posada
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Spain
- Department of Biochemistry, Genetics, and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| |
Collapse
|
7
|
Feng X, Chen L. SCSilicon: a tool for synthetic single-cell DNA sequencing data generation. BMC Genomics 2022; 23:359. [PMID: 35546390 PMCID: PMC9092674 DOI: 10.1186/s12864-022-08566-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Accepted: 04/19/2022] [Indexed: 11/25/2022] Open
Abstract
Background Single-cell DNA sequencing is getting indispensable in the study of cell-specific cancer genomics. The performance of computational tools that tackle single-cell genome aberrations may be nevertheless undervalued or overvalued, owing to the insufficient size of benchmarking data. In silicon simulation is a cost-effective approach to generate as many single-cell genomes as possible in a controlled manner to make reliable and valid benchmarking. Results This study proposes a new tool, SCSilicon, which efficiently generates single-cell in silicon DNA reads with minimum manual intervention. SCSilicon automatically creates a set of genomic aberrations, including SNP, SNV, Indel, and CNV. Besides, SCSilicon yields the ground truth of CNV segmentation breakpoints and subclone cell labels. We have manually inspected a series of synthetic variations. We conducted a sanity check of the start-of-the-art single-cell CNV callers and found SCYN was the most robust one. Conclusions SCSilicon is a user-friendly software package for users to develop and benchmark single-cell CNV callers. Source code of SCSilicon is available at https://github.com/xikanfeng2/SCSilicon. Supplementary Information The online version contains supplementary material available at (10.1186/s12864-022-08566-w).
Collapse
Affiliation(s)
- Xikang Feng
- School of Software, Northwestern Polytechnical University, Xi'an, Shaanxi, 710072, China.
| | - Lingxi Chen
- Department of Computer Science, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong, China
| |
Collapse
|
8
|
Chen Z, Gong F, Wan L, Ma L. BiTSC
2: Bayesian inference of tumor clonal tree by joint analysis of single-cell SNV and CNA data. Brief Bioinform 2022; 23:6562684. [PMID: 35368055 PMCID: PMC9116244 DOI: 10.1093/bib/bbac092] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Revised: 01/29/2022] [Accepted: 02/23/2022] [Indexed: 12/14/2022] Open
Abstract
Abstract
The rapid development of single-cell DNA sequencing (scDNA-seq) technology has greatly enhanced the resolution of tumor cell profiling, providing an unprecedented perspective in characterizing intra-tumoral heterogeneity and understanding tumor progression and metastasis. However, prominent algorithms for constructing tumor phylogeny based on scDNA-seq data usually only take single nucleotide variations (SNVs) as markers, failing to consider the effect caused by copy number alterations (CNAs). Here, we propose BiTSC$^2$, Bayesian inference of Tumor clonal Tree by joint analysis of Single-Cell SNV and CNA data. BiTSC$^2$ takes raw reads from scDNA-seq as input, accounts for the overlapping of CNA and SNV, models allelic dropout rate, sequencing errors and missing rate, as well as assigns single cells into subclones. By applying Markov Chain Monte Carlo sampling, BiTSC$^2$ can simultaneously estimate the subclonal scCNA and scSNV genotype matrices, subclonal assignments and tumor subclonal evolutionary tree. In comparison with existing methods on synthetic and real tumor data, BiTSC$^2$ shows high accuracy in genotype recovery, subclonal assignment and tree reconstruction. BiTSC$^2$ also performs robustly in dealing with scDNA-seq data with low sequencing depth and variant missing rate. BiTSC$^2$ software is available at https://github.com/ucasdp/BiTSC2.
Collapse
Affiliation(s)
- Ziwei Chen
- Institute of Zoology, Chinese Academy of Sciences, Beichen West Road, 100101, Beijing, Country
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Zhongguancun East Road, 100190, Beijing, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Yuquan Road, 100049, Beijing, China
| | - Fuzhou Gong
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Zhongguancun East Road, 100190, Beijing, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Yuquan Road, 100049, Beijing, China
| | - Lin Wan
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Zhongguancun East Road, 100190, Beijing, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Yuquan Road, 100049, Beijing, China
| | - Liang Ma
- Institute of Zoology, Chinese Academy of Sciences, Beichen West Road, 100101, Beijing, Country
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Yuquan Road, 100049, Beijing, China
| |
Collapse
|
9
|
Huzar J, Kim H, Kumar S, Miura S. MOCA for Integrated Analysis of Gene Expression and Genetic Variation in Single Cells. Front Genet 2022; 13:831040. [PMID: 35432484 PMCID: PMC9009314 DOI: 10.3389/fgene.2022.831040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Accepted: 03/07/2022] [Indexed: 11/17/2022] Open
Abstract
In cancer, somatic mutations occur continuously, causing cell populations to evolve. These somatic mutations result in the evolution of cellular gene expression patterns that can also change due to epigenetic modifications and environmental changes. By exploring the concordance of gene expression changes with molecular evolutionary trajectories of cells, we can examine the role of somatic variation on the evolution of gene expression patterns. We present Multi-Omics Concordance Analysis (MOCA) software to jointly analyze gene expressions and genetic variations from single-cell RNA sequencing profiles. MOCA outputs cells and genes showing convergent and divergent gene expression patterns in functional genomics.
Collapse
Affiliation(s)
- Jared Huzar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, United States
- Department of Biology, Temple University, Philadelphia, PA, United States
| | - Hannah Kim
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, United States
- Department of Biology, Temple University, Philadelphia, PA, United States
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, United States
- Department of Biology, Temple University, Philadelphia, PA, United States
- Center for Excellence in Genomic Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Sayaka Miura
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, United States
- Department of Biology, Temple University, Philadelphia, PA, United States
| |
Collapse
|
10
|
Farswan A, Gupta R, Gupta A. ARCANE-ROG: Algorithm for Reconstruction of Cancer Evolution from single-cell data using Robust Graph Learning. J Biomed Inform 2022; 129:104055. [PMID: 35337943 DOI: 10.1016/j.jbi.2022.104055] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Revised: 02/17/2022] [Accepted: 03/12/2022] [Indexed: 11/27/2022]
Abstract
Tumor heterogeneity, marked by the presence of divergent clonal subpopulations of tumor cells, impedes the treatment response in cancer patients. Single-cell sequencing technology provides substantial prospects to gain an in-depth understanding of the cellular phenotypic variability driving tumor progression. A comprehensive insight into the intra-tumor heterogeneity may further assist in dealing with the treatment-resistant clones in cancer patients, thereby improving their overall survival. However, this task is hampered due to the challenges associated with the single-cell data, such as false positives, false negatives and missing bases, and the increase in their size. As a result, the computational cost of the existing methods increases, thereby limiting their usage. In this work, we propose a robust graph learning-based method, ARCANE-ROG (Algorithm for Reconstruction of CANcer Evolution via RObust Graph learning), for inferring clonal evolution from single-cell datasets. The first step of the proposed method is a joint framework of denoising with data imputation for the noisy and incomplete matrix while simultaneously learning an adjacency graph. Both the operations in the joint framework boost each other such that the overall performance of the denoising algorithm is improved. In the second step, an optimal number of clusters are identified via the Leiden method. In the last step, clonal evolution trees are inferred via a minimum spanning tree algorithm. The method has been benchmarked against a state-of-the-art method, RobustClone, using simulated datasets of varying sizes and five real datasets. The performance of our proposed method is found to be significantly superior (p-value < 0.05) in terms of reconstruction error, False Positive to False Negative (FPFN) ratio, tree distance error and V-measure compared to the other method. Overall, the proposed method is an improvement over the existing methods as it enhances cluster assignment and inference on clonal hierarchies.
Collapse
Affiliation(s)
- Akanksha Farswan
- SBILab, Department of ECE, Indraprastha Institute of Information Technology, New Delhi, India
| | - Ritu Gupta
- Laboratory Oncology Unit, Dr. B.R.A. IRCH, AIIMS, New Delhi, India.
| | - Anubha Gupta
- SBILab, Department of ECE, Indraprastha Institute of Information Technology, New Delhi, India.
| |
Collapse
|
11
|
Kozlov A, Alves JM, Stamatakis A, Posada D. CellPhy: accurate and fast probabilistic inference of single-cell phylogenies from scDNA-seq data. Genome Biol 2022; 23:37. [PMID: 35081992 PMCID: PMC8790911 DOI: 10.1186/s13059-021-02583-w] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Accepted: 12/20/2021] [Indexed: 01/15/2023] Open
Abstract
We introduce CellPhy, a maximum likelihood framework for inferring phylogenetic trees from somatic single-cell single-nucleotide variants. CellPhy leverages a finite-site Markov genotype model with 16 diploid states and considers amplification error and allelic dropout. We implement CellPhy into RAxML-NG, a widely used phylogenetic inference package that provides statistical confidence measurements and scales well on large datasets with hundreds or thousands of cells. Comprehensive simulations suggest that CellPhy is more robust to single-cell genomics errors and outperforms state-of-the-art methods under realistic scenarios, both in accuracy and speed. CellPhy is freely available at https://github.com/amkozlov/cellphy .
Collapse
Affiliation(s)
- Alexey Kozlov
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, 69118 Heidelberg, Germany
- Institute for Theoretical Informatics, Karlsruhe Institute of Technology, 76128 Karlsruhe, Germany
| | - Joao M. Alves
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics, and Immunology, Universidade de Vigo, 36310 Vigo, Spain
- Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Vigo, Spain
| | - Alexandros Stamatakis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, 69118 Heidelberg, Germany
| | - David Posada
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics, and Immunology, Universidade de Vigo, 36310 Vigo, Spain
- Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Vigo, Spain
| |
Collapse
|
12
|
Abstract
Integration of ecological and evolutionary features has begun to understand the interplay of tumor heterogeneity, microenvironment, and metastatic potential. Developing a theoretical framework is intrinsic to deciphering tumors' tremendous spatial and longitudinal genetic variation patterns in patients. Here, we propose that tumors can be considered evolutionary island-like ecosystems, that is, isolated systems that undergo evolutionary and spatiotemporal dynamic processes that shape tumor microenvironments and drive the migration of cancer cells. We examine attributes of insular systems and causes of insularity, such as physical distance and connectivity. These properties modulate migration rates of cancer cells through processes causing spatial and temporal isolation of the organs and tissues functioning as a supply of cancer cells for new colonizations. We discuss hypotheses, predictions, and limitations of tumors as islands analogy. We present emerging evidence of tumor insularity in different cancer types and discuss their relevance to the islands model. We suggest that the engagement of tumor insularity into conceptual and mathematical models holds promise to illuminate cancer evolution, tumor heterogeneity, and metastatic potential of cells.
Collapse
Affiliation(s)
- Antonia Chroni
- Institute for Genomics and Evolutionary Medicine, Temple University, USA
- Department of Biology, Temple University, USA
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, USA
- Department of Biology, Temple University, USA
- Center for Excellence in Genomic Medicine Research, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
13
|
RDAClone: Deciphering Tumor Heterozygosity through Single-Cell Genomics Data Analysis with Robust Deep Autoencoder. Genes (Basel) 2021; 12:genes12121847. [PMID: 34946794 PMCID: PMC8701080 DOI: 10.3390/genes12121847] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Revised: 11/19/2021] [Accepted: 11/22/2021] [Indexed: 12/27/2022] Open
Abstract
Rapid advances in single-cell genomics sequencing (SCGS) have allowed researchers to characterize tumor heterozygosity with unprecedented resolution and reveal the phylogenetic relationships between tumor cells or clones. However, high sequencing error rates of current SCGS data, i.e., false positives, false negatives, and missing bases, severely limit its application. Here, we present a deep learning framework, RDAClone, to recover genotype matrices from noisy data with an extended robust deep autoencoder, cluster cells into subclones by the Louvain-Jaccard method, and further infer evolutionary relationships between subclones by the minimum spanning tree. Studies on both simulated and real datasets demonstrate its robustness and superiority in data denoising, cell clustering, and evolutionary tree reconstruction, particularly for large datasets.
Collapse
|
14
|
Benchmarking pipelines for subclonal deconvolution of bulk tumour sequencing data. Nat Commun 2021; 12:6396. [PMID: 34737285 PMCID: PMC8569188 DOI: 10.1038/s41467-021-26698-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Accepted: 10/20/2021] [Indexed: 11/09/2022] Open
Abstract
Intratumour heterogeneity provides tumours with the ability to adapt and acquire treatment resistance. The development of more effective and personalised treatments for cancers, therefore, requires accurate characterisation of the clonal architecture of tumours, enabling evolutionary dynamics to be tracked. Many methods exist for achieving this from bulk tumour sequencing data, involving identifying mutations and performing subclonal deconvolution, but there is a lack of systematic benchmarking to inform researchers on which are most accurate, and how dataset characteristics impact performance. To address this, we use the most comprehensive tumour genome simulation tool available for such purposes to create 80 bulk tumour whole exome sequencing datasets of differing depths, tumour complexities, and purities, and use these to benchmark subclonal deconvolution pipelines. We conclude that i) tumour complexity does not impact accuracy, ii) increasing either purity or purity-corrected sequencing depth improves accuracy, and iii) the optimal pipeline consists of Mutect2, FACETS and PyClone-VI. We have made our benchmarking datasets publicly available for future use.
Collapse
|
15
|
Kumar S, Tao Q, Weaver S, Sanderford M, Caraballo-Ortiz MA, Sharma S, Pond SLK, Miura S. An Evolutionary Portrait of the Progenitor SARS-CoV-2 and Its Dominant Offshoots in COVID-19 Pandemic. Mol Biol Evol 2021; 38:3046-3059. [PMID: 33942847 PMCID: PMC8135569 DOI: 10.1093/molbev/msab118] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Global sequencing of genomes of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has continued to reveal new genetic variants that are the key to unraveling its early evolutionary history and tracking its global spread over time. Here we present the heretofore cryptic mutational history and spatiotemporal dynamics of SARS-CoV-2 from an analysis of thousands of high-quality genomes. We report the likely most recent common ancestor of SARS-CoV-2, reconstructed through a novel application and advancement of computational methods initially developed to infer the mutational history of tumor cells in a patient. This progenitor genome differs from genomes of the first coronaviruses sampled in China by three variants, implying that none of the earliest patients represent the index case or gave rise to all the human infections. However, multiple coronavirus infections in China and the United States harbored the progenitor genetic fingerprint in January 2020 and later, suggesting that the progenitor was spreading worldwide months before and after the first reported cases of COVID-19 in China. Mutations of the progenitor and its offshoots have produced many dominant coronavirus strains that have spread episodically over time. Fingerprinting based on common mutations reveals that the same coronavirus lineage has dominated North America for most of the pandemic in 2020. There have been multiple replacements of predominant coronavirus strains in Europe and Asia as well as continued presence of multiple high-frequency strains in Asia and North America. We have developed a continually updating dashboard of global evolution and spatiotemporal trends of SARS-CoV-2 spread (http://sars2evo.datamonkey.org/).
Collapse
Affiliation(s)
- Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
- Department of Biology, Temple University, Philadelphia, PA, USA
- Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Qiqing Tao
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
- Department of Biology, Temple University, Philadelphia, PA, USA
| | - Steven Weaver
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
- Department of Biology, Temple University, Philadelphia, PA, USA
| | - Maxwell Sanderford
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
- Department of Biology, Temple University, Philadelphia, PA, USA
| | - Marcos A Caraballo-Ortiz
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
- Department of Biology, Temple University, Philadelphia, PA, USA
| | - Sudip Sharma
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
- Department of Biology, Temple University, Philadelphia, PA, USA
| | - Sergei L K Pond
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
- Department of Biology, Temple University, Philadelphia, PA, USA
| | - Sayaka Miura
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
- Department of Biology, Temple University, Philadelphia, PA, USA
| |
Collapse
|
16
|
Yu Z, Liu H, Du F, Tang X. GRMT: Generative Reconstruction of Mutation Tree From Scratch Using Single-Cell Sequencing Data. Front Genet 2021; 12:692964. [PMID: 34149820 PMCID: PMC8212059 DOI: 10.3389/fgene.2021.692964] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Accepted: 05/17/2021] [Indexed: 12/11/2022] Open
Abstract
Single-cell sequencing (SCS) now promises the landscape of genetic diversity at single cell level, and is particularly useful to reconstruct the evolutionary history of tumor. There are multiple types of noise that make the SCS data notoriously error-prone, and significantly complicate tumor tree reconstruction. Existing methods for tumor phylogeny estimation suffer from either high computational intensity or low-resolution indication of clonal architecture, giving a necessity of developing new methods for efficient and accurate reconstruction of tumor trees. We introduce GRMT (Generative Reconstruction of Mutation Tree from scratch), a method for inferring tumor mutation tree from SCS data. GRMT exploits the k-Dollo parsimony model to allow each mutation to be gained once and lost at most k times. Under this constraint on mutation evolution, GRMT searches for mutation tree structures from a perspective of tree generation from scratch, and implements it to an iterative process that gradually increases the tree size by introducing a new mutation per time until a complete tree structure that contains all mutations is obtained. This enables GRMT to efficiently recover the chronological order of mutations and scale well to large datasets. Extensive evaluations on simulated and real datasets suggest GRMT outperforms the state-of-the-arts in multiple performance metrics. The GRMT software is freely available at https://github.com/qasimyu/grmt.
Collapse
Affiliation(s)
- Zhenhua Yu
- School of Information Engineering, Ningxia University, Yinchuan, China.,Collaborative Innovation Center for Ningxia Big Data and Artificial Intelligence Co-founded by Ningxia Municipality and Ministry of Education, Ningxia University, Yinchuan, China
| | - Huidong Liu
- School of Information Engineering, Ningxia University, Yinchuan, China
| | - Fang Du
- School of Information Engineering, Ningxia University, Yinchuan, China.,Collaborative Innovation Center for Ningxia Big Data and Artificial Intelligence Co-founded by Ningxia Municipality and Ministry of Education, Ningxia University, Yinchuan, China
| | - Xiaofen Tang
- School of Information Engineering, Ningxia University, Yinchuan, China.,Collaborative Innovation Center for Ningxia Big Data and Artificial Intelligence Co-founded by Ningxia Municipality and Ministry of Education, Ningxia University, Yinchuan, China
| |
Collapse
|
17
|
Kumar S, Tao Q, Weaver S, Sanderford M, Caraballo-Ortiz MA, Sharma S, Pond SLK, Miura S. An evolutionary portrait of the progenitor SARS-CoV-2 and its dominant offshoots in COVID-19 pandemic. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2021:2020.09.24.311845. [PMID: 32995781 PMCID: PMC7523107 DOI: 10.1101/2020.09.24.311845] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
We report the likely most recent common ancestor of SARS-CoV-2 - the coronavirus that causes COVID-19. This progenitor SARS-CoV-2 genome was recovered through a novel application and advancement of computational methods initially developed to reconstruct the mutational history of tumor cells in a patient. The progenitor differs from the earliest coronaviruses sampled in China by three variants, implying that none of the earliest patients represent the index case or gave rise to all the human infections. However, multiple coronavirus infections in China and the USA harbored the progenitor genetic fingerprint in January 2020 and later, suggesting that the progenitor was spreading worldwide as soon as weeks after the first reported cases of COVID-19. Mutations of the progenitor and its offshoots have produced many dominant coronavirus strains, which have spread episodically over time. Fingerprinting based on common mutations reveals that the same coronavirus lineage has dominated North America for most of the pandemic. There have been multiple replacements of predominant coronavirus strains in Europe and Asia and the continued presence of multiple high-frequency strains in Asia and North America. We provide a continually updating dashboard of global evolution and spatiotemporal trends of SARS-CoV-2 spread (http://sars2evo.datamonkey.org/).
Collapse
Affiliation(s)
- Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
- Department of Biology, Temple University, Philadelphia, PA
| | - Qiqing Tao
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
- Department of Biology, Temple University, Philadelphia, PA
| | - Steven Weaver
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
- Department of Biology, Temple University, Philadelphia, PA
| | - Maxwell Sanderford
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
- Department of Biology, Temple University, Philadelphia, PA
| | - Marcos A. Caraballo-Ortiz
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
- Department of Biology, Temple University, Philadelphia, PA
| | - Sudip Sharma
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
- Department of Biology, Temple University, Philadelphia, PA
| | - Sergei L. K. Pond
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
- Department of Biology, Temple University, Philadelphia, PA
| | - Sayaka Miura
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
- Department of Biology, Temple University, Philadelphia, PA
| |
Collapse
|
18
|
Chen Z, Gong F, Wan L, Ma L. RobustClone: a robust PCA method for tumor clone and evolution inference from single-cell sequencing data. Bioinformatics 2020; 36:3299-3306. [PMID: 32159762 DOI: 10.1093/bioinformatics/btaa172] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2019] [Revised: 02/10/2020] [Accepted: 03/06/2020] [Indexed: 12/17/2022] Open
Abstract
MOTIVATION Single-cell sequencing (SCS) data provide unprecedented insights into intratumoral heterogeneity. With SCS, we can better characterize clonal genotypes and reconstruct phylogenetic relationships of tumor cells/clones. However, SCS data are often error-prone, making their computational analysis challenging. RESULTS To infer the clonal evolution in tumor from the error-prone SCS data, we developed an efficient computational framework, termed RobustClone. It recovers the true genotypes of subclones based on the extended robust principal component analysis, a low-rank matrix decomposition method, and reconstructs the subclonal evolutionary tree. RobustClone is a model-free method, which can be applied to both single-cell single nucleotide variation (scSNV) and single-cell copy-number variation (scCNV) data. It is efficient and scalable to large-scale datasets. We conducted a set of systematic evaluations on simulated datasets and demonstrated that RobustClone outperforms state-of-the-art methods in large-scale data both in accuracy and efficiency. We further validated RobustClone on two scSNV and two scCNV datasets and demonstrated that RobustClone could recover genotype matrix and infer the subclonal evolution tree accurately under various scenarios. In particular, RobustClone revealed the spatial progression patterns of subclonal evolution on the large-scale 10X Genomics scCNV breast cancer dataset. AVAILABILITY AND IMPLEMENTATION RobustClone software is available at https://github.com/ucasdp/RobustClone. CONTACT lwan@amss.ac.cn or maliang@ioz.ac.cn. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ziwei Chen
- NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China.,School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Fuzhou Gong
- NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China.,School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Lin Wan
- NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China.,School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Liang Ma
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| |
Collapse
|
19
|
Wu Y. Accurate and efficient cell lineage tree inference from noisy single cell data: the maximum likelihood perfect phylogeny approach. Bioinformatics 2020; 36:742-750. [PMID: 31504211 DOI: 10.1093/bioinformatics/btz676] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2019] [Revised: 08/21/2019] [Accepted: 08/27/2019] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Cells in an organism share a common evolutionary history, called cell lineage tree. Cell lineage tree can be inferred from single cell genotypes at genomic variation sites. Cell lineage tree inference from noisy single cell data is a challenging computational problem. Most existing methods for cell lineage tree inference assume uniform uncertainty in genotypes. A key missing aspect is that real single cell data usually has non-uniform uncertainty in individual genotypes. Moreover, existing methods are often sampling based and can be very slow for large data. RESULTS In this article, we propose a new method called ScisTree, which infers cell lineage tree and calls genotypes from noisy single cell genotype data. Different from most existing approaches, ScisTree works with genotype probabilities of individual genotypes (which can be computed by existing single cell genotype callers). ScisTree assumes the infinite sites model. Given uncertain genotypes with individualized probabilities, ScisTree implements a fast heuristic for inferring cell lineage tree and calling the genotypes that allow the so-called perfect phylogeny and maximize the likelihood of the genotypes. Through simulation, we show that ScisTree performs well on the accuracy of inferred trees, and is much more efficient than existing methods. The efficiency of ScisTree enables new applications including imputation of the so-called doublets. AVAILABILITY AND IMPLEMENTATION The program ScisTree is available for download at: https://github.com/yufengwudcs/ScisTree. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yufeng Wu
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269, USA
| |
Collapse
|
20
|
DiNardo Z, Tomlinson K, Ritz A, Oesper L. Distance measures for tumor evolutionary trees. Bioinformatics 2020; 36:2090-2097. [PMID: 31750900 PMCID: PMC7141873 DOI: 10.1093/bioinformatics/btz869] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Revised: 09/04/2019] [Accepted: 11/19/2019] [Indexed: 12/14/2022] Open
Abstract
MOTIVATION There has been recent increased interest in using algorithmic methods to infer the evolutionary tree underlying the developmental history of a tumor. Quantitative measures that compare such trees are vital to a number of different applications including benchmarking tree inference methods and evaluating common inheritance patterns across patients. However, few appropriate distance measures exist, and those that do have low resolution for differentiating trees or do not fully account for the complex relationship between tree topology and the inheritance of the mutations labeling that topology. RESULTS Here, we present two novel distance measures, Common Ancestor Set distance (CASet) and Distinctly Inherited Set Comparison distance (DISC), that are specifically designed to account for the subclonal mutation inheritance patterns characteristic of tumor evolutionary trees. We apply CASet and DISC to multiple simulated datasets and two breast cancer datasets and show that our distance measures allow for more nuanced and accurate delineation between tumor evolutionary trees than existing distance measures. AVAILABILITY AND IMPLEMENTATION Implementations of CASet and DISC are freely available at: https://bitbucket.org/oesperlab/stereodist. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zach DiNardo
- Department of Computer Science, Carleton College, Northfield, MN 55057, USA
| | - Kiran Tomlinson
- Department of Computer Science, Carleton College, Northfield, MN 55057, USA
- Department of Computer Science, Cornell University, Ithaca, NY 14853, USA
| | - Anna Ritz
- Department of Biology, Reed College, Portland, OR 97202, USA
| | - Layla Oesper
- Department of Computer Science, Carleton College, Northfield, MN 55057, USA
| |
Collapse
|
21
|
Miura S, Vu T, Deng J, Buturla T, Oladeinde O, Choi J, Kumar S. Power and pitfalls of computational methods for inferring clone phylogenies and mutation orders from bulk sequencing data. Sci Rep 2020; 10:3498. [PMID: 32103044 PMCID: PMC7044161 DOI: 10.1038/s41598-020-59006-2] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2020] [Accepted: 01/23/2020] [Indexed: 12/13/2022] Open
Abstract
Tumors harbor extensive genetic heterogeneity in the form of distinct clone genotypes that arise over time and across different tissues and regions in cancer. Many computational methods produce clone phylogenies from population bulk sequencing data collected from multiple tumor samples from a patient. These clone phylogenies are used to infer mutation order and clone origins during tumor progression, rendering the selection of the appropriate clonal deconvolution method critical. Surprisingly, absolute and relative accuracies of these methods in correctly inferring clone phylogenies are yet to consistently assessed. Therefore, we evaluated the performance of seven computational methods. The accuracy of the reconstructed mutation order and inferred clone groupings varied extensively among methods. All the tested methods showed limited ability to identify ancestral clone sequences present in tumor samples correctly. The presence of copy number alterations, the occurrence of multiple seeding events among tumor sites during metastatic tumor evolution, and extensive intermixture of cancer cells among tumors hindered the detection of clones and the inference of clone phylogenies for all methods tested. Overall, CloneFinder, MACHINA, and LICHeE showed the highest overall accuracy, but none of the methods performed well for all simulated datasets. So, we present guidelines for selecting methods for data analysis.
Collapse
Affiliation(s)
- Sayaka Miura
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA.,Department of Biology, Temple University, Philadelphia, PA, 19122, USA
| | - Tracy Vu
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA.,Department of Biology, Temple University, Philadelphia, PA, 19122, USA
| | - Jiamin Deng
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA.,Department of Biology, Temple University, Philadelphia, PA, 19122, USA
| | - Tiffany Buturla
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA.,Department of Biology, Temple University, Philadelphia, PA, 19122, USA
| | - Olumide Oladeinde
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA.,Department of Biology, Temple University, Philadelphia, PA, 19122, USA
| | - Jiyeong Choi
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA.,Department of Biology, Temple University, Philadelphia, PA, 19122, USA
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA. .,Department of Biology, Temple University, Philadelphia, PA, 19122, USA. .,Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia.
| |
Collapse
|
22
|
Yu Z, Du F, Sun X, Li A. SCSsim: an integrated tool for simulating single-cell genome sequencing data. Bioinformatics 2020; 36:1281-1282. [PMID: 31584615 PMCID: PMC7703785 DOI: 10.1093/bioinformatics/btz713] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2019] [Revised: 08/20/2019] [Accepted: 09/15/2019] [Indexed: 11/30/2022] Open
Abstract
Motivation Allele dropout (ADO) and unbalanced amplification of alleles are main technical issues of single-cell sequencing (SCS), and effectively emulating these issues is necessary for reliably benchmarking SCS-based bioinformatics tools. Unfortunately, currently available sequencing simulators are free of whole-genome amplification involved in SCS technique and therefore not suited for generating SCS datasets. We develop a new software package (SCSsim) that can efficiently simulate SCS datasets in a parallel fashion with minimal user intervention. SCSsim first constructs the genome sequence of single cell by mimicking a complement of genomic variations under user-controlled manner, and then amplifies the genome according to MALBAC technique and finally yields sequencing reads from the amplified products based on inferred sequencing profiles. Comprehensive evaluation in simulating different ADO rates, variation detection efficiency and genome coverage demonstrates that SCSsim is a very useful tool in mimicking single-cell sequencing data with high efficiency. Availability and implementation SCSsim is freely available at https://github.com/qasimyu/scssim. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhenhua Yu
- Department of Software Engineering, Ningxia University, Yinchuan 750021, China
| | - Fang Du
- Department of Software Engineering, Ningxia University, Yinchuan 750021, China
| | - Xuehong Sun
- Department of Software Engineering, Ningxia University, Yinchuan 750021, China
| | - Ao Li
- Department of Electronic Science and Technology, University of Science and Technology of China, Hefei 230027, China
| |
Collapse
|
23
|
Somarelli JA, Gardner H, Cannataro VL, Gunady EF, Boddy AM, Johnson NA, Fisk JN, Gaffney SG, Chuang JH, Li S, Ciccarelli FD, Panchenko AR, Megquier K, Kumar S, Dornburg A, DeGregori J, Townsend JP. Molecular Biology and Evolution of Cancer: From Discovery to Action. Mol Biol Evol 2020; 37:320-326. [PMID: 31642480 PMCID: PMC6993850 DOI: 10.1093/molbev/msz242] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Cancer progression is an evolutionary process. During this process, evolving cancer cell populations encounter restrictive ecological niches within the body, such as the primary tumor, circulatory system, and diverse metastatic sites. Efforts to prevent or delay cancer evolution-and progression-require a deep understanding of the underlying molecular evolutionary processes. Herein we discuss a suite of concepts and tools from evolutionary and ecological theory that can inform cancer biology in new and meaningful ways. We also highlight current challenges to applying these concepts, and propose ways in which incorporating these concepts could identify new therapeutic modes and vulnerabilities in cancer.
Collapse
Affiliation(s)
- Jason A Somarelli
- Department of Medicine, Duke University Medical Center, Durham, NC
- Duke Cancer Institute, Duke University Medical Center, Durham, NC
| | - Heather Gardner
- Sackler School of Graduate Biomedical Sciences, Tufts University, Medford, MA
| | | | - Ella F Gunady
- Department of Medicine, Duke University Medical Center, Durham, NC
| | - Amy M Boddy
- Department of Anthropology, University of California, Santa Barbara, CA
| | | | | | - Stephen G Gaffney
- Department of Biostatistics, Yale School of Public Health, New Haven, CT
| | | | - Sheng Li
- The Jackson Laboratory for Genomic Medicine, Farmington, CT
| | - Francesca D Ciccarelli
- Cancer Systems Biology Laboratory, The Francis Crick Institute, London, United Kingdom
- King’s College London, London, United Kingdom
| | - Anna R Panchenko
- Department of Pathology and Molecular Medicine, School of Medicine, Queen’s University, Kingston, ON, Canada
- Ontario Institute of Cancer Research, Toronto, ON, Canada
| | - Kate Megquier
- Broad Institute, Massachusettes Institute of Technology and Harvard University
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, and Department of Biology, Temple University, Philadelphia, PA
| | - Alex Dornburg
- North Carolina Museum of Natural Sciences, Raleigh, NC
| | - James DeGregori
- Department of Biochemistry and Molecular Genetics, University of Colorado Anschutz Medical Campus, Aurora, CO
| | - Jeffrey P Townsend
- Department of Biostatistics, Yale School of Public Health, New Haven, CT
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT
| |
Collapse
|
24
|
Ismail WM, Nzabarushimana E, Tang H. Algorithmic approaches to clonal reconstruction in heterogeneous cell populations. QUANTITATIVE BIOLOGY 2019; 7:255-265. [PMID: 32431959 PMCID: PMC7236794 DOI: 10.1007/s40484-019-0188-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2019] [Revised: 08/09/2019] [Accepted: 08/25/2019] [Indexed: 12/15/2022]
Abstract
BACKGROUND The reconstruction of clonal haplotypes and their evolutionary history in evolving populations is a common problem in both microbial evolutionary biology and cancer biology. The clonal theory of evolution provides a theoretical framework for modeling the evolution of clones. RESULTS In this paper, we review the theoretical framework and assumptions over which the clonal reconstruction problem is formulated. We formally define the problem and then discuss the complexity and solution space of the problem. Various methods have been proposed to find the phylogeny that best explains the observed data. We categorize these methods based on the type of input data that they use (space-resolved or time-resolved), and also based on their computational formulation as either combinatorial or probabilistic. It is crucial to understand the different types of input data because each provides essential but distinct information for drastically reducing the solution space of the clonal reconstruction problem. Complementary information provided by single cell sequencing or from whole genome sequencing of randomly isolated clones can also improve the accuracy of clonal reconstruction. We briefly review the existing algorithms and their relationships. Finally we summarize the tools that are developed for either directly solving the clonal reconstruction problem or a related computational problem. CONCLUSIONS In this review, we discuss the various formulations of the problem of inferring the clonal evolutionary history from allele frequeny data, review existing algorithms and catergorize them according to their problem formulation and solution approaches. We note that most of the available clonal inference algorithms were developed for elucidating tumor evolution whereas clonal reconstruction for unicellular genomes are less addressed. We conclude the review by discussing more open problems such as the lack of benchmark datasets and comparison of performance between available tools.
Collapse
Affiliation(s)
- Wazim Mohammed Ismail
- School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN 47405-7000, USA
| | - Etienne Nzabarushimana
- School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN 47405-7000, USA
| | - Haixu Tang
- School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN 47405-7000, USA
| |
Collapse
|
25
|
Chroni A, Vu T, Miura S, Kumar S. Delineation of Tumor Migration Paths by Using a Bayesian Biogeographic Approach. Cancers (Basel) 2019; 11:E1880. [PMID: 31783570 PMCID: PMC6966534 DOI: 10.3390/cancers11121880] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2019] [Revised: 11/20/2019] [Accepted: 11/26/2019] [Indexed: 12/20/2022] Open
Abstract
Understanding tumor progression and metastatic potential are important in cancer biology. Metastasis is the migration and colonization of clones in secondary tissues. Here, we posit that clone migration events between tumors resemble the dispersal of individuals between distinct geographic regions. This similarity makes Bayesian biogeographic analysis suitable for inferring cancer cell migration paths. We evaluated the accuracy of a Bayesian biogeography method (BBM) in inferring metastatic patterns and compared it with the accuracy of a parsimony-based approach (metastatic and clonal history integrative analysis, MACHINA) that has been specifically developed to infer clone migration patterns among tumors. We used computer-simulated datasets in which simple to complex migration patterns were modeled. BBM and MACHINA were effective in reliably reconstructing simple migration patterns from primary tumors to metastases. However, both of them exhibited a limited ability to accurately infer complex migration paths that involve the migration of clones from one metastatic tumor to another and from metastasis to the primary tumor. Therefore, advanced computational methods are still needed for the biologically realistic tracing of migration paths and to assess the relative preponderance of different types of seeding and reseeding events during cancer progression in patients.
Collapse
Affiliation(s)
- Antonia Chroni
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA; (T.V.); (S.M.); (S.K.)
- Department of Biology, Temple University, Philadelphia, PA 19122, USA
| | - Tracy Vu
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA; (T.V.); (S.M.); (S.K.)
- Department of Biology, Temple University, Philadelphia, PA 19122, USA
| | - Sayaka Miura
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA; (T.V.); (S.M.); (S.K.)
- Department of Biology, Temple University, Philadelphia, PA 19122, USA
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA; (T.V.); (S.M.); (S.K.)
- Department of Biology, Temple University, Philadelphia, PA 19122, USA
- Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| |
Collapse
|