Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Total Articles

20
(from Reference Citation Analysis)

Article PDFs (10)

Cited by > 0 (16)

Searched Name

Amit G. Deshwar

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Statistics

Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Category

Show more Refine

Number	Citation Analysis
1	Author Correction: The evolutionary history of 2,658 cancers. Nature 2023;614:E42. [PMID: 36697833 PMCID: PMC9931577 DOI: 10.1038/s41586-022-05601-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Abstract Collapse Key Words cancer genomics computational biology and bioinformatics molecular evolution Collapse MESH Headings Collapse Grants Collapse
2	Author Correction: Genomic basis for RNA alterations in cancer. Nature 2023;614:E37. [PMID: 36697831 PMCID: PMC9931574 DOI: 10.1038/s41586-022-05596-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Abstract Collapse Key Words cancer genomics data integration Collapse MESH Headings Collapse Grants Collapse
3	Author Correction: Reconstructing evolutionary trajectories of mutation signature activities in cancer using TrackSig. Nat Commun 2022;13:7567. [PMID: 36482170 PMCID: PMC9731941 DOI: 10.1038/s41467-022-32336-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open Abstract Collapse Key Words cancer genetics computational models Collapse MESH Headings Collapse Grants 27176 Cancer Research UK Collapse
4	Transcriptome-Wide Off-Target Effects of Steric-Blocking Oligonucleotides. Nucleic Acid Ther 2021;31:392-403. [PMID: 34388351 PMCID: PMC8713556 DOI: 10.1089/nat.2020.0921] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2020] [Accepted: 07/06/2021] [Indexed: 11/29/2022] Open Abstract Steric-blocking oligonucleotides (SBOs) are short, single-stranded nucleic acids designed to modulate gene expression by binding to RNA transcripts and blocking access from cellular machinery such as splicing factors. SBOs have the potential to bind to near-complementary sites in the transcriptome, causing off-target effects. In this study, we used RNA-seq to evaluate the off-target differential splicing events of 81 SBOs and differential expression events of 46 SBOs. Our results suggest that differential splicing events are predominantly hybridization driven, whereas differential expression events are more common and driven by other mechanisms (including spurious experimental variation). We further evaluated the performance of in silico screens for off-target splicing events, and found an edit distance cutoff of three to result in a sensitivity of 14% and false discovery rate (FDR) of 99%. A machine learning model incorporating splicing predictions substantially improved the ability to prioritize low edit distance hits, increasing sensitivity from 4% to 26% at a fixed FDR of 90%. Despite these large improvements in performance, this approach does not detect the majority of events at an FDR <99%. Our results suggest that in silico methods are currently of limited use for predicting the off-target effects of SBOs, and experimental screening by RNA-seq should be the preferred approach. Collapse Key Words off-target effects splice-switching steric-blocking oligonucleotides Collapse MESH Headings Alternative Splicing Oligonucleotides/genetics Oligonucleotides, Antisense RNA/genetics RNA/metabolism RNA Splicing/genetics Transcriptome Collapse Grants Collapse
5	Characterizing genetic intra-tumor heterogeneity across 2,658 human cancer genomes. Cell 2021;184:2239-2254.e39. [PMID: 33831375 PMCID: PMC8054914 DOI: 10.1016/j.cell.2021.03.009] [Citation(s) in RCA: 199] [Impact Index Per Article: 66.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Revised: 09/21/2020] [Accepted: 03/03/2021] [Indexed: 02/07/2023] Abstract Intra-tumor heterogeneity (ITH) is a mechanism of therapeutic resistance and therefore an important clinical challenge. However, the extent, origin, and drivers of ITH across cancer types are poorly understood. To address this, we extensively characterize ITH across whole-genome sequences of 2,658 cancer samples spanning 38 cancer types. Nearly all informative samples (95.1%) contain evidence of distinct subclonal expansions with frequent branching relationships between subclones. We observe positive selection of subclonal driver mutations across most cancer types and identify cancer type-specific subclonal patterns of driver gene mutations, fusions, structural variants, and copy number alterations as well as dynamic changes in mutational processes between subclonal expansions. Our results underline the importance of ITH and its drivers in tumor evolution and provide a pan-cancer resource of comprehensively annotated subclonal events from whole-genome sequencing data. Collapse Key Words branching evolution cancer driver genes cancer evolution intra-tumor heterogeneity pan-cancer genomics subclonal reconstruction tumor phylogeny whole-genome sequencing Collapse MESH Headings DNA Copy Number Variations DNA, Neoplasm/chemistry DNA, Neoplasm/metabolism Databases, Genetic Drug Resistance, Neoplasm/genetics Genetic Heterogeneity Humans Neoplasms/genetics Neoplasms/pathology Polymorphism, Single Nucleotide Whole Genome Sequencing Collapse Grants 211179/Z/18/Z Wellcome Trust P30 CA016672 NCI NIH HHS 27176 Cancer Research UK FC001202 Medical Research Council R01 CA183793 NCI NIH HHS U24 CA210957 NCI NIH HHS FC001202 Cancer Research UK WT097678 Wellcome Trust MR/V000292/1 Medical Research Council FC001169 Cancer Research UK FC001169 Medical Research Council 21777 Cancer Research UK U24 CA143799 NCI NIH HHS P30 CA008748 NCI NIH HHS FC001202 Arthritis Research UK R01 CA239342 NCI NIH HHS MR/L016311/1 Medical Research Council FC001169 Wellcome Trust P50 CA211015 NCI NIH HHS FC001169 Arthritis Research UK 24956 Cancer Research UK 21717 Cancer Research UK Wellcome Trust U24 CA210999 NCI NIH HHS R01 CA132897 NCI NIH HHS FC001202 Wellcome Trust Collapse
6	A practical guide to cancer subclonal reconstruction from DNA sequencing. Nat Methods 2021;18:144-155. [PMID: 33398189 PMCID: PMC7867630 DOI: 10.1038/s41592-020-01013-2] [Citation(s) in RCA: 64] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2019] [Accepted: 11/09/2020] [Indexed: 01/28/2023] Abstract Subclonal reconstruction from bulk tumor DNA sequencing has become a pillar of cancer evolution studies, providing insight into the clonality and relative ordering of mutations and mutational processes. We provide an outline of the complex computational approaches used for subclonal reconstruction from single and multiple tumor samples. We identify the underlying assumptions and uncertainties in each step and suggest best practices for analysis and quality assessment. This guide provides a pragmatic resource for the growing user community of subclonal reconstruction methods. Collapse Key Words Collapse MESH Headings Algorithms DNA, Neoplasm/genetics Humans Neoplasms/genetics Polymorphism, Single Nucleotide Sequence Analysis, DNA/methods Collapse Grants FC001202 Wellcome Trust U24 CA248265 NCI NIH HHS P30 CA016042 NCI NIH HHS R01 CA244729 NCI NIH HHS P50 CA211015 NCI NIH HHS FC001202 Medical Research Council Wellcome Trust P30 CA008748 NCI NIH HHS FC001202 Arthritis Research UK U01 CA214194 NCI NIH HHS FC001202 Cancer Research UK Collapse
7	ATP7B variant c.1934T > G p.Met645Arg causes Wilson disease by promoting exon 6 skipping. NPJ Genom Med 2020;5:16. [PMID: 32284880 PMCID: PMC7142117 DOI: 10.1038/s41525-020-0123-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2019] [Accepted: 03/06/2020] [Indexed: 12/30/2022] Open Abstract Wilson disease is a recessive genetic disorder caused by pathogenic loss-of-function variants in the ATP7B gene. It is characterized by disrupted copper homeostasis resulting in liver disease and/or neurological abnormalities. The variant NM_000053.3:c.1934T > G (Met645Arg) has been reported as compound heterozygous, and is highly prevalent among Wilson disease patients of Spanish descent. Accordingly, it is classified as pathogenic by leading molecular diagnostic centers. However, functional studies suggest that the amino acid change does not alter protein function, leading one ClinVar submitter to question its pathogenicity. Here, we used a minigene system and gene-edited HepG2 cells to demonstrate that c.1934T > G causes ~70% skipping of exon 6. Exon 6 skipping results in frameshift and stop-gain, leading to loss of ATP7B function. The elucidation of the mechanistic effect for this variant resolves any doubt about its pathogenicity and enables the development of genetic medicines for restoring correct splicing. Collapse Key Words Medical genetics Molecular medicine Collapse MESH Headings Collapse Grants Collapse
8	Reconstructing evolutionary trajectories of mutation signature activities in cancer using TrackSig. Nat Commun 2020;11:731. [PMID: 32024834 PMCID: PMC7002414 DOI: 10.1038/s41467-020-14352-7] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2018] [Accepted: 12/23/2019] [Indexed: 12/14/2022] Open Abstract The type and genomic context of cancer mutations depend on their causes. These causes have been characterized using signatures that represent mutation types that co-occur in the same tumours. However, it remains unclear how mutation processes change during cancer evolution due to the lack of reliable methods to reconstruct evolutionary trajectories of mutational signature activity. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole-genome sequencing data from 2658 cancers across 38 tumour types, we present TrackSig, a new method that reconstructs these trajectories using optimal, joint segmentation and deconvolution of mutation type and allele frequencies from a single tumour sample. In simulations, we find TrackSig has a 3-5% activity reconstruction error, and 12% false detection rate. It outperforms an aggressive baseline in situations with branching evolution, CNA gain, and neutral mutations. Applied to data from 2658 tumours and 38 cancer types, TrackSig permits pan-cancer insight into evolutionary changes in mutational processes. Collapse Key Words cancer genetics computational models Collapse MESH Headings Computational Biology/methods Computer Simulation Evolution, Molecular Gene Frequency Genome, Human Humans Mutation Neoplasms/genetics Neoplasms/pathology Polymorphism, Single Nucleotide Whole Genome Sequencing Collapse Grants 23924 Cancer Research UK P30 CA008748 NCI NIH HHS P30 CA016672 NCI NIH HHS R01 CA183793 NCI NIH HHS Collapse
9	Pan-cancer analysis of whole genomes. Nature 2020;578:82-93. [PMID: 32025007 PMCID: PMC7025898 DOI: 10.1038/s41586-020-1969-6] [Citation(s) in RCA: 1435] [Impact Index Per Article: 358.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2018] [Accepted: 12/11/2019] [Indexed: 02/07/2023] Abstract Cancer is driven by genetic change, and the advent of massively parallel sequencing has enabled systematic documentation of this variation at the whole-genome scale1-3. Here we report the integrative analysis of 2,658 whole-cancer genomes and their matching normal tissues across 38 tumour types from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). We describe the generation of the PCAWG resource, facilitated by international data sharing using compute clouds. On average, cancer genomes contained 4-5 driver mutations when combining coding and non-coding genomic elements; however, in around 5% of cases no drivers were identified, suggesting that cancer driver discovery is not yet complete. Chromothripsis, in which many clustered structural variants arise in a single catastrophic event, is frequently an early event in tumour evolution; in acral melanoma, for example, these events precede most somatic point mutations and affect several cancer-associated genes simultaneously. Cancers with abnormal telomere maintenance often originate from tissues with low replicative activity and show several mechanisms of preventing telomere attrition to critical levels. Common and rare germline variants affect patterns of somatic mutation, including point mutations, structural variants and somatic retrotransposition. A collection of papers from the PCAWG Consortium describes non-coding mutations that drive cancer beyond those in the TERT promoter4; identifies new signatures of mutational processes that cause base substitutions, small insertions and deletions and structural variation5,6; analyses timings and patterns of tumour evolution7; describes the diverse transcriptional consequences of somatic mutation on splicing, expression levels, fusion genes and promoter activity8,9; and evaluates a range of more-specialized features of cancer genomes8,10-18. Collapse Key Words cancer genomics Collapse MESH Headings Cell Proliferation/genetics Cellular Senescence/genetics Chromothripsis Cloud Computing DNA Mutational Analysis Evolution, Molecular Female Genome, Human/genetics Genomics Germ-Line Mutation/genetics High-Throughput Nucleotide Sequencing Humans Information Dissemination Male Mutagenesis/genetics Mutation Neoplasms/classification Neoplasms/genetics Neoplasms/pathology Oncogenes/genetics Promoter Regions, Genetic/genetics RNA Splicing/genetics Reproducibility of Results Telomerase/genetics Telomere/genetics Collapse Grants 15874 Cancer Research UK 27815 Cancer Research UK U24 CA210950 NCI NIH HHS 18387 Cancer Research UK MR/L016311/1 Medical Research Council 22720 Cancer Research UK T32 GM008313 NIGMS NIH HHS R35 GM127029 NIGMS NIH HHS 23433 Cancer Research UK U24 CA211000 NCI NIH HHS P30 ES010126 NIEHS NIH HHS P30 CA016672 NCI NIH HHS 20952 Cancer Research UK 23916 Cancer Research UK MC_UU_00016/11 Medical Research Council MC_UU_00007/16 Medical Research Council R01 CA235162 NCI NIH HHS G0300648 Medical Research Council 14545 Cancer Research UK 16942 Cancer Research UK 27176 Cancer Research UK MC_U137961146 Medical Research Council 22932 Cancer Research UK R01 GM109031 NIGMS NIH HHS R01 HG007069 NHGRI NIH HHS T32 HG002295 NHGRI NIH HHS U01 CA217842 NCI NIH HHS RIF2015_A06_JAMIESON Pancreatic Cancer UK U24 CA180951 NCI NIH HHS R01 CA218668 NCI NIH HHS U24 CA210974 NCI NIH HHS 22131 Cancer Research UK G1000729 Medical Research Council 23924 Cancer Research UK P01 CA240239 NCI NIH HHS R01 CA183793 NCI NIH HHS P30 CA014236 NCI NIH HHS MR/L008963/1 Medical Research Council MC_UU_12022/2 Medical Research Council U24 CA210949 NCI NIH HHS U24 CA210969 NCI NIH HHS U24 CA210999 NCI NIH HHS 23917 Cancer Research UK 25813 Cancer Research UK MC_UU_12009/11 Medical Research Council 26718 Cancer Research UK UG1 CA233339 NCI NIH HHS 088177 Wellcome Trust U24 CA210990 NCI NIH HHS Collapse
10	A community effort to create standards for evaluating tumor subclonal reconstruction. Nat Biotechnol 2020;38:97-107. [PMID: 31919445 PMCID: PMC6956735 DOI: 10.1038/s41587-019-0364-z] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2018] [Accepted: 11/18/2019] [Indexed: 02/03/2023] Abstract Tumor DNA sequencing data can be interpreted by computational methods that analyze genomic heterogeneity to infer evolutionary dynamics. A growing number of studies have used these approaches to link cancer evolution with clinical progression and response to therapy. Although the inference of tumor phylogenies is rapidly becoming standard practice in cancer genome analyses, standards for evaluating them are lacking. To address this need, we systematically assess methods for reconstructing tumor subclonality. First, we elucidate the main algorithmic problems in subclonal reconstruction and develop quantitative metrics for evaluating them. Then we simulate realistic tumor genomes that harbor all known clonal and subclonal mutation types and processes. Finally, we benchmark 580 tumor reconstructions, varying tumor read depth, tumor type and somatic variant detection. Our analysis provides a baseline for the establishment of gold-standard methods to analyze tumor heterogeneity. Collapse Key Words Collapse MESH Headings Algorithms Clone Cells Computer Simulation DNA Copy Number Variations/genetics Gene Dosage Genome Humans Mutation/genetics Neoplasms/genetics Neoplasms/pathology Polymorphism, Single Nucleotide/genetics Reference Standards Collapse Grants R01 AI134384 NIAID NIH HHS R01 GM109031 NIGMS NIH HHS U41 HG006620 NHGRI NIH HHS P30 CA016042 NCI NIH HHS R01 CA180778 NCI NIH HHS Wellcome Trust U24 CA210990 NCI NIH HHS P30 CA008748 NCI NIH HHS MR/L016311/1 Medical Research Council FC001202 Arthritis Research UK R35 GM133346 NIGMS NIH HHS U24 CA143858 NCI NIH HHS FC001202 Medical Research Council R01 CA183793 NCI NIH HHS Collapse
11	COSSMO: predicting competitive alternative splice site selection using deep learning. Bioinformatics 2019;34:i429-i437. [PMID: 29949959 PMCID: PMC6022534 DOI: 10.1093/bioinformatics/bty244] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open Abstract Motivation Alternative splice site selection is inherently competitive and the probability of a given splice site to be used also depends on the strength of neighboring sites. Here, we present a new model named the competitive splice site model (COSSMO), which explicitly accounts for these competitive effects and predicts the percent selected index (PSI) distribution over any number of putative splice sites. We model an alternative splicing event as the choice of a 3′ acceptor site conditional on a fixed upstream 5′ donor site or the choice of a 5′ donor site conditional on a fixed 3′ acceptor site. We build four different architectures that use convolutional layers, communication layers, long short-term memory and residual networks, respectively, to learn relevant motifs from sequence alone. We also construct a new dataset from genome annotations and RNA-Seq read data that we use to train our model. Results COSSMO is able to predict the most frequently used splice site with an accuracy of 70% on unseen test data, and achieve an R² of 0.6 in modeling the PSI distribution. We visualize the motifs that COSSMO learns from sequence and show that COSSMO recognizes the consensus splice site sequences and many known splicing factors with high specificity. Availability and implementation Model predictions, our training dataset, and code are available from http://cossmo.genes.toronto.edu. Supplementary information Supplementary data are available at Bioinformatics online. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
12	Abstract B2-59: PhyloSpan: Using multi-mutation reads to resolve subclonal architectures from heterogeneous tumor samples. Cancer Res 2015. [DOI: 10.1158/1538-7445.compsysbio-b2-59] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Abstract Abstract We have developed a new method that uses high-throughput reads that span multiple somatic point mutations to reconstruct multiple, genetically diverse subclonal populations from one or more heterogeneous tumor samples. Tumors often contain multiple, genetically diverse subclonal populations, as predicted by the clonal theory of cancer. These subclonal populations develop through successive waves of expansion and selection and have differing abilities to metastasize and resist treatment. Identifying these sub-populations and their evolutionary relationships can help identify driver mutations associated with cancer development and progression. Subclonal reconstruction algorithms attempt to infer the prevalence and genotype of multiple, genetically-related subclonal populations using the variant allele frequency (VAF) of somatic variants. To date, these algorithms exclusively use data on individual somatic mutations. This restriction greatly reduces their ability to fully resolve phylogenic ambiguities. In some cases, it is possible to determine the mutation status of >1 mutation in a single cell, for example, when single reads cover multiple single nucleotide variants (SNVs). This type of information can add considerable power to the phylogenetic reconstruction of the tumor subclonal population. We have developed the PhyloSpan algorithm which attempts to infer the states of multiple SNVs in single cells, and then exploits that information in subclonal reconstruction. Our algorithm starts with phasing somatic SNVs by looking for reads / read-pairs that cover both a somatic mutation and germline heterozygous single nucleotide polymorphism (SNP). These germline SNPs are often available through profiling of normal tissue. PhyloSpan then identifies SNVs that are on the same chromosome and close enough to be covered by a single read or paired reads. These pairs of mutations provide more phylogenetic certainty than can be found by looking at mutations independently. For example, if those SNVs are found in the same evolutionary branch, then we expect to see some reads containing both mutations. If however, the SNVs are an separate branches then no reads should show both SNVs. PhyloSpan integrates this phylogenetic information, along with information about the VAF of each somatic SNV in order to perform subclonal reconstruction. Incorporating these various types of information, especially given the substantial uncertainty in phasing and NGS read content, requires a rigorous statistical approach and so we have developed a Bayesian non-parametric tree-based clustering algorithm, based on our existing PhyloWGS method. This algorithm not only infers the number of subclonal populations and their genotype but also provides a measure of uncertainty about this inference, enabling users to determine which parts of the subclonal reconstruction are certain and which parts remain ambiguous. While the number of SNVs a short-read length distance away from another SNV is small, a handful of such pairs are all that is needed to eliminate a substantial amount of ambiguity in subclonal reconstruction. Furthermore, long (>10k) read technologies, such as PacBio, can be used to supplement short read sequence. Our approach generalizes to permit the integration of single-cell sequencing with bulk tumor sequencing. Furthermore, we can also use our framework to identify a small number of SNVs for which low throughput assays would be most useful to resolve subclonal reconstruction ambiguity. We will present results applying our algorithm to whole genome sequencing data showing the added value of considering multiple SNVs compared to independent SNVs. Citation Format: Amit G. Deshwar, Levi Boyles, Jeff Wintersinger, Paul C. Boutros, Yee Whye Teh, Quaid Morris, Quaid Morris. PhyloSpan: Using multi-mutation reads to resolve subclonal architectures from heterogeneous tumor samples. [abstract]. In: Proceedings of the AACR Special Conference on Computational and Systems Biology of Cancer; Feb 8-11 2015; San Francisco, CA. Philadelphia (PA): AACR; Cancer Res 2015;75(22 Suppl 2):Abstract nr B2-59. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
13	Abstract 4865: PhyloSpan: using multi-mutation reads to resolve subclonal architectures from heterogeneous tumor samples. Cancer Res 2015. [DOI: 10.1158/1538-7445.am2015-4865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Abstract Abstract We have developed a new method that uses high-throughput reads that span multiple somatic point mutations to reconstruct multiple, genetically diverse subclonal populations from one or more heterogeneous tumor samples. Subclonal reconstruction algorithms attempt to infer the prevalence and genotype of multiple, genetically-related subclonal populations using the variant allele frequency (VAF) of somatic variants. To date, these algorithms exclusively use data on individual somatic mutations. This restriction greatly reduces their ability to fully resolve phylogenic ambiguities. In some cases, it is possible to determine the mutation status of >1 mutation in a single cell, for example, when single reads cover multiple single nucleotide variants (SNVs). This type of information can add considerable power to the phylogenetic reconstruction of the tumor subclonal population. We have developed the PhyloSpan algorithm which attempts to infer the states of multiple SNVs in single cells, and then exploits that information in subclonal reconstruction. Our algorithm starts with phasing somatic SNVs by looking for reads / read-pairs that cover both a somatic mutation and germline heterozygous single nucleotide polymorphism (SNP). These germline SNPs are often available through profiling of normal tissue. PhyloSpan then identifies SNVs that are on the same chromosome and close enough to be covered by a single read or paired reads. These pairs of mutations provide more phylogenetic certainty than can be found by looking at mutations independently. For example, if those SNVs are found in the same evolutionary branch, then we expect to see some reads containing both mutations. If however, the SNVs are an separate branches then no reads should show both SNVs. PhyloSpan integrates this phylogenetic information, along with information about the VAF of each somatic SNV in order to perform subclonal reconstruction. Incorporating these various types of information requires a rigorous statistical approach, and so we have developed a Bayesian non-parametric tree-based clustering algorithm. This algorithm not only infers the number of subclonal populations and their genotype but also provides a measure of uncertainty about this inference, enabling users to determine which parts of the subclonal reconstruction are certain and which parts remain ambiguous. While the number of SNVs a short-read length distance away from another SNV is small, a handful of such pairs are all that is needed to eliminate a substantial amount of ambiguity in subclonal reconstruction. Furthermore, long read technologies, such as PacBio, can be used to supplement short reads. Our approach generalizes to permit the integration of single-cell sequencing with bulk tumor sequencing. We will present results applying our algorithm to whole genome sequencing data showing the added value of considering multiple SNVs compared to independent SNVs. Citation Format: Amit G. Deshwar, Levi Boyles, Jeff Wintersinger, Paul C. Boutros, Yee Whye Teh, Quaid Morris. PhyloSpan: using multi-mutation reads to resolve subclonal architectures from heterogeneous tumor samples. [abstract]. In: Proceedings of the 106th Annual Meeting of the American Association for Cancer Research; 2015 Apr 18-22; Philadelphia, PA. Philadelphia (PA): AACR; Cancer Res 2015;75(15 Suppl):Abstract nr 4865. doi:10.1158/1538-7445.AM2015-4865 Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
14	ISOpureR: an R implementation of a computational purification algorithm of mixed tumour profiles. BMC Bioinformatics 2015;16:156. [PMID: 25972088 PMCID: PMC4429941 DOI: 10.1186/s12859-015-0597-x] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2014] [Accepted: 04/27/2015] [Indexed: 01/23/2023] Open Abstract Background Tumour samples containing distinct sub-populations of cancer and normal cells present challenges in the development of reproducible biomarkers, as these biomarkers are based on bulk signals from mixed tumour profiles. ISOpure is the only mRNA computational purification method to date that does not require a paired tumour-normal sample, provides a personalized cancer profile for each patient, and has been tested on clinical data. Replacing mixed tumour profiles with ISOpure-preprocessed cancer profiles led to better prognostic gene signatures for lung and prostate cancer. Results To simplify the integration of ISOpure into standard R-based bioinformatics analysis pipelines, the algorithm has been implemented as an R package. The ISOpureR package performs analogously to the original code in estimating the fraction of cancer cells and the patient cancer mRNA abundance profile from tumour samples in four cancer datasets. Conclusions The ISOpureR package estimates the fraction of cancer cells and personalized patient cancer mRNA abundance profile from a mixed tumour profile. This open-source R implementation enables integration into existing computational pipelines, as well as easy testing, modification and extension of the model. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0597-x) contains supplementary material, which is available to authorized users. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
15	PhyloWGS: reconstructing subclonal composition and evolution from whole-genome sequencing of tumors. Genome Biol 2015;16:35. [PMID: 25786235 PMCID: PMC4359439 DOI: 10.1186/s13059-015-0602-8] [Citation(s) in RCA: 243] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2014] [Accepted: 01/29/2015] [Indexed: 01/08/2023] Open Abstract Tumors often contain multiple subpopulations of cancerous cells defined by distinct somatic mutations. We describe a new method, PhyloWGS, which can be applied to whole-genome sequencing data from one or more tumor samples to reconstruct complete genotypes of these subpopulations based on variant allele frequencies (VAFs) of point mutations and population frequencies of structural variations. We introduce a principled phylogenic correction for VAFs in loci affected by copy number alterations and we show that this correction greatly improves subclonal reconstruction compared to existing methods. PhyloWGS is free, open-source software, available at https://github.com/morrislab/phylowgs. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
16	Comparing nonparametric Bayesian tree priors for clonal reconstruction of tumors. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2015:20-31. [PMID: 25592565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/04/2023] Abstract Statistical machine learning methods, especially nonparametric Bayesian methods, have become increasingly popular to infer clonal population structure of tumors. Here we describe the treeCRP, an extension of the Chinese restaurant process (CRP), a popular construction used in nonparametric mixture models, to infer the phylogeny and genotype of major subclonal lineages represented in the population of cancer cells. We also propose new split-merge updates tailored to the subclonal reconstruction problem that improve the mixing time of Markov chains. In comparisons with the tree-structured stick breaking prior used in PhyloSub, we demonstrate superior mixing and running time using the treeCRP with our new split-merge procedures. We also show that given the same number of samples, TSSB and treeCRP have similar ability to recover the subclonal structure of a tumor… Collapse Key Words Collapse MESH Headings Algorithms Bayes Theorem Computational Biology Computer Simulation Gene Frequency Genotype Humans Leukemia, Lymphocytic, Chronic, B-Cell/genetics Leukemia, Lymphocytic, Chronic, B-Cell/pathology Likelihood Functions Machine Learning Models, Biological Models, Statistical Mutation Neoplasms/genetics Neoplasms/pathology Neoplastic Stem Cells/pathology Phylogeny Statistics, Nonparametric Collapse Grants Collapse
17	Relapsing-remitting multiple sclerosis classification using elastic net logistic regression on gene expression data. ACTA ACUST UNITED AC 2014. [DOI: 10.4161/sysb.26131] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
18	Inferring clonal evolution of tumors from single nucleotide somatic mutations. BMC Bioinformatics 2014;15:35. [PMID: 24484323 PMCID: PMC3922638 DOI: 10.1186/1471-2105-15-35] [Citation(s) in RCA: 182] [Impact Index Per Article: 18.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2013] [Accepted: 01/24/2014] [Indexed: 01/13/2023] Open Abstract Background High-throughput sequencing allows the detection and quantification of frequencies of somatic single nucleotide variants (SNV) in heterogeneous tumor cell populations. In some cases, the evolutionary history and population frequency of the subclonal lineages of tumor cells present in the sample can be reconstructed from these SNV frequency measurements. But automated methods to do this reconstruction are not available and the conditions under which reconstruction is possible have not been described. Results We describe the conditions under which the evolutionary history can be uniquely reconstructed from SNV frequencies from single or multiple samples from the tumor population and we introduce a new statistical model, PhyloSub, that infers the phylogeny and genotype of the major subclonal lineages represented in the population of cancer cells. It uses a Bayesian nonparametric prior over trees that groups SNVs into major subclonal lineages and automatically estimates the number of lineages and their ancestry. We sample from the joint posterior distribution over trees to identify evolutionary histories and cell population frequencies that have the highest probability of generating the observed SNV frequency data. When multiple phylogenies are consistent with a given set of SNV frequencies, PhyloSub represents the uncertainty in the tumor phylogeny using a “partial order plot”. Experiments on a simulated dataset and two real datasets comprising tumor samples from acute myeloid leukemia and chronic lymphocytic leukemia patients demonstrate that PhyloSub can infer both linear (or chain) and branching lineages and its inferences are in good agreement with ground truth, where it is available. Conclusions PhyloSub can be applied to frequencies of any “binary” somatic mutation, including SNVs as well as small insertions and deletions. The PhyloSub and partial order plot software is available from https://github.com/morrislab/phylosub/. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
19	PLIDA: cross-platform gene expression normalization using perturbed topic models. ACTA ACUST UNITED AC 2013;30:956-61. [PMID: 24123674 DOI: 10.1093/bioinformatics/btt574] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Abstract MOTIVATION Gene expression data are currently collected on a wide range of platforms. Differences between platforms make it challenging to combine and compare data collected on different platforms. We propose a new method of cross-platform normalization that uses topic models to summarize the expression patterns in each dataset before normalizing the topics learned from each dataset using per-gene multiplicative weights. RESULTS This method allows for cross-platform normalization even when samples profiled on different platforms have systematic differences, allows the simultaneous normalization of data from an arbitrary number of platforms and, after suitable training, allows for online normalization of expression data collected individually or in small batches. In addition, our method outperforms existing state-of-the-art platform normalization tools. AVAILABILITY AND IMPLEMENTATION MATLAB code is available at http://morrislab.med.utoronto.ca/plida/. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
20	Computational purification of individual tumor gene expression profiles leads to significant improvements in prognostic prediction. Genome Med 2013;5:29. [PMID: 23537167 PMCID: PMC3706990 DOI: 10.1186/gm433] [Citation(s) in RCA: 84] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2012] [Accepted: 03/28/2013] [Indexed: 11/10/2022] Open Abstract Tumor heterogeneity is a limiting factor in cancer treatment and in the discovery of biomarkers to personalize it. We describe a computational purification tool, ISOpure, to directly address the effects of variable normal tissue contamination in clinical tumor specimens. ISOpure uses a set of tumor expression profiles and a panel of healthy tissue expression profiles to generate a purified cancer profile for each tumor sample and an estimate of the proportion of RNA originating from cancerous cells. Applying ISOpure before identifying gene signatures leads to significant improvements in the prediction of prognosis and other clinical variables in lung and prostate cancer. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse