1
|
Atzeni R, Massidda M, Pieroni E, Rallo V, Pisu M, Angius A. A Novel Affordable and Reliable Framework for Accurate Detection and Comprehensive Analysis of Somatic Mutations in Cancer. Int J Mol Sci 2024; 25:8044. [PMID: 39125613 PMCID: PMC11311285 DOI: 10.3390/ijms25158044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Revised: 07/11/2024] [Accepted: 07/22/2024] [Indexed: 08/12/2024] Open
Abstract
Accurate detection and analysis of somatic variants in cancer involve multiple third-party tools with complex dependencies and configurations, leading to laborious, error-prone, and time-consuming data conversions. This approach lacks accuracy, reproducibility, and portability, limiting clinical application. Musta was developed to address these issues as an end-to-end pipeline for detecting, classifying, and interpreting cancer mutations. Musta is based on a Python command-line tool designed to manage tumor-normal samples for precise somatic mutation analysis. The core is a Snakemake-based workflow that covers all key cancer genomics steps, including variant calling, mutational signature deconvolution, variant annotation, driver gene detection, pathway analysis, and tumor heterogeneity estimation. Musta is easy to install on any system via Docker, with a Makefile handling installation, configuration, and execution, allowing for full or partial pipeline runs. Musta has been validated at the CRS4-NGS Core facility and tested on large datasets from The Cancer Genome Atlas and the Beijing Institute of Genomics. Musta has proven robust and flexible for somatic variant analysis in cancer. It is user-friendly, requiring no specialized programming skills, and enables data processing with a single command line. Its reproducibility ensures consistent results across users following the same protocol.
Collapse
Affiliation(s)
- Rossano Atzeni
- Center for Advanced Studies, Research and Development in Sardinia (CRS4), 09050 Pula, Italy; (R.A.); (E.P.); (M.P.)
| | - Matteo Massidda
- Department of Medical, Surgical and Experimental Sciences, University of Sassari, 07100 Sassari, Italy;
| | - Enrico Pieroni
- Center for Advanced Studies, Research and Development in Sardinia (CRS4), 09050 Pula, Italy; (R.A.); (E.P.); (M.P.)
| | - Vincenzo Rallo
- Istituto di Ricerca Genetica e Biomedica (IRGB), Consiglio Nazionale delle Ricerche (CNR), Cittadella Universitaria di Cagliari, 09042 Monserrato, Italy;
| | - Massimo Pisu
- Center for Advanced Studies, Research and Development in Sardinia (CRS4), 09050 Pula, Italy; (R.A.); (E.P.); (M.P.)
| | - Andrea Angius
- Istituto di Ricerca Genetica e Biomedica (IRGB), Consiglio Nazionale delle Ricerche (CNR), Cittadella Universitaria di Cagliari, 09042 Monserrato, Italy;
| |
Collapse
|
2
|
Krull JE, Wenzl K, Hopper MA, Manske MK, Sarangi V, Maurer MJ, Larson MC, Mondello P, Yang Z, Novak JP, Serres M, Whitaker KR, Villasboas Bisneto JC, Habermann TM, Witzig TE, Link BK, Rimsza LM, King RL, Ansell SM, Cerhan JR, Novak AJ. Follicular lymphoma B cells exhibit heterogeneous transcriptional states with associated somatic alterations and tumor microenvironments. Cell Rep Med 2024; 5:101443. [PMID: 38428430 PMCID: PMC10983045 DOI: 10.1016/j.xcrm.2024.101443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 08/14/2023] [Accepted: 02/05/2024] [Indexed: 03/03/2024]
Abstract
Follicular lymphoma (FL) is an indolent non-Hodgkin lymphoma of germinal center origin, which presents with significant biologic and clinical heterogeneity. Using RNA-seq on B cells sorted from 87 FL biopsies, combined with machine-learning approaches, we identify 3 transcriptional states that divide the biological ontology of FL B cells into inflamed, proliferative, and chromatin-modifying states, with relationship to prior GC B cell phenotypes. When integrated with whole-exome sequencing and immune profiling, we find that each state was associated with a combination of mutations in chromatin modifiers, copy-number alterations to TNFAIP3, and T follicular helper cells (Tfh) cell interactions, or primarily by a microenvironment rich in activated T cells. Altogether, these data define FL B cell transcriptional states across a large cohort of patients, contribute to our understanding of FL heterogeneity at the tumor cell level, and provide a foundation for guiding therapeutic intervention.
Collapse
Affiliation(s)
| | - Kerstin Wenzl
- Division of Hematology, Mayo Clinic, Rochester, MN, USA
| | | | | | | | - Matthew J Maurer
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA
| | - Melissa C Larson
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA
| | | | - ZhiZhang Yang
- Division of Hematology, Mayo Clinic, Rochester, MN, USA
| | | | | | | | | | | | | | - Brian K Link
- Division of Hematology, Oncology, and Blood & Marrow Transplantation, University of Iowa, Iowa City, IA, USA
| | - Lisa M Rimsza
- Department of Laboratory Medicine and Pathology, Mayo Clinic, Scottsdale, AZ, USA
| | - Rebecca L King
- Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, USA
| | | | - James R Cerhan
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA
| | - Anne J Novak
- Division of Hematology, Mayo Clinic, Rochester, MN, USA.
| |
Collapse
|
3
|
Shiraishi Y, Koya J, Chiba K, Okada A, Arai Y, Saito Y, Shibata T, Kataoka K. Precise characterization of somatic complex structural variations from tumor/control paired long-read sequencing data with nanomonsv. Nucleic Acids Res 2023; 51:e74. [PMID: 37336583 PMCID: PMC10415145 DOI: 10.1093/nar/gkad526] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 05/23/2023] [Accepted: 06/07/2023] [Indexed: 06/21/2023] Open
Abstract
We present our novel software, nanomonsv, for detecting somatic structural variations (SVs) using tumor and matched control long-read sequencing data with a single-base resolution. The current version of nanomonsv includes two detection modules, Canonical SV module, and Single breakend SV module. Using tumor/control paired long-read sequencing data from three cancer and their matched lymphoblastoid lines, we demonstrate that Canonical SV module can identify somatic SVs that can be captured by short-read technologies with higher precision and recall than existing methods. In addition, we have developed a workflow to classify mobile element insertions while elucidating their in-depth properties, such as 5' truncations, internal inversions, as well as source sites for 3' transductions. Furthermore, Single breakend SV module enables the detection of complex SVs that can only be identified by long-reads, such as SVs involving highly-repetitive centromeric sequences, and LINE1- and virus-mediated rearrangements. In summary, our approaches applied to cancer long-read sequencing data can reveal various features of somatic SVs and will lead to a better understanding of mutational processes and functional consequences of somatic SVs.
Collapse
Affiliation(s)
- Yuichi Shiraishi
- Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Junji Koya
- Division of Molecular Oncology, National Cancer Center Research Institute, Tokyo, Japan
| | - Kenichi Chiba
- Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Ai Okada
- Division of Genome Analysis Platform Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Yasuhito Arai
- Division of Cancer Genomics, National Cancer Center Research Institute, Tokyo, Japan
| | - Yuki Saito
- Division of Molecular Oncology, National Cancer Center Research Institute, Tokyo, Japan
- Department of Gastroenterology, Keio University School of Medicine, Tokyo, Japan
| | - Tatsuhiro Shibata
- Division of Cancer Genomics, National Cancer Center Research Institute, Tokyo, Japan
- Laboratory of Molecular Medicine, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| | - Keisuke Kataoka
- Division of Molecular Oncology, National Cancer Center Research Institute, Tokyo, Japan
- Department of Hematology, Keio University School of Medicine, Tokyo, Japan
| |
Collapse
|
4
|
Boßelmann CM, Leu C, Lal D. Technological and computational approaches to detect somatic mosaicism in epilepsy. Neurobiol Dis 2023:106208. [PMID: 37343892 DOI: 10.1016/j.nbd.2023.106208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2023] [Revised: 06/03/2023] [Accepted: 06/16/2023] [Indexed: 06/23/2023] Open
Abstract
Lesional epilepsy is a common and severe disease commonly associated with malformations of cortical development, including focal cortical dysplasia and hemimegalencephaly. Recent advances in sequencing and variant calling technologies have identified several genetic causes, including both short/single nucleotide and structural somatic variation. In this review, we aim to provide a comprehensive overview of the methodological advancements in this field while highlighting the unresolved technological and computational challenges that persist, including ultra-low variant allele fractions in bulk tissue, low availability of paired control samples, spatial variability of mutational burden within the lesion, and the issue of false-positive calls and validation procedures. Information from genetic testing in focal epilepsy may be integrated into clinical care to inform histopathological diagnosis, postoperative prognosis, and candidate precision therapies.
Collapse
Affiliation(s)
- Christian M Boßelmann
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA; Epilepsy Center, Neurological Institute, Cleveland Clinic, Cleveland, OH, USA
| | - Costin Leu
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA; Department of Clinical and Experimental Epilepsy, Institute of Neurology, University College London, London, UK.
| | - Dennis Lal
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA; Epilepsy Center, Neurological Institute, Cleveland Clinic, Cleveland, OH, USA; Stanley Center for Psychiatric Research, Broad Institute of Harvard and M.I.T., Cambridge, MA, USA; Cologne Center for Genomics (CCG), University of Cologne, Cologne, DE, USA
| |
Collapse
|
5
|
Li S, Hu R, Small C, Kang TY, Liu CC, Zhou XJ, Li W. cfSNV: a software tool for the sensitive detection of somatic mutations from cell-free DNA. Nat Protoc 2023; 18:1563-1583. [PMID: 36849599 PMCID: PMC10411976 DOI: 10.1038/s41596-023-00807-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Accepted: 11/24/2022] [Indexed: 03/01/2023]
Abstract
Cell-free DNA (cfDNA) in blood, viewed as a surrogate for tumor biopsy, has many clinical applications, including diagnosing cancer, guiding cancer treatment and monitoring treatment response. All these applications depend on an indispensable, yet underdeveloped task: detecting somatic mutations from cfDNA. The task is challenging because of the low tumor fraction in cfDNA. Recently, we developed the computational method cfSNV, the first method that comprehensively considers the properties of cfDNA for the sensitive detection of mutations from cfDNA. cfSNV vastly outperformed the conventional methods that were developed primarily for calling mutations from solid tumor tissues. cfSNV can accurately detect mutations in cfDNA even with medium-coverage (e.g., ≥200×) sequencing, which makes whole-exome sequencing (WES) of cfDNA a viable option for various clinical utilities. Here, we present a user-friendly cfSNV package that exhibits fast computation and convenient user options. We also built a Docker image of it, which is designed to enable researchers and clinicians with a limited computational background to easily carry out analyses on both high-performance computing platforms and local computers. Mutation calling from a standard preprocessed WES dataset (~250× and ~70 million base pair target size) can be carried out in 3 h on a server with eight virtual CPUs and 32 GB of random access memory.
Collapse
Affiliation(s)
- Shuo Li
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, CA, USA
| | - Ran Hu
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, CA, USA
- Bioinformatics Interdepartmental Graduate Program, University of California at Los Angeles, Los Angeles, CA, USA
- Institute for Quantitative & Computational Biosciences, University of California at Los Angeles, Los Angeles, CA, USA
| | - Colin Small
- Institute for Quantitative & Computational Biosciences, University of California at Los Angeles, Los Angeles, CA, USA
| | | | - Chun-Chi Liu
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, CA, USA
- EarlyDiagnostics Inc., Los Angeles, CA, USA
| | - Xianghong Jasmine Zhou
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, CA, USA.
- Institute for Quantitative & Computational Biosciences, University of California at Los Angeles, Los Angeles, CA, USA.
- EarlyDiagnostics Inc., Los Angeles, CA, USA.
| | - Wenyuan Li
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, CA, USA.
- EarlyDiagnostics Inc., Los Angeles, CA, USA.
| |
Collapse
|
6
|
Vaisband M, Schubert M, Gassner FJ, Geisberger R, Greil R, Zaborsky N, Hasenauer J. Validation of genetic variants from NGS data using deep convolutional neural networks. BMC Bioinformatics 2023; 24:158. [PMID: 37081386 PMCID: PMC10116675 DOI: 10.1186/s12859-023-05255-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 03/27/2023] [Indexed: 04/22/2023] Open
Abstract
Accurate somatic variant calling from next-generation sequencing data is one most important tasks in personalised cancer therapy. The sophistication of the available technologies is ever-increasing, yet, manual candidate refinement is still a necessary step in state-of-the-art processing pipelines. This limits reproducibility and introduces a bottleneck with respect to scalability. We demonstrate that the validation of genetic variants can be improved using a machine learning approach resting on a Convolutional Neural Network, trained using existing human annotation. In contrast to existing approaches, we introduce a way in which contextual data from sequencing tracks can be included into the automated assessment. A rigorous evaluation shows that the resulting model is robust and performs on par with trained researchers following published standard operating procedure.
Collapse
Affiliation(s)
- Marc Vaisband
- Department of Internal Medicine III with Haematology, Medical Oncology, Haemostaseology, Infectiology and Rheumatology, Oncologic Center; Salzburg Cancer Research Institute - Laboratory for Immunological and Molecular Cancer Research (SCRI-LIMCR); Cancer Cluster Salzburg, Paracelsus Medical University, Salzburg, Austria
- Life and Medical Sciences Institute, University of Bonn, Bonn, Germany
| | - Maria Schubert
- Department of Internal Medicine III with Haematology, Medical Oncology, Haemostaseology, Infectiology and Rheumatology, Oncologic Center; Salzburg Cancer Research Institute - Laboratory for Immunological and Molecular Cancer Research (SCRI-LIMCR); Cancer Cluster Salzburg, Paracelsus Medical University, Salzburg, Austria
| | - Franz Josef Gassner
- Department of Internal Medicine III with Haematology, Medical Oncology, Haemostaseology, Infectiology and Rheumatology, Oncologic Center; Salzburg Cancer Research Institute - Laboratory for Immunological and Molecular Cancer Research (SCRI-LIMCR); Cancer Cluster Salzburg, Paracelsus Medical University, Salzburg, Austria
| | - Roland Geisberger
- Department of Internal Medicine III with Haematology, Medical Oncology, Haemostaseology, Infectiology and Rheumatology, Oncologic Center; Salzburg Cancer Research Institute - Laboratory for Immunological and Molecular Cancer Research (SCRI-LIMCR); Cancer Cluster Salzburg, Paracelsus Medical University, Salzburg, Austria
| | - Richard Greil
- Department of Internal Medicine III with Haematology, Medical Oncology, Haemostaseology, Infectiology and Rheumatology, Oncologic Center; Salzburg Cancer Research Institute - Laboratory for Immunological and Molecular Cancer Research (SCRI-LIMCR); Cancer Cluster Salzburg, Paracelsus Medical University, Salzburg, Austria
| | - Nadja Zaborsky
- Department of Internal Medicine III with Haematology, Medical Oncology, Haemostaseology, Infectiology and Rheumatology, Oncologic Center; Salzburg Cancer Research Institute - Laboratory for Immunological and Molecular Cancer Research (SCRI-LIMCR); Cancer Cluster Salzburg, Paracelsus Medical University, Salzburg, Austria
| | - Jan Hasenauer
- Life and Medical Sciences Institute, University of Bonn, Bonn, Germany
| |
Collapse
|
7
|
Huang AY, Lee EA. Identification of Somatic Mutations From Bulk and Single-Cell Sequencing Data. FRONTIERS IN AGING 2022; 2:800380. [PMID: 35822012 PMCID: PMC9261417 DOI: 10.3389/fragi.2021.800380] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/23/2021] [Accepted: 12/08/2021] [Indexed: 12/26/2022]
Abstract
Somatic mutations are DNA variants that occur after the fertilization of zygotes and accumulate during the developmental and aging processes in the human lifespan. Somatic mutations have long been known to cause cancer, and more recently have been implicated in a variety of non-cancer diseases. The patterns of somatic mutations, or mutational signatures, also shed light on the underlying mechanisms of the mutational process. Advances in next-generation sequencing over the decades have enabled genome-wide profiling of DNA variants in a high-throughput manner; however, unlike germline mutations, somatic mutations are carried only by a subset of the cell population. Thus, sensitive bioinformatic methods are required to distinguish mutant alleles from sequencing and base calling errors in bulk tissue samples. An alternative way to study somatic mutations, especially those present in an extremely small number of cells or even in a single cell, is to sequence single-cell genomes after whole-genome amplification (WGA); however, it is critical and technically challenging to exclude numerous technical artifacts arising during error-prone and uneven genome amplification in current WGA methods. To address these challenges, multiple bioinformatic tools have been developed. In this review, we summarize the latest progress in methods for identification of somatic mutations and the challenges that remain to be addressed in the future.
Collapse
Affiliation(s)
- August Yue Huang
- Division of Genetics and Genomics, Manton Center for Orphan Diseases, Boston Children's Hospital, Boston, MA, United States, Department of Pediatrics, Harvard Medical School, Boston, MA, United States
| | - Eunjung Alice Lee
- Division of Genetics and Genomics, Manton Center for Orphan Diseases, Boston Children's Hospital, Boston, MA, United States, Department of Pediatrics, Harvard Medical School, Boston, MA, United States
| |
Collapse
|
8
|
Identification and Validation of Ikaros (IKZF1) as a Cancer Driver Gene for Marek’s Disease Virus-Induced Lymphomas. Microorganisms 2022; 10:microorganisms10020401. [PMID: 35208856 PMCID: PMC8877892 DOI: 10.3390/microorganisms10020401] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Revised: 01/24/2022] [Accepted: 01/25/2022] [Indexed: 12/29/2022] Open
Abstract
Marek’s disease virus (MDV) is the causative agent for Marek’s disease (MD), which is characterized by T-cell lymphomas in chickens. While the viral Meq oncogene is necessary for transformation, it is insufficient, as not every bird infected with virulent MDV goes on to develop a gross tumor. Thus, we postulated that the chicken genome contains cancer driver genes; i.e., ones with somatic mutations that promote tumors, as is the case for most human cancers. To test this hypothesis, MD tumors and matching control tissues were sequenced. Using a custom bioinformatics pipeline, 9 of the 22 tumors analyzed contained one or more somatic mutation in Ikaros (IKFZ1), a transcription factor that acts as the master regulator of lymphocyte development. The mutations found were in key Zn-finger DNA-binding domains that also commonly occur in human cancers such as B-cell acute lymphoblastic leukemia (B-ALL). To validate that IKFZ1 was a cancer driver gene, recombinant MDVs that expressed either wild-type or a mutated Ikaros allele were used to infect chickens. As predicted, birds infected with MDV expressing the mutant Ikaros allele had high tumor incidences (~90%), while there were only a few minute tumors (~12%) produced in birds infected with the virus expressing wild-type Ikaros. Thus, in addition to Meq, key somatic mutations in Ikaros or other potential cancer driver genes in the chicken genome are necessary for MDV to induce lymphomas.
Collapse
|
9
|
Ji S, Montierth MD, Wang W. MuSE: A Novel Approach to Mutation Calling with Sample-Specific Error Modeling. Methods Mol Biol 2022; 2493:21-27. [PMID: 35751806 DOI: 10.1007/978-1-0716-2293-3_2] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Accurate detection of somatic mutations in genetically heterogeneous tumor cell populations using next-generation sequencing remains challenging. We have developed MuSE, Mutation calling using a Markov Substitution model for Evolution, a novel approach for modeling the evolution of the allelic composition of tumor and normal tissue at each reference base. It adopts a sample-specific error model to depict inter-tumor heterogeneity, which greatly improves the overall accuracy. Here, we describe the method and provide a tutorial on the installation and application of MuSE.
Collapse
Affiliation(s)
- Shuangxi Ji
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Matthew D Montierth
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
- Quantitative Computational Biology, Baylor College of Medicine, Houston, TX, USA
| | - Wenyi Wang
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
| |
Collapse
|
10
|
Chang TC, Xu K, Cheng Z, Wu G. Somatic and Germline Variant Calling from Next-Generation Sequencing Data. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2022; 1361:37-54. [DOI: 10.1007/978-3-030-91836-1_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
11
|
A Retrospective Statistical Validation Approach for Panel of Normal-Based Single-Nucleotide Variant Detection in Tumor Sequencing. J Mol Diagn 2022; 24:41-47. [PMID: 34974877 DOI: 10.1016/j.jmoldx.2021.09.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2021] [Revised: 08/28/2021] [Accepted: 09/28/2021] [Indexed: 11/22/2022] Open
Abstract
An important step of somatic variant calling algorithms for deep sequencing data is quantifying the errors. For targeted sequencing in which hotspot mutations are of interest, site-specific error estimation allows more accurate calling. The site-specific error rates are often estimated from a panel of normal samples, which has limited size and is subject to sampling bias and variance. We propose a novel statistical validation method for single-nucleotide variation (SNV) calling based on historical data. The validation method extracts the high-quality reads from the Binary Alignment/Map (BAM) files, finds the negative samples in the data, and builds a statistical model to call individual samples. It is particularly useful in detecting low-frequency variants that may be missed by traditional panel of normal-based SNV methods. The proposed method makes it possible to launch a simple and parallel validation pipeline for SNV calling and improve the detection limit.
Collapse
|
12
|
Sensitive detection of tumor mutations from blood and its application to immunotherapy prognosis. Nat Commun 2021; 12:4172. [PMID: 34234141 PMCID: PMC8263778 DOI: 10.1038/s41467-021-24457-2] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Accepted: 06/18/2021] [Indexed: 02/06/2023] Open
Abstract
Cell-free DNA (cfDNA) is attractive for many applications, including detecting cancer, identifying the tissue of origin, and monitoring. A fundamental task underlying these applications is SNV calling from cfDNA, which is hindered by the very low tumor content. Thus sensitive and accurate detection of low-frequency mutations (<5%) remains challenging for existing SNV callers. Here we present cfSNV, a method incorporating multi-layer error suppression and hierarchical mutation calling, to address this challenge. Furthermore, by leveraging cfDNA's comprehensive coverage of tumor clonal landscape, cfSNV can profile mutations in subclones. In both simulated and real patient data, cfSNV outperforms existing tools in sensitivity while maintaining high precision. cfSNV enhances the clinical utilities of cfDNA by improving mutation detection performance in medium-depth sequencing data, therefore making Whole-Exome Sequencing a viable option. As an example, we demonstrate that the tumor mutation profile from cfDNA WES data can provide an effective biomarker to predict immunotherapy outcomes.
Collapse
|
13
|
Spatial Distribution of Private Gene Mutations in Clear Cell Renal Cell Carcinoma. Cancers (Basel) 2021; 13:cancers13092163. [PMID: 33946379 PMCID: PMC8124666 DOI: 10.3390/cancers13092163] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 04/02/2021] [Accepted: 04/27/2021] [Indexed: 12/15/2022] Open
Abstract
Simple Summary Tumours consist of multiple groups of similar cells resulting from differing evolutionary trajectories, i.e., subclones. These subclones are prevalent in clear cell renal cell carcinoma (ccRCC). The aim of this study is to determine how similar or dissimilar the subclones in 89 ccRCC tumours are from one another regarding their gene mutations and expression profiles, i.e., the extent of intra-tumour heterogeneity. The implications of these alterations with respect to signalling pathways is also assessed. Deep sequencing allows for the identification of mutations with low-allele frequencies, providing a more comprehensive view of the heterogeneity present in the tumours. With an average of 62% of mutations having been identified in only one of the two biopsies, some of which in turn are found to impact gene expression, the complex makeup of ccRCC tumours is evident, and this can drastically influence treatment outcome. Abstract Intra-tumour heterogeneity is the molecular hallmark of renal cancer, and the molecular tumour composition determines the treatment outcome of renal cancer patients. In renal cancer tumourigenesis, in general, different tumour clones evolve over time. We analysed intra-tumour heterogeneity and subclonal mutation patterns in 178 tumour samples obtained from 89 clear cell renal cell carcinoma patients. In an initial discovery phase, whole-exome and transcriptome sequencing data from paired tumour biopsies from 16 ccRCC patients were used to design a gene panel for follow-up analysis. In this second phase, 826 selected genes were targeted at deep coverage in an extended cohort of 89 patients for a detailed analysis of tumour heterogeneity. On average, we found 22 mutations per patient. Pairwise comparison of the two biopsies from the same tumour revealed that on average, 62% of the mutations in a patient were detected in one of the two samples. In addition to commonly mutated genes (VHL, PBRM1, SETD2 and BAP1), frequent subclonal mutations with low variant allele frequency (<10%) were observed in TP53 and in mucin coding genes MUC6, MUC16, and MUC3A. Of the 89 ccRCC tumours, 87 (~98%) harboured private mutations, occurring in only one of the paired tumour samples. Clonally exclusive pathway pairs were identified using the WES data set from 16 ccRCC patients. Our findings imply that shared and private mutations significantly contribute to the complexity of differential gene expression and pathway interaction and might explain the clonal evolution of different molecular renal cancer subgroups. Multi-regional sequencing is central for the identification of subclones within ccRCC.
Collapse
|
14
|
Ramirez-Valles EG, Rodríguez-Pulido A, Barraza-Salas M, Martínez-Velis I, Meneses-Morales I, Ayala-García VM, Alba-Fierro CA. A Quest for New Cancer Diagnosis, Prognosis and Prediction Biomarkers and Their Use in Biosensors Development. Technol Cancer Res Treat 2020; 19:1533033820957033. [PMID: 33107395 PMCID: PMC7607814 DOI: 10.1177/1533033820957033] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
Traditional techniques for cancer diagnosis, such as nuclear magnetic resonance, ultrasound and tissue analysis, require sophisticated devices and highly trained personnel, which are characterized by elevated operation costs. The use of biomarkers has emerged as an alternative for cancer diagnosis, prognosis and prediction because their measurement in tissues or fluids, such as blood, urine or saliva, is characterized by shorter processing times. However, the biomarkers used currently, and the techniques used for their measurement, including ELISA, western-blot, polymerase chain reaction (PCR) or immunohistochemistry, possess low sensitivity and specificity. Therefore, the search for new proteomic, genomic or immunological biomarkers and the development of new noninvasive, easier and cheaper techniques that meet the sensitivity and specificity criteria for the diagnosis, prognosis and prediction of this disease has become a relevant topic. The purpose of this review is to provide an overview about the search for new cancer biomarkers, including the strategies that must be followed to identify them, as well as presenting the latest advances in the development of biosensors that possess a high potential for cancer diagnosis, prognosis and prediction, mainly focusing on their relevance in lung, prostate and breast cancers.
Collapse
Affiliation(s)
- Eda G Ramirez-Valles
- Facultad de Ciencias Químicas, Universidad Juárez del Estado de Durango, Dgo, Mexico
| | | | - Marcelo Barraza-Salas
- Facultad de Ciencias Químicas, Universidad Juárez del Estado de Durango, Dgo, Mexico
| | - Isaac Martínez-Velis
- Facultad de Ciencias Químicas, Universidad Juárez del Estado de Durango, Dgo, Mexico
| | - Iván Meneses-Morales
- Facultad de Ciencias Químicas, Universidad Juárez del Estado de Durango, Dgo, Mexico
| | - Víctor M Ayala-García
- Facultad de Ciencias Químicas, Universidad Juárez del Estado de Durango, Dgo, Mexico
| | - Carlos A Alba-Fierro
- Facultad de Ciencias Químicas, Universidad Juárez del Estado de Durango, Dgo, Mexico
| |
Collapse
|
15
|
Wang M, Luo W, Jones K, Bian X, Williams R, Higson H, Wu D, Hicks B, Yeager M, Zhu B. SomaticCombiner: improving the performance of somatic variant calling based on evaluation tests and a consensus approach. Sci Rep 2020; 10:12898. [PMID: 32732891 PMCID: PMC7393490 DOI: 10.1038/s41598-020-69772-8] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2020] [Accepted: 07/16/2020] [Indexed: 02/06/2023] Open
Abstract
It is challenging to identify somatic variants from high-throughput sequence reads due to tumor heterogeneity, sub-clonality, and sequencing artifacts. In this study, we evaluated the performance of eight primary somatic variant callers and multiple ensemble methods using both real and synthetic whole-genome sequencing, whole-exome sequencing, and deep targeted sequencing datasets with the NA12878 cell line. The test results showed that a simple consensus approach can significantly improve performance even with a limited number of callers and is more robust and stable than machine learning based ensemble approaches. To fully exploit the multi-callers, we also developed a software package, SomaticCombiner, that can combine multiple callers and integrates a new variant allelic frequency (VAF) adaptive majority voting approach, which can maintain sensitive detection for variants with low VAFs.
Collapse
Affiliation(s)
- Mingyi Wang
- Cancer Genomics Research Laboratory, Division of Cancer Epidemiology and Genetics, Frederick National Laboratory for Cancer Research, Frederick, MD, 20877, USA.
| | - Wen Luo
- Cancer Genomics Research Laboratory, Division of Cancer Epidemiology and Genetics, Frederick National Laboratory for Cancer Research, Frederick, MD, 20877, USA
| | - Kristine Jones
- Cancer Genomics Research Laboratory, Division of Cancer Epidemiology and Genetics, Frederick National Laboratory for Cancer Research, Frederick, MD, 20877, USA
| | - Xiaopeng Bian
- Center for Biomedical Informatics and Information Technology, National Cancer Institute, Rockville, MD, 20850, USA
| | - Russell Williams
- Cancer Genomics Research Laboratory, Division of Cancer Epidemiology and Genetics, Frederick National Laboratory for Cancer Research, Frederick, MD, 20877, USA
| | - Herbert Higson
- Cancer Genomics Research Laboratory, Division of Cancer Epidemiology and Genetics, Frederick National Laboratory for Cancer Research, Frederick, MD, 20877, USA
| | - Dongjing Wu
- Cancer Genomics Research Laboratory, Division of Cancer Epidemiology and Genetics, Frederick National Laboratory for Cancer Research, Frederick, MD, 20877, USA
| | - Belynda Hicks
- Cancer Genomics Research Laboratory, Division of Cancer Epidemiology and Genetics, Frederick National Laboratory for Cancer Research, Frederick, MD, 20877, USA
| | - Meredith Yeager
- Cancer Genomics Research Laboratory, Division of Cancer Epidemiology and Genetics, Frederick National Laboratory for Cancer Research, Frederick, MD, 20877, USA
| | - Bin Zhu
- Cancer Genomics Research Laboratory, Division of Cancer Epidemiology and Genetics, Frederick National Laboratory for Cancer Research, Frederick, MD, 20877, USA.
| |
Collapse
|
16
|
Hunter SM, Dall GV, Doyle MA, Lupat R, Li J, Allan P, Rowley SM, Bowtell D, Campbell IG, Gorringe KL. Molecular comparison of pure ovarian fibroma with serous benign ovarian tumours. BMC Res Notes 2020; 13:349. [PMID: 32698852 PMCID: PMC7376903 DOI: 10.1186/s13104-020-05194-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2020] [Accepted: 07/17/2020] [Indexed: 01/10/2023] Open
Abstract
OBJECTIVE Ovarian fibromas and adenofibromas are rare ovarian tumours. They are benign tumours composed of spindle-like stromal cells (pure fibroma) or a mixture of fibroblast and epithelial components (adenofibroma). We have previously shown that 40% of benign serous ovarian tumours are likely primary fibromas due to the neoplastic alterations being restricted to the stromal compartment of these tumours. We further explore this finding by comparing benign serous tumours to pure fibromas. RESULTS Performing copy number aberration (CNA) analysis on the stromal component of 45 benign serous tumours and 8 pure fibromas, we have again shown that trisomy of chromosome 12 is the most common aberration in ovarian fibromas. CNAs were more frequent in the pure fibromas than the benign serous tumours (88% vs 33%), however pure fibromas more frequently harboured more than one CNA event compared with benign serous tumours. As these extra CNA events observed in the pure fibromas were unique to this subset our data indicates a unique tumour evolution. Gene expression analysis on the two cohorts was unable to show gene expression changes that differed based on tumour subtype. Exome analysis did not reveal any recurrently mutated genes.
Collapse
Affiliation(s)
- Sally M Hunter
- Cancer Genomics Program, Peter MacCallum Cancer Centre, East Melbourne, Australia
| | - Genevieve V Dall
- Cancer Genomics Program, Peter MacCallum Cancer Centre, East Melbourne, Australia
| | - Maria A Doyle
- Bioinformatics Core Facility Peter MacCallum Cancer Centre, East Melbourne, Victoria, Australia
| | - Richard Lupat
- Bioinformatics Core Facility Peter MacCallum Cancer Centre, East Melbourne, Victoria, Australia
| | - Jason Li
- Bioinformatics Core Facility Peter MacCallum Cancer Centre, East Melbourne, Victoria, Australia
| | - Prue Allan
- Anatomical Pathology, Peter MacCallum Cancer Centre, East Melbourne, Victoria, Australia
| | - Simone M Rowley
- Cancer Genomics Program, Peter MacCallum Cancer Centre, East Melbourne, Australia
| | - David Bowtell
- Cancer Genomics Program, Peter MacCallum Cancer Centre, East Melbourne, Australia.,The Department of Pathology, University of Melbourne, Parkville, Australia.,The Sir Peter MacCallum Department of Oncology, University of Melbourne, Parkville, Australia
| | | | - Ian G Campbell
- Cancer Genomics Program, Peter MacCallum Cancer Centre, East Melbourne, Australia.,The Department of Pathology, University of Melbourne, Parkville, Australia.,The Sir Peter MacCallum Department of Oncology, University of Melbourne, Parkville, Australia
| | - Kylie L Gorringe
- Cancer Genomics Program, Peter MacCallum Cancer Centre, East Melbourne, Australia. .,The Department of Pathology, University of Melbourne, Parkville, Australia. .,The Sir Peter MacCallum Department of Oncology, University of Melbourne, Parkville, Australia. .,Peter MacCallum Cancer Centre, Locked Bag 1, A'Beckett Street, Melbourne, VIC, 8006, Australia.
| |
Collapse
|
17
|
Moriyama T, Imoto S, Hayashi S, Shiraishi Y, Miyano S, Yamaguchi R. A Bayesian model integration for mutation calling through data partitioning. Bioinformatics 2020; 35:4247-4254. [PMID: 30924874 PMCID: PMC6821361 DOI: 10.1093/bioinformatics/btz233] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Revised: 09/06/2018] [Accepted: 03/28/2019] [Indexed: 11/25/2022] Open
Abstract
Motivation Detection of somatic mutations from tumor and matched normal sequencing data has become among the most important analysis methods in cancer research. Some existing mutation callers have focused on additional information, e.g. heterozygous single-nucleotide polymorphisms (SNPs) nearby mutation candidates or overlapping paired-end read information. However, existing methods cannot take multiple information sources into account simultaneously. Existing Bayesian hierarchical model-based methods construct two generative models, the tumor model and error model, and limited information sources have been modeled. Results We proposed a Bayesian model integration framework named as partitioning-based model integration. In this framework, through introducing partitions for paired-end reads based on given information sources, we integrate existing generative models and utilize multiple information sources. Based on that, we constructed a novel Bayesian hierarchical model-based method named as OHVarfinDer. In both the tumor model and error model, we introduced partitions for a set of paired-end reads that cover a mutation candidate position, and applied a different generative model for each category of paired-end reads. We demonstrated that our method can utilize both heterozygous SNP information and overlapping paired-end read information effectively in simulation datasets and real datasets. Availability and implementation https://github.com/takumorizo/OHVarfinDer. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Takuya Moriyama
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| | - Seiya Imoto
- Health Intelligence Center, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| | - Shuto Hayashi
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| | - Yuichi Shiraishi
- Center for Cancer Genomics and Advanced Therapeutics, National Cancer Center, Tokyo, Japan
| | - Satoru Miyano
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan.,Health Intelligence Center, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| | - Rui Yamaguchi
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
18
|
Cao C, Mak L, Jin G, Gordon P, Ye K, Long Q. PRESM: personalized reference editor for somatic mutation discovery in cancer genomics. Bioinformatics 2020; 35:1445-1452. [PMID: 30247633 DOI: 10.1093/bioinformatics/bty812] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2018] [Revised: 08/27/2018] [Accepted: 09/19/2018] [Indexed: 12/16/2022] Open
Abstract
MOTIVATION Accurate detection of somatic mutations is a crucial step toward understanding cancer. Various tools have been developed to detect somatic mutations from cancer genome sequencing data by mapping reads to a universal reference genome and inferring likelihoods from complex statistical models. However, read mapping is frequently obstructed by mismatches between germline and somatic mutations on a read and the reference genome. Previous attempts to develop personalized genome tools are not compatible with downstream statistical models for somatic mutation detection. RESULTS We present PRESM, a tool that builds personalized reference genomes by integrating germline mutations into the reference genome. The aforementioned obstacle is circumvented by using a two-step germline substitution procedure, maintaining positional fidelity using an innovative workaround. Reads derived from tumor tissue can be positioned more accurately along a personalized reference than a universal reference due to the reduced genetic distance between the subject (tumor genome) and the target (the personalized genome). Application of PRESM's personalized genome reduced false-positive (FP) somatic mutation calls by as much as 55.5%, and facilitated the discovery of a novel somatic point mutation on a germline insertion in PDE1A, a phosphodiesterase associated with melanoma. Moreover, all improvements in calling accuracy were achieved without parameter optimization, as PRESM itself is parameter-free. Hence, similar increases in read mapping and decreases in the FP rate will persist when PRESM-built genomes are applied to any user-provided dataset. AVAILABILITY AND IMPLEMENTATION The software is available at https://github.com/precisionomics/PRESM. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chen Cao
- Departments of Biochemistry & Molecular Biology and Medical Genetics, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, Canada
| | - Lauren Mak
- Departments of Biochemistry & Molecular Biology and Medical Genetics, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, Canada
| | - Guangxu Jin
- Department of Cancer Biology, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Paul Gordon
- Departments of Biochemistry & Molecular Biology and Medical Genetics, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, Canada
| | - Kai Ye
- Department of Bioinformatics, Electronic and Information Engineering School, Xi'an Jiaotong University, Xi'an, China
| | - Quan Long
- Departments of Biochemistry & Molecular Biology and Medical Genetics, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, Canada
| |
Collapse
|
19
|
Abstract
A standard strategy to discover somatic mutations in a cancer genome is to use next-generation sequencing (NGS) technologies to sequence the tumor tissue and its matched normal (commonly blood or adjacent normal tissue) for side-by-side comparison. However, when interrogating entire genomes (or even just the coding regions), the number of sequencing errors easily outnumbers the number of real somatic mutations by orders of magnitudes. Here, we describe SomaticSeq, which incorporates multiple somatic mutation detection algorithms and then uses machine learning to vastly improve the accuracy of the somatic mutation call sets.
Collapse
|
20
|
Mannakee BK, Gutenkunst RN. BATCAVE: calling somatic mutations with a tumor- and site-specific prior. NAR Genom Bioinform 2020; 2:lqaa004. [PMID: 32051931 PMCID: PMC7003682 DOI: 10.1093/nargab/lqaa004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2019] [Revised: 01/13/2020] [Accepted: 01/23/2020] [Indexed: 02/06/2023] Open
Abstract
Detecting somatic mutations withins tumors is key to understanding treatment resistance, patient prognosis and tumor evolution. Mutations at low allelic frequency, those present in only a small portion of tumor cells, are particularly difficult to detect. Many algorithms have been developed to detect such mutations, but none models a key aspect of tumor biology. Namely, every tumor has its own profile of mutation types that it tends to generate. We present BATCAVE (Bayesian Analysis Tools for Context-Aware Variant Evaluation), an algorithm that first learns the individual tumor mutational profile and mutation rate then uses them in a prior for evaluating potential mutations. We also present an R implementation of the algorithm, built on the popular caller MuTect. Using simulations, we show that adding the BATCAVE algorithm to MuTect improves variant detection. It also improves the calibration of posterior probabilities, enabling more principled tradeoff between precision and recall. We also show that BATCAVE performs well on real data. Our implementation is computationally inexpensive and straightforward to incorporate into existing MuTect pipelines. More broadly, the algorithm can be added to other variant callers, and it can be extended to include additional biological features that affect mutation generation.
Collapse
Affiliation(s)
- Brian K Mannakee
- Mel and Enid Zuckerman College of Public Health, University of Arizona, Tucson, AZ 85721, USA
| | - Ryan N Gutenkunst
- Department of Molecular and Cellular Biology, University of Arizona, Tucson, AZ 85721, USA
| |
Collapse
|
21
|
Geng Y, Zhao Z, Liu J. [Reconstruction of tumor clonal haplotypes based on an improved spanning algorithm]. NAN FANG YI KE DA XUE XUE BAO = JOURNAL OF SOUTHERN MEDICAL UNIVERSITY 2019; 39:1287-1292. [PMID: 31852653 DOI: 10.12122/j.issn.1673-4254.2019.11.04] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
OBJECTIVE To reconstruct tumor clonal haplotypes based on the third-generation sequencing data to effectively identify tumor heterogeneity. METHODS We developed an algorithm for extracting somatic mutational event from the mixed tumor data and determining the connection weight of each somatic cell mutation site through the probability function. A reconstruction algorithm of the haplotype was designed based on the maximum spanning tree, and following the principle of inheritance between tumor clones, the connection pattern was determined at each mutation site in the clonal maximum spanning tree in a stepwise manner. The number, ratio and evolution of the sub-clones were estimated using the depth stripping method. RESULTS In the simulation experiments, we analyzed the accuracy of the algorithm based on 4 indexes, namely the coverage, read length, subclone number and somatic variant rate, and the Results demonstrated a good robustness of the algorithm. The Results of the experiments showed that the mean sub-clone haplotypes accuracy exceeded 97%, suggesting that this algorithm significantly outperformed the previous Methods. CONCLUSIONS The proposed method can accurately reconstruct tumor subclonal haplotypes and clarify the process of tumor clonal evolution, and can thus provide a theoretical basis for tumor heterogeneity research and assist in clinical decision-making.
Collapse
Affiliation(s)
- Yu Geng
- School of Health Management, Jinzhou Medical University, Jinzhou 121001, China.,School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China
| | - Zhongmeng Zhao
- School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China
| | - Jianye Liu
- School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China
| |
Collapse
|
22
|
Liu F, Zhang Y, Zhang L, Li Z, Fang Q, Gao R, Zhang Z. Systematic comparative analysis of single-nucleotide variant detection methods from single-cell RNA sequencing data. Genome Biol 2019; 20:242. [PMID: 31744515 PMCID: PMC6862814 DOI: 10.1186/s13059-019-1863-4] [Citation(s) in RCA: 70] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2019] [Accepted: 10/23/2019] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Systematic interrogation of single-nucleotide variants (SNVs) is one of the most promising approaches to delineate the cellular heterogeneity and phylogenetic relationships at the single-cell level. While SNV detection from abundant single-cell RNA sequencing (scRNA-seq) data is applicable and cost-effective in identifying expressed variants, inferring sub-clones, and deciphering genotype-phenotype linkages, there is a lack of computational methods specifically developed for SNV calling in scRNA-seq. Although variant callers for bulk RNA-seq have been sporadically used in scRNA-seq, the performances of different tools have not been assessed. RESULTS Here, we perform a systematic comparison of seven tools including SAMtools, the GATK pipeline, CTAT, FreeBayes, MuTect2, Strelka2, and VarScan2, using both simulation and scRNA-seq datasets, and identify multiple elements influencing their performance. While the specificities are generally high, with sensitivities exceeding 90% for most tools when calling homozygous SNVs in high-confident coding regions with sufficient read depths, such sensitivities dramatically decrease when calling SNVs with low read depths, low variant allele frequencies, or in specific genomic contexts. SAMtools shows the highest sensitivity in most cases especially with low supporting reads, despite the relatively low specificity in introns or high-identity regions. Strelka2 shows consistently good performance when sufficient supporting reads are provided, while FreeBayes shows good performance in the cases of high variant allele frequencies. CONCLUSIONS We recommend SAMtools, Strelka2, FreeBayes, or CTAT, depending on the specific conditions of usage. Our study provides the first benchmarking to evaluate the performances of different SNV detection tools for scRNA-seq data.
Collapse
Affiliation(s)
- Fenglin Liu
- School of Life Sciences and BIOPIC, Peking University, Beijing, China
| | - Yuanyuan Zhang
- School of Life Sciences and BIOPIC, Peking University, Beijing, China
| | - Lei Zhang
- Beijing Advanced Innovation Centre for Genomics, Peking-Tsinghua Centre for Life Sciences, Peking University, Beijing, China
| | - Ziyi Li
- School of Life Sciences and BIOPIC, Peking University, Beijing, China
| | - Qiao Fang
- Beijing Advanced Innovation Centre for Genomics, Peking-Tsinghua Centre for Life Sciences, Peking University, Beijing, China
| | - Ranran Gao
- School of Life Sciences and BIOPIC, Peking University, Beijing, China
| | - Zemin Zhang
- School of Life Sciences and BIOPIC, Peking University, Beijing, China
- Beijing Advanced Innovation Centre for Genomics, Peking-Tsinghua Centre for Life Sciences, Peking University, Beijing, China
| |
Collapse
|
23
|
Bewicke-Copley F, Arjun Kumar E, Palladino G, Korfi K, Wang J. Applications and analysis of targeted genomic sequencing in cancer studies. Comput Struct Biotechnol J 2019; 17:1348-1359. [PMID: 31762958 PMCID: PMC6861594 DOI: 10.1016/j.csbj.2019.10.004] [Citation(s) in RCA: 90] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2019] [Revised: 10/18/2019] [Accepted: 10/22/2019] [Indexed: 12/31/2022] Open
Abstract
Next Generation Sequencing (NGS) has dramatically improved the flexibility and outcomes of cancer research and clinical trials, providing highly sensitive and accurate high-throughput platforms for large-scale genomic testing. In contrast to whole-genome (WGS) or whole-exome sequencing (WES), targeted genomic sequencing (TS) focuses on a panel of genes or targets known to have strong associations with pathogenesis of disease and/or clinical relevance, offering greater sequencing depth with reduced costs and data burden. This allows targeted sequencing to identify low frequency variants in targeted regions with high confidence, thus suitable for profiling low-quality and fragmented clinical DNA samples. As a result, TS has been widely used in clinical research and trials for patient stratification and the development of targeted therapeutics. However, its transition to routine clinical use has been slow. Many technical and analytical obstacles still remain and need to be discussed and addressed before large-scale and cross-centre implementation. Gold-standard and state-of-the-art procedures and pipelines are urgently needed to accelerate this transition. In this review we first present how TS is conducted in cancer research, including various target enrichment platforms, the construction of target panels, and selected research and clinical studies utilising TS to profile clinical samples. We then present a generalised analytical workflow for TS data discussing important parameters and filters in detail, aiming to provide the best practices of TS usage and analyses.
Collapse
Key Words
- BAM, Binary Alignment Map
- BWA, Burrows-Wheeler Aligner
- Background error
- CLL, Chronic Lymphocytic Leukaemia
- COSMIC, Catalogue of Somatic Mutations in Cancer
- Cancer genomics
- Clinical samples
- ESP, Exome Sequencing Project
- FF, Fresh Frozen
- FFPE, Formalin Fixed Paraffin Embedded
- FL, Follicular Lymphoma
- GATK, Genome Analysis Toolkit
- ICGC, International Cancer Genome Consortium
- MBC, Molecular Barcode
- NCCN, the National Comprehensive Cancer Network®
- NGS, Next Generation Sequencing
- NHL, Non-Hodgkin Lymphoma
- NSCLC, Non-Small Cell Lung Carcinoma
- PCR duplicates
- QC, Quality Control
- SAM, Sequence Alignment Map
- TCGA, The Cancer Genome Atlas
- TS, Targeted Sequencing
- Targeted sequencing
- UMI, Unique Molecular Identifiers
- VAF, Variant Allele Frequency
- Variant calling
- WES, Whole Exome Sequencing
- WGS, Whole Genome Sequencing
- tFL, Transformed Follicular Lymphoma
Collapse
Affiliation(s)
- Findlay Bewicke-Copley
- Centre for Cancer Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK
| | - Emil Arjun Kumar
- Centre for Cancer Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK
- Centre for Haemato-Oncology, Barts Cancer Institute, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK
| | - Giuseppe Palladino
- Centre for Haemato-Oncology, Barts Cancer Institute, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK
| | - Koorosh Korfi
- Centre for Cancer Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK
| | - Jun Wang
- Centre for Cancer Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK
| |
Collapse
|
24
|
Bartha Á, Győrffy B. Comprehensive Outline of Whole Exome Sequencing Data Analysis Tools Available in Clinical Oncology. Cancers (Basel) 2019; 11:E1725. [PMID: 31690036 PMCID: PMC6895801 DOI: 10.3390/cancers11111725] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Revised: 10/31/2019] [Accepted: 11/01/2019] [Indexed: 12/17/2022] Open
Abstract
Whole exome sequencing (WES) enables the analysis of all protein coding sequences in the human genome. This technology enables the investigation of cancer-related genetic aberrations that are predominantly located in the exonic regions. WES delivers high-throughput results at a reasonable price. Here, we review analysis tools enabling utilization of WES data in clinical and research settings. Technically, WES initially allows the detection of single nucleotide variants (SNVs) and copy number variations (CNVs), and data obtained through these methods can be combined and further utilized. Variant calling algorithms for SNVs range from standalone tools to machine learning-based combined pipelines. Tools for CNV detection compare the number of reads aligned to a dedicated segment. Both SNVs and CNVs help to identify mutations resulting in pharmacologically druggable alterations. The identification of homologous recombination deficiency enables the use of PARP inhibitors. Determining microsatellite instability and tumor mutation burden helps to select patients eligible for immunotherapy. To pave the way for clinical applications, we have to recognize some limitations of WES, including its restricted ability to detect CNVs, low coverage compared to targeted sequencing, and the missing consensus regarding references and minimal application requirements. Recently, Galaxy became the leading platform in non-command line-based WES data processing. The maturation of next-generation sequencing is reinforced by Food and Drug Administration (FDA)-approved methods for cancer screening, detection, and follow-up. WES is on the verge of becoming an affordable and sufficiently evolved technology for everyday clinical use.
Collapse
Affiliation(s)
- Áron Bartha
- Semmelweis University, Department of Bioinformatics and 2nd Department of Pediatrics, H-1094 Budapest, Hungary.
- TTK Cancer Biomarker Research Group, Institute of Enzymology, Magyar tudósokkörútja 2., H-1117 Budapest, Hungary.
| | - Balázs Győrffy
- Semmelweis University, Department of Bioinformatics and 2nd Department of Pediatrics, H-1094 Budapest, Hungary.
- TTK Cancer Biomarker Research Group, Institute of Enzymology, Magyar tudósokkörútja 2., H-1117 Budapest, Hungary.
| |
Collapse
|
25
|
Cho Y, Lee S, Hong JH, Kim BJ, Hong WY, Jung J, Lee HB, Sung J, Kim HN, Kim HL, Jung J. Development of the variant calling algorithm, ADIScan, and its use to estimate discordant sequences between monozygotic twins. Nucleic Acids Res 2019; 46:e92. [PMID: 29873758 PMCID: PMC6125643 DOI: 10.1093/nar/gky445] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2017] [Accepted: 05/15/2018] [Indexed: 12/30/2022] Open
Abstract
Calling variants from next-generation sequencing (NGS) data or discovering discordant sequences between two NGS data sets is challenging. We developed a computer algorithm, ADIScan1, to call variants by comparing the fractions of allelic reads in a tester to the universal reference genome. We then created ADIScan2 by modifying the algorithm to directly compare two sets of NGS data and predict discordant sequences between two testers. ADIScan1 detected >99.7% of variants called by GATK with an additional 724 393 SNVs. ADIScan2 identified ∼500 candidates of discordant sequences in each of two pairs of the monozygotic twins. About 200 of these candidates were included in the ∼2800 predicted by VarScan2. We verified 66 true discordant sequences among the candidates that ADIScan2 and VarScan2 exclusively predicted. ADIScan2 detected many discordant sequences overlooked by VarScan2 and Mutect, which specialize in detecting low frequency mutations in genetically heterogeneous cancerous tissues. Numbers of verified sequences alone were >5 times more than expected based on recently estimated mutation rates from whole genome sequences. Estimated post-zygotic mutation rates were 1.68 × 10−7 in this study. ADIScan1 and 2 would complement existing tools in screening causative mutations of diverse genetic diseases and comparing two sets of genome sequences, respectively.
Collapse
Affiliation(s)
- Yangrae Cho
- Syntekabio Incorporated, Techno-2ro B-512, Yuseong-gu, Daejeon 34025, Republic of Korea.,DFTBA, CALS, Chonnam National University, Gwangju 61186, Republic of Korea
| | - Sunho Lee
- Syntekabio Incorporated, Techno-2ro B-512, Yuseong-gu, Daejeon 34025, Republic of Korea.,School of Computer Science and Engineering, Seoul National University, Seoul, 151-742, Republic of Korea
| | - Jong Hui Hong
- Syntekabio Incorporated, Techno-2ro B-512, Yuseong-gu, Daejeon 34025, Republic of Korea.,Research Institute of Pharmaceutical Sciences, College of Pharmacy, Seoul National University, Seoul 08826, Republic of Korea
| | - Byong Joon Kim
- Syntekabio Incorporated, Techno-2ro B-512, Yuseong-gu, Daejeon 34025, Republic of Korea
| | - Woon-Young Hong
- Syntekabio Incorporated, Techno-2ro B-512, Yuseong-gu, Daejeon 34025, Republic of Korea
| | - Jongcheol Jung
- Syntekabio Incorporated, Techno-2ro B-512, Yuseong-gu, Daejeon 34025, Republic of Korea
| | - Hyang Burm Lee
- DFTBA, CALS, Chonnam National University, Gwangju 61186, Republic of Korea
| | - Joohon Sung
- Complex Disease and Genome Epidemiology Branch, Department of Epidemiology, School of Public Health, Seoul National University, Seoul 08826, Republic of Korea
| | - Han-Na Kim
- Department of Biochemistry, School of Medicine, Ewha Woman's University, Seoul 07985, Republic of Korea
| | - Hyung-Lae Kim
- Department of Biochemistry, School of Medicine, Ewha Woman's University, Seoul 07985, Republic of Korea
| | - Jongsun Jung
- Syntekabio Incorporated, Techno-2ro B-512, Yuseong-gu, Daejeon 34025, Republic of Korea
| |
Collapse
|
26
|
Singer J, Irmisch A, Ruscheweyh HJ, Singer F, Toussaint NC, Levesque MP, Stekhoven DJ, Beerenwinkel N. Bioinformatics for precision oncology. Brief Bioinform 2019; 20:778-788. [PMID: 29272324 PMCID: PMC6585151 DOI: 10.1093/bib/bbx143] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2017] [Revised: 09/29/2017] [Indexed: 12/13/2022] Open
Abstract
Molecular profiling of tumor biopsies plays an increasingly important role not only in cancer research, but also in the clinical management of cancer patients. Multi-omics approaches hold the promise of improving diagnostics, prognostics and personalized treatment. To deliver on this promise of precision oncology, appropriate bioinformatics methods for managing, integrating and analyzing large and complex data are necessary. Here, we discuss the specific requirements of bioinformatics methods and software that arise in the setting of clinical oncology, owing to a stricter regulatory environment and the need for rapid, highly reproducible and robust procedures. We describe the workflow of a molecular tumor board and the specific bioinformatics support that it requires, from the primary analysis of raw molecular profiling data to the automatic generation of a clinical report and its delivery to decision-making clinical oncologists. Such workflows have to various degrees been implemented in many clinical trials, as well as in molecular tumor boards at specialized cancer centers and university hospitals worldwide. We review these and more recent efforts to include other high-dimensional multi-omics patient profiles into the tumor board, as well as the state of clinical decision support software to translate molecular findings into treatment recommendations.
Collapse
Affiliation(s)
- Jochen Singer
- Department of Biosystems Science and Engineering of ETH Zurich in Basel, Switzerland
| | - Anja Irmisch
- Department of Dermatology at the University of Zurich Hospital in Zurich, Switzerland
| | | | | | | | | | | | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering of ETH Zurich in Basel, Switzerland
| |
Collapse
|
27
|
Calling Variants in the Clinic: Informed Variant Calling Decisions Based on Biological, Clinical, and Laboratory Variables. Comput Struct Biotechnol J 2019; 17:561-569. [PMID: 31049166 PMCID: PMC6482431 DOI: 10.1016/j.csbj.2019.04.002] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2018] [Revised: 03/12/2019] [Accepted: 04/03/2019] [Indexed: 01/10/2023] Open
Abstract
Deep sequencing genomic analysis is becoming increasingly common in clinical research and practice, enabling accurate identification of diagnostic, prognostic, and predictive determinants. Variant calling, distinguishing between true mutations and experimental errors, is a central task of genomic analysis and often requires sophisticated statistical, computational, and/or heuristic techniques. Although variant callers seek to overcome noise inherent in biological experiments, variant calling can be significantly affected by outside factors including those used to prepare, store, and analyze samples. The goal of this review is to discuss known experimental features, such as sample preparation, library preparation, and sequencing, alongside diverse biological and clinical variables, and evaluate their effect on variant caller selection and optimization.
Collapse
|
28
|
Dorri F, Jewell S, Bouchard-Côté A, Shah SP. Somatic mutation detection and classification through probabilistic integration of clonal population information. Commun Biol 2019; 2:44. [PMID: 30729182 PMCID: PMC6355807 DOI: 10.1038/s42003-019-0291-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2018] [Accepted: 12/20/2018] [Indexed: 01/06/2023] Open
Abstract
Somatic mutations are a primary contributor to malignancy in human cells. Accurate detection of mutations is needed to define the clonal composition of tumours whereby clones may have distinct phenotypic properties. Although analysis of mutations over multiple tumour samples from the same patient has the potential to enhance identification of clones, few analytic methods exploit the correlation structure across samples. We posited that incorporating clonal information into joint analysis over multiple samples would improve mutation detection, particularly those with low prevalence. In this paper, we develop a new procedure called MuClone, for detection of mutations across multiple tumour samples of a patient from whole genome or exome sequencing data. In addition to mutation detection, MuClone classifies mutations into biologically meaningful groups and allows us to study clonal dynamics. We show that, on lung and ovarian cancer datasets, MuClone improves somatic mutation detection sensitivity over competing approaches without compromising specificity.
Collapse
MESH Headings
- Female
- Humans
- Carcinoma, Non-Small-Cell Lung/diagnosis
- Carcinoma, Non-Small-Cell Lung/genetics
- Carcinoma, Non-Small-Cell Lung/metabolism
- Carcinoma, Non-Small-Cell Lung/pathology
- Clone Cells
- Cystadenocarcinoma, Serous/diagnosis
- Cystadenocarcinoma, Serous/genetics
- Cystadenocarcinoma, Serous/metabolism
- Cystadenocarcinoma, Serous/pathology
- Datasets as Topic
- Exome
- Gene Expression
- Genetic Loci
- Genome, Human
- Lung Neoplasms/diagnosis
- Lung Neoplasms/genetics
- Lung Neoplasms/metabolism
- Lung Neoplasms/pathology
- Models, Statistical
- Multigene Family
- Mutation
- Neoplasm Proteins/genetics
- Neoplasm Proteins/metabolism
- Ovarian Neoplasms/diagnosis
- Ovarian Neoplasms/genetics
- Ovarian Neoplasms/metabolism
- Ovarian Neoplasms/pathology
- Software
- Whole Genome Sequencing
Collapse
Affiliation(s)
- Fatemeh Dorri
- Department of Computer Science, University of British Columbia, 201- 2366 Main Mall, V6T 1Z4 Vancouver, Canada
| | - Sean Jewell
- Department of Statistics, University of Washington, B313 Padelford Hall, Northeast Stevens Way, Seattle, WA 24105 USA
| | - Alexandre Bouchard-Côté
- Department of Statistics, University of British Columbia, 3182 Earth Sciences Building, 2207 Main Mall, V6T 1Z4 Vancouver, Canada
| | - Sohrab P. Shah
- Department of Molecular Oncology, University of British Columbia, 675 West 10th Avenue, V5Z 1L3 Vancouver, Canada
- Department of Pathology and Laboratory Medicine, University of British Columbia, Rm. G227 - 2211 Wesbrook Mall, 24105 Vancouver, Canada
- Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, Kettering Cancer Center, 417 E 68th Street, New York, NY 10065 USA
| |
Collapse
|
29
|
Sun S, Murray SS. Bioinformatics Basics for High-Throughput Hybridization-Based Targeted DNA Sequencing from FFPE-Derived Tumor Specimens: From Reads to Variants. Methods Mol Biol 2019; 1908:37-48. [PMID: 30649719 DOI: 10.1007/978-1-4939-9004-7_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The use of next-generation sequencing and hybridization-based capture for target enrichment have enabled the interrogation of coding regions of several clinically significant cancer genes in tumor specimens using both targeted panels of a few to hundreds of genes, to whole-exome panels encompassing coding regions of all genes in the genome. Next-generation sequencing (NGS) technologies produce millions of relatively short segments of sequences or reads that require bioinformatics tools to map reads back to a reference genome using various read alignment tools, as well as to determine differences between single bases (single nucleotide variants or SNVs) or multiple bases (insertions and deletions or indels) between the aligned reads and the reference genome to call variants. In addition to single nucleotide changes or small insertions and deletions, high copy gains and losses can also be gleaned from NGS data to call gene amplifications and deletions. Throughout these processes, numerous quality control metrics can be assessed at each step to ensure that the resulting called variants are of high quality and are accurate. In this chapter we review common tools used to generate reads from Illumina-derived sequence data, align reads, and call variants from hybridization-based targeted NGS panel data generated from tumor FFPE-derived DNA specimens as well as basic quality metrics to assess for each assayed specimen.
Collapse
Affiliation(s)
- Shulei Sun
- Center for Advanced Laboratory Medicine, University of California San Diego Health, La Jolla, CA, USA
| | - Sarah S Murray
- Center for Advanced Laboratory Medicine, University of California San Diego Health, La Jolla, CA, USA.
- Department of Pathology, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
30
|
Abstract
This chapter contains a step-by-step protocol for identifying somatic SNPs and small Indels from next-generation sequencing data of tumor samples and matching normal samples. The workflow presented here is largely based on the Broad Institute's "Best Practices" guidelines and makes use of their Genome Analysis Toolkit (GATK) platform. Variants are annotated with population allele frequencies and curated resources such as GnomAD and ClinVar and curated effect predictions from dbNSFP using VCFtools, SnpEff, and SnpSift.
Collapse
Affiliation(s)
- Peter J Ulintz
- BRCF Bioinformatics Core, University of Michigan, Ann Arbor, MI, USA.
- Division of Hematology and Oncology, Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA.
| | - Weisheng Wu
- BRCF Bioinformatics Core, University of Michigan, Ann Arbor, MI, USA
| | - Chris M Gates
- BRCF Bioinformatics Core, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
31
|
Wang C, Liang C. MSIpred: a python package for tumor microsatellite instability classification from tumor mutation annotation data using a support vector machine. Sci Rep 2018; 8:17546. [PMID: 30510242 PMCID: PMC6277498 DOI: 10.1038/s41598-018-35682-z] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2018] [Accepted: 10/28/2018] [Indexed: 12/19/2022] Open
Abstract
Microsatellite instability (MSI) is characterized by high degree of polymorphism in microsatellite lengths due to deficiency in mismatch repair (MMR) system. MSI is associated with several tumor types and its status can be considered as an important indicator for patient prognosis. Conventional clinical diagnosis of MSI examines PCR products of a panel of microsatellite markers using electrophoresis (MSI-PCR), which is laborious, costly, and time consuming. We developed MSIpred, a python package for automatic MSI classification using a machine learning technology – support vector machine (SVM). MSIpred computes 22 features characterizing tumor somatic mutational load from mutation data in mutation annotation format (MAF) generated from paired tumor-normal exome sequencing data, subsequently using these features to predict tumor MSI status with a SVM classifier trained by MAF data of 1074 tumors belonging to four types. Evaluation of MSIpred on an independent testing set, MAF data of another 358 tumors, achieved overall accuracy of ≥98% and area under receiver operating characteristic (ROC) curve of 0.967. Further analysis on discrepant cases revealed that discrepancies were partially due to misclassification of MSI-PCR. Additional testing of MSIpred on non-TCGA data also validated its good classification performance. These results indicated that MSIpred is a robust pan-tumor MSI classification tool and can serve as a complementary diagnostic to MSI-PCR in MSI diagnosis.
Collapse
Affiliation(s)
- Chen Wang
- Department of Biology, Miami University, Oxford, OH, 45056, USA
| | - Chun Liang
- Department of Biology, Miami University, Oxford, OH, 45056, USA. .,Department of Computer Science & Software Engineering, Miami University, Oxford, OH, 45056, USA.
| |
Collapse
|
32
|
Chamberlain CE, German MS, Yang K, Wang J, VanBrocklin H, Regan M, Shokat KM, Ducker GS, Kim GE, Hann B, Donner DB, Warren RS, Venook AP, Bergsland EK, Lee D, Wang Y, Nakakura EK. A Patient-derived Xenograft Model of Pancreatic Neuroendocrine Tumors Identifies Sapanisertib as a Possible New Treatment for Everolimus-resistant Tumors. Mol Cancer Ther 2018; 17:2702-2709. [PMID: 30254185 DOI: 10.1158/1535-7163.mct-17-1204] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2017] [Revised: 07/18/2018] [Accepted: 09/20/2018] [Indexed: 12/11/2022]
Abstract
Patients with pancreatic neuroendocrine tumors (PNET) commonly develop advanced disease and require systemic therapy. However, treatment options remain limited, in part, because experimental models that reliably emulate PNET disease are lacking. We therefore developed a patient-derived xenograft model of PNET (PDX-PNET), which we then used to evaluate two mTOR inhibitor drugs: FDA-approved everolimus and the investigational new drug sapanisertib. PDX-PNETs maintained a PNET morphology and PNET-specific gene expression signature with serial passage. PDX-PNETs also harbored mutations in genes previously associated with PNETs (such as MEN1 and PTEN), displayed activation of the mTOR pathway, and could be detected by Gallium-68 DOTATATE PET-CT. Treatment of PDX-PNETs with either everolimus or sapanisertib strongly inhibited growth. As seen in patients, some PDX-PNETs developed resistance to everolimus. However, sapanisertib, a more potent inhibitor of the mTOR pathway, caused tumor shrinkage in most everolimus-resistant tumors. Our PDX-PNET model is the first available, validated PDX model for PNET, and preclinical data from the use of this model suggest that sapanisertib may be an effective new treatment option for patients with PNET or everolimus-resistant PNET.
Collapse
Affiliation(s)
- Chester E Chamberlain
- Center for Regeneration Medicine, University of California, San Francisco, California.
- Diabetes Center, University of California, San Francisco, California
- Department of Medicine, University of California, San Francisco, California
| | - Michael S German
- Center for Regeneration Medicine, University of California, San Francisco, California
- Diabetes Center, University of California, San Francisco, California
- Department of Medicine, University of California, San Francisco, California
| | - Katherine Yang
- Center for Regeneration Medicine, University of California, San Francisco, California
- Diabetes Center, University of California, San Francisco, California
- Department of Medicine, University of California, San Francisco, California
| | - Jason Wang
- Center for Regeneration Medicine, University of California, San Francisco, California
- Diabetes Center, University of California, San Francisco, California
- Department of Medicine, University of California, San Francisco, California
| | - Henry VanBrocklin
- Department of Radiology and Biomedical Imaging, University of California, San Francisco, California
| | - Melanie Regan
- Department of Radiology and Biomedical Imaging, University of California, San Francisco, California
| | - Kevan M Shokat
- Department of Cellular Molecular Pharmacology, University of California, San Francisco, California
| | - Gregory S Ducker
- Department of Cellular Molecular Pharmacology, University of California, San Francisco, California
| | - Grace E Kim
- Department of Pathology, University of California, San Francisco, California
| | - Byron Hann
- Helen Diller Family HDF Comprehensive Cancer Center, University of California, San Francisco, California
| | - David B Donner
- Helen Diller Family HDF Comprehensive Cancer Center, University of California, San Francisco, California
- Department of Surgery, University of California, San Francisco, California
| | - Robert S Warren
- Helen Diller Family HDF Comprehensive Cancer Center, University of California, San Francisco, California
- Department of Surgery, University of California, San Francisco, California
| | - Alan P Venook
- Department of Medicine, University of California, San Francisco, California
- Helen Diller Family HDF Comprehensive Cancer Center, University of California, San Francisco, California
| | - Emily K Bergsland
- Department of Medicine, University of California, San Francisco, California
- Helen Diller Family HDF Comprehensive Cancer Center, University of California, San Francisco, California
| | - Danny Lee
- Helen Diller Family HDF Comprehensive Cancer Center, University of California, San Francisco, California
- Department of Surgery, University of California, San Francisco, California
| | - Yucheng Wang
- Helen Diller Family HDF Comprehensive Cancer Center, University of California, San Francisco, California
- Department of Surgery, University of California, San Francisco, California
| | - Eric K Nakakura
- Helen Diller Family HDF Comprehensive Cancer Center, University of California, San Francisco, California.
- Department of Surgery, University of California, San Francisco, California
| |
Collapse
|
33
|
Sun Z, Bhagwate A, Prodduturi N, Yang P, Kocher JPA. Indel detection from RNA-seq data: tool evaluation and strategies for accurate detection of actionable mutations. Brief Bioinform 2018; 18:973-983. [PMID: 27473065 PMCID: PMC5862335 DOI: 10.1093/bib/bbw069] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2016] [Indexed: 11/29/2022] Open
Abstract
Driver somatic mutations are a hallmark of a tumor that can be used for diagnosis and targeted therapy. Mutations are primarily detected from tumor DNA. As dynamic molecules of gene activities, transcriptome profiling by RNA sequence (RNA-seq) is becoming increasingly popular, which not only measures gene expression but also structural variations such as mutations and fusion transcripts. Although single-nucleotide variants (SNVs) can be easily identified from RNA-seq, intermediate long insertions/deletions (indels > 2 bases and less than sequence reads) cause significant challenges and are ignored by most RNA-seq analysis tools. This study evaluates commonly used RNA-seq analysis programs along with variant and somatic mutation callers in a series of data sets with simulated and known indels. The aim is to develop strategies for accurate indel detection. Our results show that the RNA-seq alignment is the most important step for indel identification and the evaluated programs have a wide range of sensitivity to map sequence reads with indels, from not at all to decently sensitive. The sensitivity is impacted by sequence read lengths. Most variant calling programs rely on hard evidence indels marked in the alignment and the programs with realignment may use soft-clipped reads for indel inferencing. Based on the observations, we have provided practical recommendations for indel detection when different RNA-seq aligners are used and demonstrated the best option with highly reliable results. With careful customization of bioinformatics algorithms, RNA-seq can be reliably used for both SNV and indel mutation detection that can be used for clinical decision-making.
Collapse
Affiliation(s)
- Zhifu Sun
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN, USA
- Corresponding author: Zhifu Sun, Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, 200 First St SW, Rochester, MN 55905, USA. Tel.: 507-266-1894; Fax: 507-284-0360; E-mail:
| | - Aditya Bhagwate
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN, USA
| | - Naresh Prodduturi
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN, USA
| | - Ping Yang
- Division of Epidemiology, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Jean-Pierre A Kocher
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN, USA
| |
Collapse
|
34
|
Ren Y, Zhao J, Li R, Xie Y, Jiang S, Zhou H, Liu H, You Y, Chen F, Wang W, Gao Y, Meng Y, Lu Y. Noninvasive prenatal test for FGFR3-related skeletal dysplasia based on next-generation sequencing and plasma cell-free DNA: Test performance analysis and feasibility exploration. Prenat Diagn 2018; 38:821-828. [PMID: 30048571 DOI: 10.1002/pd.5334] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2018] [Revised: 07/13/2018] [Accepted: 07/14/2018] [Indexed: 12/22/2022]
Abstract
OBJECTIVE To explore the feasibility and accuracy of a noninvasive prenatal test for fibroblast growth factor receptor 3 (FGFR3)-related skeletal dysplasia based on next-generation sequencing (NGS) of plasma cell-free DNA. METHOD Fragmented genome DNA (gDNA) of fetuses with achondroplasia (ACH) and thanatophoric dysplasia type I (TD I) was mixed with postdelivery maternal plasma cell-free DNA to generate spiked samples of different modeled fetal fractions. Multiplex polymerase chain reaction was used to amplify the 19 FGFR3 loci, and the amplification products were then sequenced by NGS to detect the fetal mutant alleles. Then, maternal plasma samples of pregnant women carrying ACH (n = 4) and TD I fetuses (n = 2), as well as healthy controls (n = 15), were tested by NGS, and the test performance was evaluated. RESULTS Fetal FGFR3 mutations were detected in all artificial mixtures with fetal gDNA concentrations above 3%. In clinical validation, our method identified all fetal FGFR3 mutant alleles from maternal plasma, with no false positive results. The sensitivity and specificity of our method were 100% (95% CI, 54.1%-100%) and 100% (78.2%-100%), respectively. CONCLUSION Our method had a favorable performance for noninvasively detecting fetal FGFR3 mutations in maternal plasma, highlighting its promising value in developing a noninvasive prenatal test for de novo and paternally inherited disorders.
Collapse
Affiliation(s)
- Yuan Ren
- Department of Obstetrics and Gynecology, Chinese PLA General Hospital, Beijing, China
| | - Jia Zhao
- BGI-Shenzhen, Shenzhen, China.,China National GeneBank, BGI-Shenzhen, Shenzhen, China
| | - Ruibing Li
- Department of Obstetrics and Gynecology, Chinese PLA General Hospital, Beijing, China
| | | | - Shufang Jiang
- Department of Obstetrics and Gynecology, Chinese PLA General Hospital, Beijing, China
| | - Honghui Zhou
- Department of Obstetrics and Gynecology, Chinese PLA General Hospital, Beijing, China
| | | | - Yanqin You
- Department of Obstetrics and Gynecology, Chinese PLA General Hospital, Beijing, China
| | - Fang Chen
- BGI-Shenzhen, Shenzhen, China.,China National GeneBank, BGI-Shenzhen, Shenzhen, China.,Laboratory of Genomics and Molecular Biomedicine, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | | | - Ya Gao
- BGI-Shenzhen, Shenzhen, China.,China National GeneBank, BGI-Shenzhen, Shenzhen, China
| | - Yuanguang Meng
- Department of Obstetrics and Gynecology, Chinese PLA General Hospital, Beijing, China
| | - Yanping Lu
- Department of Obstetrics and Gynecology, Chinese PLA General Hospital, Beijing, China
| |
Collapse
|
35
|
A computational tool to detect DNA alterations tailored to formalin-fixed paraffin-embedded samples in cancer clinical sequencing. Genome Med 2018; 10:44. [PMID: 29880027 PMCID: PMC5992758 DOI: 10.1186/s13073-018-0547-0] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2017] [Accepted: 05/07/2018] [Indexed: 12/16/2022] Open
Abstract
Advanced cancer genomics technologies are now being employed in clinical sequencing, where next-generation sequencers are used to simultaneously identify multiple types of DNA alterations for prescription of molecularly targeted drugs. However, no computational tool is available to accurately detect DNA alterations in formalin-fixed paraffin-embedded (FFPE) samples commonly used in hospitals. Here, we developed a computational tool tailored to the detection of single nucleotide variations, indels, fusions, and copy number alterations in FFPE samples. Elaborated multilayer noise filters reduced the inherent noise while maintaining high sensitivity, as evaluated in tumor-unmatched normal samples using orthogonal technologies. This tool, cisCall, should facilitate clinical sequencing in everyday diagnostics. It is available at https://www.ciscall.org.
Collapse
|
36
|
Wang Y, Guo L, Feng L, Zhang W, Xiao T, Di X, Chen G, Zhang K. Single nucleotide variant profiles of viable single circulating tumour cells reveal CTC behaviours in breast cancer. Oncol Rep 2018; 39:2147-2159. [PMID: 29565466 PMCID: PMC5928770 DOI: 10.3892/or.2018.6325] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2017] [Accepted: 03/16/2018] [Indexed: 12/21/2022] Open
Abstract
Circulating tumour cell (CTC) behaviours are distinct from those of bulk tissues. Thus, treatments to eliminate CTCs differ from the regimens followed to reduce the primary tumour and its metastases. Accordingly, comprehensively deciphering the single nucleotide variant (SNV) profiles in CTCs, which partially determine CTC behaviours, is a priority. Using viable CTCs isolated with the oHSV1-hTERT-GFP virus coupled with fluorescence-activated cell sorting (FACS), the whole genome was amplified using the multiple annealing and looping-based amplification cycle (MALBAC) method. CTC behaviours were evaluated using the SNVs found to be recurrently mutated in different cells (termed CTC-shared SNVs). Analysis of the sequencing data of 11 CTCs from 8 patients demonstrated that SNVs accumulated sporadically among CTCs and their matched primary tumours (22 co-occurring mutated genes were identified in the exomes of CTCs and their matched primary tissues and metastases), and 394 SNVs were shared by at least two CTCs. Mutated APC and LRP1B genes co-occurred in CTC-shared and bulk-tissue SNVs. Additionally, the breast-originating identity of the CTC-shared SNVs was verified, and they demonstrated the following CTC behaviours: i) intravasation competency; ii) increased migration or motility; iii) enhanced cell-cell interactions; iv) variation in energy metabolism; v) an activated platelet or coagulation system; and vi) dysfunctional mitosis. These results demonstrated that it is feasible to capture and amplify the genomes of single CTCs using the described pipeline. CTC-shared SNVs are a potential signature for identifying the origin of the primary tumour in a liquid biopsy. Furthermore, CTCs demonstrated some behaviours that are unique from those of bulk tissues. Therefore, therapies to eradicate these precursors of metastasis may differ from the existing traditional regimens.
Collapse
Affiliation(s)
- Yipeng Wang
- Department of Breast Surgery, National Cancer Center/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100021, P.R. China
| | - Liping Guo
- State Key Laboratory of Molecular Oncology, Department of Etiology and Carcinogenesis, National Cancer Center/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100021, P.R. China
| | - Lin Feng
- State Key Laboratory of Molecular Oncology, Department of Etiology and Carcinogenesis, National Cancer Center/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100021, P.R. China
| | - Wen Zhang
- Department of Immunology, National Cancer Center/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100021, P.R. China
| | - Ting Xiao
- State Key Laboratory of Molecular Oncology, Department of Etiology and Carcinogenesis, National Cancer Center/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100021, P.R. China
| | - Xuebing Di
- State Key Laboratory of Molecular Oncology, Department of Etiology and Carcinogenesis, National Cancer Center/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100021, P.R. China
| | - Guoji Chen
- Department of Breast Surgery, National Cancer Center/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100021, P.R. China
| | - Kaitai Zhang
- State Key Laboratory of Molecular Oncology, Department of Etiology and Carcinogenesis, National Cancer Center/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100021, P.R. China
| |
Collapse
|
37
|
Muller E, Goardon N, Brault B, Rousselin A, Paimparay G, Legros A, Fouillet R, Bruet O, Tranchant A, Domin F, San C, Quesnelle C, Frebourg T, Ricou A, Krieger S, Vaur D, Castera L. OutLyzer: software for extracting low-allele-frequency tumor mutations from sequencing background noise in clinical practice. Oncotarget 2018; 7:79485-79493. [PMID: 27825131 PMCID: PMC5346729 DOI: 10.18632/oncotarget.13103] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2016] [Accepted: 10/11/2016] [Indexed: 01/01/2023] Open
Abstract
Highlighting tumoral mutations is a key step in oncology for personalizing care. Considering the genetic heterogeneity in a tumor, software used for detecting mutations should clearly distinguish real tumor events of interest that could be predictive markers for personalized medicine from false positives. OutLyzer is a new variant-caller designed for the specific and sensitive detection of mutations for research and diagnostic purposes. It is based on statistic and local evaluation of sequencing background noise to highlight potential true positive variants. 130 previously genotyped patients were sequenced after enrichment by capturing the exons of 22 genes. Sequencing data were analyzed by HaplotypeCaller, LofreqStar, Varscan2 and OutLyzer. OutLyzer had the best sensitivity and specificity with a fixed limit of detection for all tools of 1% for SNVs and 2% for Indels. OutLyzer is a useful tool for detecting mutations of interest in tumors including low allele-frequency mutations, and could be adopted in standard practice for delivering targeted therapies in cancer treatment.
Collapse
Affiliation(s)
- Etienne Muller
- Department of Cancer Biology and Genetics, CCC François Baclesse, Genomic and Personalized Medicine in Cancer and Neurological Disorders Unit, Caen, France.,Inserm U1079, Genomic and Personalized Medicine in Cancer and Neurological Disorders Unit, Rouen, France
| | - Nicolas Goardon
- Department of Cancer Biology and Genetics, CCC François Baclesse, Genomic and Personalized Medicine in Cancer and Neurological Disorders Unit, Caen, France
| | - Baptiste Brault
- Department of Cancer Biology and Genetics, CCC François Baclesse, Genomic and Personalized Medicine in Cancer and Neurological Disorders Unit, Caen, France
| | - Antoine Rousselin
- Department of Cancer Biology and Genetics, CCC François Baclesse, Genomic and Personalized Medicine in Cancer and Neurological Disorders Unit, Caen, France
| | - Germain Paimparay
- Department of Cancer Biology and Genetics, CCC François Baclesse, Genomic and Personalized Medicine in Cancer and Neurological Disorders Unit, Caen, France
| | - Angelina Legros
- Department of Cancer Biology and Genetics, CCC François Baclesse, Genomic and Personalized Medicine in Cancer and Neurological Disorders Unit, Caen, France
| | - Robin Fouillet
- Department of Cancer Biology and Genetics, CCC François Baclesse, Genomic and Personalized Medicine in Cancer and Neurological Disorders Unit, Caen, France
| | - Olivia Bruet
- Department of Cancer Biology and Genetics, CCC François Baclesse, Genomic and Personalized Medicine in Cancer and Neurological Disorders Unit, Caen, France
| | - Aurore Tranchant
- Department of Cancer Biology and Genetics, CCC François Baclesse, Genomic and Personalized Medicine in Cancer and Neurological Disorders Unit, Caen, France
| | - Florian Domin
- Department of Cancer Biology and Genetics, CCC François Baclesse, Genomic and Personalized Medicine in Cancer and Neurological Disorders Unit, Caen, France
| | - Chankannira San
- Department of Cancer Biology and Genetics, CCC François Baclesse, Genomic and Personalized Medicine in Cancer and Neurological Disorders Unit, Caen, France
| | - Céline Quesnelle
- Department of Cancer Biology and Genetics, CCC François Baclesse, Genomic and Personalized Medicine in Cancer and Neurological Disorders Unit, Caen, France
| | - Thierry Frebourg
- Inserm U1079, Genomic and Personalized Medicine in Cancer and Neurological Disorders Unit, Rouen, France.,Genetic Department, Rouen University Hospital, Genomic and Personalized Medicine in Cancer and Neurological Disorders Unit, Rouen, France.,Rouen University, France
| | - Agathe Ricou
- Department of Cancer Biology and Genetics, CCC François Baclesse, Genomic and Personalized Medicine in Cancer and Neurological Disorders Unit, Caen, France
| | - Sophie Krieger
- Department of Cancer Biology and Genetics, CCC François Baclesse, Genomic and Personalized Medicine in Cancer and Neurological Disorders Unit, Caen, France.,Inserm U1079, Genomic and Personalized Medicine in Cancer and Neurological Disorders Unit, Rouen, France.,Caen University, France
| | - Dominique Vaur
- Department of Cancer Biology and Genetics, CCC François Baclesse, Genomic and Personalized Medicine in Cancer and Neurological Disorders Unit, Caen, France.,Inserm U1079, Genomic and Personalized Medicine in Cancer and Neurological Disorders Unit, Rouen, France
| | - Laurent Castera
- Department of Cancer Biology and Genetics, CCC François Baclesse, Genomic and Personalized Medicine in Cancer and Neurological Disorders Unit, Caen, France.,Inserm U1079, Genomic and Personalized Medicine in Cancer and Neurological Disorders Unit, Rouen, France
| |
Collapse
|
38
|
Sun JX, He Y, Sanford E, Montesion M, Frampton GM, Vignot S, Soria JC, Ross JS, Miller VA, Stephens PJ, Lipson D, Yelensky R. A computational approach to distinguish somatic vs. germline origin of genomic alterations from deep sequencing of cancer specimens without a matched normal. PLoS Comput Biol 2018; 14:e1005965. [PMID: 29415044 PMCID: PMC5832436 DOI: 10.1371/journal.pcbi.1005965] [Citation(s) in RCA: 197] [Impact Index Per Article: 28.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2015] [Revised: 03/01/2018] [Accepted: 01/05/2018] [Indexed: 12/31/2022] Open
Abstract
A key constraint in genomic testing in oncology is that matched normal specimens are not commonly obtained in clinical practice. Thus, while well-characterized genomic alterations do not require normal tissue for interpretation, a significant number of alterations will be unknown in whether they are germline or somatic, in the absence of a matched normal control. We introduce SGZ (somatic-germline-zygosity), a computational method for predicting somatic vs. germline origin and homozygous vs. heterozygous or sub-clonal state of variants identified from deep massively parallel sequencing (MPS) of cancer specimens. The method does not require a patient matched normal control, enabling broad application in clinical research. SGZ predicts the somatic vs. germline status of each alteration identified by modeling the alteration’s allele frequency (AF), taking into account the tumor content, tumor ploidy, and the local copy number. Accuracy of the prediction depends on the depth of sequencing and copy number model fit, which are achieved in our clinical assay by sequencing to high depth (>500x) using MPS, covering 394 cancer-related genes and over 3,500 genome-wide single nucleotide polymorphisms (SNPs). Calls are made using a statistic based on read depth and local variability of SNP AF. To validate the method, we first evaluated performance on samples from 30 lung and colon cancer patients, where we sequenced tumors and matched normal tissue. We examined predictions for 17 somatic hotspot mutations and 20 common germline SNPs in 20,182 clinical cancer specimens. To assess the impact of stromal admixture, we examined three cell lines, which were titrated with their matched normal to six levels (10–75%). Overall, predictions were made in 85% of cases, with 95–99% of variants predicted correctly, a significantly superior performance compared to a basic approach based on AF alone. We then applied the SGZ method to the COSMIC database of known somatic variants in cancer and found >50 that are in fact more likely to be germline. We introduce SGZ, a computational method for predicting somatic vs. germline origin and homozygous vs. heterozygous or sub-clonal state of variants identified from deep massively parallel sequencing of clinical formalin-fixed, paraffin embedded (FFPE) cancer specimens. The method does not require fresh tissue or a patient matched normal control, enabling broad application in clinical research. It supports functional prioritization and interpretation of alterations discovered on routine testing and may inform clinical decision making and ultimately expand treatment choices for cancer patients.
Collapse
Affiliation(s)
- James X. Sun
- Foundation Medicine, Inc., Cambridge, MA, United States of America
- * E-mail:
| | - Yuting He
- Foundation Medicine, Inc., Cambridge, MA, United States of America
| | - Eric Sanford
- Foundation Medicine, Inc., Cambridge, MA, United States of America
| | - Meagan Montesion
- Foundation Medicine, Inc., Cambridge, MA, United States of America
| | | | - Stéphane Vignot
- Institut National de la Santé et de la Recherche Médicale (INSERM) U981, Gustave Roussy, Villejuif Grand, Paris, France
- Oncology and Hematology Department, Hôpitaux de Chartres, Chartres, France
| | - Jean-Charles Soria
- Institut National de la Santé et de la Recherche Médicale (INSERM) U981, Gustave Roussy, Villejuif Grand, Paris, France
| | - Jeffrey S. Ross
- Foundation Medicine, Inc., Cambridge, MA, United States of America
- Albany Medical College, Albany, NY, United States of America
| | | | - Phil J. Stephens
- Foundation Medicine, Inc., Cambridge, MA, United States of America
| | - Doron Lipson
- Foundation Medicine, Inc., Cambridge, MA, United States of America
| | - Roman Yelensky
- Foundation Medicine, Inc., Cambridge, MA, United States of America
| |
Collapse
|
39
|
Xu C. A review of somatic single nucleotide variant calling algorithms for next-generation sequencing data. Comput Struct Biotechnol J 2018; 16:15-24. [PMID: 29552334 PMCID: PMC5852328 DOI: 10.1016/j.csbj.2018.01.003] [Citation(s) in RCA: 153] [Impact Index Per Article: 21.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2017] [Revised: 01/20/2018] [Accepted: 01/28/2018] [Indexed: 02/06/2023] Open
Abstract
Detection of somatic mutations holds great potential in cancer treatment and has been a very active research field in the past few years, especially since the breakthrough of the next-generation sequencing technology. A collection of variant calling pipelines have been developed with different underlying models, filters, input data requirements, and targeted applications. This review aims to enumerate these unique features of the state-of-the-art variant callers, in the hope to provide a practical guide for selecting the appropriate pipeline for specific applications. We will focus on the detection of somatic single nucleotide variants, ranging from traditional variant callers based on whole genome or exome sequencing of paired tumor-normal samples to recent low-frequency variant callers designed for targeted sequencing protocols with unique molecular identifiers. The variant callers have been extensively benchmarked with inconsistent performances across these studies. We will review the reference materials, datasets, and performance metrics that have been used in the benchmarking studies. In the end, we will discuss emerging trends and future directions of the variant calling algorithms.
Collapse
Affiliation(s)
- Chang Xu
- Life Science Research and Foundation, Qiagen Sciences, Inc., 6951 Executive Way, Frederick, Maryland 21703, USA
| |
Collapse
|
40
|
Manheimer KB, Richter F, Edelmann LJ, D'Souza SL, Shi L, Shen Y, Homsy J, Boskovski MT, Tai AC, Gorham J, Yasso C, Goldmuntz E, Brueckner M, Lifton RP, Chung WK, Seidman CE, Seidman JG, Gelb BD. Robust identification of mosaic variants in congenital heart disease. Hum Genet 2018; 137:183-193. [PMID: 29417219 PMCID: PMC5997246 DOI: 10.1007/s00439-018-1871-6] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2017] [Accepted: 01/30/2018] [Indexed: 12/15/2022]
Abstract
Mosaicism due to somatic mutations can cause multiple diseases including cancer, developmental and overgrowth syndromes, neurodevelopmental disorders, autoinflammatory diseases, and atrial fibrillation. With the increased use of next generation sequencing technology, multiple tools have been developed to identify low-frequency variants, specifically from matched tumor-normal tissues in cancer studies. To investigate whether mosaic variants are implicated in congenital heart disease (CHD), we developed a pipeline using the cancer somatic variant caller MuTect to identify mosaic variants in whole-exome sequencing (WES) data from a cohort of parent/affected child trios (n = 715) and a cohort of healthy individuals (n = 416). This is a novel application of the somatic variant caller designed for cancer to WES trio data. We identified two cases with mosaic KMT2D mutations that are likely pathogenic for CHD, but conclude that, overall, mosaicism detectable in peripheral blood or saliva does not account for a significant portion of CHD etiology.
Collapse
Affiliation(s)
- Kathryn B Manheimer
- Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Felix Richter
- Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Lisa J Edelmann
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Sunita L D'Souza
- Department of Cell, Developmental and Regenerative Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Lisong Shi
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Yufeng Shen
- Department of Systems Biology, Columbia University Medical Center, New York, NY, USA
- Department of Biomedical Informatics, Columbia University Medical Center, New York, NY, USA
| | - Jason Homsy
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Cardiovscular Research Center, Massachusetts General Hospital, Boston, MA, USA
| | - Marko T Boskovski
- Division of Cardiac Surgery, The Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Angela C Tai
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Joshua Gorham
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | | | - Elizabeth Goldmuntz
- Department of Pediatrics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Division of Cardiology, The Children's Hospital of Philadelphia, The University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Martina Brueckner
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA
- Department of Pediatrics, Yale University School of Medicine, New Haven, CT, USA
| | - Richard P Lifton
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA
- Howard Hughes Medical Institute, Yale University, New Haven, CT, USA
- Yale Center for Mendelian Genomics, New Haven, CT, USA
- Yale Center for Genome Analysis, Yale University, New Haven, CT, USA
- Department of Internal Medicine, Yale University School of Medicine, New Haven, CT, USA
| | - Wendy K Chung
- Department of Pediatrics, Columbia University Medical Center, New York, NY, USA
- Department of Medicine, Columbia University Medical Center, New York, NY, USA
| | - Christine E Seidman
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Department of Medicine (Cardiology), Brigham and Women's Hospital, Boston, MA, USA
- The Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - J G Seidman
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Bruce D Gelb
- Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Department of Pediatrics, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
41
|
Santos C, Almeida NF, Alves ML, Horres R, Krezdorn N, Leitão ST, Aznar-Fernández T, Rotter B, Winter P, Rubiales D, Vaz Patto MC. First genetic linkage map of Lathyrus cicera based on RNA sequencing-derived markers: Key tool for genetic mapping of disease resistance. HORTICULTURE RESEARCH 2018; 5:45. [PMID: 30181885 PMCID: PMC6119197 DOI: 10.1038/s41438-018-0047-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2018] [Revised: 03/05/2018] [Accepted: 04/30/2018] [Indexed: 05/10/2023]
Abstract
The Lathyrus cicera transcriptome was analysed in response to rust (Uromyces pisi) infection to develop novel molecular breeding tools with potential for genetic mapping of resistance in this robust orphan legume species. One RNA-seq library each was generated from control and rust-inoculated leaves from two L. cicera genotypes with contrasting quantitative resistance, de novo assembled into contigs and sequence polymorphisms were identified. In toto, 19,224 SNPs differentiate the susceptible from the partially resistant genotype's transcriptome. In addition, we developed and tested 341 expressed E-SSR markers from the contigs, of which 60.7% varied between the two L. cicera genotypes. A first L. cicera linkage map was created using part of the developed markers in a RIL population from the cross of the two genotypes. This map contains 307 markers, covered 724.2 cM and is organised in 7 major and 2 minor linkage groups, with an average mapping interval of 2.4 cM. The genic markers also enabled us to compare their position in L. cicera map with the physical position of the same markers mapped on Medicago truncatula genome, highlighting a high macrosyntenic conservation between both species. This study provides a large new set of genic polymorphic molecular markers with potential for mapping rust resistances. It represents the first step towards genomics-assisted precision breeding in L. cicera.
Collapse
Affiliation(s)
- Carmen Santos
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa, Av. da República, Oeiras, 2780-157 Portugal
| | - Nuno Felipe Almeida
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa, Av. da República, Oeiras, 2780-157 Portugal
| | - Mara Lisa Alves
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa, Av. da República, Oeiras, 2780-157 Portugal
| | - Ralf Horres
- GenXPro GmbH, Frankfurt am Main, D-60438 Germany
| | | | - Susana Trindade Leitão
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa, Av. da República, Oeiras, 2780-157 Portugal
| | | | - Björn Rotter
- GenXPro GmbH, Frankfurt am Main, D-60438 Germany
| | - Peter Winter
- GenXPro GmbH, Frankfurt am Main, D-60438 Germany
| | - Diego Rubiales
- Institute for Sustainable Agriculture, CSIC, Córdoba, E-14004 Spain
| | - Maria Carlota Vaz Patto
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa, Av. da República, Oeiras, 2780-157 Portugal
| |
Collapse
|
42
|
Dimitrakopoulos L, Prassas I, Diamandis EP, Charames GS. Onco-proteogenomics: Multi-omics level data integration for accurate phenotype prediction. Crit Rev Clin Lab Sci 2017; 54:414-432. [DOI: 10.1080/10408363.2017.1384446] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Affiliation(s)
- Lampros Dimitrakopoulos
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, Canada
- Department of Pathology and Laboratory Medicine, Mount Sinai Hospital, Joseph and Wolf Lebovic Health Complex, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
| | - Ioannis Prassas
- Department of Pathology and Laboratory Medicine, Mount Sinai Hospital, Joseph and Wolf Lebovic Health Complex, Toronto, ON, Canada
| | - Eleftherios P. Diamandis
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, Canada
- Department of Pathology and Laboratory Medicine, Mount Sinai Hospital, Joseph and Wolf Lebovic Health Complex, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
- Department of Clinical Biochemistry, University Health Network, Toronto, ON, Canada
| | - George S. Charames
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, Canada
- Department of Pathology and Laboratory Medicine, Mount Sinai Hospital, Joseph and Wolf Lebovic Health Complex, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
| |
Collapse
|
43
|
Bohnert R, Vivas S, Jansen G. Comprehensive benchmarking of SNV callers for highly admixed tumor data. PLoS One 2017; 12:e0186175. [PMID: 29020110 PMCID: PMC5636151 DOI: 10.1371/journal.pone.0186175] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2017] [Accepted: 09/26/2017] [Indexed: 12/30/2022] Open
Abstract
Precision medicine attempts to individualize cancer therapy by matching tumor-specific genetic changes with effective targeted therapies. A crucial first step in this process is the reliable identification of cancer-relevant variants, which is considerably complicated by the impurity and heterogeneity of clinical tumor samples. We compared the impact of admixture of non-cancerous cells and low somatic allele frequencies on the sensitivity and precision of 19 state-of-the-art SNV callers. We studied both whole exome and targeted gene panel data and up to 13 distinct parameter configurations for each tool. We found vast differences among callers. Based on our comprehensive analyses we recommend joint tumor-normal calling with MuTect, EBCall or Strelka for whole exome somatic variant calling, and HaplotypeCaller or FreeBayes for whole exome germline calling. For targeted gene panel data on a single tumor sample, LoFreqStar performed best. We further found that tumor impurity and admixture had a negative impact on precision, and in particular, sensitivity in whole exome experiments. At admixture levels of 60% to 90% sometimes seen in pathological biopsies, sensitivity dropped significantly, even when variants were originally present in the tumor at 100% allele frequency. Sensitivity to low-frequency SNVs improved with targeted panel data, but whole exome data allowed more efficient identification of germline variants. Effective somatic variant calling requires high-quality pathological samples with minimal admixture, a consciously selected sequencing strategy, and the appropriate variant calling tool with settings optimized for the chosen type of data.
Collapse
|
44
|
Roman T, Xie L, Schwartz R. Automated deconvolution of structured mixtures from heterogeneous tumor genomic data. PLoS Comput Biol 2017; 13:e1005815. [PMID: 29059177 PMCID: PMC5695636 DOI: 10.1371/journal.pcbi.1005815] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2017] [Revised: 11/02/2017] [Accepted: 10/10/2017] [Indexed: 11/23/2022] Open
Abstract
With increasing appreciation for the extent and importance of intratumor heterogeneity, much attention in cancer research has focused on profiling heterogeneity on a single patient level. Although true single-cell genomic technologies are rapidly improving, they remain too noisy and costly at present for population-level studies. Bulk sequencing remains the standard for population-scale tumor genomics, creating a need for computational tools to separate contributions of multiple tumor clones and assorted stromal and infiltrating cell populations to pooled genomic data. All such methods are limited to coarse approximations of only a few cell subpopulations, however. In prior work, we demonstrated the feasibility of improving cell type deconvolution by taking advantage of substructure in genomic mixtures via a strategy called simplicial complex unmixing. We improve on past work by introducing enhancements to automate learning of substructured genomic mixtures, with specific emphasis on genome-wide copy number variation (CNV) data, as well as the ability to process quantitative RNA expression data, and heterogeneous combinations of RNA and CNV data. We introduce methods for dimensionality estimation to better decompose mixture model substructure; fuzzy clustering to better identify substructure in sparse, noisy data; and automated model inference methods for other key model parameters. We further demonstrate their effectiveness in identifying mixture substructure in true breast cancer CNV data from the Cancer Genome Atlas (TCGA). Source code is available at https://github.com/tedroman/WSCUnmix.
Collapse
Affiliation(s)
- Theodore Roman
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Lu Xie
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Russell Schwartz
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
- Biological Sciences Department, Mellon College of Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| |
Collapse
|
45
|
Kohmoto T, Masuda K, Naruto T, Tange S, Shoda K, Hamada J, Saito M, Ichikawa D, Tajima A, Otsuji E, Imoto I. Construction of a combinatorial pipeline using two somatic variant calling methods for whole exome sequence data of gastric cancer. THE JOURNAL OF MEDICAL INVESTIGATION 2017; 64:233-240. [PMID: 28954988 DOI: 10.2152/jmi.64.233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
High-throughput next-generation sequencing is a powerful tool to identify the genotypic landscapes of somatic variants and therapeutic targets in various cancers including gastric cancer, forming the basis for personalized medicine in the clinical setting. Although the advent of many computational algorithms leads to higher accuracy in somatic variant calling, no standard method exists due to the limitations of each method. Here, we constructed a new pipeline. We combined two different somatic variant callers with different algorithms, Strelka and VarScan 2, and evaluated performance using whole exome sequencing data obtained from 19 Japanese cases with gastric cancer (GC); then, we characterized these tumors based on identified driver molecular alterations. More single nucleotide variants (SNVs) and small insertions/deletions were detected by Strelka and VarScan 2, respectively. SNVs detected by both tools showed higher accuracy for estimating somatic variants compared with those detected by only one of the two tools and accurately showed the mutation signature and mutations of driver genes reported for GC. Our combinatorial pipeline may have an advantage in detection of somatic mutations in GC and may be useful for further genomic characterization of Japanese patients with GC to improve the efficacy of GC treatments. J. Med. Invest. 64: 233-240, August, 2017.
Collapse
Affiliation(s)
- Tomohiro Kohmoto
- Department of Human Genetics, Graduate School of Biomedical Sciences, Tokushima University
| | - Kiyoshi Masuda
- Department of Human Genetics, Graduate School of Biomedical Sciences, Tokushima University
| | - Takuya Naruto
- Department of Human Genetics, Graduate School of Biomedical Sciences, Tokushima University
| | - Shoichiro Tange
- Department of Human Genetics, Graduate School of Biomedical Sciences, Tokushima University
| | - Katsutoshi Shoda
- Department of Human Genetics, Graduate School of Biomedical Sciences, Tokushima University.,Division of Digestive Surgery, Department of Surgery, Kyoto Prefectural University of Medicine
| | - Junichi Hamada
- Department of Human Genetics, Graduate School of Biomedical Sciences, Tokushima University.,Division of Digestive Surgery, Department of Surgery, Kyoto Prefectural University of Medicine
| | - Masako Saito
- Department of Human Genetics, Graduate School of Biomedical Sciences, Tokushima University
| | - Daisuke Ichikawa
- Division of Digestive Surgery, Department of Surgery, Kyoto Prefectural University of Medicine
| | - Atsushi Tajima
- Department of Human Genetics, Graduate School of Biomedical Sciences, Tokushima University.,Department of Bioinformatics and Genomics, Graduate School of Advanced Preventive Medical Sciences, Kanazawa University
| | - Eigo Otsuji
- Division of Digestive Surgery, Department of Surgery, Kyoto Prefectural University of Medicine
| | - Issei Imoto
- Department of Human Genetics, Graduate School of Biomedical Sciences, Tokushima University
| |
Collapse
|
46
|
Huang AY, Zhang Z, Ye AY, Dou Y, Yan L, Yang X, Zhang Y, Wei L. MosaicHunter: accurate detection of postzygotic single-nucleotide mosaicism through next-generation sequencing of unpaired, trio, and paired samples. Nucleic Acids Res 2017; 45:e76. [PMID: 28132024 PMCID: PMC5449543 DOI: 10.1093/nar/gkx024] [Citation(s) in RCA: 51] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2016] [Revised: 12/24/2016] [Accepted: 01/26/2017] [Indexed: 02/07/2023] Open
Abstract
Genomic mosaicism arising from postzygotic mutations has long been associated with cancer and more recently with non-cancer diseases. It has also been detected in healthy individuals including healthy parents of children affected with genetic disorders, highlighting its critical role in the origin of genetic mutations. However, most existing software for the genome-wide identification of single-nucleotide mosaicisms (SNMs) requires a paired control tissue obtained from the same individual which is often unavailable for non-cancer individuals and sometimes missing in cancer studies. Here, we present MosaicHunter (http://mosaichunter.cbi.pku.edu.cn), a bioinformatics tool that can identify SNMs in whole-genome and whole-exome sequencing data of unpaired samples without matched controls using Bayesian genotypers. We evaluate the accuracy of MosaicHunter on both simulated and real data and demonstrate that it has improved performance compared with other somatic mutation callers. We further demonstrate that incorporating sequencing data of the parents can be an effective approach to significantly improve the accuracy of detecting SNMs in an individual when a matched control sample is unavailable. Finally, MosaicHunter also has a paired mode that can take advantage of matched control samples when available, making it a useful tool for detecting SNMs in both non-cancer and cancer studies.
Collapse
Affiliation(s)
- August Yue Huang
- Center for Bioinformatics, State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing 100871, People's Republic of China
- National Institute of Biological Sciences, Beijing 102206, People's Republic of China
| | - Zheng Zhang
- Center for Bioinformatics, State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing 100871, People's Republic of China
- School of Life Sciences, Tsinghua-Peking Joint Center for Life Sciences, Tsinghua University, Beijing 100084, People's Republic of China
| | - Adam Yongxin Ye
- Center for Bioinformatics, State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing 100871, People's Republic of China
- Peking-Tsinghua Center for Life Sciences, Beijing, People's Republic of China
- Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, People's Republic of China
| | - Yanmei Dou
- Center for Bioinformatics, State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing 100871, People's Republic of China
- National Institute of Biological Sciences, Beijing 102206, People's Republic of China
| | - Linlin Yan
- Center for Bioinformatics, State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing 100871, People's Republic of China
| | - Xiaoxu Yang
- Center for Bioinformatics, State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing 100871, People's Republic of China
| | - Yuehua Zhang
- Peking University First Hospital, Peking University, Beijing 100034, People's Republic of China
| | - Liping Wei
- Center for Bioinformatics, State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing 100871, People's Republic of China
| |
Collapse
|
47
|
Flynn A, Dwight T, Benn D, Deb S, Colebatch AJ, Fox S, Harris J, Duncan EL, Robinson B, Hogg A, Ellul J, To H, Duong C, Miller JA, Yates C, James P, Trainer A, Gill AJ, Clifton-Bligh R, Hicks RJ, Tothill RW. Cousins not twins: intratumoural and intertumoural heterogeneity in syndromic neuroendocrine tumours. J Pathol 2017; 242:273-283. [PMID: 28369925 DOI: 10.1002/path.4900] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2017] [Revised: 03/01/2017] [Accepted: 03/23/2017] [Indexed: 12/23/2022]
Abstract
Hereditary endocrine neoplasias, including phaeochromocytoma/paraganglioma and medullary thyroid cancer, are caused by autosomal dominant mutations in several familial cancer genes. A common feature of these diseases is the presentation of multiple primary tumours, or multifocal disease representing independent tumour clones that have arisen from the same initiating genetic lesion, but have undergone independent clonal evolution. Such tumours provide an opportunity to discover common cooperative changes required for tumourigenesis, while controlling for the genetic background of the individual. We performed genomic analysis of synchronous and metachronous tumours from five patients bearing germline mutations in the genes SDHB, RET, and MAX. Using whole exome sequencing and high-density single-nucleotide polymorphism arrays, we analysed two to four primary tumours from each patient. We also applied multi-region sampling, to assess intratumoural heterogeneity and clonal evolution, in two cases involving paraganglioma and medullary thyroid cancer, respectively. Heterogeneous patterns of genomic change existed between synchronous or metachronous tumours, with evidence of branching evolution. We observed striking examples of evolutionary convergence involving the same rare somatic copy-number events in synchronous primary phaeochromocytoma/paraganglioma. Convergent events also occurred during clonal evolution of metastatic medullary thyroid cancer. These observations suggest that genetic or epigenetic changes acquired early within precursor cells, or pre-existing within the genetic background of the individual, create contingencies that determine the evolutionary trajectory of the tumour. Copyright © 2017 Pathological Society of Great Britain and Ireland. Published by John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Aidan Flynn
- The Peter MacCallum Cancer Centre, Melbourne, Victoria, Australia.,Department of Pathology, University of Melbourne, Melbourne, Victoria, Australia
| | - Trisha Dwight
- Cancer Genetics, Kolling Institute, Royal North Shore Hospital, Sydney, NSW, Australia.,University of Sydney, Sydney, NSW, Australia
| | - Diana Benn
- Cancer Genetics, Kolling Institute, Royal North Shore Hospital, Sydney, NSW, Australia.,University of Sydney, Sydney, NSW, Australia
| | - Siddhartha Deb
- Anatomical Pathology, Anatpath, Melbourne, Victoria, Australia
| | - Andrew J Colebatch
- The Peter MacCallum Cancer Centre, Melbourne, Victoria, Australia.,Department of Pathology, University of Melbourne, Melbourne, Victoria, Australia
| | - Stephen Fox
- The Peter MacCallum Cancer Centre, Melbourne, Victoria, Australia.,Department of Pathology, University of Melbourne, Melbourne, Victoria, Australia.,The Sir Peter MacCallum Department of Oncology, University of Melbourne, Melbourne, Victoria, Australia
| | - Jessica Harris
- Queensland University of Technology, Brisbane, Queensland, Australia
| | - Emma L Duncan
- Queensland University of Technology, Brisbane, Queensland, Australia.,Faculty of Medicine, University of Queensland, Brisbane, Queensland, Australia.,Department of Endocrinology, Royal Brisbane and Women's Hospital, Brisbane, Queensland, Australia
| | - Bruce Robinson
- Cancer Genetics, Kolling Institute, Royal North Shore Hospital, Sydney, NSW, Australia.,University of Sydney, Sydney, NSW, Australia
| | - Annette Hogg
- The Peter MacCallum Cancer Centre, Melbourne, Victoria, Australia
| | - Jason Ellul
- The Peter MacCallum Cancer Centre, Melbourne, Victoria, Australia
| | - Henry To
- Department of Surgery, Royal Melbourne Hospital, Melbourne, Victoria, Australia
| | - Cuong Duong
- The Peter MacCallum Cancer Centre, Melbourne, Victoria, Australia
| | - Julie A Miller
- Department of Surgery, Royal Melbourne Hospital, Melbourne, Victoria, Australia.,Department of Surgery, Epworth Hospital, Melbourne, Victoria, Australia
| | - Christopher Yates
- Department of Diabetes and Endocrinology, Royal Melbourne Hospital, Melbourne, Victoria, Australia.,Department of Diabetes and Endocrinology, Western Health, Melbourne, Victoria, Australia
| | - Paul James
- The Peter MacCallum Cancer Centre, Melbourne, Victoria, Australia.,The Sir Peter MacCallum Department of Oncology, University of Melbourne, Melbourne, Victoria, Australia
| | - Alison Trainer
- The Peter MacCallum Cancer Centre, Melbourne, Victoria, Australia.,Department of Pathology, University of Melbourne, Melbourne, Victoria, Australia
| | - Anthony J Gill
- University of Sydney, Sydney, NSW, Australia.,Department of Anatomical Pathology, Royal North Shore Hospital, Sydney, NSW, Australia
| | - Roderick Clifton-Bligh
- Cancer Genetics, Kolling Institute, Royal North Shore Hospital, Sydney, NSW, Australia.,University of Sydney, Sydney, NSW, Australia
| | - Rodney J Hicks
- The Peter MacCallum Cancer Centre, Melbourne, Victoria, Australia.,The Sir Peter MacCallum Department of Oncology, University of Melbourne, Melbourne, Victoria, Australia
| | - Richard W Tothill
- The Peter MacCallum Cancer Centre, Melbourne, Victoria, Australia.,Department of Pathology, University of Melbourne, Melbourne, Victoria, Australia.,The Sir Peter MacCallum Department of Oncology, University of Melbourne, Melbourne, Victoria, Australia
| |
Collapse
|
48
|
Kuipers J, Jahn K, Beerenwinkel N. Advances in understanding tumour evolution through single-cell sequencing. Biochim Biophys Acta Rev Cancer 2017; 1867:127-138. [PMID: 28193548 PMCID: PMC5813714 DOI: 10.1016/j.bbcan.2017.02.001] [Citation(s) in RCA: 79] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2016] [Revised: 02/02/2017] [Accepted: 02/04/2017] [Indexed: 12/14/2022]
Abstract
The mutational heterogeneity observed within tumours poses additional challenges to the development of effective cancer treatments. A thorough understanding of a tumour's subclonal composition and its mutational history is essential to open up the design of treatments tailored to individual patients. Comparative studies on a large number of tumours permit the identification of mutational patterns which may refine forecasts of cancer progression, response to treatment and metastatic potential. The composition of tumours is shaped by evolutionary processes. Recent advances in next-generation sequencing offer the possibility to analyse the evolutionary history and accompanying heterogeneity of tumours at an unprecedented resolution, by sequencing single cells. New computational challenges arise when moving from bulk to single-cell sequencing data, leading to the development of novel modelling frameworks. In this review, we present the state of the art methods for understanding the phylogeny encoded in bulk or single-cell sequencing data, and highlight future directions for developing more comprehensive and informative pictures of tumour evolution. This article is part of a Special Issue entitled: Evolutionary principles - heterogeneity in cancer?, edited by Dr. Robert A. Gatenby.
Collapse
MESH Headings
- Adaptation, Physiological
- Animals
- Biomarkers, Tumor/genetics
- Biomarkers, Tumor/metabolism
- Cell Transformation, Neoplastic/genetics
- Cell Transformation, Neoplastic/metabolism
- Cell Transformation, Neoplastic/pathology
- Evolution, Molecular
- Gene Expression Regulation, Neoplastic
- Genetic Fitness
- Genetic Heterogeneity
- Genetic Predisposition to Disease
- Heredity
- Humans
- Models, Genetic
- Mutation
- Neoplasms/drug therapy
- Neoplasms/genetics
- Neoplasms/metabolism
- Neoplasms/pathology
- Pedigree
- Phenotype
- Phylogeny
- Sequence Analysis, DNA
- Signal Transduction/genetics
- Single-Cell Analysis/methods
- Time Factors
Collapse
Affiliation(s)
- Jack Kuipers
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland; Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Katharina Jahn
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland; Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland; Swiss Institute of Bioinformatics, Basel, Switzerland
| |
Collapse
|
49
|
Moriyama T, Shiraishi Y, Chiba K, Yamaguchi R, Imoto S, Miyano S. OVarCall: Bayesian Mutation Calling Method Utilizing Overlapping Paired-End Reads. IEEE Trans Nanobioscience 2017; 16:116-122. [PMID: 28278479 DOI: 10.1109/tnb.2017.2670601] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Detection of somatic mutations from tumor and matched normal sequencing data has become a standard approach in cancer research. Although a number of mutation callers have been developed, it is still difficult to detect mutations with low allele frequency even in exome sequencing. We expect that overlapping paired-end read information is effective for this purpose, but no mutation caller has modeled overlapping information statistically in a proper form in exome sequence data. Here, we develop a Bayesian hierarchical method, OVar- Call (https://github.com/takumorizo/OVarCall), where overlapping paired-end read information improves the accuracy of low allele frequency mutation detection. Firstly, we construct two generative models: one is for reads with somatic variants generated from tumor cells and the other is for reads that does not have somatic variants but potentially includes sequence errors. Secondly, we calculate marginal likelihood for each model using a variational Bayesian algorithm to compute Bayes factor for the detection of somatic mutations. We empirically evaluated the performance of OVarCall and confirmed its better performance than other existing methods.
Collapse
|
50
|
Salehi S, Steif A, Roth A, Aparicio S, Bouchard-Côté A, Shah SP. ddClone: joint statistical inference of clonal populations from single cell and bulk tumour sequencing data. Genome Biol 2017; 18:44. [PMID: 28249593 PMCID: PMC5333399 DOI: 10.1186/s13059-017-1169-3] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2016] [Accepted: 02/10/2017] [Indexed: 12/16/2022] Open
Abstract
Next-generation sequencing (NGS) of bulk tumour tissue can identify constituent cell populations in cancers and measure their abundance. This requires computational deconvolution of allelic counts from somatic mutations, which may be incapable of fully resolving the underlying population structure. Single cell sequencing (SCS) is a more direct method, although its replacement of NGS is impeded by technical noise and sampling limitations. We propose ddClone, which analytically integrates NGS and SCS data, leveraging their complementary attributes through joint statistical inference. We show on real and simulated datasets that ddClone produces more accurate results than can be achieved by either method alone.
Collapse
Affiliation(s)
- Sohrab Salehi
- Bioinformatics Graduate Program, University of British Columbia, 570 West 7th Avenue, Vancouver, V5Z 4S6, BC, Canada
| | - Adi Steif
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, V6T 2B5, BC, Canada.,Department of Molecular Oncology, British Columbia Cancer Agency, 675 West 10th Avenue, Vancouver, V5Z 1L3, BC, Canada
| | - Andrew Roth
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, V6T 2B5, BC, Canada.,Department of Molecular Oncology, British Columbia Cancer Agency, 675 West 10th Avenue, Vancouver, V5Z 1L3, BC, Canada
| | - Samuel Aparicio
- Bioinformatics Graduate Program, University of British Columbia, 570 West 7th Avenue, Vancouver, V5Z 4S6, BC, Canada.,Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, V6T 2B5, BC, Canada
| | - Alexandre Bouchard-Côté
- Department of Statistics, University of British Columbia, 2207 Main Mall, Vancouver, V6T 1Z4, BC, Canada
| | - Sohrab P Shah
- Bioinformatics Graduate Program, University of British Columbia, 570 West 7th Avenue, Vancouver, V5Z 4S6, BC, Canada. .,Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, V6T 2B5, BC, Canada.
| |
Collapse
|