1
|
Srivastava R. Advancing precision oncology with AI-powered genomic analysis. Front Pharmacol 2025; 16:1591696. [PMID: 40371349 PMCID: PMC12075946 DOI: 10.3389/fphar.2025.1591696] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2025] [Accepted: 04/21/2025] [Indexed: 05/16/2025] Open
Abstract
Multiomics data integration approaches offer a comprehensive functional understanding of biological systems, with significant applications in disease therapeutics. However, the quantitative integration of multiomics data presents a complex challenge, requiring highly specialized computational methods. By providing deep insights into disease-associated molecular mechanisms, multiomics facilitates precision medicine by accounting for individual omics profiles, enabling early disease detection and prevention, aiding biomarker discovery for diagnosis, prognosis, and treatment monitoring, and identifying molecular targets for innovative drug development or the repurposing of existing therapies. AI-driven bioinformatics plays a crucial role in multiomics by computing scores to prioritize available drugs, assisting clinicians in selecting optimal treatments. This review will explain the potential of AI and multiomics data integration for disease understanding and therapeutics. It highlight the challenges in quantitative integration of diverse omics data and clinical workflows involving AI in cancer genomics, addressing the ethical and privacy concerns related to AI-driven applications in oncology. The scope of this text is broad yet focused, providing readers with a comprehensive overview of how AI-powered bioinformatics and integrative multiomics approaches are transforming precision oncology. Understanding bioinformatics in Genomics, it explore the integrative multiomics strategies for drug selection, genome profiling and tumor clonality analysis with clinical application of drug prioritization tools, addressing the technical, ethical, and practical hurdles in deploying AI-driven genomics tools.
Collapse
|
2
|
Ma W, Tang W, Kwok JS, Tong AH, Lo CW, Chu AT, Chung BH. A review on trends in development and translation of omics signatures in cancer. Comput Struct Biotechnol J 2024; 23:954-971. [PMID: 38385061 PMCID: PMC10879706 DOI: 10.1016/j.csbj.2024.01.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Revised: 01/31/2024] [Accepted: 01/31/2024] [Indexed: 02/23/2024] Open
Abstract
The field of cancer genomics and transcriptomics has evolved from targeted profiling to swift sequencing of individual tumor genome and transcriptome. The steady growth in genome, epigenome, and transcriptome datasets on a genome-wide scale has significantly increased our capability in capturing signatures that represent both the intrinsic and extrinsic biological features of tumors. These biological differences can help in precise molecular subtyping of cancer, predicting tumor progression, metastatic potential, and resistance to therapeutic agents. In this review, we summarized the current development of genomic, methylomic, transcriptomic, proteomic and metabolic signatures in the field of cancer research and highlighted their potentials in clinical applications to improve diagnosis, prognosis, and treatment decision in cancer patients.
Collapse
Affiliation(s)
- Wei Ma
- Hong Kong Genome Institute, Hong Kong, China
| | - Wenshu Tang
- Hong Kong Genome Institute, Hong Kong, China
| | | | | | | | | | - Brian H.Y. Chung
- Hong Kong Genome Institute, Hong Kong, China
- Department of Pediatrics and Adolescent Medicine, School of Clinical Medicine, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Hong Kong Genome Project
- Hong Kong Genome Institute, Hong Kong, China
- Department of Pediatrics and Adolescent Medicine, School of Clinical Medicine, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| |
Collapse
|
3
|
Jiang N, Wu Y, Rozen SG. Benchmarking 13 tools for mutational signature attribution, including a new and improved algorithm. Brief Bioinform 2024; 26:bbaf042. [PMID: 39910776 PMCID: PMC11798676 DOI: 10.1093/bib/bbaf042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2024] [Revised: 01/01/2025] [Accepted: 01/20/2025] [Indexed: 02/07/2025] Open
Abstract
Mutational signatures are characteristic patterns of mutations caused by endogenous mutational processes or by exogenous mutational exposures. There has been little benchmarking of approaches for determining which signatures are present in a sample and estimating the number of mutations due to each signature. This problem is referred to as "signature attribution." We show that there are often many combinations of signatures that can reconstruct the patterns of mutations in a sample reasonably well, even after encouraging sparse solutions. We benchmarked 13 approaches to signature attribution, including a new approach called Presence Attribute Signature Activity (PASA), on large synthetic data sets (2700 synthetic samples in total). These data sets recapitulated the single-base, insertion-deletion, and doublet-base mutational signature repertoires of nine cancer types. For single-base substitution mutations, PASA and MuSiCal outperformed other approaches on all the cancer types combined. However, the ranking of approaches varied by cancer type. For doublet-base substitutions and small insertions and deletions, while PASA outperformed the other approaches in most of the nine cancer types, the ranking of approaches again varied by cancer type. We believe that this variation reflects inherent difficulties in signature attribution. These difficulties stem from the fact that there are often many attributions that can reasonably explain the pattern of mutations in a sample and from the combinatorial search space due to the need to impose sparsity. Tables herein can provide guidance on the selection of mutational signature attribution approaches that are best suited to particular cancer types and study objectives.
Collapse
Affiliation(s)
- Nanhai Jiang
- Centre for Computational Biology, Duke-NUS Medical School, 8 College Road, Singapore 169857, Singapore
- Programme in Cancer and Stem Cell Biology, Duke-NUS Medical School, 8 College Road, Singapore 169857, Singapore
| | - Yang Wu
- Centre for Computational Biology, Duke-NUS Medical School, 8 College Road, Singapore 169857, Singapore
- Programme in Cancer and Stem Cell Biology, Duke-NUS Medical School, 8 College Road, Singapore 169857, Singapore
| | - Steven G Rozen
- Centre for Computational Biology, Duke-NUS Medical School, 8 College Road, Singapore 169857, Singapore
- Programme in Cancer and Stem Cell Biology, Duke-NUS Medical School, 8 College Road, Singapore 169857, Singapore
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, 2424 Erwin Road, Durham, NC 27710, United States
| |
Collapse
|
4
|
Hollizeck S, Wang N, Wong SQ, Litchfield C, Guinto J, Ftouni S, Rebello R, Kanwal S, Dong R, Grimmond S, Sandhu S, Mileshkin L, Tothill RW, Chandrananda D, Dawson SJ. Unravelling mutational signatures with plasma circulating tumour DNA. Nat Commun 2024; 15:9876. [PMID: 39543119 PMCID: PMC11564803 DOI: 10.1038/s41467-024-54193-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Accepted: 11/04/2024] [Indexed: 11/17/2024] Open
Abstract
The use of circulating tumour DNA (ctDNA) to profile mutational signatures represents a non-invasive opportunity for understanding cancer mutational processes. Here we present MisMatchFinder, a liquid biopsy approach for mutational signature detection using low-coverage whole-genome sequencing of ctDNA. Through analysis of 375 plasma samples across 9 cancers, we demonstrate that MisMatchFinder accurately infers single-base and doublet-base substitutions, as well as insertions and deletions to enhance the detection of ctDNA and clinically relevant mutational signatures.
Collapse
Affiliation(s)
- Sebastian Hollizeck
- Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
- Sir Peter MacCallum Department of Oncology, The University of Melbourne, Melbourne, VIC, Australia
| | - Ning Wang
- Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
- Sir Peter MacCallum Department of Oncology, The University of Melbourne, Melbourne, VIC, Australia
| | - Stephen Q Wong
- Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
- Sir Peter MacCallum Department of Oncology, The University of Melbourne, Melbourne, VIC, Australia
| | | | - Jerick Guinto
- Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
| | - Sarah Ftouni
- Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
| | - Richard Rebello
- Centre for Cancer Research, The University of Melbourne, Melbourne, VIC, Australia
| | - Sehrish Kanwal
- Centre for Cancer Research, The University of Melbourne, Melbourne, VIC, Australia
| | - Ruining Dong
- Centre for Cancer Research, The University of Melbourne, Melbourne, VIC, Australia
| | - Sean Grimmond
- Centre for Cancer Research, The University of Melbourne, Melbourne, VIC, Australia
| | - Shahneen Sandhu
- Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
- Sir Peter MacCallum Department of Oncology, The University of Melbourne, Melbourne, VIC, Australia
| | - Linda Mileshkin
- Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
- Sir Peter MacCallum Department of Oncology, The University of Melbourne, Melbourne, VIC, Australia
| | - Richard W Tothill
- Sir Peter MacCallum Department of Oncology, The University of Melbourne, Melbourne, VIC, Australia
- Centre for Cancer Research, The University of Melbourne, Melbourne, VIC, Australia
- Department of Clinical Pathology, The University of Melbourne, Melbourne, VIC, Australia
| | - Dineika Chandrananda
- Peter MacCallum Cancer Centre, Melbourne, VIC, Australia.
- Sir Peter MacCallum Department of Oncology, The University of Melbourne, Melbourne, VIC, Australia.
| | - Sarah-Jane Dawson
- Peter MacCallum Cancer Centre, Melbourne, VIC, Australia.
- Sir Peter MacCallum Department of Oncology, The University of Melbourne, Melbourne, VIC, Australia.
- Centre for Cancer Research, The University of Melbourne, Melbourne, VIC, Australia.
| |
Collapse
|
5
|
Medo M, Ng CKY, Medová M. A comprehensive comparison of tools for fitting mutational signatures. Nat Commun 2024; 15:9467. [PMID: 39487150 PMCID: PMC11530434 DOI: 10.1038/s41467-024-53711-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Accepted: 10/18/2024] [Indexed: 11/04/2024] Open
Abstract
Mutational signatures connect characteristic mutational patterns in the genome with biological or chemical processes that take place in cancers. Analysis of mutational signatures can help elucidate tumor evolution, prognosis, and therapeutic strategies. Although tools for extracting mutational signatures de novo have been extensively benchmarked, a similar effort is lacking for tools that fit known mutational signatures to a given catalog of mutations. We fill this gap by comprehensively evaluating twelve signature fitting tools on synthetic mutational catalogs with empirically driven signature weights corresponding to eight cancer types. On average, SigProfilerSingleSample and SigProfilerAssignment/MuSiCal perform best for small and large numbers of mutations per sample, respectively. We further show that ad hoc constraining the list of reference signatures is likely to produce inferior results. Evaluation of real mutational catalogs suggests that the activity of signatures that are absent in the reference catalog poses considerable problems to all evaluated tools.
Collapse
Affiliation(s)
- Matúš Medo
- Department of Radiation Oncology, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland.
- Department for BioMedical Research, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland.
| | - Charlotte K Y Ng
- Department for BioMedical Research, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
- IRCCS Humanitas Research Hospital, Rozzano, Milan, Italy
| | - Michaela Medová
- Department of Radiation Oncology, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
- Department for BioMedical Research, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
| |
Collapse
|
6
|
Jayakrishnan R, Kwiatkowski DJ, Rose MG, Nassar AH. Topography of mutational signatures in non-small cell lung cancer: emerging concepts, clinical applications, and limitations. Oncologist 2024; 29:833-841. [PMID: 38907669 PMCID: PMC11449018 DOI: 10.1093/oncolo/oyae091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 04/16/2024] [Indexed: 06/24/2024] Open
Abstract
The genome of a cell is continuously battered by a plethora of exogenous and endogenous processes that can lead to damaged DNA. Repair mechanisms correct this damage most of the time, but failure to do so leaves mutations. Mutations do not occur in random manner, but rather typically follow a more or less specific pattern due to known or imputed mutational processes. Mutational signature analysis is the process by which the predominant mutational process can be inferred for a cancer and can be used in several contexts to study both the genesis of cancer and its response to therapy. Recent pan-cancer genomic efforts such as "The Cancer Genome Atlas" have identified numerous mutational signatures that can be categorized into single base substitutions, doublet base substitutions, or small insertions/deletions. Understanding these mutational signatures as they occur in non-small lung cancer could improve efforts at prevention, predict treatment response to personalized treatments, and guide the development of therapies targeting tumor evolution. For non-small cell lung cancer, several mutational signatures have been identified that correlate with exposures such as tobacco smoking and radon and can also reflect endogenous processes such as aging, APOBEC activity, and loss of mismatch repair. Herein, we provide an overview of the current knowledge of mutational signatures in non-small lung cancer.
Collapse
Affiliation(s)
- Ritujith Jayakrishnan
- Department of Internal Medicine, Yale School of Medicine, New Haven, CT, United States
| | - David J Kwiatkowski
- Department of Pulmonary Medicine, Brigham and Women's Hospital, Boston, MA, 02115, United States
| | - Michal G Rose
- Yale University School of Medicine and Cancer Center, Veterans Affairs Connecticut Healthcare System, West Haven, CT 06516, United States
- Department of Medicine, Medical Oncology Division, Yale Cancer Center, New Haven, CT, United States
| | - Amin H Nassar
- Yale University School of Medicine and Cancer Center, Veterans Affairs Connecticut Healthcare System, West Haven, CT 06516, United States
| |
Collapse
|
7
|
Wiens M, Farahani H, Scott RW, Underhill TM, Bashashati A. Benchmarking bulk and single-cell variant-calling approaches on Chromium scRNA-seq and scATAC-seq libraries. Genome Res 2024; 34:1196-1210. [PMID: 39147582 PMCID: PMC11444184 DOI: 10.1101/gr.277066.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2023] [Accepted: 08/12/2024] [Indexed: 08/17/2024]
Abstract
Single-cell sequencing methodologies such as scRNA-seq and scATAC-seq have become widespread and effective tools to interrogate tissue composition. Increasingly, variant callers are being applied to these methodologies to resolve the genetic heterogeneity of a sample, especially in the case of detecting the clonal architecture of a tumor. Typically, traditional bulk DNA variant callers are applied to the pooled reads of a single-cell library to detect candidate mutations. Recently, multiple studies have applied such callers on reads from individual cells, with some citing the ability to detect rare variants with higher sensitivity. Many studies apply these two approaches to the Chromium (10x Genomics) scRNA-seq and scATAC-seq methodologies. However, Chromium-based libraries may offer additional challenges to variant calling compared with existing single-cell methodologies, raising questions regarding the validity of variants obtained from such a workflow. To determine the merits and challenges of various variant-calling approaches on Chromium scRNA-seq and scATAC-seq libraries, we use sample libraries with matched bulk whole-genome sequencing to evaluate the performance of callers. We review caller performance, finding that bulk callers applied on pooled reads significantly outperform individual-cell approaches. We also evaluate variants unique to scRNA-seq and scATAC-seq methodologies, finding patterns of noise but also potential capture of RNA-editing events. Finally, we review the notion that variant calling at the single-cell level can detect rare somatic variants, providing empirical results that suggest resolving such variants is infeasible in single-cell Chromium libraries.
Collapse
Affiliation(s)
- Matthew Wiens
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia V6T 2B9, Canada
| | - Hossein Farahani
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia V6T 2B9, Canada
| | - R Wilder Scott
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia V6T 2B9, Canada
| | - T Michael Underhill
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia V6T 2B9, Canada
- Department of Cellular & Physiological Sciences, University of British Columbia, Vancouver, British Columbia V6T 2A1, Canada
| | - Ali Bashashati
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia V6T 2B9, Canada;
- Department of Pathology & Laboratory Medicine, University of British Columbia, Vancouver, British Columbia V6T 1Z7, Canada
| |
Collapse
|
8
|
Flynn A, Waszak SM, Weischenfeldt J. Somatic CpG hypermutation is associated with mismatch repair deficiency in cancer. Mol Syst Biol 2024; 20:1006-1024. [PMID: 39026103 PMCID: PMC11369196 DOI: 10.1038/s44320-024-00054-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Revised: 06/17/2024] [Accepted: 06/28/2024] [Indexed: 07/20/2024] Open
Abstract
Somatic hypermutation in cancer has gained momentum with the increased use of tumour mutation burden as a biomarker for immune checkpoint inhibitors. Spontaneous deamination of 5-methylcytosine to thymine at CpG dinucleotides is one of the most ubiquitous endogenous mutational processes in normal and cancer cells. Here, we performed a systematic investigation of somatic CpG hypermutation at a pan-cancer level. We studied 30,191 cancer patients and 103 cancer types and developed an algorithm to identify somatic CpG hypermutation. Across cancer types, we observed the highest prevalence in paediatric leukaemia (3.5%), paediatric high-grade glioma (1.7%), and colorectal cancer (1%). We discovered germline variants and somatic mutations in the mismatch repair complex MutSα (MSH2-MSH6) as genetic drivers of somatic CpG hypermutation in cancer, which frequently converged on CpG sites and TP53 driver mutations. We further observe an association between somatic CpG hypermutation and response to immune checkpoint inhibitors. Overall, our study identified novel cancer types that display somatic CpG hypermutation, strong association with MutSα-deficiency, and potential utility in cancer immunotherapy.
Collapse
Affiliation(s)
- Aidan Flynn
- Biotech Research & Innovation Centre (BRIC), University of Copenhagen, Copenhagen, Denmark
- The Finsen Laboratory, Copenhagen University Hospital - Rigshospitalet, Copenhagen, Denmark
- Department of Clinical Pathology and Centre for Cancer Research, University of Melbourne, Parkville, VIC, Australia
| | - Sebastian M Waszak
- Swiss Institute for Experimental Cancer Research (ISREC), School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland.
- Centre for Molecular Medicine Norway, Nordic EMBL Partnership, University of Oslo and Oslo University Hospital, Oslo, Norway.
- Department of Neurology, University of California, San Francisco, San Francisco, CA, USA.
| | - Joachim Weischenfeldt
- Biotech Research & Innovation Centre (BRIC), University of Copenhagen, Copenhagen, Denmark.
- The Finsen Laboratory, Copenhagen University Hospital - Rigshospitalet, Copenhagen, Denmark.
- The DCCC Brain Tumor Center, Danish Comprehensive Cancer Center, Copenhagen, Denmark.
- Department of Urology, Charité University Hospital, Berlin, Germany.
| |
Collapse
|
9
|
Battuello P, Corti G, Bartolini A, Lorenzato A, Sogari A, Russo M, Di Nicolantonio F, Bardelli A, Crisafulli G. Mutational signatures of colorectal cancers according to distinct computational workflows. Brief Bioinform 2024; 25:bbae249. [PMID: 38783705 PMCID: PMC11116831 DOI: 10.1093/bib/bbae249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 04/15/2024] [Accepted: 05/13/2024] [Indexed: 05/25/2024] Open
Abstract
Tumor mutational signatures have gained prominence in cancer research, yet the lack of standardized methods hinders reproducibility and robustness. Leveraging colorectal cancer (CRC) as a model, we explored the influence of computational parameters on mutational signature analyses across 230 CRC cell lines and 152 CRC patients. Results were validated in three independent datasets: 483 endometrial cancer patients stratified by mismatch repair (MMR) status, 35 lung cancer patients by smoking status and 12 patient-derived organoids (PDOs) annotated for colibactin exposure. Assessing various bioinformatic tools, reference datasets and input data sizes including whole genome sequencing, whole exome sequencing and a pan-cancer gene panel, we demonstrated significant variability in the results. We report that the use of distinct algorithms and references led to statistically different results, highlighting how arbitrary choices may induce variability in the mutational signature contributions. Furthermore, we found a differential contribution of mutational signatures between coding and intergenic regions and defined the minimum number of somatic variants required for reliable mutational signature assignment. To facilitate the identification of the most suitable workflows, we developed Comparative Mutational Signature analysis on Coding and Extragenic Regions (CoMSCER), a bioinformatic tool which allows researchers to easily perform comparative mutational signature analysis by coupling the results from several tools and public reference datasets and to assess mutational signature contributions in coding and non-coding genomic regions. In conclusion, our study provides a comparative framework to elucidate the impact of distinct computational workflows on mutational signatures.
Collapse
Affiliation(s)
- Paolo Battuello
- Department of Oncology, Molecular Biotechnology Center, University of Turin, Piazza Nizza 44, 10126, Turin, Italy
- Genomics of Cancer and Targeted Therapies Unit, IFOM ETS, The AIRC Institute of Molecular Oncology, Via Adamello 16, 20139, Milan, Italy
| | - Giorgio Corti
- Department of Oncology, Molecular Biotechnology Center, University of Turin, Piazza Nizza 44, 10126, Turin, Italy
- Candiolo Cancer Institute, FPO - IRCCS, Strada Provinciale 142 - km 3.95, 10060, Candiolo, Turin, Italy
| | - Alice Bartolini
- Candiolo Cancer Institute, FPO - IRCCS, Strada Provinciale 142 - km 3.95, 10060, Candiolo, Turin, Italy
| | - Annalisa Lorenzato
- Department of Oncology, Molecular Biotechnology Center, University of Turin, Piazza Nizza 44, 10126, Turin, Italy
| | - Alberto Sogari
- Department of Oncology, Molecular Biotechnology Center, University of Turin, Piazza Nizza 44, 10126, Turin, Italy
- Genomics of Cancer and Targeted Therapies Unit, IFOM ETS, The AIRC Institute of Molecular Oncology, Via Adamello 16, 20139, Milan, Italy
| | - Mariangela Russo
- Department of Oncology, Molecular Biotechnology Center, University of Turin, Piazza Nizza 44, 10126, Turin, Italy
- Genomics of Cancer and Targeted Therapies Unit, IFOM ETS, The AIRC Institute of Molecular Oncology, Via Adamello 16, 20139, Milan, Italy
| | - Federica Di Nicolantonio
- Department of Oncology, Molecular Biotechnology Center, University of Turin, Piazza Nizza 44, 10126, Turin, Italy
- Candiolo Cancer Institute, FPO - IRCCS, Strada Provinciale 142 - km 3.95, 10060, Candiolo, Turin, Italy
| | - Alberto Bardelli
- Department of Oncology, Molecular Biotechnology Center, University of Turin, Piazza Nizza 44, 10126, Turin, Italy
- Genomics of Cancer and Targeted Therapies Unit, IFOM ETS, The AIRC Institute of Molecular Oncology, Via Adamello 16, 20139, Milan, Italy
| | - Giovanni Crisafulli
- Genomics of Cancer and Targeted Therapies Unit, IFOM ETS, The AIRC Institute of Molecular Oncology, Via Adamello 16, 20139, Milan, Italy
| |
Collapse
|
10
|
Jin H, Gulhan DC, Geiger B, Ben-Isvy D, Geng D, Ljungström V, Park PJ. Accurate and sensitive mutational signature analysis with MuSiCal. Nat Genet 2024; 56:541-552. [PMID: 38361034 PMCID: PMC10937379 DOI: 10.1038/s41588-024-01659-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2022] [Accepted: 01/08/2024] [Indexed: 02/17/2024]
Abstract
Mutational signature analysis is a recent computational approach for interpreting somatic mutations in the genome. Its application to cancer data has enhanced our understanding of mutational forces driving tumorigenesis and demonstrated its potential to inform prognosis and treatment decisions. However, methodological challenges remain for discovering new signatures and assigning proper weights to existing signatures, thereby hindering broader clinical applications. Here we present Mutational Signature Calculator (MuSiCal), a rigorous analytical framework with algorithms that solve major problems in the standard workflow. Our simulation studies demonstrate that MuSiCal outperforms state-of-the-art algorithms for both signature discovery and assignment. By reanalyzing more than 2,700 cancer genomes, we provide an improved catalog of signatures and their assignments, discover nine indel signatures absent in the current catalog, resolve long-standing issues with the ambiguous 'flat' signatures and give insights into signatures with unknown etiologies. We expect MuSiCal and the improved catalog to be a step towards establishing best practices for mutational signature analysis.
Collapse
Affiliation(s)
- Hu Jin
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Doga C Gulhan
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Benedikt Geiger
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Daniel Ben-Isvy
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - David Geng
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Viktor Ljungström
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Peter J Park
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
11
|
Kaufmann TL, Schwarz RF. Improved identification of cancer mutational processes. Nat Genet 2024; 56:365-366. [PMID: 38454020 DOI: 10.1038/s41588-024-01679-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/09/2024]
Affiliation(s)
- Tom L Kaufmann
- BIFOLD - Berlin Institute for the Foundations of Learning and Data, Berlin, Germany.
- Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Berlin, Germany.
- Institute for Computational Cancer Biology (ICCB), Center for Integrated Oncology (CIO), Cancer Research Center Cologne Essen (CCCE), Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany.
| | - Roland F Schwarz
- BIFOLD - Berlin Institute for the Foundations of Learning and Data, Berlin, Germany.
- Institute for Computational Cancer Biology (ICCB), Center for Integrated Oncology (CIO), Cancer Research Center Cologne Essen (CCCE), Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany.
| |
Collapse
|
12
|
Edsjö A, Holmquist L, Geoerger B, Nowak F, Gomon G, Alix-Panabières C, Ploeger C, Lassen U, Le Tourneau C, Lehtiö J, Ott PA, von Deimling A, Fröhling S, Voest E, Klauschen F, Dienstmann R, Alshibany A, Siu LL, Stenzinger A. Precision cancer medicine: Concepts, current practice, and future developments. J Intern Med 2023; 294:455-481. [PMID: 37641393 DOI: 10.1111/joim.13709] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]
Abstract
Precision cancer medicine is a multidisciplinary team effort that requires involvement and commitment of many stakeholders including the society at large. Building on the success of significant advances in precision therapy for oncological patients over the last two decades, future developments will be significantly shaped by improvements in scalable molecular diagnostics in which increasingly complex multilayered datasets require transformation into clinically useful information guiding patient management at fast turnaround times. Adaptive profiling strategies involving tissue- and liquid-based testing that account for the immense plasticity of cancer during the patient's journey and also include early detection approaches are already finding their way into clinical routine and will become paramount. A second major driver is the development of smart clinical trials and trial concepts which, complemented by real-world evidence, rapidly broaden the spectrum of therapeutic options. Tight coordination with regulatory agencies and health technology assessment bodies is crucial in this context. Multicentric networks operating nationally and internationally are key in implementing precision oncology in clinical practice and support developing and improving the ecosystem and framework needed to turn invocation into benefits for patients. The review provides an overview of the diagnostic tools, innovative clinical studies, and collaborative efforts needed to realize precision cancer medicine.
Collapse
Affiliation(s)
- Anders Edsjö
- Department of Clinical Genetics, Pathology and Molecular Diagnostics, Office for Medical Services, Region Skåne, Lund, Sweden
- Division of Pathology, Department of Clinical Sciences, Lund University, Lund, Sweden
- Genomic Medicine Sweden (GMS), Kristianstad, Sweden
| | - Louise Holmquist
- Department of Clinical Genetics, Pathology and Molecular Diagnostics, Office for Medical Services, Region Skåne, Lund, Sweden
- Genomic Medicine Sweden (GMS), Kristianstad, Sweden
| | - Birgit Geoerger
- Department of Pediatric and Adolescent Oncology, Gustave Roussy Cancer Campus, Université Paris-Saclay, Villejuif, France
- INSERM U1015, Gustave Roussy Cancer Campus, Université Paris-Saclay, Villejuif, France
| | | | - Georgy Gomon
- Department of Molecular Oncology and Immunology, The Netherlands Cancer Institute, Antoni van Leeuwenhoek Hospital, Amsterdam, The Netherlands
- Department of Medical Oncology, Leiden University Medical Centre, Leiden, The Netherlands
| | - Catherine Alix-Panabières
- Laboratory of Rare Human Circulating Cells, University Medical Center of Montpellier, Montpellier, France
- CREEC, MIVEGEC, University of Montpellier, Montpellier, France
| | - Carolin Ploeger
- Institute of Pathology, University Hospital Heidelberg, Heidelberg, Germany
- Centers for Personalized Medicine (ZPM), Heidelberg, Germany
| | - Ulrik Lassen
- Department of Oncology, Copenhagen University Hospital, Copenhagen, Denmark
| | - Christophe Le Tourneau
- Department of Drug Development and Innovation (D3i), Institut Curie, Paris, France
- INSERM U900 Research Unit, Saint-Cloud, France
- Faculty of Medicine, Paris-Saclay University, Paris, France
| | - Janne Lehtiö
- Department of Oncology Pathology, Karolinska Institutet, Science for Life Laboratory, Stockholm, Sweden
| | - Patrick A Ott
- Department of Medical Oncology, Dana-Farber Cancer Institute and Harvard Medical School, Boston, Massachusetts, USA
| | - Andreas von Deimling
- Clinical Cooperation Unit Neuropathology, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Department of Neuropathology, Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany
| | - Stefan Fröhling
- Division of Translational Medical Oncology, National Center for Tumor Diseases (NCT) Heidelberg and German Cancer Research Center (DKFZ), Heidelberg, Germany
- German Cancer Consortium (DKTK), Heidelberg, Germany
| | - Emile Voest
- Department of Molecular Oncology and Immunology, The Netherlands Cancer Institute, Antoni van Leeuwenhoek Hospital, Amsterdam, The Netherlands
- Oncode Institute, The Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Frederick Klauschen
- Institute of Pathology, Charite - Universitätsmedizin Berlin, Berlin, Germany
- German Cancer Consortium (DKTK), Partner Site Berlin, and German Cancer Research Center (DKFZ), Heidelberg, Germany
- BIFOLD - Berlin Institute for the Foundations of Learning and Data, Berlin, Germany
- Institute of Pathology, Ludwig-Maximilians-University, Munich, Germany
- German Cancer Consortium (DKTK), German Cancer Research Center (DKFZ), Munich Partner Site, Heidelberg, Germany
| | | | | | - Lillian L Siu
- Princess Margaret Cancer Centre, Toronto, Ontario, Canada
| | - Albrecht Stenzinger
- Institute of Pathology, University Hospital Heidelberg, Heidelberg, Germany
- Centers for Personalized Medicine (ZPM), Heidelberg, Germany
| |
Collapse
|
13
|
Wu AJ, Perera A, Kularatnarajah L, Korsakova A, Pitt JJ. Mutational signature assignment heterogeneity is widespread and can be addressed by ensemble approaches. Brief Bioinform 2023; 24:bbad331. [PMID: 37742051 PMCID: PMC10518036 DOI: 10.1093/bib/bbad331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2023] [Revised: 08/03/2023] [Accepted: 08/27/2023] [Indexed: 09/25/2023] Open
Abstract
Single-base substitution (SBS) mutational signatures have become standard practice in cancer genomics. In lieu of de novo signature extraction, reference signature assignment allows users to estimate the activities of pre-established SBS signatures within individual malignancies. Several tools have been developed for this purpose, each with differing methodologies. However, due to a lack of standardization, there may be inter-tool variability in signature assignment. We deeply characterized three assignment strategies and five SBS signature assignment tools. We observed that assignment strategy choice can significantly influence results and interpretations. Despite varying recommendations by tools, Refit performed best by reducing overfitting and maximizing reconstruction of the original mutational spectra. Even after uniform application of Refit, tools varied remarkably in signature assignments both qualitatively (Jaccard index = 0.38-0.83) and quantitatively (Kendall tau-b = 0.18-0.76). This phenomenon was exacerbated for 'flat' signatures such as the homologous recombination deficiency signature SBS3. An ensemble approach (EnsembleFit), which leverages output from all five tools, increased SBS3 assignment accuracy in BRCA1/2-deficient breast carcinomas. After generating synthetic mutational profiles for thousands of pan-cancer tumors, EnsembleFit reduced signature activity assignment error 15.9-24.7% on average using Catalogue of Somatic Mutations In Cancer and non-standard reference signature sets. We have also released the EnsembleFit web portal (https://www.ensemblefit.pittlabgenomics.com) for users to generate or download ensemble-based SBS signature assignments using any strategy and combination of tools. Overall, we show that signature assignment heterogeneity across tools and strategies is non-negligible and propose a viable, ensemble solution.
Collapse
Affiliation(s)
- Andy J Wu
- Cancer Science Institute of Singapore, National University of Singapore, Singapore, Singapore
- School of Medicine, National University of Singapore, Singapore, Singapore
| | - Akila Perera
- Cancer Science Institute of Singapore, National University of Singapore, Singapore, Singapore
- School of Computing, National University of Singapore, Singapore, Singapore
| | | | - Anna Korsakova
- Cancer Science Institute of Singapore, National University of Singapore, Singapore, Singapore
| | - Jason J Pitt
- Cancer Science Institute of Singapore, National University of Singapore, Singapore, Singapore
- NUS Centre for Cancer Research, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Genome Institute of Singapore, Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| |
Collapse
|
14
|
Pelizzola M, Laursen R, Hobolth A. Model selection and robust inference of mutational signatures using Negative Binomial non-negative matrix factorization. BMC Bioinformatics 2023; 24:187. [PMID: 37158829 PMCID: PMC10165836 DOI: 10.1186/s12859-023-05304-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Accepted: 04/25/2023] [Indexed: 05/10/2023] Open
Abstract
BACKGROUND The spectrum of mutations in a collection of cancer genomes can be described by a mixture of a few mutational signatures. The mutational signatures can be found using non-negative matrix factorization (NMF). To extract the mutational signatures we have to assume a distribution for the observed mutational counts and a number of mutational signatures. In most applications, the mutational counts are assumed to be Poisson distributed, and the rank is chosen by comparing the fit of several models with the same underlying distribution and different values for the rank using classical model selection procedures. However, the counts are often overdispersed, and thus the Negative Binomial distribution is more appropriate. RESULTS We propose a Negative Binomial NMF with a patient specific dispersion parameter to capture the variation across patients and derive the corresponding update rules for parameter estimation. We also introduce a novel model selection procedure inspired by cross-validation to determine the number of signatures. Using simulations, we study the influence of the distributional assumption on our method together with other classical model selection procedures. We also present a simulation study with a method comparison where we show that state-of-the-art methods are highly overestimating the number of signatures when overdispersion is present. We apply our proposed analysis on a wide range of simulated data and on two real data sets from breast and prostate cancer patients. On the real data we describe a residual analysis to investigate and validate the model choice. CONCLUSIONS With our results on simulated and real data we show that our model selection procedure is more robust at determining the correct number of signatures under model misspecification. We also show that our model selection procedure is more accurate than the available methods in the literature for finding the true number of signatures. Lastly, the residual analysis clearly emphasizes the overdispersion in the mutational count data. The code for our model selection procedure and Negative Binomial NMF is available in the R package SigMoS and can be found at https://github.com/MartaPelizzola/SigMoS .
Collapse
Affiliation(s)
- Marta Pelizzola
- Department of Mathematics, Aarhus University, Aarhus, Denmark.
| | | | - Asger Hobolth
- Department of Mathematics, Aarhus University, Aarhus, Denmark
| |
Collapse
|
15
|
Bae JH, Liu R, Roberts E, Nguyen E, Tabrizi S, Rhoades J, Blewett T, Xiong K, Gydush G, Shea D, An Z, Patel S, Cheng J, Sridhar S, Liu MH, Lassen E, Skytte AB, Grońska-Pęski M, Shoag JE, Evrony GD, Parsons HA, Mayer EL, Makrigiorgos GM, Golub TR, Adalsteinsson VA. Single duplex DNA sequencing with CODEC detects mutations with high sensitivity. Nat Genet 2023; 55:871-879. [PMID: 37106072 PMCID: PMC10181940 DOI: 10.1038/s41588-023-01376-0] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Accepted: 03/21/2023] [Indexed: 04/29/2023]
Abstract
Detecting mutations from single DNA molecules is crucial in many fields but challenging. Next-generation sequencing (NGS) affords tremendous throughput but cannot directly sequence double-stranded DNA molecules ('single duplexes') to discern the true mutations on both strands. Here we present Concatenating Original Duplex for Error Correction (CODEC), which confers single duplex resolution to NGS. CODEC affords 1,000-fold higher accuracy than NGS, using up to 100-fold fewer reads than duplex sequencing. CODEC revealed mutation frequencies of 2.72 × 10-8 in sperm of a 39-year-old individual, and somatic mutations acquired with age in blood cells. CODEC detected genome-wide, clonal hematopoiesis mutations from single DNA molecules, single mutated duplexes from tumor genomes and liquid biopsies, microsatellite instability with 10-fold greater sensitivity and mutational signatures, and specific tumor mutations with up to 100-fold fewer reads. CODEC enables more precise genetic testing and reveals biologically significant mutations, which are commonly obscured by NGS errors.
Collapse
Affiliation(s)
- Jin H Bae
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Ruolin Liu
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Erica Nguyen
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Shervin Tabrizi
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Koch Institute for Integrative Cancer Research at MIT, Cambridge, MA, USA
- Massachusetts General Hospital, Boston, MA, USA
| | | | | | - Kan Xiong
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Douglas Shea
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Zhenyi An
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Sahil Patel
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Koch Institute for Integrative Cancer Research at MIT, Cambridge, MA, USA
- Massachusetts General Hospital, Boston, MA, USA
| | - Ju Cheng
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Mei Hong Liu
- Center for Human Genetics and Genomics, Departments of Pediatrics and Neuroscience & Physiology, New York University Grossman School of Medicine, New York City, NY, USA
| | | | | | - Marta Grońska-Pęski
- Center for Human Genetics and Genomics, Departments of Pediatrics and Neuroscience & Physiology, New York University Grossman School of Medicine, New York City, NY, USA
| | - Jonathan E Shoag
- University Hospitals Cleveland Medical Center, Case Western Reserve University School of Medicine, Case Comprehensive Cancer Center, Cleveland, OH, USA
| | - Gilad D Evrony
- Center for Human Genetics and Genomics, Departments of Pediatrics and Neuroscience & Physiology, New York University Grossman School of Medicine, New York City, NY, USA
| | | | | | | | - Todd R Golub
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Dana-Farber Cancer Institute, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | | |
Collapse
|
16
|
Mingard C, Battey JND, Takhaveev V, Blatter K, Hürlimann V, Sierro N, Ivanov NV, Sturla SJ. Dissection of Cancer Mutational Signatures with Individual Components of Cigarette Smoking. Chem Res Toxicol 2023; 36:714-723. [PMID: 36976926 PMCID: PMC10114081 DOI: 10.1021/acs.chemrestox.3c00021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/30/2023]
Abstract
Tobacco smoke delivers a complex mixture of hazardous and potentially hazardous chemicals. Some of these may induce the formation of DNA mutations, which increases the risk of various cancers that display characteristic patterns of accumulated mutations arising from the causative exposures. Tracking the contributions of individual mutagens to mutational signatures present in human cancers can help understand cancer etiology and advance disease prevention strategies. To characterize the potential contributions of individual constituents of tobacco smoke to tobacco exposure-associated mutational signatures, we first assessed the toxic potential of 13 tobacco-relevant compounds by determining their impact on the viability of a human bronchial lung epithelial cell line (BEAS-2B). Experimentally derived high-resolution mutational profiles were characterized for the seven most potent compounds by sequencing the genomes of clonally expanded mutants that arose after exposure to the individual chemicals. Analogous to the classification of mutagenic processes on the basis of signatures from human cancers, we extracted mutational signatures from the mutant clones. We confirmed the formation of previously characterized benzo[a]pyrene mutational signatures. Furthermore, we discovered three novel mutational signatures. The mutational signatures arising from benzo[a]pyrene and norharmane were similar to human lung cancer signatures attributed to tobacco smoking. However, the signatures arising from N-methyl-N'-nitro-N-nitrosoguanidine and 4-(acetoxymethyl)nitrosamino]-1-(3-pyridyl)-1-butanone were not directly related to known tobacco-linked mutational signatures from human cancers. This new data set expands the scope of the in vitro mutational signature catalog and advances understanding of how environmental agents mutate DNA.
Collapse
Affiliation(s)
- Cécile Mingard
- Department of Health Sciences and Technology, ETH Zurich, Schmelzbergstrasse 9, Zürich, CH 8092, Switzerland
| | - James N D Battey
- PMI R&D, Philip Morris Products SA, Quai Jeanrenaud 5, Neuchâtel, CH 2000, Switzerland
| | - Vakil Takhaveev
- Department of Health Sciences and Technology, ETH Zurich, Schmelzbergstrasse 9, Zürich, CH 8092, Switzerland
| | - Katharina Blatter
- Department of Health Sciences and Technology, ETH Zurich, Schmelzbergstrasse 9, Zürich, CH 8092, Switzerland
| | - Vera Hürlimann
- Department of Health Sciences and Technology, ETH Zurich, Schmelzbergstrasse 9, Zürich, CH 8092, Switzerland
| | - Nicolas Sierro
- PMI R&D, Philip Morris Products SA, Quai Jeanrenaud 5, Neuchâtel, CH 2000, Switzerland
| | - Nikolai V Ivanov
- PMI R&D, Philip Morris Products SA, Quai Jeanrenaud 5, Neuchâtel, CH 2000, Switzerland
| | - Shana J Sturla
- Department of Health Sciences and Technology, ETH Zurich, Schmelzbergstrasse 9, Zürich, CH 8092, Switzerland
| |
Collapse
|
17
|
Patterson A, Elbasir A, Tian B, Auslander N. Computational Methods Summarizing Mutational Patterns in Cancer: Promise and Limitations for Clinical Applications. Cancers (Basel) 2023; 15:1958. [PMID: 37046619 PMCID: PMC10093138 DOI: 10.3390/cancers15071958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 02/24/2023] [Accepted: 03/09/2023] [Indexed: 03/29/2023] Open
Abstract
Since the rise of next-generation sequencing technologies, the catalogue of mutations in cancer has been continuously expanding. To address the complexity of the cancer-genomic landscape and extract meaningful insights, numerous computational approaches have been developed over the last two decades. In this review, we survey the current leading computational methods to derive intricate mutational patterns in the context of clinical relevance. We begin with mutation signatures, explaining first how mutation signatures were developed and then examining the utility of studies using mutation signatures to correlate environmental effects on the cancer genome. Next, we examine current clinical research that employs mutation signatures and discuss the potential use cases and challenges of mutation signatures in clinical decision-making. We then examine computational studies developing tools to investigate complex patterns of mutations beyond the context of mutational signatures. We survey methods to identify cancer-driver genes, from single-driver studies to pathway and network analyses. In addition, we review methods inferring complex combinations of mutations for clinical tasks and using mutations integrated with multi-omics data to better predict cancer phenotypes. We examine the use of these tools for either discovery or prediction, including prediction of tumor origin, treatment outcomes, prognosis, and cancer typing. We further discuss the main limitations preventing widespread clinical integration of computational tools for the diagnosis and treatment of cancer. We end by proposing solutions to address these challenges using recent advances in machine learning.
Collapse
Affiliation(s)
- Andrew Patterson
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- The Wistar Institute, Philadelphia, PA 19104, USA
| | | | - Bin Tian
- The Wistar Institute, Philadelphia, PA 19104, USA
| | - Noam Auslander
- The Wistar Institute, Philadelphia, PA 19104, USA
- Department of Cancer Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
18
|
Liu M, Wu Y, Jiang N, Boot A, Rozen S. mSigHdp: hierarchical Dirichlet process mixture modeling for mutational signature discovery. NAR Genom Bioinform 2023; 5:lqad005. [PMID: 36694663 PMCID: PMC9869330 DOI: 10.1093/nargab/lqad005] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Accepted: 01/19/2023] [Indexed: 01/24/2023] Open
Abstract
Mutational signatures are characteristic patterns of mutations caused by endogenous or exogenous mutational processes. These signatures can be discovered by analyzing mutations in large sets of samples-usually somatic mutations in tumor samples. Most programs for discovering mutational signatures are based on non-negative matrix factorization (NMF). Alternatively, signatures can be discovered using hierarchical Dirichlet process (HDP) mixture models, an approach that has been less explored. These models assign mutations to clusters and view each cluster as being generated from the signature of a particular mutational process. Here, we describe mSigHdp, an improved approach to using HDP mixture models to discover mutational signatures. We benchmarked mSigHdp and state-of-the-art NMF-based approaches on four realistic synthetic data sets. These data sets encompassed 18 cancer types. In total, they contained 3.5 × 107 single-base-substitution mutations representing 32 signatures and 6.1 × 106 small insertion and deletion mutations representing 13 signatures. For three of the four data sets, mSigHdp had the best positive predictive value for discovering mutational signatures, and for all four data sets, it had the best true positive rate. Its CPU usage was similar to that of the NMF-based approaches. Thus, mSigHdp is an important and practical addition to the set of tools available for discovering mutational signatures.
Collapse
Affiliation(s)
| | | | - Nanhai Jiang
- Programme in Cancer & Stem Cell Biology, Duke–NUS Medical School, 169857 Singapore,Centre for Computational Biology, Duke–NUS Medical School, 169857 Singapore
| | - Arnoud Boot
- Programme in Cancer & Stem Cell Biology, Duke–NUS Medical School, 169857 Singapore,Centre for Computational Biology, Duke–NUS Medical School, 169857 Singapore
| | - Steven G Rozen
- To whom correspondence should be addressed. Tel: +65 65164945;
| |
Collapse
|
19
|
Pancotti C, Rollo C, Birolo G, Benevenuta S, Fariselli P, Sanavia T. Unravelling the instability of mutational signatures extraction via archetypal analysis. Front Genet 2023; 13:1049501. [PMID: 36685831 PMCID: PMC9846778 DOI: 10.3389/fgene.2022.1049501] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Accepted: 12/07/2022] [Indexed: 01/06/2023] Open
Abstract
The high cosine similarity between some single-base substitution mutational signatures and their characteristic flat profiles could suggest the presence of overfitting and mathematical artefacts. The newest version (v3.3) of the signature database available in the Catalogue Of Somatic Mutations In Cancer (COSMIC) provides a collection of 79 mutational signatures, which has more than doubled with respect to previous version (30 profiles available in COSMIC signatures v2), making more critical the associations between signatures and specific mutagenic processes. This study both provides a systematic assessment of the de novo extraction task through simulation scenarios based on the latest version of the COSMIC signatures and highlights, through a novel approach using archetypal analysis, which COSMIC signatures are redundant and more likely to be considered as mathematical artefacts. 29 archetypes were able to reconstruct the profile of all the COSMIC signatures with cosine similarity > 0.8. Interestingly, these archetypes tend to group similar original signatures sharing either the same aetiology or similar biological processes. We believe that these findings will be useful to encourage the development of new de novo extraction methods avoiding the redundancy of information among the signatures while preserving the biological interpretation.
Collapse
|
20
|
Pan-cancer landscape of AID-related mutations, composite mutations, and their potential role in the ICI response. NPJ Precis Oncol 2022; 6:89. [PMID: 36456685 PMCID: PMC9715662 DOI: 10.1038/s41698-022-00331-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Accepted: 11/02/2022] [Indexed: 12/03/2022] Open
Abstract
Activation-induced cytidine deaminase, AICDA or AID, is a driver of somatic hypermutation and class-switch recombination in immunoglobulins. In addition, this deaminase belonging to the APOBEC family may have off-target effects genome-wide, but its effects at pan-cancer level are not well elucidated. Here, we used different pan-cancer datasets, totaling more than 50,000 samples analyzed by whole-genome, whole-exome, or targeted sequencing. AID mutations are present at pan-cancer level with higher frequency in hematological cancers and higher presence at transcriptionally active TAD domains. AID synergizes initial hotspot mutations by a second composite mutation. AID mutational load was found to be independently associated with a favorable outcome in immune-checkpoint inhibitors (ICI) treated patients across cancers after analyzing 2000 samples. Finally, we found that AID-related neoepitopes, resulting from mutations at more frequent hotspots if compared to other mutational signatures, enhance CXCL13/CCR5 expression, immunogenicity, and T-cell exhaustion, which may increase ICI sensitivity.
Collapse
|
21
|
Jiménez‐Santos MJ, García‐Martín S, Fustero‐Torre C, Di Domenico T, Gómez‐López G, Al‐Shahrour F. Bioinformatics roadmap for therapy selection in cancer genomics. Mol Oncol 2022; 16:3881-3908. [PMID: 35811332 PMCID: PMC9627786 DOI: 10.1002/1878-0261.13286] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Revised: 06/22/2022] [Accepted: 07/08/2022] [Indexed: 12/24/2022] Open
Abstract
Tumour heterogeneity is one of the main characteristics of cancer and can be categorised into inter- or intratumour heterogeneity. This heterogeneity has been revealed as one of the key causes of treatment failure and relapse. Precision oncology is an emerging field that seeks to design tailored treatments for each cancer patient according to epidemiological, clinical and omics data. This discipline relies on bioinformatics tools designed to compute scores to prioritise available drugs, with the aim of helping clinicians in treatment selection. In this review, we describe the current approaches for therapy selection depending on which type of tumour heterogeneity is being targeted and the available next-generation sequencing data. We cover intertumour heterogeneity studies and individual treatment selection using genomics variants, expression data or multi-omics strategies. We also describe intratumour dissection through clonal inference and single-cell transcriptomics, in each case providing bioinformatics tools for tailored treatment selection. Finally, we discuss how these therapy selection workflows could be integrated into the clinical practice.
Collapse
Affiliation(s)
| | | | - Coral Fustero‐Torre
- Bioinformatics UnitSpanish National Cancer Research Centre (CNIO)MadridSpain
| | - Tomás Di Domenico
- Bioinformatics UnitSpanish National Cancer Research Centre (CNIO)MadridSpain
| | - Gonzalo Gómez‐López
- Bioinformatics UnitSpanish National Cancer Research Centre (CNIO)MadridSpain
| | - Fátima Al‐Shahrour
- Bioinformatics UnitSpanish National Cancer Research Centre (CNIO)MadridSpain
| |
Collapse
|
22
|
Abstract
The evolutionary history of hepatobiliary cancers is embedded in their genomes. By analysing their catalogue of somatic mutations and the DNA sequence context in which they occur, it is possible to infer the mechanisms underpinning tumorigenesis. These mutational signatures reflect the exogenous and endogenous origins of genetic damage as well as the capacity of hepatobiliary cells to repair and replicate DNA. Genomic analysis of thousands of patients with hepatobiliary cancers has highlighted the diversity of mutagenic processes active in these malignancies, highlighting a prominent source of the inter-cancer-type, inter-patient, intertumour and intratumoural heterogeneity that is observed clinically. However, a substantial proportion of mutational signatures detected in hepatocellular carcinoma and biliary tract cancer remain of unknown cause, emphasizing the important contribution of processes yet to be identified. Exploiting mutational signatures to retrospectively understand hepatobiliary carcinogenesis could advance preventative management of these aggressive tumours as well as potentially predict treatment response and guide the development of therapies targeting tumour evolution.
Collapse
|
23
|
Abstract
Distilling biologically meaningful information from cancer genome sequencing data requires comprehensive identification of somatic alterations using rigorous computational methods. As the amount and complexity of sequencing data have increased, so has the number of tools for analysing them. Here, we describe the main steps involved in the bioinformatic analysis of cancer genomes, review key algorithmic developments and highlight popular tools and emerging technologies. These tools include those that identify point mutations, copy number alterations, structural variations and mutational signatures in cancer genomes. We also discuss issues in experimental design, the strengths and limitations of sequencing modalities and methodological challenges for the future.
Collapse
|
24
|
Lee D, Wang D, Yang XR, Shi J, Landi MT, Zhu B. SUITOR: Selecting the number of mutational signatures through cross-validation. PLoS Comput Biol 2022; 18:e1009309. [PMID: 35377867 PMCID: PMC9009674 DOI: 10.1371/journal.pcbi.1009309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Revised: 04/14/2022] [Accepted: 03/09/2022] [Indexed: 11/19/2022] Open
Abstract
For de novo mutational signature analysis, the critical first step is to decide how many signatures should be expected in a cancer genomics study. An incorrect number could mislead downstream analyses. Here we present SUITOR (Selecting the nUmber of mutatIonal signaTures thrOugh cRoss-validation), an unsupervised cross-validation method that requires little assumptions and no numerical approximations to select the optimal number of signatures without overfitting the data. In vitro studies and in silico simulations demonstrated that SUITOR can correctly identify signatures, some of which were missed by other widely used methods. Applied to 2,540 whole-genome sequenced tumors across 22 cancer types, SUITOR selected signatures with the smallest prediction errors and almost all signatures of breast cancer selected by SUITOR were validated in an independent breast cancer study. SUITOR is a powerful tool to select the optimal number of mutational signatures, facilitating downstream analyses with etiological or therapeutic importance.
Collapse
Affiliation(s)
- Donghyuk Lee
- Department of Statistics, Pusan National University, Busan, Korea
| | - Difei Wang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Xiaohong R. Yang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Jianxin Shi
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Maria Teresa Landi
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Bin Zhu
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| |
Collapse
|
25
|
Gan SKE, Phua SX, Yeo JY. Sagacious epitope selection for vaccines, and both antibody-based therapeutics and diagnostics: tips from virology and oncology. Antib Ther 2022; 5:63-72. [PMID: 35372784 PMCID: PMC8972324 DOI: 10.1093/abt/tbac005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Revised: 01/24/2022] [Accepted: 02/12/2022] [Indexed: 11/12/2022] Open
Abstract
Abstract
The target of an antibody plays a significant role in the success of antibody-based therapeutics and diagnostics, and vaccine development. This importance is focused on the target binding site—epitope, where epitope selection as a part of design thinking beyond traditional antigen selection using whole cell or whole protein immunization can positively impact success. With purified recombinant protein production and peptide synthesis to display limited/selected epitopes, intrinsic factors that can affect the functioning of resulting antibodies can be more easily selected for. Many of these factors stem from the location of the epitope that can impact accessibility of the antibody to the epitope at a cellular or molecular level, direct inhibition of target antigen activity, conservation of function despite escape mutations, and even non-competitive inhibition sites. By incorporating novel computational methods for predicting antigen changes to model-informed drug discovery and development, superior vaccines and antibody-based therapeutics or diagnostics can be easily designed to mitigate failures. With detailed examples, this review highlights the new opportunities, factors and methods of predicting antigenic changes for consideration in sagacious epitope selection.
Collapse
Affiliation(s)
- Samuel Ken-En Gan
- Antibody & Product Development Lab, EDDC-BII, Agency for Science, Technology and Research (A*STAR), Singapore 138672, Singapore
- APD SKEG Pte Ltd, Singapore 439444, Singapore
| | - Ser-Xian Phua
- Antibody & Product Development Lab, EDDC-BII, Agency for Science, Technology and Research (A*STAR), Singapore 138672, Singapore
| | - Joshua Yi Yeo
- Antibody & Product Development Lab, EDDC-BII, Agency for Science, Technology and Research (A*STAR), Singapore 138672, Singapore
| |
Collapse
|
26
|
Pandey P, Arora S, Rosen GL. MetaMutationalSigs: comparison of mutational signature refitting results made easy. Bioinformatics 2022; 38:2344-2347. [PMID: 35157026 PMCID: PMC9004636 DOI: 10.1093/bioinformatics/btac091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 12/07/2021] [Accepted: 02/09/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION The analysis of mutational signatures is becoming increasingly common in cancer genetics, with emerging implications in cancer evolution, classification, treatment decision and prognosis. Recently, several packages have been developed for mutational signature analysis, with each using different methodology and yielding significantly different results. Because of the non-trivial differences in tools' refitting results, researchers may desire to survey and compare the available tools, in order to objectively evaluate the results for their specific research question, such as which mutational signatures are prevalent in different cancer types. RESULTS Due to the need for effective comparison of refitting mutational signatures, we introduce a user-friendly software that can aggregate and visually present results from different refitting packages. AVAILABILITY AND IMPLEMENTATION MetaMutationalSigs is implemented using R and python and is available for installation using Docker and available at: https://github.com/EESI/MetaMutationalSigs.
Collapse
Affiliation(s)
- Palash Pandey
- Ecological and Evolutionary Signal-Processing and Informatics Laboratory, Department of Electrical and Computer Engineering, College of Engineering, Drexel University, Philadelphia, PA 19104, USA,Cancer Prevention and Control Program, Fox Chase Cancer Center, Philadelphia, PA 19111, USA
| | | | | |
Collapse
|
27
|
Wu Y, Chua EHZ, Ng AWT, Boot A, Rozen SG. Accuracy of mutational signature software on correlated signatures. Sci Rep 2022; 12:390. [PMID: 35013428 PMCID: PMC8748538 DOI: 10.1038/s41598-021-04207-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Accepted: 12/17/2021] [Indexed: 11/09/2022] Open
Abstract
Mutational signatures are characteristic patterns of mutations generated by exogenous mutagens or by endogenous mutational processes. Mutational signatures are important for research into DNA damage and repair, aging, cancer biology, genetic toxicology, and epidemiology. Unsupervised learning can infer mutational signatures from the somatic mutations in large numbers of tumors, and separating correlated signatures is a notable challenge for this task. To investigate which methods can best meet this challenge, we assessed 18 computational methods for inferring mutational signatures on 20 synthetic data sets that incorporated varying degrees of correlated activity of two common mutational signatures. Performance varied widely, and four methods noticeably outperformed the others: hdp (based on hierarchical Dirichlet processes), SigProExtractor (based on multiple non-negative matrix factorizations over resampled data), TCSM (based on an approach used in document topic analysis), and mutSpec.NMF (also based on non-negative matrix factorization). The results underscored the complexities of mutational signature extraction, including the importance and difficulty of determining the correct number of signatures and the importance of hyperparameters. Our findings indicate directions for improvement of the software and show a need for care when interpreting results from any of these methods, including the need for assessing sensitivity of the results to input parameters.
Collapse
Affiliation(s)
- Yang Wu
- Programme in Cancer and Stem Cell Biology, Duke-NUS Medical School, Singapore, 169857, Singapore
- Centre for Computational Biology, Duke-NUS Medical School, Singapore, 169857, Singapore
| | - Ellora Hui Zhen Chua
- Department of Biological Sciences, National University of Singapore, Singapore, 117558, Singapore
| | - Alvin Wei Tian Ng
- Programme in Cancer and Stem Cell Biology, Duke-NUS Medical School, Singapore, 169857, Singapore
- Centre for Computational Biology, Duke-NUS Medical School, Singapore, 169857, Singapore
| | - Arnoud Boot
- Programme in Cancer and Stem Cell Biology, Duke-NUS Medical School, Singapore, 169857, Singapore
- Centre for Computational Biology, Duke-NUS Medical School, Singapore, 169857, Singapore
| | - Steven G Rozen
- Programme in Cancer and Stem Cell Biology, Duke-NUS Medical School, Singapore, 169857, Singapore.
- Centre for Computational Biology, Duke-NUS Medical School, Singapore, 169857, Singapore.
| |
Collapse
|
28
|
Wong JKL, Aichmüller C, Schulze M, Hlevnjak M, Elgaafary S, Lichter P, Zapatka M. Association of mutation signature effectuating processes with mutation hotspots in driver genes and non-coding regions. Nat Commun 2022; 13:178. [PMID: 35013316 PMCID: PMC8748499 DOI: 10.1038/s41467-021-27792-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Accepted: 12/09/2021] [Indexed: 02/06/2023] Open
Abstract
Cancer driving mutations are difficult to identify especially in the non-coding part of the genome. Here, we present sigDriver, an algorithm dedicated to call driver mutations. Using 3813 whole-genome sequenced tumors from International Cancer Genome Consortium, The Cancer Genome Atlas Program, and a childhood pan-cancer cohort, we employ mutational signatures based on single-base substitution in the context of tri- and penta-nucleotide motifs for hotspot discovery. Knowledge-based annotations on mutational hotspots reveal enrichment in coding regions and regulatory elements for 6 mutational signatures, including APOBEC and somatic hypermutation signatures. APOBEC activity is associated with 32 hotspots of which 11 are known and 11 are putative regulatory drivers. Somatic single nucleotide variants clusters detected at hypermutation-associated hotspots are distinct from translocation or gene amplifications. Patients carrying APOBEC induced PIK3CA driver mutations show lower occurrence of signature SBS39. In summary, sigDriver uncovers mutational processes associated with known and putative tumor drivers and hotspots particularly in the non-coding regions of the genome.
Collapse
Affiliation(s)
- John K L Wong
- Division of Molecular Genetics and German Cancer Consortium (DKTK), German Cancer Research Center (DKFZ), Heidelberg, Germany.
| | - Christian Aichmüller
- Division of Molecular Genetics and German Cancer Consortium (DKTK), German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Markus Schulze
- Computational Oncology Group, Molecular Precision Oncology Program, National Center for Tumor Diseases (NCT) and DKFZ, Heidelberg, Germany
| | - Mario Hlevnjak
- Computational Oncology Group, Molecular Precision Oncology Program, National Center for Tumor Diseases (NCT) and DKFZ, Heidelberg, Germany
| | - Shaymaa Elgaafary
- Gynecologic Oncology, National Center for Tumor Diseases (NCT) and University of Heidelberg, Heidelberg, Germany
- Molecular Precision Oncology Program at the National Center for Tumor Diseases (NCT) and DKFZ, Heidelberg, Germany
| | - Peter Lichter
- Division of Molecular Genetics and German Cancer Consortium (DKTK), German Cancer Research Center (DKFZ), Heidelberg, Germany
- Molecular Precision Oncology Program at the National Center for Tumor Diseases (NCT) and DKFZ, Heidelberg, Germany
| | - Marc Zapatka
- Division of Molecular Genetics and German Cancer Consortium (DKTK), German Cancer Research Center (DKFZ), Heidelberg, Germany.
| |
Collapse
|
29
|
Yaacov A, Vardi O, Blumenfeld B, Greenberg A, Massey DJ, Koren A, Adar S, Simon I, Rosenberg S. Cancer Mutational Processes Vary in Their Association with Replication Timing and Chromatin Accessibility. Cancer Res 2021; 81:6106-6116. [PMID: 34702725 DOI: 10.1158/0008-5472.can-21-2039] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 09/15/2021] [Accepted: 10/19/2021] [Indexed: 11/16/2022]
Abstract
Cancer somatic mutations are the product of multiple mutational and repair processes, both of which are tightly associated with DNA replication. Distinctive patterns of somatic mutation accumulation, termed mutational signatures, are indicative of processes sustained within tumors. However, the association of various mutational processes with replication timing (RT) remains an open question. In this study, we systematically analyzed the mutational landscape of 2,787 tumors from 32 tumor types separately for early and late replicating regions using sequence context normalization and chromatin data to account for sequence and chromatin accessibility differences. To account for sequence differences between various genomic regions, an artificial genome-based approach was developed to expand the signature analyses to doublet base substitutions and small insertions and deletions. The association of mutational processes and RT was signature specific: Some signatures were associated with early or late replication (such as SBS7b and SBS7a, respectively), and others had no association. Most associations existed even after normalizing for genome accessibility. A focused mutational signature identification approach was also developed that uses RT information to improve signature identification; this approach found that SBS16, which is biased toward early replication, is strongly associated with better survival rates in liver cancer. Overall, this novel and comprehensive approach provides a better understanding of the etiology of mutational signatures, which may lead to improved cancer prevention, diagnosis, and treatment. SIGNIFICANCE: Many mutational processes associate with early or late replication timing regions independently of chromatin accessibility, enabling development of a focused identification approach to improve mutational signature detection.
Collapse
Affiliation(s)
- Adar Yaacov
- The Gaffin Center for Neuro-Oncology, Sharett Institute for Oncology, Hebrew University-Hadassah Medical Center, Jerusalem, Israel.,The Wohl Institute for Translational Medicine, Hadassah-Hebrew University Medical Center, Jerusalem, Israel.,Department of Microbiology and Molecular Genetics, IMRIC, Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel
| | - Oriya Vardi
- Department of Microbiology and Molecular Genetics, IMRIC, Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel
| | - Britny Blumenfeld
- Department of Microbiology and Molecular Genetics, IMRIC, Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel
| | - Avraham Greenberg
- Department of Microbiology and Molecular Genetics, IMRIC, Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel
| | - Dashiell J Massey
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York
| | - Amnon Koren
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York
| | - Sheera Adar
- Department of Microbiology and Molecular Genetics, IMRIC, Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel
| | - Itamar Simon
- Department of Microbiology and Molecular Genetics, IMRIC, Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel.
| | - Shai Rosenberg
- The Gaffin Center for Neuro-Oncology, Sharett Institute for Oncology, Hebrew University-Hadassah Medical Center, Jerusalem, Israel. .,The Wohl Institute for Translational Medicine, Hadassah-Hebrew University Medical Center, Jerusalem, Israel
| |
Collapse
|
30
|
Abbasi A, Alexandrov LB. Significance and limitations of the use of next-generation sequencing technologies for detecting mutational signatures. DNA Repair (Amst) 2021; 107:103200. [PMID: 34411908 PMCID: PMC9478565 DOI: 10.1016/j.dnarep.2021.103200] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Revised: 07/30/2021] [Accepted: 08/03/2021] [Indexed: 12/13/2022]
Abstract
Next generation sequencing technologies (NGS) have been critical in characterizing the genomic landscape and untangling the genetic heterogeneity of human cancer. Since its advent, NGS has played a pivotal role in identifying the patterns of somatic mutations imprinted on cancer genomes and in deciphering the signatures of the mutational processes that have generated these patterns. Mutational signatures serve as phenotypic molecular footprints of exposures to environmental factors as well as deficiency and infidelity of DNA replication and repair pathways. Since the first roadmap of mutational signatures in human cancer was generated from whole-genome and whole-exome sequencing data, there has been a growing interest to extract mutational signatures from other NGS technologies such as targeted panel sequencing, RNA sequencing, single-cell sequencing, duplex sequencing, reduced representation sequencing, and long-read sequencing. Many of these technologies have their inherent sequencing biases and produce technical artifacts that can confound the extraction of reliable and interpretable mutational signatures. In this review, we highlight the relevance, limitations, and prospects of using different NGS technologies for examining mutational patterns and for deciphering mutational signatures.
Collapse
Affiliation(s)
- Ammal Abbasi
- Department of Cellular and Molecular Medicine, UC San Diego, La Jolla, CA, 92093, USA; Department of Bioengineering, UC San Diego, La Jolla, CA, 92093, USA; Moores Cancer Center, UC San Diego, La Jolla, CA, 92037, USA
| | - Ludmil B Alexandrov
- Department of Cellular and Molecular Medicine, UC San Diego, La Jolla, CA, 92093, USA; Department of Bioengineering, UC San Diego, La Jolla, CA, 92093, USA; Moores Cancer Center, UC San Diego, La Jolla, CA, 92037, USA.
| |
Collapse
|
31
|
Koh G, Degasperi A, Zou X, Momen S, Nik-Zainal S. Mutational signatures: emerging concepts, caveats and clinical applications. Nat Rev Cancer 2021; 21:619-637. [PMID: 34316057 DOI: 10.1038/s41568-021-00377-7] [Citation(s) in RCA: 147] [Impact Index Per Article: 36.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/08/2021] [Indexed: 02/05/2023]
Abstract
Whole-genome sequencing has brought the cancer genomics community into new territory. Thanks to the sheer power provided by the thousands of mutations present in each patient's cancer, we have been able to discern generic patterns of mutations, termed 'mutational signatures', that arise during tumorigenesis. These mutational signatures provide new insights into the causes of individual cancers, revealing both endogenous and exogenous factors that have influenced cancer development. This Review brings readers up to date in a field that is expanding in computational, experimental and clinical directions. We focus on recent conceptual advances, underscoring some of the caveats associated with using the mutational signature frameworks and highlighting the latest experimental insights. We conclude by bringing attention to areas that are likely to see advancements in clinical applications.
Collapse
Affiliation(s)
- Gene Koh
- Department of Medical Genetics, School of Clinical Medicine, University of Cambridge, Cambridge, UK
- MRC Cancer Unit, School of Clinical Medicine, University of Cambridge, Cambridge, UK
| | - Andrea Degasperi
- Department of Medical Genetics, School of Clinical Medicine, University of Cambridge, Cambridge, UK
- MRC Cancer Unit, School of Clinical Medicine, University of Cambridge, Cambridge, UK
| | - Xueqing Zou
- Department of Medical Genetics, School of Clinical Medicine, University of Cambridge, Cambridge, UK
- MRC Cancer Unit, School of Clinical Medicine, University of Cambridge, Cambridge, UK
| | - Sophie Momen
- Department of Medical Genetics, School of Clinical Medicine, University of Cambridge, Cambridge, UK
- MRC Cancer Unit, School of Clinical Medicine, University of Cambridge, Cambridge, UK
| | - Serena Nik-Zainal
- Department of Medical Genetics, School of Clinical Medicine, University of Cambridge, Cambridge, UK.
- MRC Cancer Unit, School of Clinical Medicine, University of Cambridge, Cambridge, UK.
| |
Collapse
|
32
|
Abécassis J, Reyal F, Vert JP. CloneSig can jointly infer intra-tumor heterogeneity and mutational signature activity in bulk tumor sequencing data. Nat Commun 2021; 12:5352. [PMID: 34504064 PMCID: PMC8429716 DOI: 10.1038/s41467-021-24992-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2019] [Accepted: 07/12/2021] [Indexed: 02/07/2023] Open
Abstract
Systematic DNA sequencing of cancer samples has highlighted the importance of two aspects of cancer genomics: intra-tumor heterogeneity (ITH) and mutational processes. These two aspects may not always be independent, as different mutational processes could be involved in different stages or regions of the tumor, but existing computational approaches to study them largely ignore this potential dependency. Here, we present CloneSig, a computational method to jointly infer ITH and mutational processes in a tumor from bulk-sequencing data. Extensive simulations show that CloneSig outperforms current methods for ITH inference and detection of mutational processes when the distribution of mutational signatures changes between clones. Applied to a large cohort of 8,951 tumors with whole-exome sequencing data from The Cancer Genome Atlas, and on a pan-cancer dataset of 2,632 whole-genome sequencing tumor samples from the Pan-Cancer Analysis of Whole Genomes initiative, CloneSig obtains results overall coherent with previous studies.
Collapse
Affiliation(s)
- Judith Abécassis
- Institut Curie, PSL Research University, Translational Research Department, INSERM, U932 Immunity and Cancer, Residual Tumor & Response to Treatment Laboratory (RT2Lab), Paris, France
- MINES ParisTech, PSL University, CBIO - Centre for Computational Biology, Paris, France
- Institut Curie, PSL Research University, Paris, France
| | - Fabien Reyal
- Institut Curie, PSL Research University, Translational Research Department, INSERM, U932 Immunity and Cancer, Residual Tumor & Response to Treatment Laboratory (RT2Lab), Paris, France
- Department of Surgery, Institut Curie, Paris, France
| | - Jean-Philippe Vert
- MINES ParisTech, PSL University, CBIO - Centre for Computational Biology, Paris, France.
- Google Research, Brain team, Paris, France.
| |
Collapse
|
33
|
Kim YA, Leiserson MDM, Moorjani P, Sharan R, Wojtowicz D, Przytycka TM. Mutational Signatures: From Methods to Mechanisms. Annu Rev Biomed Data Sci 2021; 4:189-206. [PMID: 34465178 DOI: 10.1146/annurev-biodatasci-122320-120920] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Mutations are the driving force of evolution, yet they underlie many diseases, in particular, cancer. They are thought to arise from a combination of stochastic errors in DNA processing, naturally occurring DNA damage (e.g., the spontaneous deamination of methylated CpG sites), replication errors, and dysregulation of DNA repair mechanisms. High-throughput sequencing has made it possible to generate large datasets to study mutational processes in health and disease. Since the emergence of the first mutational process studies in 2012, this field is gaining increasing attention and has already accumulated a host of computational approaches and biomedical applications.
Collapse
Affiliation(s)
- Yoo-Ah Kim
- National Center of Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA;
| | - Mark D M Leiserson
- Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland 20742, USA
| | - Priya Moorjani
- Department of Molecular and Cell Biology and Center for Computational Biology, University of California, Berkeley, California 94720, USA
| | - Roded Sharan
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel
| | - Damian Wojtowicz
- National Center of Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA;
| | - Teresa M Przytycka
- National Center of Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA;
| |
Collapse
|
34
|
Jeong HY, Yoo J, Kim H, Kim TM. Identification of potential candidate genes for lip and oral cavity cancer using network analysis. Genomics Inform 2021; 19:e40. [PMID: 35172473 PMCID: PMC8752981 DOI: 10.5808/gi.21047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Accepted: 09/28/2021] [Indexed: 11/21/2022] Open
Abstract
Mutation signatures represent unique sequence footprints of somatic mutations resulting from specific DNA mutagenic and repair processes. However, their causal associations and the potential utility for genome research remain largely unknown. In this study, we performed PanCancer-scale correlative analyses to identify the genomic features associated with tumor mutation burdens (TMB) and individual mutation signatures. We observed that TMB was correlated with tumor purity, ploidy, and the level of aneuploidy, as well as with the expression of cell proliferation-related genes representing genomic covariates in evaluating TMB. Correlative analyses of mutation signature levels with genes belonging to specific DNA damage-repair processes revealed that deficiencies of NHEJ1 and ALKBH3 may contribute to mutations in the settings of APOBEC cytidine deaminase activation and DNA mismatch repair deficiency, respectively. We further employed a strategy to identify feature-driven, de novo mutation signatures and demonstrated that mutation signatures can be reconstructed using known causal features. Using the strategy, we further identified tumor hypoxia-related mutation signatures similar to the APOBEC-related mutation signatures, suggesting that APOBEC activity mediates hypoxia-related mutational consequences in cancer genomes. Our study advances the mechanistic insights into the TMB and signature-based DNA mutagenic and repair processes in cancer genomes. We also propose that feature-driven mutation signature analysis can further extend the categories of cancer-relevant mutation signatures and their causal relationships.
Collapse
Affiliation(s)
- Hye Young Jeong
- Department of Medical Informatics, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea.,Cancer Research Institute, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea.,Department of Biomedicine and Health Sciences, Graduate School, The Catholic University of Korea, Seoul 06591, Korea
| | - Jinseon Yoo
- Department of Medical Informatics, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea.,Cancer Research Institute, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea.,Department of Biomedicine and Health Sciences, Graduate School, The Catholic University of Korea, Seoul 06591, Korea
| | - Hyunwoo Kim
- Department of Medical Informatics, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea.,Cancer Research Institute, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea
| | - Tae-Min Kim
- Department of Medical Informatics, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea.,Cancer Research Institute, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea.,Department of Biomedicine and Health Sciences, Graduate School, The Catholic University of Korea, Seoul 06591, Korea
| |
Collapse
|
35
|
Díaz-Gay M, Alexandrov LB. Unraveling the genomic landscape of colorectal cancer through mutational signatures. Adv Cancer Res 2021; 151:385-424. [PMID: 34148618 DOI: 10.1016/bs.acr.2021.03.003] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Colorectal cancer, along with most other cancer types, is driven by somatic mutations. Characteristic patterns of somatic mutations, known as mutational signatures, arise as a result of the activities of different mutational processes. Mutational signatures have diverse origins, including exogenous and endogenous sources. In the case of colorectal cancer, the analysis of mutational signatures has elucidated specific signatures for classically associated DNA repair deficiencies, namely mismatch repair (leading to microsatellite instability), base excision repair (due to MUTYH or NTHL1 mutations), and polymerase proofreading (due to POLE and POLD1 exonuclease domain mutations). Additional signatures also play a role in colorectal cancer, including those related to normal aging and those associated with gut microbiota, as well as a number of signatures with unknown etiologies. This chapter provides an overview of the current knowledge of mutational signatures, with a focus on colorectal cancer and on the recently reported signatures in physiologically normal and inflammatory bowel disease-affected somatic colon tissues.
Collapse
Affiliation(s)
- Marcos Díaz-Gay
- Department of Cellular and Molecular Medicine, UC San Diego, La Jolla, CA, United States; Department of Bioengineering, UC San Diego, La Jolla, CA, United States; Moores Cancer Center, UC San Diego, La Jolla, CA, United States
| | - Ludmil B Alexandrov
- Department of Cellular and Molecular Medicine, UC San Diego, La Jolla, CA, United States; Department of Bioengineering, UC San Diego, La Jolla, CA, United States; Moores Cancer Center, UC San Diego, La Jolla, CA, United States.
| |
Collapse
|
36
|
Yang Z, Pandey P, Marjoram P, Siegmund KD. iMutSig: a web application to identify the most similar mutational signature using shiny. F1000Res 2020; 9:586. [PMID: 33299548 PMCID: PMC7702159 DOI: 10.12688/f1000research.24435.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/10/2020] [Indexed: 11/20/2022] Open
Abstract
There are two frameworks for characterizing mutational signatures which are commonly used to describe the nucleotide patterns that arise from mutational processes. Estimated mutational signatures from fitting these two methods in human cancer can be found online, in the Catalogue Of Somatic Mutations In Cancer (COSMIC) website or a GitHub repository. The two frameworks make differing assumptions regarding independence of base pairs and for that reason may produce different results. Consequently, there is a need to compare and contrast the results of the two methods, but no such tool currently exists. In this paper, we provide a simple and intuitive interface that allows comparisons of pairs of mutational signatures to be easily performed. Cosine similarity measures the extent of signature similarity. To compare mutational signatures of different formats, one signature type (COSMIC or
pmsignature) is converted to the format of the other before the signatures are compared.
iMutSig provides a simple and user-friendly web application allowing researchers to download published mutational signatures of either type and to compare signatures from COSMIC to those from
pmsignature, and vice versa. Furthermore,
iMutSig allows users to input a self-defined mutational signature and examine its similarity to published signatures from both data sources.
iMutSig is accessible
online and source code is available for download from
GitHub.
Collapse
Affiliation(s)
- Zhi Yang
- Department of Preventive Medicine, Keck School of Medicine of the University of Southern California, 2001 N.Soto Street, Los Angeles, CA, 91003, USA
| | - Priyatama Pandey
- Department of Preventive Medicine, Keck School of Medicine of the University of Southern California, 2001 N.Soto Street, Los Angeles, CA, 91003, USA
| | - Paul Marjoram
- Department of Preventive Medicine, Keck School of Medicine of the University of Southern California, 2001 N.Soto Street, Los Angeles, CA, 91003, USA
| | - Kimberly D Siegmund
- Department of Preventive Medicine, Keck School of Medicine of the University of Southern California, 2001 N.Soto Street, Los Angeles, CA, 91003, USA
| |
Collapse
|
37
|
Hu X, Xu Z, De S. Characteristics of mutational signatures of unknown etiology. NAR Cancer 2020; 2:zcaa026. [PMID: 33015626 PMCID: PMC7520824 DOI: 10.1093/narcan/zcaa026] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Revised: 09/01/2020] [Accepted: 09/23/2020] [Indexed: 12/25/2022] Open
Abstract
Although not all somatic mutations are cancer drivers, their mutational signatures, i.e. the patterns of genomic alterations at a genome-wide scale, provide insights into past exposure to mutagens, DNA damage and repair processes. Computational deconvolution of somatic mutation patterns and expert curation pan-cancer studies have identified a number of mutational signatures associated with point mutations, dinucleotide substitutions, insertions and deletions, and rearrangements, and have established etiologies for a subset of these signatures. However, the mechanisms underlying nearly one-third of all mutational signatures are not yet understood. The signatures with established etiology and those with hitherto unknown origin appear to have some differences in strand bias, GC content and nucleotide context diversity. It is possible that some of the hitherto ‘unknown’ signatures predominantly occur outside gene regions. While nucleotide contexts might be adequate to establish etiologies of some mutational signatures, in other cases additional features, such as broader (epi)genomic contexts, including chromatin, replication timing, processivity and local mutational patterns, may help fully understand the underlying DNA damage and repair processes. Nonetheless, remarkable progress in characterization of mutational signatures has provided fundamental insights into the biology of cancer, informed disease etiology and opened up new opportunities for cancer prevention, risk management, and therapeutic decision making.
Collapse
Affiliation(s)
- Xiaoju Hu
- Rutgers Cancer Institute of New Jersey, New Brunswick, NJ 08901, USA
| | - Zhuxuan Xu
- Rutgers Cancer Institute of New Jersey, New Brunswick, NJ 08901, USA
| | - Subhajyoti De
- Rutgers Cancer Institute of New Jersey, New Brunswick, NJ 08901, USA
| |
Collapse
|
38
|
Beal MA, Meier MJ, LeBlanc DP, Maurice C, O'Brien JM, Yauk CL, Marchetti F. Chemically induced mutations in a MutaMouse reporter gene inform mechanisms underlying human cancer mutational signatures. Commun Biol 2020; 3:438. [PMID: 32796912 PMCID: PMC7429849 DOI: 10.1038/s42003-020-01174-y] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2019] [Accepted: 07/24/2020] [Indexed: 02/07/2023] Open
Abstract
Transgenic rodent (TGR) models use bacterial reporter genes to quantify in vivo mutagenesis. Pairing TGR assays with next-generation sequencing (NGS) enables comprehensive mutation pattern analysis to inform mutational mechanisms. We used this approach to identify 2751 independent lacZ mutations in the bone marrow of MutaMouse animals exposed to four chemical mutagens: benzo[a]pyrene, N-ethyl-N-nitrosourea, procarbazine, and triethylenemelamine. We also collected published data for 706 lacZ mutations from eight additional environmental mutagens. We report that lacZ gene sequencing generates chemical-specific mutation signatures observed in human cancers with established environmental causes. For example, the mutation signature of benzo[a]pyrene, a carcinogen present in tobacco smoke, matched the signature associated with tobacco-induced lung cancers. Our results suggest that the analysis of chemically induced mutations in the lacZ gene shortly after exposure provides an effective approach to characterize human-relevant mechanisms of carcinogenesis and propose novel environmental causes of mutation signatures observed in human cancers.
Collapse
Affiliation(s)
- Marc A Beal
- Environmental Health Science and Research Bureau, Healthy Environments and Consumer Safety Branch, Health Canada, Ottawa, Ontario, K1A 0K9, Canada
- Existing Substances Risk Assessment Bureau, Health Canada, Ottawa, ON, Canada
| | - Matthew J Meier
- Environmental Health Science and Research Bureau, Healthy Environments and Consumer Safety Branch, Health Canada, Ottawa, Ontario, K1A 0K9, Canada
| | - Danielle P LeBlanc
- Environmental Health Science and Research Bureau, Healthy Environments and Consumer Safety Branch, Health Canada, Ottawa, Ontario, K1A 0K9, Canada
| | - Clotilde Maurice
- Environmental Health Science and Research Bureau, Healthy Environments and Consumer Safety Branch, Health Canada, Ottawa, Ontario, K1A 0K9, Canada
- Existing Substances Risk Assessment Bureau, Health Canada, Ottawa, ON, Canada
| | - Jason M O'Brien
- National Wildlife Research Centre, Environment and Climate Change Canada, Ottawa, ON, K1A 0H3, Canada
| | - Carole L Yauk
- Environmental Health Science and Research Bureau, Healthy Environments and Consumer Safety Branch, Health Canada, Ottawa, Ontario, K1A 0K9, Canada
| | - Francesco Marchetti
- Environmental Health Science and Research Bureau, Healthy Environments and Consumer Safety Branch, Health Canada, Ottawa, Ontario, K1A 0K9, Canada.
| |
Collapse
|
39
|
pyCancerSig: subclassifying human cancer with comprehensive single nucleotide, structural and microsatellite mutational signature deconstruction from whole genome sequencing. BMC Bioinformatics 2020; 21:128. [PMID: 32245405 PMCID: PMC7118897 DOI: 10.1186/s12859-020-3451-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Accepted: 03/10/2020] [Indexed: 12/28/2022] Open
Abstract
Background DNA damage accumulates over the course of cancer development. The often-substantial amount of somatic mutations in cancer poses a challenge to traditional methods to characterize tumors based on driver mutations. However, advances in machine learning technology can take advantage of this substantial amount of data. Results We developed a command line interface python package, pyCancerSig, to perform sample profiling by integrating single nucleotide variation (SNV), structural variation (SV) and microsatellite instability (MSI) profiles into a unified profile. It also provides a command to decipher underlying cancer processes, employing an unsupervised learning technique, Non-negative Matrix Factorization, and a command to visualize the results. The package accepts common standard file formats (vcf, bam). The program was evaluated using a cohort of breast- and colorectal cancer from The Cancer Genome Atlas project (TCGA). The result showed that by integrating multiple mutations modes, the tool can correctly identify cases with known clear mutational signatures and can strengthen signatures in cases with unclear signal from an SNV-only profile. The software package is available at https://github.com/jessada/pyCancerSig. Conclusions pyCancerSig has demonstrated its capability in identifying known and unknown cancer processes, and at the same time, illuminates the association within and between the mutation modes.
Collapse
|
40
|
Bergstrom EN, Huang MN, Mahto U, Barnes M, Stratton MR, Rozen SG, Alexandrov LB. SigProfilerMatrixGenerator: a tool for visualizing and exploring patterns of small mutational events. BMC Genomics 2019; 20:685. [PMID: 31470794 PMCID: PMC6717374 DOI: 10.1186/s12864-019-6041-2] [Citation(s) in RCA: 164] [Impact Index Per Article: 27.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2019] [Accepted: 08/19/2019] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Cancer genomes are peppered with somatic mutations imprinted by different mutational processes. The mutational pattern of a cancer genome can be used to identify and understand the etiology of the underlying mutational processes. A plethora of prior research has focused on examining mutational signatures and mutational patterns from single base substitutions and their immediate sequencing context. We recently demonstrated that further classification of small mutational events (including substitutions, insertions, deletions, and doublet substitutions) can be used to provide a deeper understanding of the mutational processes that have molded a cancer genome. However, there has been no standard tool that allows fast, accurate, and comprehensive classification for all types of small mutational events. RESULTS Here, we present SigProfilerMatrixGenerator, a computational tool designed for optimized exploration and visualization of mutational patterns for all types of small mutational events. SigProfilerMatrixGenerator is written in Python with an R wrapper package provided for users that prefer working in an R environment. SigProfilerMatrixGenerator produces fourteen distinct matrices by considering transcriptional strand bias of individual events and by incorporating distinct classifications for single base substitutions, doublet base substitutions, and small insertions and deletions. While the tool provides a comprehensive classification of mutations, SigProfilerMatrixGenerator is also faster and more memory efficient than existing tools that generate only a single matrix. CONCLUSIONS SigProfilerMatrixGenerator provides a standardized method for classifying small mutational events that is both efficient and scalable to large datasets. In addition to extending the classification of single base substitutions, the tool is the first to provide support for classifying doublet base substitutions and small insertions and deletions. SigProfilerMatrixGenerator is freely available at https://github.com/AlexandrovLab/SigProfilerMatrixGenerator with an extensive documentation at https://osf.io/s93d5/wiki/home/ .
Collapse
Affiliation(s)
- Erik N Bergstrom
- Department of Cellular and Molecular Medicine and Department of Bioengineering and Moores Cancer Center, University of California, San Diego, La Jolla, CA, 92093, USA
| | - Mi Ni Huang
- Centre for Computational Biology and Programme in Cancer & Stem Cell Biology, Duke-NUS Medical School, 8 College Rd, Singapore, 169857, Singapore
| | - Uma Mahto
- Department of Cellular and Molecular Medicine and Department of Bioengineering and Moores Cancer Center, University of California, San Diego, La Jolla, CA, 92093, USA
| | - Mark Barnes
- Department of Cellular and Molecular Medicine and Department of Bioengineering and Moores Cancer Center, University of California, San Diego, La Jolla, CA, 92093, USA
| | - Michael R Stratton
- Cancer, Ageing and Somatic Mutation, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, UK
| | - Steven G Rozen
- Centre for Computational Biology and Programme in Cancer & Stem Cell Biology, Duke-NUS Medical School, 8 College Rd, Singapore, 169857, Singapore
| | - Ludmil B Alexandrov
- Department of Cellular and Molecular Medicine and Department of Bioengineering and Moores Cancer Center, University of California, San Diego, La Jolla, CA, 92093, USA.
| |
Collapse
|