1
|
Morrill Gavarró L, Couturier DL, Markowetz F. A Dirichlet-multinomial mixed model for determining differential abundance of mutational signatures. BMC Bioinformatics 2025; 26:59. [PMID: 39966709 PMCID: PMC11837616 DOI: 10.1186/s12859-025-06055-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2024] [Accepted: 01/16/2025] [Indexed: 02/20/2025] Open
Abstract
BACKGROUND Mutational processes of diverse origin leave their imprints in the genome during tumour evolution. These imprints are called mutational signatures and they have been characterised for point mutations, structural variants and copy number changes. Each signature has an exposure, or abundance, per sample, which indicates how much a process has contributed to the overall genomic change. Mutational processes are not static, and a better understanding of their dynamics is key to characterise tumour evolution and identify cancer cell vulnerabilities that can be exploited during treatment. However, the structure of the data typically collected in this context makes it difficult to test whether signature exposures differ between conditions or time-points when comparing groups of samples. In general, the data consists of multivariate count mutational data (e.g. signature exposures) with two observations per patient, each reflecting a group. RESULTS We propose a mixed-effects Dirichlet-multinomial model: within-patient correlations are taken into account with random effects, possible correlations between signatures by making such random effects multivariate, and a group-specific dispersion parameter can deal with particularities of the groups. Moreover, the model is flexible in its fixed-effects structure, so that the two-group comparison can be generalised to several groups, or to a regression setting. We apply our approach to characterise differences of mutational processes between clonal and subclonal mutations across 23 cancer types of the PCAWG cohort. We find ubiquitous differential abundance of clonal and subclonal signatures across cancer types, and higher dispersion of signatures in the subclonal group, indicating higher variability between patients at subclonal level, possibly due to the presence of different clones with distinct active mutational processes. CONCLUSIONS Mutational signature analysis is an expanding field and we envision our framework to be used widely to detect global changes in mutational process activity. Our methodology is available in the R package CompSign and offers an ample toolkit for the analysis and visualisation of differential abundance of compositional data such as, but not restricted to, mutational signatures.
Collapse
Affiliation(s)
- Lena Morrill Gavarró
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
- MRC Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, UK
- Human Oncology & Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, USA
| | - Dominique-Laurent Couturier
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
- Medical Research Council Biostatistics Unit, University of Cambridge, Cambridge, UK
| | - Florian Markowetz
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK.
| |
Collapse
|
2
|
Park JE, Smith MA, Van Alsten SC, Walens A, Wu D, Hoadley KA, Troester MA, Love MI. Diffsig: Associating Risk Factors with Mutational Signatures. Cancer Epidemiol Biomarkers Prev 2024; 33:721-730. [PMID: 38426904 PMCID: PMC11062813 DOI: 10.1158/1055-9965.epi-23-0728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 10/12/2023] [Accepted: 02/28/2024] [Indexed: 03/02/2024] Open
Abstract
BACKGROUND Somatic mutational signatures elucidate molecular vulnerabilities to therapy, and therefore detecting signatures and classifying tumors with respect to signatures has clinical value. However, identifying the etiology of the mutational signatures remains a statistical challenge, with both small sample sizes and high variability in classification algorithms posing barriers. As a result, few signatures have been strongly linked to particular risk factors. METHODS Here, we develop a statistical model, Diffsig, for estimating the association of one or more continuous or categorical risk factors with DNA mutational signatures. Diffsig takes into account the uncertainty associated with assigning signatures to samples as well as multiple risk factors' simultaneous effect on observed DNA mutations. RESULTS We applied Diffsig to breast cancer data to assess relationships between five established breast-relevant mutational signatures and etiologic variables, confirming known mechanisms of cancer development. In simulation, our model was capable of accurately estimating expected associations in a variety of contexts. CONCLUSIONS Diffsig allows researchers to quantify and perform inference on the associations of risk factors with mutational signatures. IMPACT We expect Diffsig to provide more robust associations of risk factors with signatures to lead to better understanding of the tumor development process and improved models of tumorigenesis.
Collapse
Affiliation(s)
- Ji-Eun Park
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Markia A. Smith
- Department of Pathology and Laboratory Medicine, School of Medicine, University of North Carolina, Chapel Hill, NC, USA
| | - Sarah C. Van Alsten
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, USA
| | - Andrea Walens
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, USA
| | - Di Wu
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Division of Oral and Craniofacial Health Sciences, Adams School of Dentistry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Katherine A. Hoadley
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Melissa A. Troester
- Department of Pathology and Laboratory Medicine, School of Medicine, University of North Carolina, Chapel Hill, NC, USA
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, USA
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC, USA
| | - Michael I. Love
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| |
Collapse
|
3
|
Park JE, Smith MA, Van Alsten SC, Walens A, Wu D, Hoadley KA, Troester MA, Love MI. Diffsig: Associating Risk Factors With Mutational Signatures. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.09.527740. [PMID: 36798154 PMCID: PMC9934616 DOI: 10.1101/2023.02.09.527740] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/12/2023]
Abstract
Somatic mutational signatures elucidate molecular vulnerabilities to therapy and therefore detecting signatures and classifying tumors with respect to signatures has clinical value. However, identifying the etiology of the mutational signatures remains a statistical challenge, with both small sample sizes and high variability in classification algorithms posing barriers. As a result, few signatures have been strongly linked to particular risk factors. Here we present Diffsig, a model and R package for estimating the association of risk factors with mutational signatures, suggesting etiologies for the pre-defined mutational signatures. Diffsig is a Bayesian Dirichlet-multinomial hierarchical model that allows testing of any type of risk factor while taking into account the uncertainty associated with samples with a low number of observations. In simulation, we found that our method can accurately estimate risk factor-mutational signal associations. We applied Diffsig to breast cancer data to assess relationships between five established breast-relevant mutational signatures and etiologic variables, confirming known mechanisms of cancer development. Diffsig is implemented as an R package available at: https://github.com/jennprk/diffsig.
Collapse
|
4
|
Kim YA, Leiserson MDM, Moorjani P, Sharan R, Wojtowicz D, Przytycka TM. Mutational Signatures: From Methods to Mechanisms. Annu Rev Biomed Data Sci 2021; 4:189-206. [PMID: 34465178 DOI: 10.1146/annurev-biodatasci-122320-120920] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Mutations are the driving force of evolution, yet they underlie many diseases, in particular, cancer. They are thought to arise from a combination of stochastic errors in DNA processing, naturally occurring DNA damage (e.g., the spontaneous deamination of methylated CpG sites), replication errors, and dysregulation of DNA repair mechanisms. High-throughput sequencing has made it possible to generate large datasets to study mutational processes in health and disease. Since the emergence of the first mutational process studies in 2012, this field is gaining increasing attention and has already accumulated a host of computational approaches and biomedical applications.
Collapse
Affiliation(s)
- Yoo-Ah Kim
- National Center of Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA;
| | - Mark D M Leiserson
- Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland 20742, USA
| | - Priya Moorjani
- Department of Molecular and Cell Biology and Center for Computational Biology, University of California, Berkeley, California 94720, USA
| | - Roded Sharan
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel
| | - Damian Wojtowicz
- National Center of Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA;
| | - Teresa M Przytycka
- National Center of Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA;
| |
Collapse
|
5
|
Yang Z, Pandey P, Marjoram P, Siegmund KD. iMutSig: a web application to identify the most similar mutational signature using shiny. F1000Res 2020; 9:586. [PMID: 33299548 PMCID: PMC7702159 DOI: 10.12688/f1000research.24435.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/10/2020] [Indexed: 11/20/2022] Open
Abstract
There are two frameworks for characterizing mutational signatures which are commonly used to describe the nucleotide patterns that arise from mutational processes. Estimated mutational signatures from fitting these two methods in human cancer can be found online, in the Catalogue Of Somatic Mutations In Cancer (COSMIC) website or a GitHub repository. The two frameworks make differing assumptions regarding independence of base pairs and for that reason may produce different results. Consequently, there is a need to compare and contrast the results of the two methods, but no such tool currently exists. In this paper, we provide a simple and intuitive interface that allows comparisons of pairs of mutational signatures to be easily performed. Cosine similarity measures the extent of signature similarity. To compare mutational signatures of different formats, one signature type (COSMIC or
pmsignature) is converted to the format of the other before the signatures are compared.
iMutSig provides a simple and user-friendly web application allowing researchers to download published mutational signatures of either type and to compare signatures from COSMIC to those from
pmsignature, and vice versa. Furthermore,
iMutSig allows users to input a self-defined mutational signature and examine its similarity to published signatures from both data sources.
iMutSig is accessible
online and source code is available for download from
GitHub.
Collapse
Affiliation(s)
- Zhi Yang
- Department of Preventive Medicine, Keck School of Medicine of the University of Southern California, 2001 N.Soto Street, Los Angeles, CA, 91003, USA
| | - Priyatama Pandey
- Department of Preventive Medicine, Keck School of Medicine of the University of Southern California, 2001 N.Soto Street, Los Angeles, CA, 91003, USA
| | - Paul Marjoram
- Department of Preventive Medicine, Keck School of Medicine of the University of Southern California, 2001 N.Soto Street, Los Angeles, CA, 91003, USA
| | - Kimberly D Siegmund
- Department of Preventive Medicine, Keck School of Medicine of the University of Southern California, 2001 N.Soto Street, Los Angeles, CA, 91003, USA
| |
Collapse
|
6
|
Pandey P, Yang Z, Shibata D, Marjoram P, Siegmund KD. Mutational signatures in colon cancer. BMC Res Notes 2019; 12:788. [PMID: 31796096 PMCID: PMC6889194 DOI: 10.1186/s13104-019-4820-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Accepted: 11/21/2019] [Indexed: 11/30/2022] Open
Abstract
OBJECTIVE Recently, many tumor sequencing studies have inferred and reported on mutational signatures, short nucleotide patterns at which particular somatic base substitutions appear more often. A number of signatures reflect biological processes in the patient and factors associated with cancer risk. Our goal is to infer mutational signatures appearing in colon cancer, a cancer for which environmental risk factors vary by cancer subtype, and compare the signatures to those in adult stem cells from normal colon. We also compare the mutational signatures to others in the literature. RESULTS We apply a probabilistic mutation signature model to somatic mutations previously reported for six adult normal colon stem cells and 431 colon adenocarcinomas. We infer six mutational signatures in colon cancer, four being specific to tumors with hypermutation. Just two signatures explained the majority of mutations in the small number of normal aging colon samples. All six signatures are independently identified in a series of 295 Chinese colorectal cancers.
Collapse
Affiliation(s)
- Priyatama Pandey
- Department of Preventive Medicine, Keck School of Medicine of the University of Southern California, 2001 N. Soto Street, Los Angeles, CA 90032 USA
| | - Zhi Yang
- Department of Preventive Medicine, Keck School of Medicine of the University of Southern California, 2001 N. Soto Street, Los Angeles, CA 90032 USA
| | - Darryl Shibata
- Department of Pathology, Keck School of Medicine of the University of Southern California, 2011 Zonal Ave, Los Angeles, CA 90033 USA
| | - Paul Marjoram
- Department of Preventive Medicine, Keck School of Medicine of the University of Southern California, 2001 N. Soto Street, Los Angeles, CA 90032 USA
| | - Kimberly D. Siegmund
- Department of Preventive Medicine, Keck School of Medicine of the University of Southern California, 2001 N. Soto Street, Los Angeles, CA 90032 USA
| |
Collapse
|