1
|
Diagnostic utility of whole genome sequencing in adults with B-other acute lymphoblastic leukemia. Blood Adv 2023; 7:3862-3873. [PMID: 36867579 PMCID: PMC10405200 DOI: 10.1182/bloodadvances.2022008992] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Accepted: 02/12/2023] [Indexed: 03/04/2023] Open
Abstract
Genomic profiling during the diagnosis of B-cell precursor acute lymphoblastic leukemia (BCP-ALL) in adults is used to guide disease classification, risk stratification, and treatment decisions. Patients for whom diagnostic screening fails to identify disease-defining or risk-stratifying lesions are classified as having B-other ALL. We screened a cohort of 652 BCP-ALL cases enrolled in UKALL14 to identify and perform whole genome sequencing (WGS) of paired tumor-normal samples. For 52 patients with B-other, we compared the WGS findings with data from clinical and research cytogenetics. WGS identified a cancer-associated event in 51 of 52 patients, including an established subtype defining genetic alterations that were previously missed with standard-of-care (SoC) genetics in 5 of them. Of the 47 true B-other ALL, we identified a recurrent driver in 87% (41). A complex karyotype via cytogenetics emerges as a heterogeneous group, including distinct genetic alterations associated with either favorable (DUX4-r) or poor outcomes (MEF2D-r and IGK::BCL2). For a subset of 31 cases, we integrated the findings from RNA sequencing (RNA-seq) analysis to include fusion gene detection and classification based on gene expression. Compared with RNA-seq, WGS was sufficient to detect and resolve recurrent genetic subtypes; however, RNA-seq can provide orthogonal validation of findings. In conclusion, we demonstrated that WGS can identify clinically relevant genetic abnormalities missed with SoC testing as well as identify leukemia driver events in virtually all cases of B-other ALL.
Collapse
|
2
|
|
3
|
Author Correction: Genomic basis for RNA alterations in cancer. Nature 2023; 614:E37. [PMID: 36697831 PMCID: PMC9931574 DOI: 10.1038/s41586-022-05596-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
|
4
|
Variant Library Annotation Tool (VaLiAnT): an oligonucleotide library design and annotation tool for saturation genome editing and other deep mutational scanning experiments. Bioinformatics 2022; 38:892-899. [PMID: 34791067 PMCID: PMC8796380 DOI: 10.1093/bioinformatics/btab776] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Revised: 07/13/2021] [Accepted: 11/10/2021] [Indexed: 02/04/2023] Open
Abstract
MOTIVATION CRISPR/Cas9-based technology allows for the functional analysis of genetic variants at single nucleotide resolution whilst maintaining genomic context. This approach, known as saturation genome editing (SGE), a form of deep mutational scanning, systematically alters each position in a target region to explore its function. SGE experiments require the design and synthesis of oligonucleotide variant libraries which are introduced into the genome. This technology is applicable to diverse fields such as disease variant identification, drug development, structure-function studies, synthetic biology, evolutionary genetics and host-pathogen interactions. Here, we present the Variant Library Annotation Tool (VaLiAnT) which can be used to generate variant libraries from user-defined genomic coordinates and standard input files. The software can accommodate user-specified species, reference sequences and transcript annotations. RESULTS Coordinates for a genomic range are provided by the user to retrieve a corresponding oligonucleotide reference sequence. A user-specified range within this sequence is then subject to systematic, nucleotide and/or amino acid saturating mutator functions. VaLiAnT provides a novel way to retrieve, mutate and annotate genomic sequences for oligonucleotide library generation. Specific features for SGE library generation can be employed. In addition, VaLiAnT is configurable, allowing for cDNA and prime editing saturation library generation, with other diverse applications possible. AVAILABILITY AND IMPLEMENTATION VaLiAnT is a command line tool written in Python. Source code, testing data, example input and output files and executables are available (https://github.com/cancerit/VaLiAnT) in addition to a detailed user manual (https://github.com/cancerit/VaLiAnT/wiki). VaLiAnT is licensed under AGPLv3. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
5
|
Convergent somatic mutations in metabolism genes in chronic liver disease. Nature 2021; 598:473-478. [PMID: 34646017 DOI: 10.1038/s41586-021-03974-6] [Citation(s) in RCA: 69] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2020] [Accepted: 08/31/2021] [Indexed: 02/08/2023]
Abstract
The progression of chronic liver disease to hepatocellular carcinoma is caused by the acquisition of somatic mutations that affect 20-30 cancer genes1-8. Burdens of somatic mutations are higher and clonal expansions larger in chronic liver disease9-13 than in normal liver13-16, which enables positive selection to shape the genomic landscape9-13. Here we analysed somatic mutations from 1,590 genomes across 34 liver samples, including healthy controls, alcohol-related liver disease and non-alcoholic fatty liver disease. Seven of the 29 patients with liver disease had mutations in FOXO1, the major transcription factor in insulin signalling. These mutations affected a single hotspot within the gene, impairing the insulin-mediated nuclear export of FOXO1. Notably, six of the seven patients with FOXO1S22W hotspot mutations showed convergent evolution, with variants acquired independently by up to nine distinct hepatocyte clones per patient. CIDEB, which regulates lipid droplet metabolism in hepatocytes17-19, and GPAM, which produces storage triacylglycerol from free fatty acids20,21, also had a significant excess of mutations. We again observed frequent convergent evolution: up to fourteen independent clones per patient with CIDEB mutations and up to seven clones per patient with GPAM mutations. Mutations in metabolism genes were distributed across multiple anatomical segments of the liver, increased clone size and were seen in both alcohol-related liver disease and non-alcoholic fatty liver disease, but rarely in hepatocellular carcinoma. Master regulators of metabolic pathways are a frequent target of convergent somatic mutation in alcohol-related and non-alcoholic fatty liver disease.
Collapse
|
6
|
RNAmut: robust identification of somatic mutations in acute myeloid leukemia using RNA-sequencing. Haematologica 2020; 105:e290-e293. [PMID: 31649132 PMCID: PMC7271607 DOI: 10.3324/haematol.2019.230821] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
|
7
|
Abstract
Cancer is driven by genetic change, and the advent of massively parallel sequencing has enabled systematic documentation of this variation at the whole-genome scale1-3. Here we report the integrative analysis of 2,658 whole-cancer genomes and their matching normal tissues across 38 tumour types from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). We describe the generation of the PCAWG resource, facilitated by international data sharing using compute clouds. On average, cancer genomes contained 4-5 driver mutations when combining coding and non-coding genomic elements; however, in around 5% of cases no drivers were identified, suggesting that cancer driver discovery is not yet complete. Chromothripsis, in which many clustered structural variants arise in a single catastrophic event, is frequently an early event in tumour evolution; in acral melanoma, for example, these events precede most somatic point mutations and affect several cancer-associated genes simultaneously. Cancers with abnormal telomere maintenance often originate from tissues with low replicative activity and show several mechanisms of preventing telomere attrition to critical levels. Common and rare germline variants affect patterns of somatic mutation, including point mutations, structural variants and somatic retrotransposition. A collection of papers from the PCAWG Consortium describes non-coding mutations that drive cancer beyond those in the TERT promoter4; identifies new signatures of mutational processes that cause base substitutions, small insertions and deletions and structural variation5,6; analyses timings and patterns of tumour evolution7; describes the diverse transcriptional consequences of somatic mutation on splicing, expression levels, fusion genes and promoter activity8,9; and evaluates a range of more-specialized features of cancer genomes8,10-18.
Collapse
|
8
|
Characterizing Mutational Signatures in Human Cancer Cell Lines Reveals Episodic APOBEC Mutagenesis. Cell 2020; 176:1282-1294.e20. [PMID: 30849372 PMCID: PMC6424819 DOI: 10.1016/j.cell.2019.02.012] [Citation(s) in RCA: 236] [Impact Index Per Article: 59.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2017] [Revised: 09/19/2018] [Accepted: 01/27/2019] [Indexed: 12/20/2022]
Abstract
Multiple signatures of somatic mutations have been identified in cancer genomes. Exome sequences of 1,001 human cancer cell lines and 577 xenografts revealed most common mutational signatures, indicating past activity of the underlying processes, usually in appropriate cancer types. To investigate ongoing patterns of mutational-signature generation, cell lines were cultured for extended periods and subsequently DNA sequenced. Signatures of discontinued exposures, including tobacco smoke and ultraviolet light, were not generated in vitro. Signatures of normal and defective DNA repair and replication continued to be generated at roughly stable mutation rates. Signatures of APOBEC cytidine deaminase DNA-editing exhibited substantial fluctuations in mutation rate over time with episodic bursts of mutations. The initiating factors for the bursts are unclear, although retrotransposon mobilization may contribute. The examined cell lines constitute a resource of live experimental models of mutational processes, which potentially retain patterns of activity and regulation operative in primary human cancers. Annotation of mutational signatures across 1,001 cancer cell lines and 577 PDXs Activities of mutational processes determined over time in cancer cell lines APOBEC-associated mutagenesis is often ongoing and can be episodic Detection of mutational signatures by single-cell sequencing
Collapse
|
9
|
Impact of Climate Change and Land Use on Groundwater Salinization in Southern Bangladesh-Implications for Other Asian Deltas. ENVIRONMENTAL MANAGEMENT 2019; 64:640-649. [PMID: 31655864 DOI: 10.1007/s00267-019-01220-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/05/2019] [Accepted: 10/19/2019] [Indexed: 06/10/2023]
Abstract
Pervasive salinity in soil and water is affecting agricultural yield and the health of millions of delta dwellers in Asia. This is also being exacerbated by climate change through increases in sea level and tropical storm surges. One consequence of this has been a widespread introduction of salt water shrimp farming. Here, we show, using field data and modeling, how changes in climate and land use are likely to result in increased salinization of shallow groundwater in SE Asian mega-deltas. We also explore possible adaptation options. We find that possible future increase of episodic inundation events, combined with salt water shrimp farming, will cause rapid salinization of groundwater in the region making it less suitable for drinking water and irrigation. However, modified land use and water management practices can mitigate the impacts on groundwater, as well as the overlying soil, from future salinization. The study therefore provides guidance for adaptation planning to reduce future salinization in Asian deltas.
Collapse
|
10
|
Abstract
BACKGROUND Myeloproliferative neoplasms, such as polycythemia vera, essential thrombocythemia, and myelofibrosis, are chronic hematologic cancers with varied progression rates. The genomic characterization of patients with myeloproliferative neoplasms offers the potential for personalized diagnosis, risk stratification, and treatment. METHODS We sequenced coding exons from 69 myeloid cancer genes in patients with myeloproliferative neoplasms, comprehensively annotating driver mutations and copy-number changes. We developed a genomic classification for myeloproliferative neoplasms and multistage prognostic models for predicting outcomes in individual patients. Classification and prognostic models were validated in an external cohort. RESULTS A total of 2035 patients were included in the analysis. A total of 33 genes had driver mutations in at least 5 patients, with mutations in JAK2, CALR, or MPL being the sole abnormality in 45% of the patients. The numbers of driver mutations increased with age and advanced disease. Driver mutations, germline polymorphisms, and demographic variables independently predicted whether patients received a diagnosis of essential thrombocythemia as compared with polycythemia vera or a diagnosis of chronic-phase disease as compared with myelofibrosis. We defined eight genomic subgroups that showed distinct clinical phenotypes, including blood counts, risk of leukemic transformation, and event-free survival. Integrating 63 clinical and genomic variables, we created prognostic models capable of generating personally tailored predictions of clinical outcomes in patients with chronic-phase myeloproliferative neoplasms and myelofibrosis. The predicted and observed outcomes correlated well in internal cross-validation of a training cohort and in an independent external cohort. Even within individual categories of existing prognostic schemas, our models substantially improved predictive accuracy. CONCLUSIONS Comprehensive genomic characterization identified distinct genetic subgroups and provided a classification of myeloproliferative neoplasms on the basis of causal biologic mechanisms. Integration of genomic data with clinical variables enabled the personalized predictions of patients' outcomes and may support the treatment of patients with myeloproliferative neoplasms. (Funded by the Wellcome Trust and others.).
Collapse
|
11
|
Unsupervised correction of gene-independent cell responses to CRISPR-Cas9 targeting. BMC Genomics 2018; 19:604. [PMID: 30103702 PMCID: PMC6088408 DOI: 10.1186/s12864-018-4989-y] [Citation(s) in RCA: 50] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2018] [Accepted: 07/31/2018] [Indexed: 12/25/2022] Open
Abstract
BACKGROUND Genome editing by CRISPR-Cas9 technology allows large-scale screening of gene essentiality in cancer. A confounding factor when interpreting CRISPR-Cas9 screens is the high false-positive rate in detecting essential genes within copy number amplified regions of the genome. We have developed the computational tool CRISPRcleanR which is capable of identifying and correcting gene-independent responses to CRISPR-Cas9 targeting. CRISPRcleanR uses an unsupervised approach based on the segmentation of single-guide RNA fold change values across the genome, without making any assumption about the copy number status of the targeted genes. RESULTS Applying our method to existing and newly generated genome-wide essentiality profiles from 15 cancer cell lines, we demonstrate that CRISPRcleanR reduces false positives when calling essential genes, correcting biases within and outside of amplified regions, while maintaining true positive rates. Established cancer dependencies and essentiality signals of amplified cancer driver genes are detectable post-correction. CRISPRcleanR reports sgRNA fold changes and normalised read counts, is therefore compatible with downstream analysis tools, and works with multiple sgRNA libraries. CONCLUSIONS CRISPRcleanR is a versatile open-source tool for the analysis of CRISPR-Cas9 knockout screens to identify essential genes.
Collapse
|
12
|
Analysis of the genomic landscape of multiple myeloma highlights novel prognostic markers and disease subgroups. Leukemia 2018; 32:2604-2616. [PMID: 29789651 PMCID: PMC6092251 DOI: 10.1038/s41375-018-0037-9] [Citation(s) in RCA: 123] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2017] [Revised: 10/28/2017] [Accepted: 11/10/2017] [Indexed: 12/19/2022]
Abstract
In multiple myeloma, next-generation sequencing (NGS) has expanded our knowledge of genomic lesions, and highlighted a dynamic and heterogeneous composition of the tumor. Here we used NGS to characterize the genomic landscape of 418 multiple myeloma cases at diagnosis and correlate this with prognosis and classification. Translocations and copy number abnormalities (CNAs) had a preponderant contribution over gene mutations in defining the genotype and prognosis of each case. Known and novel independent prognostic markers were identified in our cohort of proteasome inhibitor and immunomodulatory drug-treated patients with long follow-up, including events with context-specific prognostic value, such as deletions of the PRDM1 gene. Taking advantage of the comprehensive genomic annotation of each case, we used innovative statistical approaches to identify potential novel myeloma subgroups. We observed clusters of patients stratified based on the overall number of mutations and number/type of CNAs, with distinct effects on survival, suggesting that extended genotype of multiple myeloma at diagnosis may lead to improved disease classification and prognostication.
Collapse
|
13
|
Timing the Landmark Events in the Evolution of Clear Cell Renal Cell Cancer: TRACERx Renal. Cell 2018; 173:611-623.e17. [PMID: 29656891 PMCID: PMC5927631 DOI: 10.1016/j.cell.2018.02.020] [Citation(s) in RCA: 324] [Impact Index Per Article: 54.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2017] [Revised: 11/10/2017] [Accepted: 02/07/2018] [Indexed: 02/07/2023]
Abstract
Clear cell renal cell carcinoma (ccRCC) is characterized by near-universal loss of the short arm of chromosome 3, deleting several tumor suppressor genes. We analyzed whole genomes from 95 biopsies across 33 patients with clear cell renal cell carcinoma. We find hotspots of point mutations in the 5' UTR of TERT, targeting a MYC-MAX-MAD1 repressor associated with telomere lengthening. The most common structural abnormality generates simultaneous 3p loss and 5q gain (36% patients), typically through chromothripsis. This event occurs in childhood or adolescence, generally as the initiating event that precedes emergence of the tumor's most recent common ancestor by years to decades. Similar genomic changes drive inherited ccRCC. Modeling differences in age incidence between inherited and sporadic cancers suggests that the number of cells with 3p loss capable of initiating sporadic tumors is no more than a few hundred. Early development of ccRCC follows well-defined evolutionary trajectories, offering opportunity for early intervention.
Collapse
|
14
|
Analysis of the genomic landscape of multiple myeloma highlights novel prognostic markers and disease subgroups. Leukemia 2017. [DOI: 10.1038/leu.2017.344] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
15
|
Abstract
Chordoma is a malignant, often incurable bone tumour showing notochordal differentiation. Here, we defined the somatic driver landscape of 104 cases of sporadic chordoma. We reveal somatic duplications of the notochordal transcription factor brachyury (T) in up to 27% of cases. These variants recapitulate the rearrangement architecture of the pathogenic germline duplications of T that underlie familial chordoma. In addition, we find potentially clinically actionable PI3K signalling mutations in 16% of cases. Intriguingly, one of the most frequently altered genes, mutated exclusively by inactivating mutation, was LYST (10%), which may represent a novel cancer gene in chordoma.Chordoma is a rare often incurable malignant bone tumour. Here, the authors investigate driver mutations of sporadic chordoma in 104 cases, revealing duplications in notochordal transcription factor brachyury (T), PI3K signalling mutations, and mutations in LYST, a potential novel cancer gene in chordoma.
Collapse
|
16
|
Appraising the relevance of DNA copy number loss and gain in prostate cancer using whole genome DNA sequence data. PLoS Genet 2017; 13:e1007001. [PMID: 28945760 PMCID: PMC5628936 DOI: 10.1371/journal.pgen.1007001] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2017] [Revised: 10/05/2017] [Accepted: 08/28/2017] [Indexed: 12/13/2022] Open
Abstract
A variety of models have been proposed to explain regions of recurrent somatic copy number alteration (SCNA) in human cancer. Our study employs Whole Genome DNA Sequence (WGS) data from tumor samples (n = 103) to comprehensively assess the role of the Knudson two hit genetic model in SCNA generation in prostate cancer. 64 recurrent regions of loss and gain were detected, of which 28 were novel, including regions of loss with more than 15% frequency at Chr4p15.2-p15.1 (15.53%), Chr6q27 (16.50%) and Chr18q12.3 (17.48%). Comprehensive mutation screens of genes, lincRNA encoding sequences, control regions and conserved domains within SCNAs demonstrated that a two-hit genetic model was supported in only a minor proportion of recurrent SCNA losses examined (15/40). We found that recurrent breakpoints and regions of inversion often occur within Knudson model SCNAs, leading to the identification of ZNF292 as a target gene for the deletion at 6q14.3-q15 and NKX3.1 as a two-hit target at 8p21.3-p21.2. The importance of alterations of lincRNA sequences was illustrated by the identification of a novel mutational hotspot at the KCCAT42, FENDRR, CAT1886 and STCAT2 loci at the 16q23.1-q24.3 loss. Our data confirm that the burden of SCNAs is predictive of biochemical recurrence, define nine individual regions that are associated with relapse, and highlight the possible importance of ion channel and G-protein coupled-receptor (GPCR) pathways in cancer development. We concluded that a two-hit genetic model accounts for about one third of SCNA indicating that mechanisms, such haploinsufficiency and epigenetic inactivation, account for the remaining SCNA losses.
Collapse
|
17
|
Recurrent mutation of IGF signalling genes and distinct patterns of genomic rearrangement in osteosarcoma. Nat Commun 2017; 8:15936. [PMID: 28643781 PMCID: PMC5490007 DOI: 10.1038/ncomms15936] [Citation(s) in RCA: 142] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2016] [Accepted: 05/15/2017] [Indexed: 02/08/2023] Open
Abstract
Osteosarcoma is a primary malignancy of bone that affects children and adults. Here, we present the largest sequencing study of osteosarcoma to date, comprising 112 childhood and adult tumours encompassing all major histological subtypes. A key finding of our study is the identification of mutations in insulin-like growth factor (IGF) signalling genes in 8/112 (7%) of cases. We validate this observation using fluorescence in situ hybridization (FISH) in an additional 87 osteosarcomas, with IGF1 receptor (IGF1R) amplification observed in 14% of tumours. These findings may inform patient selection in future trials of IGF1R inhibitors in osteosarcoma. Analysing patterns of mutation, we identify distinct rearrangement profiles including a process characterized by chromothripsis and amplification. This process operates recurrently at discrete genomic regions and generates driver mutations. It may represent an age-independent mutational mechanism that contributes to the development of osteosarcoma in children and adults alike.
Collapse
|
18
|
cgpCaVEManWrapper: Simple Execution of CaVEMan in Order to Detect Somatic Single Nucleotide Variants in NGS Data. ACTA ACUST UNITED AC 2016; 56:15.10.1-15.10.18. [PMID: 27930805 DOI: 10.1002/cpbi.20] [Citation(s) in RCA: 113] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
CaVEMan is an expectation maximization-based somatic substitution-detection algorithm that is written in C. The algorithm analyzes sequence data from a test sample, such as a tumor relative to a reference normal sample from the same patient and the reference genome. It performs a comparative analysis of the tumor and normal sample to derive a probabilistic estimate for putative somatic substitutions. When combined with a set of validated post-hoc filters, CaVEMan generates a set of somatic substitution calls with high recall and positive predictive value. Here we provide instructions for using a wrapper script called cgpCaVEManWrapper, which runs the CaVEMan algorithm and additional downstream post-hoc filters. We describe both a simple one-shot run of cgpCaVEManWrapper and a more in-depth implementation suited to large-scale compute farms. © 2016 by John Wiley & Sons, Inc.
Collapse
|
19
|
ascatNgs: Identifying Somatically Acquired Copy-Number Alterations from Whole-Genome Sequencing Data. CURRENT PROTOCOLS IN BIOINFORMATICS 2016; 56:15.9.1-15.9.17. [PMID: 27930809 PMCID: PMC6097604 DOI: 10.1002/cpbi.17] [Citation(s) in RCA: 74] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
We have developed ascatNgs to aid researchers in carrying out Allele-Specific Copy number Analysis of Tumours (ASCAT). ASCAT is capable of detecting DNA copy number changes affecting a tumor genome when comparing to a matched normal sample. Additionally, the algorithm estimates the amount of tumor DNA in the sample, known as Aberrant Cell Fraction (ACF). ASCAT itself is an R-package which requires the generation of many file types. Here, we present a suite of tools to help handle this for the user. Our code is available on our GitHub site (https://github.com/cancerit). This unit describes both 'one-shot' execution and approaches more suitable for large-scale compute farms. © 2016 by John Wiley & Sons, Inc.
Collapse
|
20
|
Mutational signatures of ionizing radiation in second malignancies. Nat Commun 2016; 7:12605. [PMID: 27615322 PMCID: PMC5027243 DOI: 10.1038/ncomms12605] [Citation(s) in RCA: 172] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2015] [Accepted: 07/13/2016] [Indexed: 02/07/2023] Open
Abstract
Ionizing radiation is a potent carcinogen, inducing cancer through DNA damage. The signatures of mutations arising in human tissues following in vivo exposure to ionizing radiation have not been documented. Here, we searched for signatures of ionizing radiation in 12 radiation-associated second malignancies of different tumour types. Two signatures of somatic mutation characterize ionizing radiation exposure irrespective of tumour type. Compared with 319 radiation-naive tumours, radiation-associated tumours carry a median extra 201 deletions genome-wide, sized 1-100 base pairs often with microhomology at the junction. Unlike deletions of radiation-naive tumours, these show no variation in density across the genome or correlation with sequence context, replication timing or chromatin structure. Furthermore, we observe a significant increase in balanced inversions in radiation-associated tumours. Both small deletions and inversions generate driver mutations. Thus, ionizing radiation generates distinctive mutational signatures that explain its carcinogenic potential.
Collapse
|
21
|
Abstract
BACKGROUND Recent studies have provided a detailed census of genes that are mutated in acute myeloid leukemia (AML). Our next challenge is to understand how this genetic diversity defines the pathophysiology of AML and informs clinical practice. METHODS We enrolled a total of 1540 patients in three prospective trials of intensive therapy. Combining driver mutations in 111 cancer genes with cytogenetic and clinical data, we defined AML genomic subgroups and their relevance to clinical outcomes. RESULTS We identified 5234 driver mutations across 76 genes or genomic regions, with 2 or more drivers identified in 86% of the patients. Patterns of co-mutation compartmentalized the cohort into 11 classes, each with distinct diagnostic features and clinical outcomes. In addition to currently defined AML subgroups, three heterogeneous genomic categories emerged: AML with mutations in genes encoding chromatin, RNA-splicing regulators, or both (in 18% of patients); AML with TP53 mutations, chromosomal aneuploidies, or both (in 13%); and, provisionally, AML with IDH2(R172) mutations (in 1%). Patients with chromatin-spliceosome and TP53-aneuploidy AML had poor outcomes, with the various class-defining mutations contributing independently and additively to the outcome. In addition to class-defining lesions, other co-occurring driver mutations also had a substantial effect on overall survival. The prognostic effects of individual mutations were often significantly altered by the presence or absence of other driver mutations. Such gene-gene interactions were especially pronounced for NPM1-mutated AML, in which patterns of co-mutation identified groups with a favorable or adverse prognosis. These predictions require validation in prospective clinical trials. CONCLUSIONS The driver landscape in AML reveals distinct molecular subgroups that reflect discrete paths in the evolution of AML, informing disease classification and prognostic stratification. (Funded by the Wellcome Trust and others; ClinicalTrials.gov number, NCT00146120.).
Collapse
|
22
|
Abstract
VAGrENT is a tool that provides biological context and effect prediction for genomic sequence variants. It annotates single base substitutions and small insertions and deletions by comparing them to reference information within or close to genes or other transcribed elements. This information provides the critical insight required to inform the biological or clinical significance of variant data generated from sequencing studies. The software has been optimized to run efficiently against the large numbers and diverse classes of variants that are typically generated from next generation sequencing technologies. This unit describes how to configure and use VAGrENT and also contains support protocols for extending and adapting its default behavior.
Collapse
|
23
|
cgpPindel: Identifying Somatically Acquired Insertion and Deletion Events from Paired End Sequencing. ACTA ACUST UNITED AC 2015; 52:15.7.1-15.7.12. [PMID: 26678382 DOI: 10.1002/0471250953.bi1507s52] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
cgpPindel is a modified version of Pindel that is optimized for detecting somatic insertions and deletions (indels) in cancer genomes and other samples compared to a reference control. Post-hoc filters remove false positive calls, resulting in a high-quality dataset for downstream analysis. This unit provides concise instructions for both a simple 'one-shot' execution of cgpPindel and a more detailed approach suitable for large-scale compute farms.
Collapse
|
24
|
A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing. Nat Commun 2015; 6:10001. [PMID: 26647970 PMCID: PMC4682041 DOI: 10.1038/ncomms10001] [Citation(s) in RCA: 205] [Impact Index Per Article: 22.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2015] [Accepted: 10/23/2015] [Indexed: 12/13/2022] Open
Abstract
As whole-genome sequencing for cancer genome analysis becomes a clinical tool, a full understanding of the variables affecting sequencing analysis output is required. Here using tumour-normal sample pairs from two different types of cancer, chronic lymphocytic leukaemia and medulloblastoma, we conduct a benchmarking exercise within the context of the International Cancer Genome Consortium. We compare sequencing methods, analysis pipelines and validation methods. We show that using PCR-free methods and increasing sequencing depth to ∼ 100 × shows benefits, as long as the tumour:control coverage ratio remains balanced. We observe widely varying mutation call rates and low concordance among analysis pipelines, reflecting the artefact-prone nature of the raw data and lack of standards for dealing with the artefacts. However, we show that, using the benchmark mutation set we have created, many issues are in fact easy to remedy and have an immediate positive impact on mutation detection accuracy.
Collapse
|
25
|
Origins and functional consequences of somatic mitochondrial DNA mutations in human cancer. eLife 2014; 3:e02935. [PMID: 25271376 PMCID: PMC4371858 DOI: 10.7554/elife.02935] [Citation(s) in RCA: 270] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2014] [Accepted: 09/26/2014] [Indexed: 01/04/2023] Open
Abstract
Recent sequencing studies have extensively explored the somatic alterations present in the nuclear genomes of cancers. Although mitochondria control energy metabolism and apoptosis, the origins and impact of cancer-associated mutations in mtDNA are unclear. In this study, we analyzed somatic alterations in mtDNA from 1675 tumors. We identified 1907 somatic substitutions, which exhibited dramatic replicative strand bias, predominantly C > T and A > G on the mitochondrial heavy strand. This strand-asymmetric signature differs from those found in nuclear cancer genomes but matches the inferred germline process shaping primate mtDNA sequence content. A number of mtDNA mutations showed considerable heterogeneity across tumor types. Missense mutations were selectively neutral and often gradually drifted towards homoplasmy over time. In contrast, mutations resulting in protein truncation undergo negative selection and were almost exclusively heteroplasmic. Our findings indicate that the endogenous mutational mechanism has far greater impact than any other external mutagens in mitochondria and is fundamentally linked to mtDNA replication.
Collapse
|
26
|
Polygenic in vivo validation of cancer mutations using transposons. Genome Biol 2014; 15:455. [PMID: 25260652 PMCID: PMC4210617 DOI: 10.1186/s13059-014-0455-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2014] [Accepted: 08/27/2014] [Indexed: 01/22/2023] Open
Abstract
The in vivo validation of cancer mutations and genes identified in cancer genomics is resource-intensive because of the low throughput of animal experiments. We describe a mouse model that allows multiple cancer mutations to be validated in each animal line. Animal lines are generated with multiple candidate cancer mutations using transposons. The candidate cancer genes are tagged and randomly expressed in somatic cells, allowing easy identification of the cancer genes involved in the generated tumours. This system presents a useful, generalised and efficient means for animal validation of cancer genes.
Collapse
|
27
|
Mobile DNA in cancer. Extensive transduction of nonrepetitive DNA mediated by L1 retrotransposition in cancer genomes. Science 2014; 345:1251343. [PMID: 25082706 PMCID: PMC4380235 DOI: 10.1126/science.1251343] [Citation(s) in RCA: 277] [Impact Index Per Article: 27.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Long interspersed nuclear element-1 (L1) retrotransposons are mobile repetitive elements that are abundant in the human genome. L1 elements propagate through RNA intermediates. In the germ line, neighboring, nonrepetitive sequences are occasionally mobilized by the L1 machinery, a process called 3' transduction. Because 3' transductions are potentially mutagenic, we explored the extent to which they occur somatically during tumorigenesis. Studying cancer genomes from 244 patients, we found that tumors from 53% of the patients had somatic retrotranspositions, of which 24% were 3' transductions. Fingerprinting of donor L1s revealed that a handful of source L1 elements in a tumor can spawn from tens to hundreds of 3' transductions, which can themselves seed further retrotranspositions. The activity of individual L1 elements fluctuated during tumor evolution and correlated with L1 promoter hypomethylation. The 3' transductions disseminated genes, exons, and regulatory elements to new locations, most often to heterochromatic regions of the genome.
Collapse
|
28
|
Processed pseudogenes acquired somatically during cancer development. Nat Commun 2014; 5:3644. [PMID: 24714652 PMCID: PMC3996531 DOI: 10.1038/ncomms4644] [Citation(s) in RCA: 69] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2014] [Accepted: 03/13/2014] [Indexed: 12/14/2022] Open
Abstract
Cancer evolves by mutation, with somatic reactivation of retrotransposons being one such mutational process. Germline retrotransposition can cause processed pseudogenes, but whether this occurs somatically has not been evaluated. Here we screen sequencing data from 660 cancer samples for somatically acquired pseudogenes. We find 42 events in 17 samples, especially non-small cell lung cancer (5/27) and colorectal cancer (2/11). Genomic features mirror those of germline LINE element retrotranspositions, with frequent target-site duplications (67%), consensus TTTTAA sites at insertion points, inverted rearrangements (21%), 5' truncation (74%) and polyA tails (88%). Transcriptional consequences include expression of pseudogenes from UTRs or introns of target genes. In addition, a somatic pseudogene that integrated into the promoter and first exon of the tumour suppressor gene, MGA, abrogated expression from that allele. Thus, formation of processed pseudogenes represents a new class of mutation occurring during cancer development, with potentially diverse functional consequences depending on genomic context.
Collapse
|
29
|
RAG-mediated recombination is the predominant driver of oncogenic rearrangement in ETV6-RUNX1 acute lymphoblastic leukemia. Nat Genet 2014; 46:116-25. [PMID: 24413735 PMCID: PMC3960636 DOI: 10.1038/ng.2874] [Citation(s) in RCA: 261] [Impact Index Per Article: 26.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2013] [Accepted: 12/13/2013] [Indexed: 12/16/2022]
Abstract
The ETV6-RUNX1 fusion gene, found in 25% of childhood acute lymphoblastic leukemia (ALL) cases, is acquired in utero but requires additional somatic mutations for overt leukemia. We used exome and low-coverage whole-genome sequencing to characterize secondary events associated with leukemic transformation. RAG-mediated deletions emerge as the dominant mutational process, characterized by recombination signal sequence motifs near breakpoints, incorporation of non-templated sequence at junctions, ∼30-fold enrichment at promoters and enhancers of genes actively transcribed in B cell development and an unexpectedly high ratio of recurrent to non-recurrent structural variants. Single-cell tracking shows that this mechanism is active throughout leukemic evolution, with evidence of localized clustering and reiterated deletions. Integration of data on point mutations and rearrangements identifies ATF7IP and MGA as two new tumor-suppressor genes in ALL. Thus, a remarkably parsimonious mutational process transforms ETV6-RUNX1-positive lymphoblasts, targeting the promoters, enhancers and first exons of genes that normally regulate B cell differentiation.
Collapse
|
30
|
Heterogeneity of genomic evolution and mutational profiles in multiple myeloma. Nat Commun 2014; 5:2997. [PMID: 24429703 PMCID: PMC3905727 DOI: 10.1038/ncomms3997] [Citation(s) in RCA: 655] [Impact Index Per Article: 65.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2013] [Accepted: 11/25/2013] [Indexed: 12/25/2022] Open
Abstract
Multiple myeloma is an incurable plasma cell malignancy with a complex and incompletely understood molecular pathogenesis. Here we use whole-exome sequencing, copy-number profiling and cytogenetics to analyse 84 myeloma samples. Most cases have a complex subclonal structure and show clusters of subclonal variants, including subclonal driver mutations. Serial sampling reveals diverse patterns of clonal evolution, including linear evolution, differential clonal response and branching evolution. Diverse processes contribute to the mutational repertoire, including kataegis and somatic hypermutation, and their relative contribution changes over time. We find heterogeneity of mutational spectrum across samples, with few recurrent genes. We identify new candidate genes, including truncations of SP140, LTB, ROBO1 and clustered missense mutations in EGR1. The myeloma genome is heterogeneous across the cohort, and exhibits diversity in clonal admixture and in dynamics of evolution, which may impact prognostic stratification, therapeutic approaches and assessment of disease response to treatment.
Collapse
|
31
|
Abstract
BACKGROUND Somatic mutations in the Janus kinase 2 gene (JAK2) occur in many myeloproliferative neoplasms, but the molecular pathogenesis of myeloproliferative neoplasms with nonmutated JAK2 is obscure, and the diagnosis of these neoplasms remains a challenge. METHODS We performed exome sequencing of samples obtained from 151 patients with myeloproliferative neoplasms. The mutation status of the gene encoding calreticulin (CALR) was assessed in an additional 1345 hematologic cancers, 1517 other cancers, and 550 controls. We established phylogenetic trees using hematopoietic colonies. We assessed calreticulin subcellular localization using immunofluorescence and flow cytometry. RESULTS Exome sequencing identified 1498 mutations in 151 patients, with medians of 6.5, 6.5, and 13.0 mutations per patient in samples of polycythemia vera, essential thrombocythemia, and myelofibrosis, respectively. Somatic CALR mutations were found in 70 to 84% of samples of myeloproliferative neoplasms with nonmutated JAK2, in 8% of myelodysplasia samples, in occasional samples of other myeloid cancers, and in none of the other cancers. A total of 148 CALR mutations were identified with 19 distinct variants. Mutations were located in exon 9 and generated a +1 base-pair frameshift, which would result in a mutant protein with a novel C-terminal. Mutant calreticulin was observed in the endoplasmic reticulum without increased cell-surface or Golgi accumulation. Patients with myeloproliferative neoplasms carrying CALR mutations presented with higher platelet counts and lower hemoglobin levels than patients with mutated JAK2. Mutation of CALR was detected in hematopoietic stem and progenitor cells. Clonal analyses showed CALR mutations in the earliest phylogenetic node, a finding consistent with its role as an initiating mutation in some patients. CONCLUSIONS Somatic mutations in the endoplasmic reticulum chaperone CALR were found in a majority of patients with myeloproliferative neoplasms with nonmutated JAK2. (Funded by the Kay Kendall Leukaemia Fund and others.).
Collapse
|
32
|
|
33
|
Abstract
All cancers are caused by somatic mutations; however, understanding of the biological processes generating these mutations is limited. The catalogue of somatic mutations from a cancer genome bears the signatures of the mutational processes that have been operative. Here we analysed 4,938,362 mutations from 7,042 cancers and extracted more than 20 distinct mutational signatures. Some are present in many cancer types, notably a signature attributed to the APOBEC family of cytidine deaminases, whereas others are confined to a single cancer class. Certain signatures are associated with age of the patient at cancer diagnosis, known mutagenic exposures or defects in DNA maintenance, but many are of cryptic origin. In addition to these genome-wide mutational signatures, hypermutation localized to small genomic regions, 'kataegis', is found in many cancer types. The results reveal the diversity of mutational processes underlying the development of cancer, with potential implications for understanding of cancer aetiology, prevention and therapy.
Collapse
|
34
|
Abstract
All cancers are caused by somatic mutations; however, understanding of the biological processes generating these mutations is limited. The catalogue of somatic mutations from a cancer genome bears the signatures of the mutational processes that have been operative. Here we analysed 4,938,362 mutations from 7,042 cancers and extracted more than 20 distinct mutational signatures. Some are present in many cancer types, notably a signature attributed to the APOBEC family of cytidine deaminases, whereas others are confined to a single cancer class. Certain signatures are associated with age of the patient at cancer diagnosis, known mutagenic exposures or defects in DNA maintenance, but many are of cryptic origin. In addition to these genome-wide mutational signatures, hypermutation localized to small genomic regions, 'kataegis', is found in many cancer types. The results reveal the diversity of mutational processes underlying the development of cancer, with potential implications for understanding of cancer aetiology, prevention and therapy.
Collapse
|
35
|
Whole exome sequencing of adenoid cystic carcinoma. J Clin Invest 2013; 123:2965-8. [PMID: 23778141 DOI: 10.1172/jci67201] [Citation(s) in RCA: 200] [Impact Index Per Article: 18.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2012] [Accepted: 04/11/2013] [Indexed: 12/28/2022] Open
Abstract
Adenoid cystic carcinoma (ACC) is a rare malignancy that can occur in multiple organ sites and is primarily found in the salivary gland. While the identification of recurrent fusions of the MYB-NFIB genes have begun to shed light on the molecular underpinnings, little else is known about the molecular genetics of this frequently fatal cancer. We have undertaken exome sequencing in a series of 24 ACC to further delineate the genetics of the disease. We identified multiple mutated genes that, combined, implicate chromatin deregulation in half of cases. Further, mutations were identified in known cancer genes, including PIK3CA, ATM, CDKN2A, SF3B1, SUFU, TSC1, and CYLD. Mutations in NOTCH1/2 were identified in 3 cases, and we identify the negative NOTCH signaling regulator, SPEN, as a new cancer gene in ACC with mutations in 5 cases. Finally, the identification of 3 likely activating mutations in the tyrosine kinase receptor FGFR2, analogous to those reported in ovarian and endometrial carcinoma, point to potential therapeutic avenues for a subset of cases.
Collapse
|
36
|
Abstract 5143: From sequencing data to mutation spectra: a high throughput analysis pipeline. Cancer Res 2013. [DOI: 10.1158/1538-7445.am2013-5143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Advances in massively parallel sequencing technology have revolutionized the way we characterise cancer genomes and provided significant new insights to our understanding of the mechanisms that underpin oncogenesis. A diverse range of mutations types including single base-pair changes, insertions, deletions, copy number alterations and larger structural variations are common in cancer genomes.
To rapidly and accurately screen next generation sequencing data for these somatic mutations in cancer, the Cancer Genome Project (CGP) has developed a high throughput analysis pipeline utilising a suite of analysis software developed by the group. Built around a compute farm of ∼2,000 nodes and using a Lustre filesystem, raw data files (BAM etc.), analysis results files and version information are efficiently stored and tracked in our archive/storage system, FileTrk. Lane data is aligned using Burrows-Wheeler Aligner (BWA) and web interfaces have been developed to allow scientific staff to rapidly QC aligned lanes. Once QC'd and desired coverage is reached, lanes are merged into a single sample BAM file and the sample is then ready for analysis.
In house algorithms are used to detect point mutations (CaVEMan), structural variation breakpoints (Brass) and copy number changes (ASCAT and PICNIC), whilst Pindel is used to detect small insertions/deletions. Post-processing filters then remove false positives and the results are uploaded into a database. Mutations are annotated to the protein and RNA levels using standard nomenclature (Vagrent, in-house software). Downstream analysis software has been developed (CANDI, in-house software) which produces a range of plots to aid visualisation of mutation context and mutation spectra patterns in related cancer samples.
Current IT development is focussed on converting the pipeline to produce and store VCF output, incorporate further downstream analysis software and automate data export to COSMIC and the ICGC data portal.
Citation Format: David Jones, Adam P. Butler, Jon W. Teague, Keiran M. Raine, Andrew Menzies, John Marshall, Jonathan Hinton, Serge Dronov, Lucy Stebbings, Alagu Jayakumar, Catherine Leroy, Jorge Zamora, Manasa Ramakrishna, Elli Papaemmanuil, Helen Davies, Susanna L. Cooke, Serena Nik-Zainal, Ultan McDermott, Michael R. Stratton, Peter Campbell. From sequencing data to mutation spectra: a high throughput analysis pipeline. [abstract]. In: Proceedings of the 104th Annual Meeting of the American Association for Cancer Research; 2013 Apr 6-10; Washington, DC. Philadelphia (PA): AACR; Cancer Res 2013;73(8 Suppl):Abstract nr 5143. doi:10.1158/1538-7445.AM2013-5143
Collapse
|
37
|
Mutational processes molding the genomes of 21 breast cancers. Cell 2012; 149:979-93. [PMID: 22608084 PMCID: PMC3414841 DOI: 10.1016/j.cell.2012.04.024] [Citation(s) in RCA: 1367] [Impact Index Per Article: 113.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2011] [Revised: 03/12/2012] [Accepted: 04/30/2012] [Indexed: 12/14/2022]
Abstract
All cancers carry somatic mutations. The patterns of mutation in cancer genomes reflect the DNA damage and repair processes to which cancer cells and their precursors have been exposed. To explore these mechanisms further, we generated catalogs of somatic mutation from 21 breast cancers and applied mathematical methods to extract mutational signatures of the underlying processes. Multiple distinct single- and double-nucleotide substitution signatures were discernible. Cancers with BRCA1 or BRCA2 mutations exhibited a characteristic combination of substitution mutation signatures and a distinctive profile of deletions. Complex relationships between somatic mutation prevalence and transcription were detected. A remarkable phenomenon of localized hypermutation, termed “kataegis,” was observed. Regions of kataegis differed between cancers but usually colocalized with somatic rearrangements. Base substitutions in these regions were almost exclusively of cytosine at TpC dinucleotides. The mechanisms underlying most of these mutational signatures are unknown. However, a role for the APOBEC family of cytidine deaminases is proposed. PaperClip
Collapse
|
38
|
Abstract
Cancer evolves dynamically as clonal expansions supersede one another driven by shifting selective pressures, mutational processes, and disrupted cancer genes. These processes mark the genome, such that a cancer's life history is encrypted in the somatic mutations present. We developed algorithms to decipher this narrative and applied them to 21 breast cancers. Mutational processes evolve across a cancer's lifespan, with many emerging late but contributing extensive genetic variation. Subclonal diversification is prominent, and most mutations are found in just a fraction of tumor cells. Every tumor has a dominant subclonal lineage, representing more than 50% of tumor cells. Minimal expansion of these subclones occurs until many hundreds to thousands of mutations have accumulated, implying the existence of long-lived, quiescent cell lineages capable of substantial proliferation upon acquisition of enabling genomic changes. Expansion of the dominant subclone to an appreciable mass may therefore represent the final rate-limiting step in a breast cancer's development, triggering diagnosis. PaperClip
Collapse
|
39
|
Abstract 3967: The Cancer Genome Project high throughput analysis pipeline. Cancer Res 2012. [DOI: 10.1158/1538-7445.am2012-3967] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
The Cancer Genome Project (CGP) was set up in 2000 to use systematic mutation screening methods to increase our understanding of human cancer. With the advent of next generation sequencing, and the large volumes of data that it generates, a new suite of software was required to rapidly and accurately screen these data for somatic changes. We have built an analysis pipeline to track and analyse large numbers of tumour samples, using in-house and externally available tools. The analysis pipeline is built around a ∼2,000 node compute farm and Lustre filesystem which outputs into our archive and data storage system, FileTrk. FileTrk holds the raw data files (BAM, CEL etc), the results of the analysis and any versioning information about the software used to generate these results. Sample lanes are aligned back to the genome using Burrows-Wheeler Aligner (BWA) and lane-to-lane comparisons are made to ensure data integrity. Lanes from each sample are merged into a single sample BAM file and once 30 - 40x coverage is reached and the lanes have been quality assessed the sample is locked and ready for analysis. Mutation callers detect point mutations (Caveman, in-house software), small insertions/deletions (Pindel), breakpoints (BRASS, in-house software) and copy number changes (ASCAT & PICNIC, in-house software). The resulting mutations are post-processed to remove false positives, annotated to the RNA and protein level using standard nomenclature (Vagrent, in-house software) and uploaded to a database. Interfaces have been developed to enable the selection of random sets of mutations for validation, the outcome of the validations is recorded so specificity can be calculated for each sample in the system. IT systems are being developed to automatically export lists of somatic changes to COSMIC, the ICGC data portal and raw data to the European Genome-Phenome Archive (EGA).
Citation Format: {Authors}. {Abstract title} [abstract]. In: Proceedings of the 103rd Annual Meeting of the American Association for Cancer Research; 2012 Mar 31-Apr 4; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2012;72(8 Suppl):Abstract nr 3967. doi:1538-7445.AM2012-3967
Collapse
|
40
|
Abstract
BACKGROUND Myelodysplastic syndromes are a diverse and common group of chronic hematologic cancers. The identification of new genetic lesions could facilitate new diagnostic and therapeutic strategies. METHODS We used massively parallel sequencing technology to identify somatically acquired point mutations across all protein-coding exons in the genome in 9 patients with low-grade myelodysplasia. Targeted resequencing of the gene encoding RNA splicing factor 3B, subunit 1 (SF3B1), was also performed in a cohort of 2087 patients with myeloid or other cancers. RESULTS We identified 64 point mutations in the 9 patients. Recurrent somatically acquired mutations were identified in SF3B1. Follow-up revealed SF3B1 mutations in 72 of 354 patients (20%) with myelodysplastic syndromes, with particularly high frequency among patients whose disease was characterized by ring sideroblasts (53 of 82 [65%]). The gene was also mutated in 1 to 5% of patients with a variety of other tumor types. The observed mutations were less deleterious than was expected on the basis of chance, suggesting that the mutated protein retains structural integrity with altered function. SF3B1 mutations were associated with down-regulation of key gene networks, including core mitochondrial pathways. Clinically, patients with SF3B1 mutations had fewer cytopenias and longer event-free survival than patients without SF3B1 mutations. CONCLUSIONS Mutations in SF3B1 implicate abnormalities of messenger RNA splicing in the pathogenesis of myelodysplastic syndromes. (Funded by the Wellcome Trust and others.).
Collapse
|
41
|
Data mining using the Catalogue of Somatic Mutations in Cancer BioMart. Database (Oxford) 2011; 2011:bar018. [PMID: 21609966 PMCID: PMC3263736 DOI: 10.1093/database/bar018] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2011] [Revised: 04/15/2011] [Accepted: 04/19/2011] [Indexed: 01/10/2023]
Abstract
Catalogue of Somatic Mutations in Cancer (COSMIC) (http://www.sanger.ac.uk/cosmic) is a publicly available resource providing information on somatic mutations implicated in human cancer. Release v51 (January 2011) includes data from just over 19,000 genes, 161,787 coding mutations and 5573 gene fusions, described in more than 577,000 tumour samples. COSMICMart (COSMIC BioMart) provides a flexible way to mine these data and combine somatic mutations with other biological relevant data sets. This article describes the data available in COSMIC along with examples of how to successfully mine and integrate data sets using COSMICMart. DATABASE URL: http://www.sanger.ac.uk/genetics/CGP/cosmic/biomart/martview/.
Collapse
|
42
|
Massive genomic rearrangement acquired in a single catastrophic event during cancer development. Cell 2011; 144:27-40. [PMID: 21215367 PMCID: PMC3065307 DOI: 10.1016/j.cell.2010.11.055] [Citation(s) in RCA: 1672] [Impact Index Per Article: 128.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2010] [Revised: 11/03/2010] [Accepted: 11/24/2010] [Indexed: 12/13/2022]
Abstract
Cancer is driven by somatically acquired point mutations and chromosomal rearrangements, conventionally thought to accumulate gradually over time. Using next-generation sequencing, we characterize a phenomenon, which we term chromothripsis, whereby tens to hundreds of genomic rearrangements occur in a one-off cellular crisis. Rearrangements involving one or a few chromosomes crisscross back and forth across involved regions, generating frequent oscillations between two copy number states. These genomic hallmarks are highly improbable if rearrangements accumulate over time and instead imply that nearly all occur during a single cellular catastrophe. The stamp of chromothripsis can be seen in at least 2%–3% of all cancers, across many subtypes, and is present in ∼25% of bone cancers. We find that one, or indeed more than one, cancer-causing lesion can emerge out of the genomic crisis. This phenomenon has important implications for the origins of genomic remodeling and temporal emergence of cancer. PaperClip
Collapse
|
43
|
The patterns and dynamics of genomic instability in metastatic pancreatic cancer. Nature 2010; 467:1109-13. [PMID: 20981101 DOI: 10.1038/nature09460] [Citation(s) in RCA: 996] [Impact Index Per Article: 71.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2010] [Accepted: 08/24/2010] [Indexed: 11/09/2022]
Abstract
Pancreatic cancer is an aggressive malignancy with a five-year mortality of 97-98%, usually due to widespread metastatic disease. Previous studies indicate that this disease has a complex genomic landscape, with frequent copy number changes and point mutations, but genomic rearrangements have not been characterized in detail. Despite the clinical importance of metastasis, there remain fundamental questions about the clonal structures of metastatic tumours, including phylogenetic relationships among metastases, the scale of ongoing parallel evolution in metastatic and primary sites, and how the tumour disseminates. Here we harness advances in DNA sequencing to annotate genomic rearrangements in 13 patients with pancreatic cancer and explore clonal relationships among metastases. We find that pancreatic cancer acquires rearrangements indicative of telomere dysfunction and abnormal cell-cycle control, namely dysregulated G1-to-S-phase transition with intact G2-M checkpoint. These initiate amplification of cancer genes and occur predominantly in early cancer development rather than the later stages of the disease. Genomic instability frequently persists after cancer dissemination, resulting in ongoing, parallel and even convergent evolution among different metastases. We find evidence that there is genetic heterogeneity among metastasis-initiating cells, that seeding metastasis may require driver mutations beyond those required for primary tumours, and that phylogenetic trees across metastases show organ-specific branches. These data attest to the richness of genetic variation in cancer, brought about by the tandem forces of genomic instability and evolutionary selection.
Collapse
|
44
|
Architectures of somatic genomic rearrangement in human cancer amplicons at sequence-level resolution. Genome Res 2007; 17:1296-303. [PMID: 17675364 PMCID: PMC1950898 DOI: 10.1101/gr.6522707] [Citation(s) in RCA: 146] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
For decades, cytogenetic studies have demonstrated that somatically acquired structural rearrangements of the genome are a common feature of most classes of human cancer. However, the characteristics of these rearrangements at sequence-level resolution have thus far been subject to very limited description. One process that is dependent upon somatic genome rearrangement is gene amplification, a mechanism often exploited by cancer cells to increase copy number and hence expression of dominantly acting cancer genes. The mechanisms underlying gene amplification are complex but must involve chromosome breakage and rejoining. We sequenced 133 different genomic rearrangements identified within four cancer amplicons involving the frequently amplified cancer genes MYC, MYCN, and ERBB2. The observed architectures of rearrangement were diverse and highly distinctive, with evidence for sister chromatid breakage-fusion-bridge cycles, formation and reinsertion of double minutes, and the presence of bizarre clusters of small genomic fragments. There were characteristic features of sequences at the breakage-fusion junctions, indicating roles for nonhomologous end joining and homologous recombination-mediated repair mechanisms together with nontemplated DNA synthesis. Evidence was also found for sequence-dependent variation in susceptibility of the genome to somatic rearrangement. The results therefore provide insights into the DNA breakage and repair processes operative in somatic genome rearrangement and illustrate how the evolutionary histories of individual cancers can be reconstructed from large-scale cancer genome sequencing.
Collapse
|
45
|
Stochastic modelling of landfill processes incorporating waste heterogeneity and data uncertainty. WASTE MANAGEMENT (NEW YORK, N.Y.) 2004; 24:241-250. [PMID: 15016413 DOI: 10.1016/j.wasman.2003.12.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 12/08/2003] [Indexed: 05/24/2023]
Abstract
A landfill is a very complex heterogeneous environment and as such it presents many modelling challenges. Attempts to develop models that reproduce these complexities generally involve the use of large numbers of spatially dependent parameters that cannot be properly characterised in the face of data uncertainty. An alternative method is presented, which couples a simplified microbial degradation model with a stochastic hydrological and contaminant transport model. This provides a framework for incorporating the complex effects of spatial heterogeneity within the landfill in a simplified manner, along with other key variables. A methodology for handling data uncertainty is also integrated into the model structure. Illustrative examples of the model's output are presented to demonstrate effects of data uncertainty on leachate composition and gas volume prediction.
Collapse
|
46
|
Stochastic modelling of landfill leachate and biogas production incorporating waste heterogeneity. Model formulation and uncertainty analysis. WASTE MANAGEMENT (NEW YORK, N.Y.) 2004; 24:453-462. [PMID: 15120429 DOI: 10.1016/j.wasman.2003.09.010] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 09/24/2003] [Indexed: 05/24/2023]
Abstract
A mathematical model simulating the hydrological and biochemical processes occurring in landfilled waste is presented and demonstrated. The model combines biochemical and hydrological models into an integrated representation of the landfill environment. Waste decomposition is modelled using traditional biochemical waste decomposition pathways combined with a simplified methodology for representing the rate of decomposition. Water flow through the waste is represented using a statistical velocity model capable of representing the effects of waste heterogeneity on leachate flow through the waste. Given the limitations in data capture from landfill sites, significant emphasis is placed on improving parameter identification and reducing parameter requirements. A sensitivity analysis is performed, highlighting the model's response to changes in input variables. A model test run is also presented, demonstrating the model capabilities. A parameter perturbation model sensitivity analysis was also performed. This has been able to show that although the model is sensitive to certain key parameters, its overall intuitive response provides a good basis for making reasonable predictions of the future state of the landfill system. Finally, due to the high uncertainty associated with landfill data, a tool for handling input data uncertainty is incorporated in the model's structure. It is concluded that the model can be used as a reasonable tool for modelling landfill processes and that further work should be undertaken to assess the model's performance.
Collapse
|
47
|
Soil transport and plant uptake of radio-iodine from near-surface groundwater. JOURNAL OF ENVIRONMENTAL RADIOACTIVITY 2003; 70:99-114. [PMID: 12915063 DOI: 10.1016/s0265-931x(03)00121-8] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
This paper describes a 12-month experiment designed to study the extent of upward migration of (125)I (as a surrogate for (129)I) from near-surface groundwater, through a 50-cm column of soil and into perennial ryegrass. The water table was established at a depth of 45 cm below the soil surface. By 3 months, (125)I had migrated about half way up the soil column. After this, it tended to accumulate just above this mid-point, with only very small amounts being transported to the upper 20 cm of soil. This behaviour seemed to be explained well by soil moisture and redox conditions. The experiment indicated that (125)I was mobile only within the saturated/low redox zone at the base of the soil column and accumulated in the zone of transition between anoxic and oxic soil conditions. Uptake of (125)I by the ryegrass was found to be low.
Collapse
|
48
|
Abstract
The finished sequence of human chromosome 20 comprises 59,187,298 base pairs (bp) and represents 99.4% of the euchromatic DNA. A single contig of 26 megabases (Mb) spans the entire short arm, and five contigs separated by gaps totalling 320 kb span the long arm of this metacentric chromosome. An additional 234,339 bp of sequence has been determined within the pericentromeric region of the long arm. We annotated 727 genes and 168 pseudogenes in the sequence. About 64% of these genes have a 5' and a 3' untranslated region and a complete open reading frame. Comparative analysis of the sequence of chromosome 20 to whole-genome shotgun-sequence data of two other vertebrates, the mouse Mus musculus and the puffer fish Tetraodon nigroviridis, provides an independent measure of the efficiency of gene annotation, and indicates that this analysis may account for more than 95% of all coding exons and almost all genes.
Collapse
|
49
|
Abstract
Overexpression of ornithine decarboxylase (ODC) is an important oncogenic event in tumorigenesis. Although ODC was one of the first genes described whose product is inducible by 12-O-tetradecanoylphorbol-13-acetate (TPA), the mechanisms of ODC transcriptional regulation have remained elusive. In this study, we systematically analyzed the rat ODC core promoter region for novel TPA response elements. Analysis of linker scanning mutants of the ODC promoter from the TATA box to the transcription start site demonstrated that mutation of the TATA box reduced the TPA induction ratio by 40%, while the basal ODC promoter activity was not significantly changed. A novel region between nt - 20 to - 10 was shown to be critical for both basal promoter activity and induction by TPA. Random mutagenesis of this region showed that conversion of the GC-rich wild-type sequence into a T-rich sequence could either substantially increase the basal promoter activity and decrease the TPA induction ratio or dramatically reduce the basal promoter activity, depending on the T content. Mutant R5, containing an ATTT sequence at nt - 15 to - 12, caused a more than twofold increase of basal promoter activity and 80% reduction of TPA induction ratio. We suggest that this region interacts with components of the general transcription machinery and that the strength of this interaction is mediated by the T-content in this region.
Collapse
|
50
|
Genetic analysis of susceptibility to spontaneous and UV-induced carcinogenesis in Xiphophorus hybrid fish. MARINE BIOTECHNOLOGY (NEW YORK, N.Y.) 2001; 3:S24-S36. [PMID: 14961297 DOI: 10.1007/s1012601-0004-7] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Xiphophorus interspecies hybrids provide genetically controlled models of tumor formation. Spontaneous melanomas form in first-generation backcross (BC(1)) hybrids produced from backcrossing F(1) hybrids derived from the platyfish X. maculatus Jp 163 A and the swordtail X. helleri to the X. helleri parental strain (the Gordon-Kosswig hybrid cross). Nodular melanomas originate in the dorsal fin from cells constituting the spotted dorsal (Sd) pigment pattern. A parallel genetic cross, with X. maculatus Jp 163 B, exhibits the spotted side (Sp) pigment pattern instead of Sd, and produces BC(1) hybrids exhibiting a much lower frequency of spontaneous melanoma formation. These hybrids are susceptible to melanoma development if irradiated with UV light as fry. Other hybrids involving these two strains of X. maculatus and different swordtail and platyfish backcross parents also have been investigated as potential tumor models, and show differing susceptibilities to UV-induced and spontaneous melanomas. Genotyping of individual BC(1) hybrids from several Xiphophorus crosses has implicated a locus, CDKN2X (a Xiphophorus homologue of the mammalian CDKN2 gene family, residing on Xiphophorus linkage group V), in enhancing pigmentation and the susceptibility to spontaneous and UV-induced melanoma formation in BC(1) hybrids from some crosses, but not others. Homozygosity for X. helleri and X. couchianus CDKN2X alleles in BC(1) hybrids can predispose individuals to melanoma, but this susceptibility is modified in other crosses depending both on the contributing sex-linked pigment pattern locus from X. maculatus (Sd or Sp), and the genetic constitution of the backcross parent. Xiphophorus BC(1) hybrids constitute unique genetic models offering the potential to analyze the contributions of specific genes to spontaneous and induced tumor formation in different, but comparable genetic backgrounds.
Collapse
|