1
|
Ajore R, Niroula A, Pertesi M, Cafaro C, Thodberg M, Went M, Bao EL, Duran-Lozano L, Lopez de Lapuente Portilla A, Olafsdottir T, Ugidos-Damboriena N, Magnusson O, Samur M, Lareau CA, Halldorsson GH, Thorleifsson G, Norddahl GL, Gunnarsdottir K, Försti A, Goldschmidt H, Hemminki K, van Rhee F, Kimber S, Sperling AS, Kaiser M, Anderson K, Jonsdottir I, Munshi N, Rafnar T, Waage A, Weinhold N, Thorsteinsdottir U, Sankaran VG, Stefansson K, Houlston R, Nilsson B. Functional dissection of inherited non-coding variation influencing multiple myeloma risk. Nat Commun 2022; 13:151. [PMID: 35013207 PMCID: PMC8748989 DOI: 10.1038/s41467-021-27666-x] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Accepted: 12/02/2021] [Indexed: 12/16/2022] Open
Abstract
Thousands of non-coding variants have been associated with increased risk of human diseases, yet the causal variants and their mechanisms-of-action remain obscure. In an integrative study combining massively parallel reporter assays (MPRA), expression analyses (eQTL, meQTL, PCHiC) and chromatin accessibility analyses in primary cells (caQTL), we investigate 1,039 variants associated with multiple myeloma (MM). We demonstrate that MM susceptibility is mediated by gene-regulatory changes in plasma cells and B-cells, and identify putative causal variants at six risk loci (SMARCD3, WAC, ELL2, CDCA7L, CEP120, and PREX1). Notably, three of these variants co-localize with significant plasma cell caQTLs, signaling the presence of causal activity at these precise genomic positions in an endogenous chromosomal context in vivo. Our results provide a systematic functional dissection of risk loci for a hematologic malignancy.
Collapse
Affiliation(s)
- Ram Ajore
- Hematology and Transfusion Medicine, Department of Laboratory Medicine, BMC B13, 221 84, Lund, Sweden
| | - Abhishek Niroula
- Hematology and Transfusion Medicine, Department of Laboratory Medicine, BMC B13, 221 84, Lund, Sweden
- Broad Institute of Massachusetts Institute of Technology and Harvard University, 415 Main Street, Boston, MA, 02142, USA
| | - Maroulio Pertesi
- Hematology and Transfusion Medicine, Department of Laboratory Medicine, BMC B13, 221 84, Lund, Sweden
| | - Caterina Cafaro
- Hematology and Transfusion Medicine, Department of Laboratory Medicine, BMC B13, 221 84, Lund, Sweden
| | - Malte Thodberg
- Hematology and Transfusion Medicine, Department of Laboratory Medicine, BMC B13, 221 84, Lund, Sweden
| | - Molly Went
- Division of Genetics and Epidemiology, The Institute of Cancer Research, 123 Old Brompton Road, London, SW7 3RP, United Kingdom
| | - Erik L Bao
- Broad Institute of Massachusetts Institute of Technology and Harvard University, 415 Main Street, Boston, MA, 02142, USA
- Division of Hematology/Oncology, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
- Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | - Laura Duran-Lozano
- Hematology and Transfusion Medicine, Department of Laboratory Medicine, BMC B13, 221 84, Lund, Sweden
| | | | | | - Nerea Ugidos-Damboriena
- Hematology and Transfusion Medicine, Department of Laboratory Medicine, BMC B13, 221 84, Lund, Sweden
| | - Olafur Magnusson
- deCODE Genetics/Amgen Inc., Sturlugata 8, 101, Reykjavik, Iceland
| | - Mehmet Samur
- Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | - Caleb A Lareau
- Broad Institute of Massachusetts Institute of Technology and Harvard University, 415 Main Street, Boston, MA, 02142, USA
- Division of Hematology/Oncology, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
- Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | | | | | | | | | - Asta Försti
- German Cancer Research Center (DKFZ), Im Neuenheimer Feld 580, D-69120, Heidelberg, Germany
- Hopp Children's Cancer Center, Heidelberg, Germany
| | - Hartmut Goldschmidt
- Department of Internal Medicine V, University Hospital of Heidelberg, 69120, Heidelberg, Germany
| | - Kari Hemminki
- German Cancer Research Center (DKFZ), Im Neuenheimer Feld 580, D-69120, Heidelberg, Germany
- Faculty of Medicine and Biomedical Center in Pilsen, Charles University in Prague, Prague, 30605, Czech Republic
| | | | - Scott Kimber
- Division of Genetics and Epidemiology, The Institute of Cancer Research, 123 Old Brompton Road, London, SW7 3RP, United Kingdom
| | - Adam S Sperling
- Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | - Martin Kaiser
- Division of Genetics and Epidemiology, The Institute of Cancer Research, 123 Old Brompton Road, London, SW7 3RP, United Kingdom
| | - Kenneth Anderson
- Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | | | - Nikhil Munshi
- Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | - Thorunn Rafnar
- deCODE Genetics/Amgen Inc., Sturlugata 8, 101, Reykjavik, Iceland
| | - Anders Waage
- Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology, Box 8905, N-7491, Trondheim, Norway
| | - Niels Weinhold
- German Cancer Research Center (DKFZ), Im Neuenheimer Feld 580, D-69120, Heidelberg, Germany
- Department of Internal Medicine V, University Hospital of Heidelberg, 69120, Heidelberg, Germany
| | | | - Vijay G Sankaran
- Broad Institute of Massachusetts Institute of Technology and Harvard University, 415 Main Street, Boston, MA, 02142, USA
- Division of Hematology/Oncology, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
- Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
- Harvard Stem Cell Institute, Cambridge, MA, USA
| | - Kari Stefansson
- deCODE Genetics/Amgen Inc., Sturlugata 8, 101, Reykjavik, Iceland
| | - Richard Houlston
- Division of Genetics and Epidemiology, The Institute of Cancer Research, 123 Old Brompton Road, London, SW7 3RP, United Kingdom
| | - Björn Nilsson
- Hematology and Transfusion Medicine, Department of Laboratory Medicine, BMC B13, 221 84, Lund, Sweden.
- Broad Institute of Massachusetts Institute of Technology and Harvard University, 415 Main Street, Boston, MA, 02142, USA.
| |
Collapse
|
2
|
Girimurugan SB, Liu Y, Lung PY, Vera DL, Dennis JH, Bass HW, Zhang J. iSeg: an efficient algorithm for segmentation of genomic and epigenomic data. BMC Bioinformatics 2018; 19:131. [PMID: 29642840 PMCID: PMC5896135 DOI: 10.1186/s12859-018-2140-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2017] [Accepted: 03/26/2018] [Indexed: 11/16/2022] Open
Abstract
Background Identification of functional elements of a genome often requires dividing a sequence of measurements along a genome into segments where adjacent segments have different properties, such as different mean values. Despite dozens of algorithms developed to address this problem in genomics research, methods with improved accuracy and speed are still needed to effectively tackle both existing and emerging genomic and epigenomic segmentation problems. Results We designed an efficient algorithm, called iSeg, for segmentation of genomic and epigenomic profiles. iSeg first utilizes dynamic programming to identify candidate segments and test for significance. It then uses a novel data structure based on two coupled balanced binary trees to detect overlapping significant segments and update them simultaneously during searching and refinement stages. Refinement and merging of significant segments are performed at the end to generate the final set of segments. By using an objective function based on the p-values of the segments, the algorithm can serve as a general computational framework to be combined with different assumptions on the distributions of the data. As a general segmentation method, it can segment different types of genomic and epigenomic data, such as DNA copy number variation, nucleosome occupancy, nuclease sensitivity, and differential nuclease sensitivity data. Using simple t-tests to compute p-values across multiple datasets of different types, we evaluate iSeg using both simulated and experimental datasets and show that it performs satisfactorily when compared with some other popular methods, which often employ more sophisticated statistical models. Implemented in C++, iSeg is also very computationally efficient, well suited for large numbers of input profiles and data with very long sequences. Conclusions We have developed an efficient general-purpose segmentation tool and showed that it had comparable or more accurate results than many of the most popular segment-calling algorithms used in contemporary genomic data analysis. iSeg is capable of analyzing datasets that have both positive and negative values. Tunable parameters allow users to readily adjust the statistical stringency to best match the biological nature of individual datasets, including widely or sparsely mapped genomic datasets or those with non-normal distributions. Electronic supplementary material The online version of this article (10.1186/s12859-018-2140-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | - Yuhang Liu
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| | - Pei-Yau Lung
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| | - Daniel L Vera
- Center for Genomics and Personalized Medicine, Florida State University, Tallahassee, FL, USA
| | - Jonathan H Dennis
- Department of Biological Science, Florida State University, Tallahassee, FL, USA
| | - Hank W Bass
- Department of Biological Science, Florida State University, Tallahassee, FL, USA
| | - Jinfeng Zhang
- Department of Statistics, Florida State University, Tallahassee, FL, USA.
| |
Collapse
|
3
|
Ajore R, Raiser D, McConkey M, Jöud M, Boidol B, Mar B, Saksena G, Weinstock DM, Armstrong S, Ellis SR, Ebert BL, Nilsson B. Deletion of ribosomal protein genes is a common vulnerability in human cancer, especially in concert with TP53 mutations. EMBO Mol Med 2017; 9:498-507. [PMID: 28264936 PMCID: PMC5376749 DOI: 10.15252/emmm.201606660] [Citation(s) in RCA: 80] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Heterozygous inactivating mutations in ribosomal protein genes (RPGs) are associated with hematopoietic and developmental abnormalities, activation of p53, and altered risk of cancer in humans and model organisms. Here we performed a large‐scale analysis of cancer genome data to examine the frequency and selective pressure of RPG lesions across human cancers. We found that hemizygous RPG deletions are common, occurring in about 43% of 10,744 cancer specimens and cell lines. Consistent with p53‐dependent negative selection, such lesions are underrepresented in TP53‐intact tumors (P ≪ 10−10), and shRNA‐mediated knockdown of RPGs activated p53 in TP53‐wild‐type cells. In contrast, we did not see negative selection of RPG deletions in TP53‐mutant tumors. RPGs are conserved with respect to homozygous deletions, and shRNA screening data from 174 cell lines demonstrate that further suppression of hemizygously deleted RPGs inhibits cell growth. Our results establish RPG haploinsufficiency as a strikingly common vulnerability of human cancers that associates with TP53 mutations and could be targetable therapeutically.
Collapse
Affiliation(s)
- Ram Ajore
- Hematology and Transfusion Medicine, Department of Laboratory Medicine, Lund University, Lund, Sweden
| | - David Raiser
- Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Marie McConkey
- Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Magnus Jöud
- Hematology and Transfusion Medicine, Department of Laboratory Medicine, Lund University, Lund, Sweden
| | - Bernd Boidol
- Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Brenton Mar
- Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | | | | | | | - Steven R Ellis
- Department of Biochemistry and Molecular Biology, University of Louisville, Louisville, KY, USA
| | - Benjamin L Ebert
- Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA .,Broad Institute, 7 Cambridge Center, Cambridge, MA, USA
| | - Björn Nilsson
- Hematology and Transfusion Medicine, Department of Laboratory Medicine, Lund University, Lund, Sweden .,Broad Institute, 7 Cambridge Center, Cambridge, MA, USA
| |
Collapse
|
4
|
Hugerth LW, Larsson J, Alneberg J, Lindh MV, Legrand C, Pinhassi J, Andersson AF. Metagenome-assembled genomes uncover a global brackish microbiome. Genome Biol 2015; 16:279. [PMID: 26667648 PMCID: PMC4699468 DOI: 10.1186/s13059-015-0834-7] [Citation(s) in RCA: 121] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2015] [Accepted: 11/12/2015] [Indexed: 02/03/2023] Open
Abstract
BACKGROUND Microbes are main drivers of biogeochemical cycles in oceans and lakes. Although the genome is a foundation for understanding the metabolism, ecology and evolution of an organism, few bacterioplankton genomes have been sequenced, partly due to difficulties in cultivating them. RESULTS We use automatic binning to reconstruct a large number of bacterioplankton genomes from a metagenomic time-series from the Baltic Sea, one of world's largest brackish water bodies. These genomes represent novel species within typical freshwater and marine clades, including clades not previously sequenced. The genomes' seasonal dynamics follow phylogenetic patterns, but with fine-grained lineage-specific variations, reflected in gene-content. Signs of streamlining are evident in most genomes, and estimated genome sizes correlate with abundance variation across filter size fractions. Comparing the genomes with globally distributed metagenomes reveals significant fragment recruitment at high sequence identity from brackish waters in North America, but little from lakes or oceans. This suggests the existence of a global brackish metacommunity whose populations diverged from freshwater and marine relatives over 100,000 years ago, long before the Baltic Sea was formed (8000 years ago). This markedly contrasts to most Baltic Sea multicellular organisms, which are locally adapted populations of freshwater or marine counterparts. CONCLUSIONS We describe the gene content, temporal dynamics and biogeography of a large set of new bacterioplankton genomes assembled from metagenomes. We propose that brackish environments exert such strong selection that lineages adapted to them flourish globally with limited influence from surrounding aquatic communities.
Collapse
Affiliation(s)
- Luisa W Hugerth
- KTH Royal Institute of Technology, Science for Life Laboratory, School of Biotechnology, Division of Gene Technology, Stockholm, Sweden.
| | - John Larsson
- Centre for Ecology and Evolution in Microbial model Systems - EEMiS, Linnaeus University, Barlastgatan 11, SE-39182, Kalmar, Sweden.
| | - Johannes Alneberg
- KTH Royal Institute of Technology, Science for Life Laboratory, School of Biotechnology, Division of Gene Technology, Stockholm, Sweden.
| | - Markus V Lindh
- Centre for Ecology and Evolution in Microbial model Systems - EEMiS, Linnaeus University, Barlastgatan 11, SE-39182, Kalmar, Sweden.
| | - Catherine Legrand
- Centre for Ecology and Evolution in Microbial model Systems - EEMiS, Linnaeus University, Barlastgatan 11, SE-39182, Kalmar, Sweden.
| | - Jarone Pinhassi
- Centre for Ecology and Evolution in Microbial model Systems - EEMiS, Linnaeus University, Barlastgatan 11, SE-39182, Kalmar, Sweden.
| | - Anders F Andersson
- KTH Royal Institute of Technology, Science for Life Laboratory, School of Biotechnology, Division of Gene Technology, Stockholm, Sweden.
| |
Collapse
|
5
|
Anjum S, Morganella S, D'Angelo F, Iavarone A, Ceccarelli M. VEGAWES: variational segmentation on whole exome sequencing for copy number detection. BMC Bioinformatics 2015; 16:315. [PMID: 26416038 PMCID: PMC4587906 DOI: 10.1186/s12859-015-0748-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2015] [Accepted: 09/16/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Copy number variations are important in the detection and progression of significant tumors and diseases. Recently, Whole Exome Sequencing is gaining popularity with copy number variations detection due to low cost and better efficiency. In this work, we developed VEGAWES for accurate and robust detection of copy number variations on WES data. VEGAWES is an extension to a variational based segmentation algorithm, VEGA: Variational estimator for genomic aberrations, which has previously outperformed several algorithms on segmenting array comparative genomic hybridization data. RESULTS We tested this algorithm on synthetic data and 100 Glioblastoma Multiforme primary tumor samples. The results on the real data were analyzed with segmentation obtained from Single-nucleotide polymorphism data as ground truth. We compared our results with two other segmentation algorithms and assessed the performance based on accuracy and time. CONCLUSIONS In terms of both accuracy and time, VEGAWES provided better results on the synthetic data and tumor samples demonstrating its potential in robust detection of aberrant regions in the genome.
Collapse
Affiliation(s)
- Samreen Anjum
- Computational Sciences and Engineering, Qatar Computing Research Institute, Doha, P. O. Box 5825, Qatar.
| | - Sandro Morganella
- European Molecular Biology Laboratory, European Bioinformatics Institute, (EMBL -EBI), Wellcome Trust Genome Campus, Cambridge, CB10 1SD, UK.
| | | | - Antonio Iavarone
- Institute for Cancer Genetics, Columbia University, New York, 10027, USA.
| | - Michele Ceccarelli
- Computational Sciences and Engineering, Qatar Computing Research Institute, Doha, P. O. Box 5825, Qatar. .,Department of Science and Technology, University of Sannio, Benevento, 82100, Italy.
| |
Collapse
|
6
|
Zhou X, Yang C, Wan X, Zhao H, Yu W. Multisample aCGH data analysis via total variation and spectral regularization. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:230-235. [PMID: 23702561 PMCID: PMC3715577 DOI: 10.1109/tcbb.2012.166] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
DNA copy number variation (CNV) accounts for a large proportion of genetic variation. One commonly used approach to detecting CNVs is array-based comparative genomic hybridization (aCGH). Although many methods have been proposed to analyze aCGH data, it is not clear how to combine information from multiple samples to improve CNV detection. In this paper, we propose to use a matrix to approximate the multisample aCGH data and minimize the total variation of each sample as well as the nuclear norm of the whole matrix. In this way, we can make use of the smoothness property of each sample and the correlation among multiple samples simultaneously in a convex optimization framework. We also developed an efficient and scalable algorithm to handle large-scale data. Experiments demonstrate that the proposed method outperforms the state-of-the-art techniques under a wide range of scenarios and it is capable of processing large data sets with millions of probes.
Collapse
Affiliation(s)
- Xiaowei Zhou
- Department of Electronic and Computer Engineering, Hong Kong University of Science and Technology, Hong Kong, China.
| | | | | | | | | |
Collapse
|
7
|
Seifert M, Gohr A, Strickert M, Grosse I. Parsimonious higher-order hidden Markov models for improved array-CGH analysis with applications to Arabidopsis thaliana. PLoS Comput Biol 2012; 8:e1002286. [PMID: 22253580 PMCID: PMC3257270 DOI: 10.1371/journal.pcbi.1002286] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2011] [Accepted: 10/11/2011] [Indexed: 12/19/2022] Open
Abstract
Array-based comparative genomic hybridization (Array-CGH) is an important technology in molecular biology for the detection of DNA copy number polymorphisms between closely related genomes. Hidden Markov Models (HMMs) are popular tools for the analysis of Array-CGH data, but current methods are only based on first-order HMMs having constrained abilities to model spatial dependencies between measurements of closely adjacent chromosomal regions. Here, we develop parsimonious higher-order HMMs enabling the interpolation between a mixture model ignoring spatial dependencies and a higher-order HMM exhaustively modeling spatial dependencies. We apply parsimonious higher-order HMMs to the analysis of Array-CGH data of the accessions C24 and Col-0 of the model plant Arabidopsis thaliana. We compare these models against first-order HMMs and other existing methods using a reference of known deletions and sequence deviations. We find that parsimonious higher-order HMMs clearly improve the identification of these polymorphisms. Moreover, we perform a functional analysis of identified polymorphisms revealing novel details of genomic differences between C24 and Col-0. Additional model evaluations are done on widely considered Array-CGH data of human cell lines indicating that parsimonious HMMs are also well-suited for the analysis of non-plant specific data. All these results indicate that parsimonious higher-order HMMs are useful for Array-CGH analyses. An implementation of parsimonious higher-order HMMs is available as part of the open source Java library Jstacs (www.jstacs.de/index.php/PHHMM). Array-based comparative genomics is a standard approach for the identification of DNA copy number polymorphisms between closely related genomes. The huge amounts of data produced by these experiments require efficient and accurate bioinformatics tools for the identification of copy number polymorphisms. Hidden Markov Models (HMMs) are frequently used for analyzing such data sets, but current models are based on first-order HMMs only having limited capabilities to model spatial dependencies between measurements of closely adjacent chromosomal regions. We develop parsimonious higher-order HMMs enabling the interpolation between a mixture model ignoring spatial dependencies and a higher-order HMM exhaustively modeling these dependencies to overcome this limitation. In an in-depth case study with Arabidopsis thaliana, we find that parsimonious higher-order HMMs clearly improve the identification of copy number polymorphisms in comparison to standard first-order HMMs and other frequently used methods. Functional analysis of identified polymorphisms revealed details of genomic differences between the accessions C24 and Col-0 of Arabidopsis thaliana. An additional study on human cell lines further indicates that parsimonious HMMs are well-suited for the analysis of Array-CGH data.
Collapse
Affiliation(s)
- Michael Seifert
- Department of Molecular Genetics, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Germany.
| | | | | | | |
Collapse
|
8
|
Predictive genes in adjacent normal tissue are preferentially altered by sCNV during tumorigenesis in liver cancer and may rate limiting. PLoS One 2011; 6:e20090. [PMID: 21750698 PMCID: PMC3130029 DOI: 10.1371/journal.pone.0020090] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2010] [Accepted: 04/25/2011] [Indexed: 11/19/2022] Open
Abstract
Background In hepatocellular carcinoma (HCC) genes predictive of survival have been found in both adjacent normal (AN) and tumor (TU) tissues. The relationships between these two sets of predictive genes and the general process of tumorigenesis and disease progression remains unclear. Methodology/Principal Findings Here we have investigated HCC tumorigenesis by comparing gene expression, DNA copy number variation and survival using ∼250 AN and TU samples representing, respectively, the pre-cancer state, and the result of tumorigenesis. Genes that participate in tumorigenesis were defined using a gene-gene correlation meta-analysis procedure that compared AN versus TU tissues. Genes predictive of survival in AN (AN-survival genes) were found to be enriched in the differential gene-gene correlation gene set indicating that they directly participate in the process of tumorigenesis. Additionally the AN-survival genes were mostly not predictive after tumorigenesis in TU tissue and this transition was associated with and could largely be explained by the effect of somatic DNA copy number variation (sCNV) in cis and in trans. The data was consistent with the variance of AN-survival genes being rate-limiting steps in tumorigenesis and this was confirmed using a treatment that promotes HCC tumorigenesis that selectively altered AN-survival genes and genes differentially correlated between AN and TU. Conclusions/Significance This suggests that the process of tumor evolution involves rate-limiting steps related to the background from which the tumor evolved where these were frequently predictive of clinical outcome. Additionally treatments that alter the likelihood of tumorigenesis occurring may act by altering AN-survival genes, suggesting that the process can be manipulated. Further sCNV explains a substantial fraction of tumor specific expression and may therefore be a causal driver of tumor evolution in HCC and perhaps many solid tumor types.
Collapse
|
9
|
Mermel CH, Schumacher SE, Hill B, Meyerson ML, Beroukhim R, Getz G. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol 2011; 12:R41. [PMID: 21527027 PMCID: PMC3218867 DOI: 10.1186/gb-2011-12-4-r41] [Citation(s) in RCA: 2474] [Impact Index Per Article: 176.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2010] [Revised: 02/14/2011] [Accepted: 04/28/2011] [Indexed: 12/18/2022] Open
Abstract
We describe methods with enhanced power and specificity to identify genes targeted by somatic copy-number alterations (SCNAs) that drive cancer growth. By separating SCNA profiles into underlying arm-level and focal alterations, we improve the estimation of background rates for each category. We additionally describe a probabilistic method for defining the boundaries of selected-for SCNA regions with user-defined confidence. Here we detail this revised computational approach, GISTIC2.0, and validate its performance in real and simulated datasets.
Collapse
Affiliation(s)
- Craig H Mermel
- Cancer Program, The Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, MA 02142, USA
| | | | | | | | | | | |
Collapse
|
10
|
Wang S, Wang Y, Xie Y, Xiao G. A novel approach to DNA copy number data segmentation. J Bioinform Comput Biol 2011; 9:131-48. [PMID: 21328710 PMCID: PMC3084615 DOI: 10.1142/s0219720011005343] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2010] [Revised: 11/02/2010] [Accepted: 11/04/2010] [Indexed: 11/18/2022]
Abstract
DNA copy number (DCN) is the number of copies of DNA at a region of a genome. The alterations of DCN are highly associated with the development of different tumors. Recently, microarray technologies are being employed to detect DCN changes at many loci at the same time in tumor samples. The resulting DCN data are often very noisy, and the tumor sample is often contaminated by normal cells. The goal of computational analysis of array-based DCN data is to infer the underlying DCNs from raw DCN data. Previous methods for this task do not model the tumor/normal cell mixture ratio explicitly and they cannot output segments with DCN annotations. We developed a novel model-based method using the minimum description length (MDL) principle for DCN data segmentation. Our new method can output underlying DCN for each chromosomal segment, and at the same time, infer the underlying tumor proportion in the test samples. Empirical results show that our method achieves better accuracies on average as compared to three previous methods, namely Circular Binary Segmentation, Hidden Markov Model and Ultrasome.
Collapse
Affiliation(s)
- Siling Wang
- Department of Computer Science and Engineering, Southern Methodist University, Dallas, Texas 75205, USA.
| | | | | | | |
Collapse
|
11
|
Lamy P, Wiuf C, Ørntoft TF, Andersen CL. Rseg—an R package to optimize segmentation of SNP array data. Bioinformatics 2010; 27:419-20. [DOI: 10.1093/bioinformatics/btq668] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
12
|
Morganella S, Cerulo L, Viglietto G, Ceccarelli M. VEGA: variational segmentation for copy number detection. ACTA ACUST UNITED AC 2010; 26:3020-7. [PMID: 20959380 DOI: 10.1093/bioinformatics/btq586] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Genomic copy number (CN) information is useful to study genetic traits of many diseases. Using array comparative genomic hybridization (aCGH), researchers are able to measure the copy number of thousands of DNA loci at the same time. Therefore, a current challenge in bioinformatics is the development of efficient algorithms to detect the map of aberrant chromosomal regions. METHODS We describe an approach for the segmentation of copy number aCGH data. Variational estimator for genomic aberrations (VEGA) adopt a variational model used in image segmentation. The optimal segmentation is modeled as the minimum of an energy functional encompassing both the quality of interpolation of the data and the complexity of the solution measured by the length of the boundaries between segmented regions. This solution is obtained by a region growing process where the stop condition is completely data driven. RESULTS VEGA is compared with three algorithms that represent the state of the art in CN segmentation. Performance assessment is made both on synthetic and real data. Synthetic data simulate different noise conditions. Results on these data show the robustness with respect to noise of variational models and the accuracy of VEGA in terms of recall and precision. Eight mantle cell lymphoma cell lines and two samples of glioblastoma multiforme are used to evaluate the behavior of VEGA on real biological data. Comparison between results and current biological knowledge shows the ability of the proposed method in detecting known chromosomal aberrations. AVAILABILITY VEGA has been implemented in R and is available at the address http://www.dsba.unisannio.it/Members/ceccarelli/vega in the section Download.
Collapse
Affiliation(s)
- Sandro Morganella
- Department of Biological and Environmental Studies, University of Sannio, Benevento, Italy
| | | | | | | |
Collapse
|
13
|
Ivakhno S, Tavaré S. CNAnova: a new approach for finding recurrent copy number abnormalities in cancer SNP microarray data. ACTA ACUST UNITED AC 2010; 26:1395-402. [PMID: 20403815 DOI: 10.1093/bioinformatics/btq145] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION The current generation of single nucleotide polymorphism (SNP) arrays allows measurement of copy number aberrations (CNAs) in cancer at more than one million locations in the genome in hundreds of tumour samples. Most research has focused on single-sample CNA discovery, the so-called segmentation problem. The availability of high-density, large sample-size SNP array datasets makes the identification of recurrent copy number changes in cancer, an important issue that can be addressed using the cross-sample information. RESULTS We present a novel approach for finding regions of recurrent copy number aberrations, called CNAnova, from Affymetrix SNP 6.0 array data. The method derives its statistical properties from a control dataset composed of normal samples and, in contrast to previous methods, does not require segmentation and permutation steps. For rigorous testing of the algorithm and comparison to existing methods, we developed a simulation scheme that uses the noise distribution present in Affymetrix arrays. Application of the method to 128 acute lymphoblastic leukaemia samples shows that CNAnova achieves lower error rate than a popular alternative approach. We also describe an extension of the CNAnova framework to identify recurrent CNA regions with intra-tumour heterogeneity, present in either primary or relapsed samples from the same patients. AVAILABILITY The CNAnova package and synthetic datasets are available at http://www.compbio.group.cam.ac.uk/software.html.
Collapse
Affiliation(s)
- Sergii Ivakhno
- Cancer Research UK Cambridge Research Institute, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK.
| | | |
Collapse
|
14
|
Abstract
Comprehensive analysis of the cancer genome has become a standard approach to identifying new disease loci, and ultimately will guide therapeutic decisions. A key technology in this effort, single nucleotide polymorphism arrays, has been applied in hematologic malignancies to detect deletions, amplifications, and loss of heterozygosity (LOH) at high resolution. An inherent challenge of such studies lies in correctly distinguishing somatically acquired, cancer-specific lesions from patient-specific inherited copy number variations or segments of homozygosity. Failure to include appropriate normal DNA reference samples for each patient in retrospective or prospective studies makes it difficult to identify small somatic deletions not evident by standard cytogenetic analysis. In addition, the lack of proper controls can also lead to vastly overestimated frequencies of LOH without accompanying loss of DNA copies, so-called copy-neutral LOH. Here we use examples from patients with myeloid malignancies to demonstrate the superiority of matched tumor and normal DNA samples (paired studies) over multiple unpaired samples with respect to reducing false discovery rates in high-resolution single nucleotide polymorphism array analysis. Comparisons between matched tumor and normal samples will continue to be critical as the field moves from high resolution array analysis to deep sequencing to detect abnormalities in the cancer genome.
Collapse
|
15
|
Zhang Q, Ding L, Larson DE, Koboldt DC, McLellan MD, Chen K, Shi X, Kraja A, Mardis ER, Wilson RK, Borecki IB, Province MA. CMDS: a population-based method for identifying recurrent DNA copy number aberrations in cancer from high-resolution data. ACTA ACUST UNITED AC 2009; 26:464-9. [PMID: 20031968 DOI: 10.1093/bioinformatics/btp708] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
MOTIVATION DNA copy number aberration (CNA) is a hallmark of genomic abnormality in tumor cells. Recurrent CNA (RCNA) occurs in multiple cancer samples across the same chromosomal region and has greater implication in tumorigenesis. Current commonly used methods for RCNA identification require CNA calling for individual samples before cross-sample analysis. This two-step strategy may result in a heavy computational burden, as well as a loss of the overall statistical power due to segmentation and discretization of individual sample's data. We propose a population-based approach for RCNA detection with no need of single-sample analysis, which is statistically powerful, computationally efficient and particularly suitable for high-resolution and large-population studies. RESULTS Our approach, correlation matrix diagonal segmentation (CMDS), identifies RCNAs based on a between-chromosomal-site correlation analysis. Directly using the raw intensity ratio data from all samples and adopting a diagonal transformation strategy, CMDS substantially reduces computational burden and can obtain results very quickly from large datasets. Our simulation indicates that the statistical power of CMDS is higher than that of single-sample CNA calling based two-step approaches. We applied CMDS to two real datasets of lung cancer and brain cancer from Affymetrix and Illumina array platforms, respectively, and successfully identified known regions of CNA associated with EGFR, KRAS and other important oncogenes. CMDS provides a fast, powerful and easily implemented tool for the RCNA analysis of large-scale data from cancer genomes.
Collapse
Affiliation(s)
- Qunyuan Zhang
- Division of Statistical Genomics, Washington University School of Medicine, St Louis, MO, USA.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|