1
|
Brahme A, Hultén M, Bengtsson C, Hultgren A, Zetterberg A. Radiation-Induced Chromosomal Breaks may be DNA Repair Fragile Sites with Larger-scale Correlations to Eight Double-Strand-Break Related Data Sets over the Human Genome. Radiat Res 2019; 192:562-576. [PMID: 31545677 DOI: 10.1667/rr15424.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
In this work, we compared the genomic distribution of common radiation-induced chromosomal breaks to eight different data sets covering the whole human genome. Sites with a high probability of chromatid breakage after exposure to low and high ionization density radiations were often located inside common and rare fragile sites, indicating that they may be a new and more local type of DNA repair-related fragility. Breaks in specific chromosome bands after acute exposure to oil and benzene also showed strong correlation with these sites and fragile sites. In addition, close correlation was found with cytologically detected chiasma and MLH1 immunofluorescence sites and with the HapMap recombination density distributions. Also, of interest, copy number changes occurred predominantly at radiation-induced breaks and fragile sites, at least for breast cancers with poor prognosis, and they decreased weakly but significantly in regions with increasing recombination and CpG density. An increased CpG density is linked to regions of high gene density to secure high-fidelity reproduction and survival. To minimize cancer induction, cancer-related genes are often located in regions of decreased recombination density and/or higher-than-average CpG density. It is compelling that all these data sets were influenced by the cells' handling of double-strand breaks and, more generally, DNA damage on its genome. In fact, the DNA repair genes systematically avoid regions with a high recombination density, as they need to be intact to accurately handle repairable DNA lesions.
Collapse
Affiliation(s)
- Anders Brahme
- Department of Oncology-Pathology, Karolinska Institutet, Box 260, SE-171 76 Stockholm, Sweden
| | - Maj Hultén
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Karolinska University Hospital, S-171 76 Stockholm, Sweden
| | - Carin Bengtsson
- Department of Oncology-Pathology, Karolinska Institutet, Box 260, SE-171 76 Stockholm, Sweden
| | - Andreas Hultgren
- Department of Oncology-Pathology, Karolinska Institutet, Box 260, SE-171 76 Stockholm, Sweden
| | - Anders Zetterberg
- Department of Oncology-Pathology, Karolinska Institutet, Box 260, SE-171 76 Stockholm, Sweden
| |
Collapse
|
2
|
Shankar G, Rossi MR, Mcquaid DE, Conroy JM, Gaile DG, Cowell JK, Nowak NJ, Liang P. aCGHViewer: A Generic Visualization Tool for aCGH Data. Cancer Inform 2017. [DOI: 10.1177/117693510600200023] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Array-Comparative Genomic Hybridization (aCGH) is a powerful high throughput technology for detecting chromosomal copy number aberrations (CNAs) in cancer, aiming at identifying related critical genes from the affected genomic regions. However, advancing from a dataset with thousands of tabular lines to a few candidate genes can be an onerous and time-consuming process. To expedite the aCGH data analysis process, we have developed a user-friendly aCGH data viewer (aCGHViewer) as a conduit between the aCGH data tables and a genome browser. The data from a given aCGH analysis are displayed in a genomic view comprised of individual chromosome panels which can be rapidly scanned for interesting features. A chromosome panel containing a feature of interest can be selected to launch a detail window for that single chromosome. Selecting a data point of interest in the detail window launches a query to the UCSC or NCBI genome browser to allow the user to explore the gene content in the chromosomal region. Additionally, aCGHViewer can display aCGH and expression array data concurrently to visually correlate the two. aCGHViewer is a stand alone Java visualization application that should be used in conjunction with separate statistical programs. It operates on all major computer platforms and is freely available at http://falcon.roswellpark.org/aCGHview/ .
Collapse
Affiliation(s)
- Ganesh Shankar
- Department of Cancer Genetics, Roswell Park Cancer Institute, Buffalo, NY 14263 USA
| | - Michael R. Rossi
- Department of Cancer Genetics, Roswell Park Cancer Institute, Buffalo, NY 14263 USA
| | - Devin E. Mcquaid
- Department of Cancer Genetics, Roswell Park Cancer Institute, Buffalo, NY 14263 USA
| | - Jeffrey M. Conroy
- Department of Cancer Genetics, Roswell Park Cancer Institute, Buffalo, NY 14263 USA
| | - Daniel G. Gaile
- Department of Biostatistics, The State University of New York at Buffalo, Buffalo, NY 14214 USA
| | - John K. Cowell
- Department of Cancer Genetics, Roswell Park Cancer Institute, Buffalo, NY 14263 USA
| | - Norma J. Nowak
- Department of Cancer Genetics, Roswell Park Cancer Institute, Buffalo, NY 14263 USA
| | - Ping Liang
- Department of Cancer Genetics, Roswell Park Cancer Institute, Buffalo, NY 14263 USA
| |
Collapse
|
3
|
Lo KC, Shankar G, Turpaz Y, Bailey D, Rossi MR, Burkhardt T, Liang P, Cowell JK. Overlay Tool© for aCGHViewer©: An Analysis Module Built for aCGHViewer© used to Perform Comparisons of Data Derived from Different Microarray Platforms. Cancer Inform 2017. [DOI: 10.1177/117693510700300003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
The Overlay Tool© has been developed to combine high throughput data derived from various microarray platforms. This tool analyzes high-resolution correlations between gene expression changes and either copy number abnormalities (CNAs) or loss of heterozygosity events detected using array comparative genomic hybridization (aCGH). Using an overlay analysis which is designed to be performed using data from multiple microarray platforms on a single biological sample, the Overlay Tool© identifies potentially important genes whose expression profiles are changed as a result of losses, gains and amplifications in the cancer genome. In addition, the Overlay Tool© will incorporate loss of heterozygosity (LOH) probability data into this overlay procedure. To facilitate this analysis, we developed an application which computationally combines two or more high throughput datasets (e.g. aCGH/expression) into a single categorized dataset for visualization and interrogation using a gene-centric approach. As such, data from virtually any microarray platform can be incorporated without the need to remap entire datasets individually. The resultant categorized (overlay) data set can be conveniently viewed using our in-house visualization tool, aCGHViewer© (Shankar et al. 2006), which serves as a conduit to public databases such as UCSC and NCBI, to rapidly investigate genes of interest.
Collapse
Affiliation(s)
- Ken C. Lo
- Department of Cancer Genetics, Roswell Park Cancer Institute, Buffalo, NY 14263
| | - Ganesh Shankar
- Department of Cancer Genetics, Roswell Park Cancer Institute, Buffalo, NY 14263
| | | | | | - Michael R. Rossi
- Department of Cancer Genetics, Roswell Park Cancer Institute, Buffalo, NY 14263
- Yale University School of Medicine, Department of Cancer Genetics, New Haven, CT 06520
| | - Tania Burkhardt
- Department of Cancer Genetics, Roswell Park Cancer Institute, Buffalo, NY 14263
| | - Ping Liang
- Department of Cancer Genetics, Roswell Park Cancer Institute, Buffalo, NY 14263
| | - John K. Cowell
- Department of Cancer Genetics, Roswell Park Cancer Institute, Buffalo, NY 14263
| |
Collapse
|
4
|
Morishita M, Muramatsu T, Suto Y, Hirai M, Konishi T, Hayashi S, Shigemizu D, Tsunoda T, Moriyama K, Inazawa J. Chromothripsis-like chromosomal rearrangements induced by ionizing radiation using proton microbeam irradiation system. Oncotarget 2016; 7:10182-92. [PMID: 26862731 PMCID: PMC4891112 DOI: 10.18632/oncotarget.7186] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2015] [Accepted: 01/24/2016] [Indexed: 12/11/2022] Open
Abstract
Chromothripsis is the massive but highly localized chromosomal rearrangement in response to a one-step catastrophic event, rather than an accumulation of a series of subsequent and random alterations. Chromothripsis occurs commonly in various human cancers and is thought to be associated with increased malignancy and carcinogenesis. However, the causes and consequences of chromothripsis remain unclear. Therefore, to identify the mechanism underlying the generation of chromothripsis, we investigated whether chromothripsis could be artificially induced by ionizing radiation. We first elicited DNA double-strand breaks in an oral squamous cell carcinoma cell line HOC313-P and its highly metastatic subline HOC313-LM, using Single Particle Irradiation system to Cell (SPICE), a focused vertical microbeam system designed to irradiate a spot within the nuclei of adhesive cells, and then established irradiated monoclonal sublines from them, respectively. SNP array analysis detected a number of chromosomal copy number alterations (CNAs) in these sublines, and one HOC313-LM-derived monoclonal subline irradiated with 200 protons by the microbeam displayed multiple CNAs involved locally in chromosome 7. Multi-color FISH showed a complex translocation of chromosome 7 involving chromosomes 11 and 12. Furthermore, whole genome sequencing analysis revealed multiple de novo complex chromosomal rearrangements localized in chromosomes 2, 5, 7, and 20, resembling chromothripsis. These findings suggested that localized ionizing irradiation within the nucleus may induce chromothripsis-like complex chromosomal alterations via local DNA damage in the nucleus.
Collapse
Affiliation(s)
- Maki Morishita
- Department of Molecular Cytogenetics, Medical Research Institute, Tokyo Medical and Dental University, Tokyo, Japan.,Department of Maxillofacial Orthognathics, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan.,Research Fellow of the Japan Society for the Promotion of Science, Chiyoda-ku, Tokyo, Japan
| | - Tomoki Muramatsu
- Department of Molecular Cytogenetics, Medical Research Institute, Tokyo Medical and Dental University, Tokyo, Japan
| | - Yumiko Suto
- Biodosimetry Research Team, Research Center for Radiation Emergency Medicine, National Institute of Radiological Sciences, Inage-ku, Chiba-shi, Chiba, Japan
| | - Momoki Hirai
- Biodosimetry Research Team, Research Center for Radiation Emergency Medicine, National Institute of Radiological Sciences, Inage-ku, Chiba-shi, Chiba, Japan
| | - Teruaki Konishi
- Research Development and Support Center, National Institute of Radiological Sciences, Inage-ku, Chiba-shi, Chiba, Japan
| | - Shin Hayashi
- Department of Molecular Cytogenetics, Medical Research Institute, Tokyo Medical and Dental University, Tokyo, Japan
| | - Daichi Shigemizu
- Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University, Tokyo, Japan.,Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Tsurumi, Yokohama, Japan
| | - Tatsuhiko Tsunoda
- Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University, Tokyo, Japan.,Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Tsurumi, Yokohama, Japan
| | - Keiji Moriyama
- Department of Maxillofacial Orthognathics, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan.,Bioresource Research Center, Tokyo Medical and Dental University, Bunkyo-ku, Tokyo, Japan
| | - Johji Inazawa
- Department of Molecular Cytogenetics, Medical Research Institute, Tokyo Medical and Dental University, Tokyo, Japan.,Bioresource Research Center, Tokyo Medical and Dental University, Bunkyo-ku, Tokyo, Japan
| |
Collapse
|
5
|
Loohuis LO, Witzel A, Mishra B. Improving Detection of Driver Genes: Power-Law Null Model of Copy Number Variation in Cancer. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11:1260-1263. [PMID: 26357061 DOI: 10.1109/tcbb.2014.2351805] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
In this paper, we study Copy Number Variation (CNV) data. The underlying process generating CNV segments is generally assumed to be memory-less, giving rise to an exponential distribution of segment lengths. In this paper, we provide evidence from cancer patient data, which suggests that this generative model is too simplistic, and that segment lengths follow a power-law distribution instead. We conjecture a simple preferential attachment generative model that provides the basis for the observed power-law distribution. We then show how an existing statistical method for detecting cancer driver genes can be improved by incorporating the power-law distribution in the null model.
Collapse
|
6
|
Vandeweyer G, Kooy RF. Detection and interpretation of genomic structural variation in health and disease. Expert Rev Mol Diagn 2014; 13:61-82. [DOI: 10.1586/erm.12.119] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
7
|
Pronold M, Vali M, Pique-Regi R, Asgharzadeh S. Copy number variation signature to predict human ancestry. BMC Bioinformatics 2012; 13:336. [PMID: 23270563 PMCID: PMC3598683 DOI: 10.1186/1471-2105-13-336] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2012] [Accepted: 12/06/2012] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND Copy number variations (CNVs) are genomic structural variants that are found in healthy populations and have been observed to be associated with disease susceptibility. Existing methods for CNV detection are often performed on a sample-by-sample basis, which is not ideal for large datasets where common CNVs must be estimated by comparing the frequency of CNVs in the individual samples. Here we describe a simple and novel approach to locate genome-wide CNVs common to a specific population, using human ancestry as the phenotype. RESULTS We utilized our previously published Genome Alteration Detection Analysis (GADA) algorithm to identify common ancestry CNVs (caCNVs) and built a caCNV model to predict population structure. We identified a 73 caCNV signature using a training set of 225 healthy individuals from European, Asian, and African ancestry. The signature was validated on an independent test set of 300 individuals with similar ancestral background. The error rate in predicting ancestry in this test set was 2% using the 73 caCNV signature. Among the caCNVs identified, several were previously confirmed experimentally to vary by ancestry. Our signature also contains a caCNV region with a single microRNA (MIR270), which represents the first reported variation of microRNA by ancestry. CONCLUSIONS We developed a new methodology to identify common CNVs and demonstrated its performance by building a caCNV signature to predict human ancestry with high accuracy. The utility of our approach could be extended to large case-control studies to identify CNV signatures for other phenotypes such as disease susceptibility and drug response.
Collapse
Affiliation(s)
- Melissa Pronold
- Department of Pediatrics, Children's Hospital Los Angeles and The Saban Research Institute, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | | | | | | |
Collapse
|
8
|
Seifert M, Gohr A, Strickert M, Grosse I. Parsimonious higher-order hidden Markov models for improved array-CGH analysis with applications to Arabidopsis thaliana. PLoS Comput Biol 2012; 8:e1002286. [PMID: 22253580 PMCID: PMC3257270 DOI: 10.1371/journal.pcbi.1002286] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2011] [Accepted: 10/11/2011] [Indexed: 12/19/2022] Open
Abstract
Array-based comparative genomic hybridization (Array-CGH) is an important technology in molecular biology for the detection of DNA copy number polymorphisms between closely related genomes. Hidden Markov Models (HMMs) are popular tools for the analysis of Array-CGH data, but current methods are only based on first-order HMMs having constrained abilities to model spatial dependencies between measurements of closely adjacent chromosomal regions. Here, we develop parsimonious higher-order HMMs enabling the interpolation between a mixture model ignoring spatial dependencies and a higher-order HMM exhaustively modeling spatial dependencies. We apply parsimonious higher-order HMMs to the analysis of Array-CGH data of the accessions C24 and Col-0 of the model plant Arabidopsis thaliana. We compare these models against first-order HMMs and other existing methods using a reference of known deletions and sequence deviations. We find that parsimonious higher-order HMMs clearly improve the identification of these polymorphisms. Moreover, we perform a functional analysis of identified polymorphisms revealing novel details of genomic differences between C24 and Col-0. Additional model evaluations are done on widely considered Array-CGH data of human cell lines indicating that parsimonious HMMs are also well-suited for the analysis of non-plant specific data. All these results indicate that parsimonious higher-order HMMs are useful for Array-CGH analyses. An implementation of parsimonious higher-order HMMs is available as part of the open source Java library Jstacs (www.jstacs.de/index.php/PHHMM). Array-based comparative genomics is a standard approach for the identification of DNA copy number polymorphisms between closely related genomes. The huge amounts of data produced by these experiments require efficient and accurate bioinformatics tools for the identification of copy number polymorphisms. Hidden Markov Models (HMMs) are frequently used for analyzing such data sets, but current models are based on first-order HMMs only having limited capabilities to model spatial dependencies between measurements of closely adjacent chromosomal regions. We develop parsimonious higher-order HMMs enabling the interpolation between a mixture model ignoring spatial dependencies and a higher-order HMM exhaustively modeling these dependencies to overcome this limitation. In an in-depth case study with Arabidopsis thaliana, we find that parsimonious higher-order HMMs clearly improve the identification of copy number polymorphisms in comparison to standard first-order HMMs and other frequently used methods. Functional analysis of identified polymorphisms revealed details of genomic differences between the accessions C24 and Col-0 of Arabidopsis thaliana. An additional study on human cell lines further indicates that parsimonious HMMs are well-suited for the analysis of Array-CGH data.
Collapse
Affiliation(s)
- Michael Seifert
- Department of Molecular Genetics, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Germany.
| | | | | | | |
Collapse
|
9
|
Stjernqvist S, Rydén T, Greenman CD. Model-integrated estimation of normal tissue contamination for cancer SNP allelic copy number data. Cancer Inform 2011; 10:159-73. [PMID: 21695067 PMCID: PMC3118450 DOI: 10.4137/cin.s6873] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
SNP allelic copy number data provides intensity measurements for the two different alleles separately. We present a method that estimates the number of copies of each allele at each SNP position, using a continuous-index hidden Markov model. The method is especially suited for cancer data, since it includes the fraction of normal tissue contamination, often present when studying data from cancer tumors, into the model. The continuous-index structure takes into account the distances between the SNPs, and is thereby appropriate also when SNPs are unequally spaced. In a simulation study we show that the method performs favorably compared to previous methods even with as much as 70% normal contamination. We also provide results from applications to clinical data produced using the Affymetrix genome-wide SNP 6.0 platform.
Collapse
Affiliation(s)
- Susann Stjernqvist
- Centre for Mathematical Sciences, Lund University, Box 118, 221 00 Lund, Sweden, Department of Mathematics, Royal Institute of Technology, 100 44 Stockholm, Sweden
| | | | | |
Collapse
|
10
|
He D, Hormozdiari F, Furlotte N, Eskin E. Efficient algorithms for tandem copy number variation reconstruction in repeat-rich regions. Bioinformatics 2011; 27:1513-20. [PMID: 21505028 DOI: 10.1093/bioinformatics/btr169] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
MOTIVATION Structural variations and in particular copy number variations (CNVs) have dramatic effects of disease and traits. Technologies for identifying CNVs have been an active area of research for over 10 years. The current generation of high-throughput sequencing techniques presents new opportunities for identification of CNVs. Methods that utilize these technologies map sequencing reads to a reference genome and look for signatures which might indicate the presence of a CNV. These methods work well when CNVs lie within unique genomic regions. However, the problem of CNV identification and reconstruction becomes much more challenging when CNVs are in repeat-rich regions, due to the multiple mapping positions of the reads. RESULTS In this study, we propose an efficient algorithm to handle these multi-mapping reads such that the CNVs can be reconstructed with high accuracy even for repeat-rich regions. To our knowledge, this is the first attempt to both identify and reconstruct CNVs in repeat-rich regions. Our experiments show that our method is not only computationally efficient but also accurate.
Collapse
Affiliation(s)
- Dan He
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA 90095, USA
| | | | | | | |
Collapse
|
11
|
Chen H, Xing H, Zhang NR. Estimation of parent specific DNA copy number in tumors using high-density genotyping arrays. PLoS Comput Biol 2011; 7:e1001060. [PMID: 21298078 PMCID: PMC3029233 DOI: 10.1371/journal.pcbi.1001060] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2010] [Accepted: 12/17/2010] [Indexed: 01/01/2023] Open
Abstract
Chromosomal gains and losses comprise an important type of genetic change in tumors, and can now be assayed using microarray hybridization-based experiments. Most current statistical models for DNA copy number estimate total copy number, which do not distinguish between the underlying quantities of the two inherited chromosomes. This latter information, sometimes called parent specific copy number, is important for identifying allele-specific amplifications and deletions, for quantifying normal cell contamination, and for giving a more complete molecular portrait of the tumor. We propose a stochastic segmentation model for parent-specific DNA copy number in tumor samples, and give an estimation procedure that is computationally efficient and can be applied to data from the current high density genotyping platforms. The proposed method does not require matched normal samples, and can estimate the unknown genotypes simultaneously with the parent specific copy number. The new method is used to analyze 223 glioblastoma samples from the Cancer Genome Atlas (TCGA) project, giving a more comprehensive summary of the copy number events in these samples. Detailed case studies on these samples reveal the additional insights that can be gained from an allele-specific copy number analysis, such as the quantification of fractional gains and losses, the identification of copy neutral loss of heterozygosity, and the characterization of regions of simultaneous changes of both inherited chromosomes. Many genetic diseases are related to copy number aberrations of some regions of the genome. As we know, each chromosome normally has two copies. However, under some circumstances, for some regions, either one or both of the chromosomes change. Genotyping microarray data provides the copy number of the two alleles of polymorphic sites along the chromosomes, which make the inference of the copy number aberrations of the chromosome feasible. One difficulty is that genotyping microarray data cannot provide the haplotype of the two copies of a chromosome. In this paper, we model the copy number along the chromosome as a two-dimensional Markov Chain. Using the observed copy number of both alleles of all the sites, we can determine the parent specific copy number along the chromosome as well as infer the haplotypes of the two copies of the inherited chromosomes in regions where there is allelic imbalance. Simulation results show high sensitivity and specificity of the method. Applying this method to glioblastoma samples from the Cancer Genome Atlas data illustrate the insights gained from allele-specific copy number analysis.
Collapse
Affiliation(s)
- Hao Chen
- Department of Statistics, Stanford University, Stanford, California, United States of America
| | - Haipeng Xing
- Department of Applied Mathematics and Statistics, SUNY at Stony Brook, Stony Brook, New York, United States of America
| | - Nancy R. Zhang
- Department of Statistics, Stanford University, Stanford, California, United States of America
- * E-mail:
| |
Collapse
|
12
|
Yuan A, Chen G, Xiong J, He W, Rotimi C. Bayesian Frequentist hybrid Model wth Application to the Analysis of Gene Copy Number Changes. J Appl Stat 2011; 38:987-1005. [PMID: 24014930 PMCID: PMC3762327 DOI: 10.1080/02664761003692449] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Gene copy number (GCN) changes are common characteristics of many genetic diseases. Comparative genomic hybridization (CGH) is a new technology widely used today to screen the GCN changes in mutant cells with high resolution genome-wide. Statistical methods for analyzing such CGH data have been evolving. Existing methods are either frequentist's, or full Bayesian. The former often has computational advantage, while the latter can incorporate prior information into the model, but could be misleading when one does not have sound prior information. In an attempt to take full advantages of both approaches, we develop a Bayesian-frequentist hybrid approach, in which a subset of the model parameters is inferred by the Bayesian method, while the rest parameters by the frequentist's. This new hybrid approach provides advantages over those of the Bayesian or frequentist's method used alone. This is especially the case when sound prior information is available on part of the parameters, and the sample size is relatively small. Spatial dependence and false discovery rate are also discussed, and the parameter estimation is efficient. As an illustration, we used the proposed hybrid approach to analyze a real CGH data.
Collapse
Affiliation(s)
- Ao Yuan
- National Human Genome Center, Howard University, Washington D.C. USA
| | | | | | | | | |
Collapse
|
13
|
He D, Furlotte N, Eskin E. Detection and reconstruction of tandemly organized de novo copy number variations. BMC Bioinformatics 2010; 11 Suppl 11:S12. [PMID: 21172047 PMCID: PMC3024866 DOI: 10.1186/1471-2105-11-s11-s12] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Background The characterization of structural variations (SV) such as insertions, deletions and copy number variations is a critical step in the process of understanding the full genetic architecture of organisms. Copy number variations (CNV) have attracted much recent attention due to their effects on gene expression and disease status. Results In this paper, we present a method that utilizes next-generation sequencing technologies (NGS), in order to both detect and reconstruct CNVs. We focus on a special type of CNV, namely tandemly organized de novo CNVs, which have been shown to occur with high frequency in the mouse genome. Conclusions We apply our method to CNV regions randomly inserted into the reference mouse genome and show that our method achieves good performance for both detection and reconstruction of tandemly organized de novo CNVs.
Collapse
Affiliation(s)
- Dan He
- Dept, of Comp, Sci, Univ, of California Los Angeles, Los Angeles, CA 90095, USA.
| | | | | |
Collapse
|
14
|
A bayesian analysis for identifying DNA copy number variations using a compound poisson process. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2010; 2010:268513. [PMID: 20976296 DOI: 10.1155/2010/268513] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/03/2010] [Revised: 07/29/2010] [Accepted: 08/06/2010] [Indexed: 11/17/2022]
Abstract
To study chromosomal aberrations that may lead to cancer formation or genetic diseases, the array-based Comparative Genomic Hybridization (aCGH) technique is often used for detecting DNA copy number variants (CNVs). Various methods have been developed for gaining CNVs information based on aCGH data. However, most of these methods make use of the log-intensity ratios in aCGH data without taking advantage of other information such as the DNA probe (e.g., biomarker) positions/distances contained in the data. Motivated by the specific features of aCGH data, we developed a novel method that takes into account the estimation of a change point or locus of the CNV in aCGH data with its associated biomarker position on the chromosome using a compound Poisson process. We used a Bayesian approach to derive the posterior probability for the estimation of the CNV locus. To detect loci of multiple CNVs in the data, a sliding window process combined with our derived Bayesian posterior probability was proposed. To evaluate the performance of the method in the estimation of the CNV locus, we first performed simulation studies. Finally, we applied our approach to real data from aCGH experiments, demonstrating its applicability.
Collapse
|
15
|
Morganella S, Cerulo L, Viglietto G, Ceccarelli M. VEGA: variational segmentation for copy number detection. ACTA ACUST UNITED AC 2010; 26:3020-7. [PMID: 20959380 DOI: 10.1093/bioinformatics/btq586] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Genomic copy number (CN) information is useful to study genetic traits of many diseases. Using array comparative genomic hybridization (aCGH), researchers are able to measure the copy number of thousands of DNA loci at the same time. Therefore, a current challenge in bioinformatics is the development of efficient algorithms to detect the map of aberrant chromosomal regions. METHODS We describe an approach for the segmentation of copy number aCGH data. Variational estimator for genomic aberrations (VEGA) adopt a variational model used in image segmentation. The optimal segmentation is modeled as the minimum of an energy functional encompassing both the quality of interpolation of the data and the complexity of the solution measured by the length of the boundaries between segmented regions. This solution is obtained by a region growing process where the stop condition is completely data driven. RESULTS VEGA is compared with three algorithms that represent the state of the art in CN segmentation. Performance assessment is made both on synthetic and real data. Synthetic data simulate different noise conditions. Results on these data show the robustness with respect to noise of variational models and the accuracy of VEGA in terms of recall and precision. Eight mantle cell lymphoma cell lines and two samples of glioblastoma multiforme are used to evaluate the behavior of VEGA on real biological data. Comparison between results and current biological knowledge shows the ability of the proposed method in detecting known chromosomal aberrations. AVAILABILITY VEGA has been implemented in R and is available at the address http://www.dsba.unisannio.it/Members/ceccarelli/vega in the section Download.
Collapse
Affiliation(s)
- Sandro Morganella
- Department of Biological and Environmental Studies, University of Sannio, Benevento, Italy
| | | | | | | |
Collapse
|
16
|
Zhang NR, Siegmund DO, Ji H, Li JZ. Detecting simultaneous changepoints in multiple sequences. Biometrika 2010; 97:631-645. [PMID: 22822250 DOI: 10.1093/biomet/asq025] [Citation(s) in RCA: 81] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
We discuss the detection of local signals that occur at the same location in multiple one-dimensional noisy sequences, with particular attention to relatively weak signals that may occur in only a fraction of the sequences. We propose simple scan and segmentation algorithms based on the sum of the chi-squared statistics for each individual sample, which is equivalent to the generalized likelihood ratio for a model where the errors in each sample are independent. The simple geometry of the statistic allows us to derive accurate analytic approximations to the significance level of such scans. The formulation of the model is motivated by the biological problem of detecting recurrent DNA copy number variants in multiple samples. We show using replicates and parent-child comparisons that pooling data across samples results in more accurate detection of copy number variants. We also apply the multisample segmentation algorithm to the analysis of a cohort of tumour samples containing complex nested and overlapping copy number aberrations, for which our method gives a sparse and intuitive cross-sample summary.
Collapse
Affiliation(s)
- Nancy R Zhang
- Department of Statistics , Stanford University , 390 Serra Mall, Stanford, California 94305-4065 , U.S.A.
| | | | | | | |
Collapse
|
17
|
Wu LY, Chipman HA, Bull SB, Briollais L, Wang K. A Bayesian segmentation approach to ascertain copy number variations at the population level. ACTA ACUST UNITED AC 2009; 25:1669-79. [PMID: 19389735 DOI: 10.1093/bioinformatics/btp270] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Efficient and accurate ascertainment of copy number variations (CNVs) at the population level is essential to understand the evolutionary process and population genetics, and to apply CNVs in population-based genome-wide association studies for complex human diseases. We propose a novel Bayesian segmentation approach to identify CNVs in a defined population of any size. It is computationally efficient and provides statistical evidence for the detected CNVs through the Bayes factor. This approach has the unique feature of carrying out segmentation and assigning copy number status simultaneously-a desirable property that current segmentation methods do not share. RESULTS In comparisons with popular two-step segmentation methods for a single individual using benchmark simulation studies, we find the new approach to perform competitively with respect to false discovery rate and sensitivity in breakpoint detection. In a simulation study of multiple samples with recurrent copy numbers, the new approach outperforms two leading single sample methods. We further demonstrate the effectiveness of our approach in population-level analysis of previously published HapMap data. We also apply our approach in studying population genetics of CNVs. AVAILABILITY R programs are available at http://www.mshri.on.ca/mitacs/software/SOFTWARE.HTML
Collapse
Affiliation(s)
- Long Yang Wu
- Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario, Canada.
| | | | | | | | | |
Collapse
|
18
|
Abstract
DNA microarrays have become a mainstream tool in experimental plant biology. The constant improvements in the technological platforms have enabled the development of the tiling DNA microarrays that cover the whole genome, which in turn catalyzed the wide variety of creative applications of such microarrays in the areas as diverse as global studies of genetic variation, DNA-binding proteins, DNA methylation, and chromatin and transcriptome dynamics. This chapter attempts to summarize such applications as well as discusses some technical and strategic issues that are particular to the use of tiling microarrays.
Collapse
|
19
|
Yuan A, Chen G, Zhou ZC, Bonney G, Rotimi C. Gene copy number analysis for family data using semiparametric copula model. Bioinform Biol Insights 2008; 2:343-55. [PMID: 19812787 PMCID: PMC2735963 DOI: 10.4137/bbi.s839] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
Abstract
Gene copy number changes are common characteristics of many genetic disorders. A new technology, array comparative genomic hybridization (a-CGH), is widely used today to screen for gains and losses in cancers and other genetic diseases with high resolution at the genome level or for specific chromosomal region. Statistical methods for analyzing such a-CGH data have been developed. However, most of the existing methods are for unrelated individual data and the results from them provide explanation for horizontal variations in copy number changes. It is potentially meaningful to develop a statistical method that will allow for the analysis of family data to investigate the vertical kinship effects as well. Here we consider a semiparametric model based on clustering method in which the marginal distributions are estimated nonparametrically, and the familial dependence structure is modeled by copula. The model is illustrated and evaluated using simulated data. Our results show that the proposed method is more robust than the commonly used multivariate normal model. Finally, we demonstrated the utility of our method using a real dataset.
Collapse
Affiliation(s)
- Ao Yuan
- National Human Genome Center, Howard University, Washington, DC, 20059 USAUSA.
| | | | | | | | | |
Collapse
|
20
|
Huang H, Nguyen N, Oraintara S, Vo A. Array CGH data modeling and smoothing in Stationary Wavelet Packet Transform domain. BMC Genomics 2008; 9 Suppl 2:S17. [PMID: 18831782 PMCID: PMC2559881 DOI: 10.1186/1471-2164-9-s2-s17] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Background Array-based comparative genomic hybridization (array CGH) is a highly efficient technique, allowing the simultaneous measurement of genomic DNA copy number at hundreds or thousands of loci and the reliable detection of local one-copy-level variations. Characterization of these DNA copy number changes is important for both the basic understanding of cancer and its diagnosis. In order to develop effective methods to identify aberration regions from array CGH data, many recent research work focus on both smoothing-based and segmentation-based data processing. In this paper, we propose stationary packet wavelet transform based approach to smooth array CGH data. Our purpose is to remove CGH noise in whole frequency while keeping true signal by using bivariate model. Results In both synthetic and real CGH data, Stationary Wavelet Packet Transform (SWPT) is the best wavelet transform to analyze CGH signal in whole frequency. We also introduce a new bivariate shrinkage model which shows the relationship of CGH noisy coefficients of two scales in SWPT. Before smoothing, the symmetric extension is considered as a preprocessing step to save information at the border. Conclusion We have designed the SWTP and the SWPT-Bi which are using the stationary wavelet packet transform with the hard thresholding and the new bivariate shrinkage estimator respectively to smooth the array CGH data. We demonstrate the effectiveness of our approach through theoretical and experimental exploration of a set of array CGH data, including both synthetic data and real data. The comparison results show that our method outperforms the previous approaches.
Collapse
Affiliation(s)
- Heng Huang
- Department of Computer Science and Engineering, University of Texas at Arlington, TX, USA.
| | | | | | | |
Collapse
|
21
|
Ionita-Laza I, Laird NM, Raby BA, Weiss ST, Lange C. On the frequency of copy number variants. ACTA ACUST UNITED AC 2008; 24:2350-5. [PMID: 18689430 DOI: 10.1093/bioinformatics/btn421] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Estimating the frequency distribution of copy number variants (CNVs) is an important aspect of the effort to characterize this new type of genetic variation. Currently, most studies report a strong skew toward low-frequency CNVs. In this article, our goal is to investigate the frequencies of CNVs. We employ a two-step procedure for the CNV frequency estimation process. We use family information a posteriori to select only the most reliable CNV regions, i.e. those showing high rates of Mendelian transmission. RESULTS Our results suggest that the current skew toward low-frequency CNVs may not be representative of the true frequency distribution, but may be due, among other reasons, to the non-negligible false negative rates that characterize CNV detection methods. Moreover, false positives are also likely, as low-frequency CNVs are hard to detect with small sample sizes and technologies that are not ideally suited for their detection. Without appropriate validation methods, such as incorporation of biologically relevant information (for example, in our case, the transmission of heritable CNVs from parents to offspring), it is difficult to assess the validity of specific CNVs, and even harder to obtain reliable frequency estimates.
Collapse
Affiliation(s)
- Iuliana Ionita-Laza
- Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA.
| | | | | | | | | |
Collapse
|
22
|
Wu H, Kim KJ, Mehta K, Paxia S, Sundstrom A, Anantharaman T, Kuraishy AI, Doan T, Ghosh J, Pyle AD, Clark A, Lowry W, Fan G, Baxter T, Mishra B, Sun Y, Teitell MA. Copy number variant analysis of human embryonic stem cells. Stem Cells 2008; 26:1484-9. [PMID: 18369100 DOI: 10.1634/stemcells.2007-0993] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Differences between individual DNA sequences provide the basis for human genetic variability. Forms of genetic variation include single-nucleotide polymorphisms, insertions/duplications, deletions, and inversions/translocations. The genome of human embryonic stem cells (hESCs) has been characterized mainly by karyotyping and comparative genomic hybridization (CGH), techniques whose relatively low resolution at 2-10 megabases (Mb) cannot accurately determine most copy number variability, which is estimated to involve 10%-20% of the genome. In this brief technical study, we examined HSF1 and HSF6 hESCs using array-comparative genomic hybridization (aCGH) to determine copy number variants (CNVs) as a higher-resolution method for characterizing hESCs. Our approach used five samples for each hESC line and showed four consistent CNVs for HSF1 and five consistent CNVs for HSF6. These consistent CNVs included amplifications and deletions that ranged in size from 20 kilobases to 1.48 megabases, involved seven different chromosomes, were both shared and unique between hESCs, and were maintained during neuronal stem/progenitor cell differentiation or drug selection. Thirty HSF1 and 40 HSF6 less consistently scored but still highly significant candidate CNVs were also identified. Overall, aCGH provides a promising approach for uniquely identifying hESCs and their derivatives and highlights a potential genomic source for distinct differentiation and functional potentials that lower-resolution karyotype and CGH techniques could miss. Disclosure of potential conflicts of interest is found at the end of this article.
Collapse
Affiliation(s)
- Hao Wu
- Department of Psychiatry and Biobehavioral Sciences, University of California Los Angeles, Los Angeles, California 90095-1732, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
23
|
Climent J, Garcia JL, Mao JH, Arsuaga J, Perez-Losada J. Characterization of breast cancer by array comparative genomic hybridization. Biochem Cell Biol 2008; 85:497-508. [PMID: 17713584 DOI: 10.1139/o07-072] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Cancer progression is due to the accumulation of recurrent genomic alterations that induce growth advantage and clonal expansion. Most of these genomic changes can be detected using the array comparative genomic hybridization (CGH) technique. The accurate classification of these genomic alterations is expected to have an important impact on translational and basic research. Here we review recent advances in CGH technology used in the characterization of different features of breast cancer. First, we present bioinformatics methods that have been developed for the analysis of CGH arrays; next, we discuss the use of array CGH technology to classify tumor stages and to identify and stratify subgroups of patients with different prognoses and clinical behaviors. We finish our review with a discussion of how CGH arrays are being used to identify oncogenes, tumor suppressor genes, and breast cancer susceptibility genes.
Collapse
Affiliation(s)
- J Climent
- Comprehensive Cancer Center, University of California, San Francisco, CA 94143, USA
| | | | | | | | | |
Collapse
|
24
|
Nilsson B, Johansson M, Heyden A, Nelander S, Fioretos T. An improved method for detecting and delineating genomic regions with altered gene expression in cancer. Genome Biol 2008; 9:R13. [PMID: 18208590 PMCID: PMC2395254 DOI: 10.1186/gb-2008-9-1-r13] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2007] [Accepted: 01/21/2008] [Indexed: 11/22/2022] Open
Abstract
A method is presented for identifying genomic regions with altered gene expression in gene expression maps. Genomic regions with altered gene expression are a characteristic feature of cancer cells. We present a novel method for identifying such regions in gene expression maps. This method is based on total variation minimization, a classical signal restoration technique. In systematic evaluations, we show that our method combines top-notch detection performance with an ability to delineate relevant regions without excessive over-segmentation, making it a significant advance over existing methods. Software (Rendersome) is provided.
Collapse
Affiliation(s)
- Björn Nilsson
- Department of Clinical Genetics, Lund University Hospital, SE-221 85 Lund, Sweden.
| | | | | | | | | |
Collapse
|
25
|
Affiliation(s)
- X Shirley Liu
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Harvard School of Public Health, Boston, Massachusetts, USA.
| |
Collapse
|
26
|
Lo KC, Shankar G, Turpaz Y, Bailey D, Rossi MR, Burkhardt T, Liang P, Cowell JK. Overlay tool for aCGHViewer: an analysis module built for aCGHViewer used to perform comparisons of data derived from different microarray platforms. Cancer Inform 2007; 3:307-19. [PMID: 19455250 PMCID: PMC2675835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
The Overlay Tool has been developed to combine high throughput data derived from various microarray platforms. This tool analyzes high-resolution correlations between gene expression changes and either copy number abnormalities (CNAs) or loss of heterozygosity events detected using array comparative genomic hybridization (aCGH). Using an overlay analysis which is designed to be performed using data from multiple microarray platforms on a single biological sample, the Overlay Tool identifies potentially important genes whose expression profiles are changed as a result of losses, gains and amplifications in the cancer genome. In addition, the Overlay Tool will incorporate loss of heterozygosity (LOH) probability data into this overlay procedure. To facilitate this analysis, we developed an application which computationally combines two or more high throughput datasets (e.g. aCGH/expression) into a single categorized dataset for visualization and interrogation using a gene-centric approach. As such, data from virtually any microarray platform can be incorporated without the need to remap entire datasets individually. The resultant categorized (overlay) data set can be conveniently viewed using our in-house visualization tool, aCGHViewer (Shankar et al. 2006), which serves as a conduit to public databases such as UCSC and NCBI, to rapidly investigate genes of interest.
Collapse
Affiliation(s)
- Ken C. Lo
- Department of Cancer Genetics, Roswell Park Cancer Institute, Buffalo, NY 14263
| | - Ganesh Shankar
- Department of Cancer Genetics, Roswell Park Cancer Institute, Buffalo, NY 14263
| | | | | | - Michael R. Rossi
- Department of Cancer Genetics, Roswell Park Cancer Institute, Buffalo, NY 14263,Yale University School of Medicine, Department of Cancer Genetics, New Haven, CT 06520
| | - Tania Burkhardt
- Department of Cancer Genetics, Roswell Park Cancer Institute, Buffalo, NY 14263
| | - Ping Liang
- Department of Cancer Genetics, Roswell Park Cancer Institute, Buffalo, NY 14263
| | - John K. Cowell
- Department of Cancer Genetics, Roswell Park Cancer Institute, Buffalo, NY 14263,Correspondence: John K. Cowell, Ph.D., D.Sc. FRCPath, Department of Cancer Genetics, Roswell Park Cancer Institute, Elm and Carlton Streets, Buffalo, NY 14263. Tel: (716)845-5714; Fax: (716)845-1698;
| |
Collapse
|
27
|
Rueda OM, Díaz-Uriarte R. Flexible and accurate detection of genomic copy-number changes from aCGH. PLoS Comput Biol 2007; 3:e122. [PMID: 17590078 PMCID: PMC1894821 DOI: 10.1371/journal.pcbi.0030122] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2007] [Accepted: 05/16/2007] [Indexed: 11/18/2022] Open
Abstract
Genomic DNA copy-number alterations (CNAs) are associated with complex diseases, including cancer: CNAs are indeed related to tumoral grade, metastasis, and patient survival. CNAs discovered from array-based comparative genomic hybridization (aCGH) data have been instrumental in identifying disease-related genes and potential therapeutic targets. To be immediately useful in both clinical and basic research scenarios, aCGH data analysis requires accurate methods that do not impose unrealistic biological assumptions and that provide direct answers to the key question, "What is the probability that this gene/region has CNAs?" Current approaches fail, however, to meet these requirements. Here, we introduce reversible jump aCGH (RJaCGH), a new method for identifying CNAs from aCGH; we use a nonhomogeneous hidden Markov model fitted via reversible jump Markov chain Monte Carlo; and we incorporate model uncertainty through Bayesian model averaging. RJaCGH provides an estimate of the probability that a gene/region has CNAs while incorporating interprobe distance and the capability to analyze data on a chromosome or genome-wide basis. RJaCGH outperforms alternative methods, and the performance difference is even larger with noisy data and highly variable interprobe distance, both commonly found features in aCGH data. Furthermore, our probabilistic method allows us to identify minimal common regions of CNAs among samples and can be extended to incorporate expression data. In summary, we provide a rigorous statistical framework for locating genes and chromosomal regions with CNAs with potential applications to cancer and other complex human diseases.
Collapse
Affiliation(s)
- Oscar M Rueda
- Structural and Computational Biology Programme, Spanish National Cancer Centre (CNIO), Madrid, Spain
- * To whom correspondence should be addressed. E-mail: (OMR), (RDU)
| | - Ramón Díaz-Uriarte
- Structural and Computational Biology Programme, Spanish National Cancer Centre (CNIO), Madrid, Spain
- * To whom correspondence should be addressed. E-mail: (OMR), (RDU)
| |
Collapse
|
28
|
Yu T, Ye H, Sun W, Li KC, Chen Z, Jacobs S, Bailey DK, Wong DT, Zhou X. A forward-backward fragment assembling algorithm for the identification of genomic amplification and deletion breakpoints using high-density single nucleotide polymorphism (SNP) array. BMC Bioinformatics 2007; 8:145. [PMID: 17477871 PMCID: PMC1868765 DOI: 10.1186/1471-2105-8-145] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2006] [Accepted: 05/03/2007] [Indexed: 01/16/2023] Open
Abstract
BACKGROUND DNA copy number aberration (CNA) is one of the key characteristics of cancer cells. Recent studies demonstrated the feasibility of utilizing high density single nucleotide polymorphism (SNP) genotyping arrays to detect CNA. Compared with the two-color array-based comparative genomic hybridization (array-CGH), the SNP arrays offer much higher probe density and lower signal-to-noise ratio at the single SNP level. To accurately identify small segments of CNA from SNP array data, segmentation methods that are sensitive to CNA while resistant to noise are required. RESULTS We have developed a highly sensitive algorithm for the edge detection of copy number data which is especially suitable for the SNP array-based copy number data. The method consists of an over-sensitive edge-detection step and a test-based forward-backward edge selection step. CONCLUSION Using simulations constructed from real experimental data, the method shows high sensitivity and specificity in detecting small copy number changes in focused regions. The method is implemented in an R package FASeg, which includes data processing and visualization utilities, as well as libraries for processing Affymetrix SNP array data.
Collapse
Affiliation(s)
- Tianwei Yu
- Department of Biostatistics, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | - Hui Ye
- Center for Molecular Biology of Oral Diseases, College of Dentistry, University of Illinois at Chicago, Chicago, IL, USA
- Shanghai Children's Medical Center, Shanghai Jiao-Tong University, Shanghai, China
| | - Wei Sun
- Department of Statistics, University of California at Los Angeles, CA, USA
| | - Ker-Chau Li
- Department of Statistics, University of California at Los Angeles, CA, USA
| | - Zugen Chen
- Department of Human Genetics & Microarray Core, University of California at Los Angeles, Los Angeles, CA, USA
| | - Sharoni Jacobs
- Affymetrix, Inc., 3420 Central Expressway, Santa Clara, CA, USA
| | - Dione K Bailey
- Affymetrix, Inc., 3420 Central Expressway, Santa Clara, CA, USA
| | - David T Wong
- Dental Research Institute, School of Dentistry, David Geffen School of Medicine & Henry Samueli School of Engineering & Jonsson Comprehensive Cancer Center, University of California at Los Angeles, Los Angeles, CA, USA
| | - Xiaofeng Zhou
- Center for Molecular Biology of Oral Diseases, College of Dentistry, University of Illinois at Chicago, Chicago, IL, USA
- Guanghua School & Research Institute of Stomatology, Sun Yat-Sen University, Guangzhou, China
| |
Collapse
|
29
|
Stjernqvist S, Rydén T, Sköld M, Staaf J. Continuous-index hidden Markov modelling of array CGH copy number data. ACTA ACUST UNITED AC 2007; 23:1006-14. [PMID: 17309894 DOI: 10.1093/bioinformatics/btm059] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
MOTIVATION In recent years, a range of techniques for analysis and segmentation of array comparative genomic hybridization (aCGH) data have been proposed. For array designs in which clones are of unequal lengths, are unevenly spaced or overlap, the discrete-index view typically adopted by such methods may be questionable or improved. RESULTS We describe a continuous-index hidden Markov model for aCGH data as well as a Monte Carlo EM algorithm to estimate its parameters. It is shown that for a dataset from the BT-474 cell line analysed on 32K BAC tiling microarrays, this model yields considerably better model fit in terms of lag-1 residual autocorrelations compared to a discrete-index HMM, and it is also shown how to use the model for e.g. estimation of change points on the base-pair scale and for estimation of conditional state probabilities across the genome. In addition, the model is applied to the Glioblastoma Multiforme data used in the comparative study by Lai et al. (Lai,W.R. et al. (2005) Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data. Bioinformatics, 21, 3763-3370.) giving result similar to theirs but with certain features highlighted in the continuous-index setting.
Collapse
|
30
|
Hu J, Gao JB, Cao Y, Bottinger E, Zhang W. Exploiting noise in array CGH data to improve detection of DNA copy number change. Nucleic Acids Res 2007; 35:e35. [PMID: 17272296 PMCID: PMC1994778 DOI: 10.1093/nar/gkl730] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Developing effective methods for analyzing array-CGH data to detect chromosomal aberrations is very important for the diagnosis of pathogenesis of cancer and other diseases. Current analysis methods, being largely based on smoothing and/or segmentation, are not quite capable of detecting both the aberration regions and the boundary break points very accurately. Furthermore, when evaluating the accuracy of an algorithm for analyzing array-CGH data, it is commonly assumed that noise in the data follows normal distribution. A fundamental question is whether noise in array-CGH is indeed Gaussian, and if not, can one exploit the characteristics of noise to develop novel analysis methods that are capable of detecting accurately the aberration regions as well as the boundary break points simultaneously? By analyzing bacterial artificial chromosomes (BACs) arrays with an average 1 mb resolution, 19 k oligo arrays with the average probe spacing <100 kb and 385 k oligo arrays with the average probe spacing of about 6 kb, we show that when there are aberrations, noise in all three types of arrays is highly non-Gaussian and possesses long-range spatial correlations, and that such noise leads to worse performance of existing methods for detecting aberrations in array-CGH than the Gaussian noise case. We further develop a novel method, which has optimally exploited the character of the noise, and is capable of identifying both aberration regions as well as the boundary break points very accurately. Finally, we propose a new concept, posteriori signal-to-noise ratio (p-SNR), to assign certain confidence level to an aberration region and boundaries detected.
Collapse
Affiliation(s)
| | - Jian-Bo Gao
- *Correspondence may also be addressed to Jian-Bo Gao.
| | - Yinhe Cao
- Department of Electrical and Computer Engineering, University of Florida Gainesville, FL 32611, Biosieve 1026 Springfield Drive, Campbell, CA 95008 and Department of Medicine, Mount Sinai School of Medicine One Gustave L. Levy Place, New York, NY 10029, USA
| | - Erwin Bottinger
- Department of Electrical and Computer Engineering, University of Florida Gainesville, FL 32611, Biosieve 1026 Springfield Drive, Campbell, CA 95008 and Department of Medicine, Mount Sinai School of Medicine One Gustave L. Levy Place, New York, NY 10029, USA
| | - Weijia Zhang
- Department of Electrical and Computer Engineering, University of Florida Gainesville, FL 32611, Biosieve 1026 Springfield Drive, Campbell, CA 95008 and Department of Medicine, Mount Sinai School of Medicine One Gustave L. Levy Place, New York, NY 10029, USA
- *To whom correspondence should be addressed. +1 21224128831 2128492643
| |
Collapse
|
31
|
Hicks J, Krasnitz A, Lakshmi B, Navin NE, Riggs M, Leibu E, Esposito D, Alexander J, Troge J, Grubor V, Yoon S, Wigler M, Ye K, Børresen-Dale AL, Naume B, Schlicting E, Norton L, Hägerström T, Skoog L, Auer G, Månér S, Lundin P, Zetterberg A. Novel patterns of genome rearrangement and their association with survival in breast cancer. Genome Res 2007; 16:1465-79. [PMID: 17142309 PMCID: PMC1665631 DOI: 10.1101/gr.5460106] [Citation(s) in RCA: 255] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Representational Oligonucleotide Microarray Analysis (ROMA) detects genomic amplifications and deletions with boundaries defined at a resolution of approximately 50 kb. We have used this technique to examine 243 breast tumors from two separate studies for which detailed clinical data were available. The very high resolution of this technology has enabled us to identify three characteristic patterns of genomic copy number variation in diploid tumors and to measure correlations with patient survival. One of these patterns is characterized by multiple closely spaced amplicons, or "firestorms," limited to single chromosome arms. These multiple amplifications are highly correlated with aggressive disease and poor survival even when the rest of the genome is relatively quiet. Analysis of a selected subset of clinical material suggests that a simple genomic calculation, based on the number and proximity of genomic alterations, correlates with life-table estimates of the probability of overall survival in patients with primary breast cancer. Based on this sample, we generate the working hypothesis that copy number profiling might provide information useful in making clinical decisions, especially regarding the use or not of systemic therapies (hormonal therapy, chemotherapy), in the management of operable primary breast cancer with ostensibly good prognosis, for example, small, node-negative, hormone-receptor-positive diploid cases.
Collapse
Affiliation(s)
- James Hicks
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
32
|
Ionita I, Daruwala RS, Mishra B. Mapping tumor-suppressor genes with multipoint statistics from copy-number-variation data. Am J Hum Genet 2006; 79:13-22. [PMID: 16773561 PMCID: PMC1474131 DOI: 10.1086/504354] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2005] [Accepted: 03/17/2006] [Indexed: 11/03/2022] Open
Abstract
Array-based comparative genomic hybridization (arrayCGH) is a microarray-based comparative genomic hybridization technique that has been used to compare tumor genomes with normal genomes, thus providing rapid genomic assays of tumor genomes in terms of copy-number variations of those chromosomal segments that have been gained or lost. When properly interpreted, these assays are likely to shed important light on genes and mechanisms involved in the initiation and progression of cancer. Specifically, chromosomal segments, deleted in one or both copies of the diploid genomes of a group of patients with cancer, point to locations of tumor-suppressor genes (TSGs) implicated in the cancer. In this study, we focused on automatic methods for reliable detection of such genes and their locations, and we devised an efficient statistical algorithm to map TSGs, using a novel multipoint statistical score function. The proposed algorithm estimates the location of TSGs by analyzing segmental deletions (hemi- or homozygous) in the genomes of patients with cancer and the spatial relation of the deleted segments to any specific genomic interval. The algorithm assigns, to an interval of consecutive probes, a multipoint score that parsimoniously captures the underlying biology. It also computes a P value for every putative TSG by using concepts from the theory of scan statistics. Furthermore, it can identify smaller sets of predictive probes that can be used as biomarkers for diagnosis and therapeutics. We validated our method using different simulated artificial data sets and one real data set, and we report encouraging results. We discuss how, with suitable modifications to the underlying statistical model, this algorithm can be applied generally to a wider class of problems (e.g., detection of oncogenes).
Collapse
Affiliation(s)
- Iuliana Ionita
- Courant Institute of Mathematical Sciences, New York, NY 10012, USA
| | | | | |
Collapse
|
33
|
Daser A, Thangavelu M, Pannell R, Forster A, Sparrow L, Chung G, Dear PH, Rabbitts TH. Interrogation of genomes by molecular copy-number counting (MCC). Nat Methods 2006; 3:447-53. [PMID: 16721378 DOI: 10.1038/nmeth880] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2006] [Accepted: 04/17/2006] [Indexed: 11/08/2022]
Abstract
Human cancers and some congenital traits are characterized by cytogenetic aberrations including translocations, amplifications, duplications or deletions that can involve gain or loss of genetic material. We have developed a simple method to precisely delineate such regions with known or cryptic genomic alterations. Molecular copy-number counting (MCC) uses PCR to interrogate miniscule amounts of genomic DNA and allows progressive delineation of DNA content to within a few hundred base pairs of a genomic alteration. As an example, we have located the junctions of a recurrent nonreciprocal translocation between chromosomes 3 and 5 in human renal cell carcinoma, facilitating cloning of the breakpoint without recourse to genomic libraries. The analysis also revealed additional cryptic chromosomal changes close to the translocation junction. MCC is a fast and flexible method for characterizing a wide range of chromosomal aberrations.
Collapse
Affiliation(s)
- Angelika Daser
- MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, UK
| | | | | | | | | | | | | | | |
Collapse
|
34
|
Huang J, Wei W, Chen J, Zhang J, Liu G, Di X, Mei R, Ishikawa S, Aburatani H, Jones KW, Shapero MH. CARAT: a novel method for allelic detection of DNA copy number changes using high density oligonucleotide arrays. BMC Bioinformatics 2006; 7:83. [PMID: 16504045 PMCID: PMC1402331 DOI: 10.1186/1471-2105-7-83] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2005] [Accepted: 02/21/2006] [Indexed: 12/13/2022] Open
Abstract
Background DNA copy number alterations are one of the main characteristics of the cancer cell karyotype and can contribute to the complex phenotype of these cells. These alterations can lead to gains in cellular oncogenes as well as losses in tumor suppressor genes and can span small intervals as well as involve entire chromosomes. The ability to accurately detect these changes is central to understanding how they impact the biology of the cell. Results We describe a novel algorithm called CARAT (Copy Number Analysis with Regression And Tree) that uses probe intensity information to infer copy number in an allele-specific manner from high density DNA oligonuceotide arrays designed to genotype over 100, 000 SNPs. Total and allele-specific copy number estimations using CARAT are independently evaluated for a subset of SNPs using quantitative PCR and allelic TaqMan reactions with several human breast cancer cell lines. The sensitivity and specificity of the algorithm are characterized using DNA samples containing differing numbers of X chromosomes as well as a test set of normal individuals. Results from the algorithm show a high degree of agreement with results from independent verification methods. Conclusion Overall, CARAT automatically detects regions with copy number variations and assigns a significance score to each alteration as well as generating allele-specific output. When coupled with SNP genotype calls from the same array, CARAT provides additional detail into the structure of genome wide alterations that can contribute to allelic imbalance.
Collapse
Affiliation(s)
- Jing Huang
- Affymetrix, Inc. 3420 Central Expressway, Santa Clara CA 95051, USA
| | - Wen Wei
- Affymetrix, Inc. 3420 Central Expressway, Santa Clara CA 95051, USA
| | - Joyce Chen
- Affymetrix, Inc. 3420 Central Expressway, Santa Clara CA 95051, USA
| | - Jane Zhang
- Affymetrix, Inc. 3420 Central Expressway, Santa Clara CA 95051, USA
| | - Guoying Liu
- Affymetrix, Inc. 3420 Central Expressway, Santa Clara CA 95051, USA
| | - Xiaojun Di
- Affymetrix, Inc. 3420 Central Expressway, Santa Clara CA 95051, USA
| | - Rui Mei
- Affymetrix, Inc. 3420 Central Expressway, Santa Clara CA 95051, USA
| | - Shumpei Ishikawa
- University of Tokyo, Genome Science Division Research Center for Advanced Science and Technology, 4-6-1 Komaba, Meguro, 153-8904, Tokyo
| | - Hiroyuki Aburatani
- University of Tokyo, Genome Science Division Research Center for Advanced Science and Technology, 4-6-1 Komaba, Meguro, 153-8904, Tokyo
| | - Keith W Jones
- Affymetrix, Inc. 3420 Central Expressway, Santa Clara CA 95051, USA
| | | |
Collapse
|
35
|
Oba S, Tomioka N, Ohira M, Ishii S. Combfit: A Normalization Method for Array CGH Data. ACTA ACUST UNITED AC 2006. [DOI: 10.2197/ipsjdc.2.716] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
36
|
Shankar G, Rossi MR, McQuaid DE, Conroy JM, Gaile DG, Cowell JK, Nowak NJ, Liang P. aCGHViewer: a generic visualization tool for aCGH data. Cancer Inform 2006; 2:36-43. [PMID: 17404607 PMCID: PMC1847423] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Array-Comparative Genomic Hybridization (aCGH) is a powerful high throughput technology for detecting chromosomal copy number aberrations (CNAs) in cancer, aiming at identifying related critical genes from the affected genomic regions. However, advancing from a dataset with thousands of tabular lines to a few candidate genes can be an onerous and time-consuming process. To expedite the aCGH data analysis process, we have developed a user-friendly aCGH data viewer (aCGHViewer) as a conduit between the aCGH data tables and a genome browser. The data from a given aCGH analysis are displayed in a genomic view comprised of individual chromosome panels which can be rapidly scanned for interesting features. A chromosome panel containing a feature of interest can be selected to launch a detail window for that single chromosome. Selecting a data point of interest in the detail window launches a query to the UCSC or NCBI genome browser to allow the user to explore the gene content in the chromosomal region. Additionally, aCGHViewer can display aCGH and expression array data concurrently to visually correlate the two. aCGHViewer is a stand alone Java visualization application that should be used in conjunction with separate statistical programs. It operates on all major computer platforms and is freely available at http://falcon.roswellpark.org/aCGHview/.
Collapse
Affiliation(s)
- Ganesh Shankar
- Department of Cancer Genetics, Roswell Park Cancer Institute, Buffalo, NY 14263 USA
| | - Michael R. Rossi
- Department of Cancer Genetics, Roswell Park Cancer Institute, Buffalo, NY 14263 USA
| | - Devin E. McQuaid
- Department of Cancer Genetics, Roswell Park Cancer Institute, Buffalo, NY 14263 USA
| | - Jeffrey M. Conroy
- Department of Cancer Genetics, Roswell Park Cancer Institute, Buffalo, NY 14263 USA
| | - Daniel G. Gaile
- Department of Biostatistics, The State University of New York at Buffalo, Buffalo, NY 14214 USA
| | - John K. Cowell
- Department of Cancer Genetics, Roswell Park Cancer Institute, Buffalo, NY 14263 USA
| | - Norma J. Nowak
- Department of Cancer Genetics, Roswell Park Cancer Institute, Buffalo, NY 14263 USA
| | - Ping Liang
- Department of Cancer Genetics, Roswell Park Cancer Institute, Buffalo, NY 14263 USA
| |
Collapse
|
37
|
Abstract
Altering DNA copy number is one of the many ways that gene expression and function may be modified. Some variations are found among normal individuals ( 14, 35, 103 ), others occur in the course of normal processes in some species ( 33 ), and still others participate in causing various disease states. For example, many defects in human development are due to gains and losses of chromosomes and chromosomal segments that occur prior to or shortly after fertilization, whereas DNA dosage alterations that occur in somatic cells are frequent contributors to cancer. Detecting these aberrations, and interpreting them within the context of broader knowledge, facilitates identification of critical genes and pathways involved in biological processes and diseases, and provides clinically relevant information. Over the past several years array comparative genomic hybridization (array CGH) has demonstrated its value for analyzing DNA copy number variations. In this review we discuss the state of the art of array CGH and its applications in medical genetics and cancer, emphasizing general concepts rather than specific results.
Collapse
Affiliation(s)
- Daniel Pinkel
- Comprehensive Cancer Center, Department of Laboratory Medicine, University of California, San Francisco, California 94143, USA.
| | | |
Collapse
|
38
|
Willenbrock H, Fridlyand J. A comparison study: applying segmentation to array CGH data for downstream analyses. ACTA ACUST UNITED AC 2005; 21:4084-91. [PMID: 16159913 DOI: 10.1093/bioinformatics/bti677] [Citation(s) in RCA: 210] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Array comparative genomic hybridization (CGH) allows detection and mapping of copy number of DNA segments. A challenge is to make inferences about the copy number structure of the genome. Several statistical methods have been proposed to determine genomic segments with different copy number levels. However, to date, no comprehensive comparison of various characteristics of these methods exists. Moreover, the segmentation results have not been utilized in downstream analyses. RESULTS We describe a comparison of three popular and publicly available methods for the analysis of array CGH data and we demonstrate how segmentation results may be utilized in the downstream analyses such as testing and classification, yielding higher power and prediction accuracy. Since the methods operate on individual chromosomes, we also propose a novel procedure for merging segments across the genome, which results in an interpretable set of copy number levels, and thus facilitate identification of copy number alterations in each genome. AVAILABILITY http://www.bioconductor.org
Collapse
Affiliation(s)
- Hanni Willenbrock
- Center for Biological Sequence Analysis, Department of Biotechnology, Technical University of Denmark, Kgs. Lyngby
| | | |
Collapse
|
39
|
Lai WR, Johnson MD, Kucherlapati R, Park PJ. Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data. Bioinformatics 2005; 21:3763-70. [PMID: 16081473 PMCID: PMC2819184 DOI: 10.1093/bioinformatics/bti611] [Citation(s) in RCA: 269] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Array Comparative Genomic Hybridization (CGH) can reveal chromosomal aberrations in the genomic DNA. These amplifications and deletions at the DNA level are important in the pathogenesis of cancer and other diseases. While a large number of approaches have been proposed for analyzing the large array CGH datasets, the relative merits of these methods in practice are not clear. RESULTS We compare 11 different algorithms for analyzing array CGH data. These include both segment detection methods and smoothing methods, based on diverse techniques such as mixture models, Hidden Markov Models, maximum likelihood, regression, wavelets and genetic algorithms. We compute the Receiver Operating Characteristic (ROC) curves using simulated data to quantify sensitivity and specificity for various levels of signal-to-noise ratio and different sizes of abnormalities. We also characterize their performance on chromosomal regions of interest in a real dataset obtained from patients with Glioblastoma Multiforme. While comparisons of this type are difficult due to possibly sub-optimal choice of parameters in the methods, they nevertheless reveal general characteristics that are helpful to the biological investigator.
Collapse
Affiliation(s)
- Weil R Lai
- Harvard-Partners Center for Genetics and Genomics 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | | | | | | |
Collapse
|
40
|
Wang Y, Yu Q, Cho AH, Rondeau G, Welsh J, Adamson E, Mercola D, McClelland M. Survey of differentially methylated promoters in prostate cancer cell lines. Neoplasia 2005; 7:748-60. [PMID: 16207477 PMCID: PMC1501885 DOI: 10.1593/neo.05289] [Citation(s) in RCA: 82] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2005] [Revised: 04/15/2005] [Accepted: 04/22/2005] [Indexed: 12/31/2022]
Abstract
DNA methylation and copy number in the genomes of three immortalized prostate epithelial and five cancer cell lines (LNCaP, PC3, PC3M, PC3M-Pro4, and PC3M-LN4) were compared using a microarray-based technique. Genomic DNA is cut with a methylation-sensitive enzyme HpaII, followed by linker ligation, polymerase chain reaction (PCR) amplification, labeling, and hybridization to an array of promoter sequences. Only those parts of the genomic DNA that have unmethylated restriction sites within a few hundred base pairs generate PCR products detectable on an array. Of 2732 promoter sequences on a test array, 504 (18.5%) showed differential hybridization between immortalized prostate epithelial and cancer cell lines. Among candidate hypermethylated genes in cancer-derived lines, there were eight (CD44, CDKN1A, ESR1, PLAU, RARB, SFN, TNFRSF6, and TSPY) previously observed in prostate cancer and 13 previously known methylation targets in other cancers (ARHI, bcl-2, BRCA1, CDKN2C, GADD45A, MTAP, PGR, SLC26A4, SPARC, SYK, TJP2, UCHL1, and WIT-1). The majority of genes that appear to be both differentially methylated and differentially regulated between prostate epithelial and cancer cell lines are novel methylation targets, including PAK6, RAD50, TLX3, PIR51, MAP2K5, INSR, FBN1, and GG2-1, representing a rich new source of candidate genes used to study the role of DNA methylation in prostate tumors.
Collapse
Affiliation(s)
- Yipeng Wang
- Sidney Kimmel Cancer Center, 10835 Road to the Cure, San Diego, CA 92121, USA
| | - Qiuju Yu
- Sidney Kimmel Cancer Center, 10835 Road to the Cure, San Diego, CA 92121, USA
| | - Ann H Cho
- Sidney Kimmel Cancer Center, 10835 Road to the Cure, San Diego, CA 92121, USA
| | - Gaelle Rondeau
- Sidney Kimmel Cancer Center, 10835 Road to the Cure, San Diego, CA 92121, USA
| | - John Welsh
- Sidney Kimmel Cancer Center, 10835 Road to the Cure, San Diego, CA 92121, USA
| | - Eileen Adamson
- The Burnham Institute, Cancer Research Center, La Jolla, CA, USA
| | - Dan Mercola
- Department of Pathology, University of California at Irvine, Irvine, CA 92697, USA
| | - Michael McClelland
- Sidney Kimmel Cancer Center, 10835 Road to the Cure, San Diego, CA 92121, USA
| |
Collapse
|
41
|
Price TS, Regan R, Mott R, Hedman Å, Honey B, Daniels RJ, Smith L, Greenfield A, Tiganescu A, Buckle V, Ventress N, Ayyub H, Salhan A, Pedraza-Diaz S, Broxholme J, Ragoussis J, Higgs DR, Flint J, Knight SJL. SW-ARRAY: a dynamic programming solution for the identification of copy-number changes in genomic DNA using array comparative genome hybridization data. Nucleic Acids Res 2005; 33:3455-64. [PMID: 15961730 PMCID: PMC1151590 DOI: 10.1093/nar/gki643] [Citation(s) in RCA: 70] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Comparative genome hybridization (CGH) to DNA microarrays (array CGH) is a technique capable of detecting deletions and duplications in genomes at high resolution. However, array CGH studies of the human genome noting false negative and false positive results using large insert clones as probes have raised important concerns regarding the suitability of this approach for clinical diagnostic applications. Here, we adapt the Smith–Waterman dynamic-programming algorithm to provide a sensitive and robust analytic approach (SW-ARRAY) for detecting copy-number changes in array CGH data. In a blind series of hybridizations to arrays consisting of the entire tiling path for the terminal 2 Mb of human chromosome 16p, the method identified all monosomies between 267 and 1567 kb with a high degree of statistical significance and accurately located the boundaries of deletions in the range 267–1052 kb. The approach is unique in offering both a nonparametric segmentation procedure and a nonparametric test of significance. It is scalable and well-suited to high resolution whole genome array CGH studies that use array probes derived from large insert clones as well as PCR products and oligonucleotides.
Collapse
Affiliation(s)
- Thomas S. Price
- The Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Churchill HospitalHeadington, Oxford OX3 7BN, UK
| | - Regina Regan
- Oxford Genetics Knowledge Park, Roosevelt Drive, Churchill HospitalHeadington, Oxford OX3 7BN, UK
| | - Richard Mott
- The Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Churchill HospitalHeadington, Oxford OX3 7BN, UK
| | - Åsa Hedman
- The Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Churchill HospitalHeadington, Oxford OX3 7BN, UK
| | - Ben Honey
- Oxford Genetics Knowledge Park, Roosevelt Drive, Churchill HospitalHeadington, Oxford OX3 7BN, UK
| | - Rachael J. Daniels
- Weatherall Institute of Molecular Medicine, John Radcliffe HospitalHeadley Way, Headington, Oxford OX3 9DS, UK
| | - Lee Smith
- Mammalian Genetics Unit, Medical Research CouncilHarwell, Didcot, OX11 0RD, UK
| | - Andy Greenfield
- Mammalian Genetics Unit, Medical Research CouncilHarwell, Didcot, OX11 0RD, UK
| | - Ana Tiganescu
- The Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Churchill HospitalHeadington, Oxford OX3 7BN, UK
| | - Veronica Buckle
- Weatherall Institute of Molecular Medicine, John Radcliffe HospitalHeadley Way, Headington, Oxford OX3 9DS, UK
| | - Nicki Ventress
- Weatherall Institute of Molecular Medicine, John Radcliffe HospitalHeadley Way, Headington, Oxford OX3 9DS, UK
| | - Helena Ayyub
- Weatherall Institute of Molecular Medicine, John Radcliffe HospitalHeadley Way, Headington, Oxford OX3 9DS, UK
| | - Anita Salhan
- The Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Churchill HospitalHeadington, Oxford OX3 7BN, UK
| | - Susana Pedraza-Diaz
- The Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Churchill HospitalHeadington, Oxford OX3 7BN, UK
| | - John Broxholme
- The Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Churchill HospitalHeadington, Oxford OX3 7BN, UK
| | - Jiannis Ragoussis
- The Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Churchill HospitalHeadington, Oxford OX3 7BN, UK
| | - Douglas R. Higgs
- Weatherall Institute of Molecular Medicine, John Radcliffe HospitalHeadley Way, Headington, Oxford OX3 9DS, UK
| | - Jonathan Flint
- The Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Churchill HospitalHeadington, Oxford OX3 7BN, UK
| | - Samantha J. L. Knight
- The Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Churchill HospitalHeadington, Oxford OX3 7BN, UK
- Oxford Genetics Knowledge Park, Roosevelt Drive, Churchill HospitalHeadington, Oxford OX3 7BN, UK
- To whom correspondence should be addressed at The Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Churchill Hospital, Headington, Oxford OX3 7BN, UK. Tel: +44 1865 287511; Fax: +44 1865 287501;
| |
Collapse
|
42
|
Pinkel D, Albertson DG. Array comparative genomic hybridization and its applications in cancer. Nat Genet 2005; 37 Suppl:S11-7. [PMID: 15920524 DOI: 10.1038/ng1569] [Citation(s) in RCA: 374] [Impact Index Per Article: 18.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Alteration in DNA copy number is one of the many ways in which gene expression and function may be modified. Some variations are found among normal individuals, others occur in the course of normal processes in some species and still others participate in causing various disease states. For example, many defects in human development are due to gains and losses of chromosomes and chromosomal segments that occur before or shortly after fertilization, and DNA dosage-alteration changes occurring in somatic cells are frequent contributors to cancer. Detecting these aberrations and interpreting them in the context of broader knowledge facilitates the identification of crucial genes and pathways involved in biological processes and disease. Over the past several years, array comparative genomic hybridization has proven its value for analyzing DNA copy-number variations. Here, we discuss the state of the art of array comparative genomic hybridization and its applications in cancer, emphasizing general concepts rather than specific results.
Collapse
Affiliation(s)
- Daniel Pinkel
- Department of Laboratory Medicine and Comprehensive Cancer Center, University of California San Francisco, Box 0808, San Francisco, California 94143, USA.
| | | |
Collapse
|