251
|
Alibés A, Morrissey ER, Cañada A, Rueda OM, Casado D, Yankilevich P, Díaz-Uriarte R. Asterias: a parallelized web-based suite for the analysis of expression and aCGH data. Cancer Inform 2007; 3:1-9. [PMID: 19455230 PMCID: PMC2675829] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The analysis of expression and CGH arrays plays a central role in the study of complex diseases, especially cancer, including finding markers for early diagnosis and prognosis, choosing an optimal therapy, or increasing our understanding of cancer development and metastasis. Asterias (http://www.asterias.info) is an integrated collection of freely-accessible web tools for the analysis of gene expression and aCGH data. Most of the tools use parallel computing (via MPI) and run on a server with 60 CPUs for computation; compared to a desktop or server-based but not parallelized application, parallelization provides speed ups of factors up to 50. Most of our applications allow the user to obtain additional information for user-selected genes (chromosomal location, PubMed ids, Gene Ontology terms, etc.) by using clickable links in tables and/or figures. Our tools include: normalization of expression and aCGH data (DNMAD); converting between different types of gene/clone and protein identifiers (IDconverter/IDClight); filtering and imputation (preP); finding differentially expressed genes related to patient class and survival data (Pomelo II); searching for models of class prediction (Tnasas); using random forests to search for minimal models for class prediction or for large subsets of genes with predictive capacity (GeneSrF); searching for molecular signatures and predictive genes with survival data (SignS); detecting regions of genomic DNA gain or loss (ADaCGH). The capability to send results between different applications, access to additional functional information, and parallelized computation make our suite unique and exploit features only available to web-based applications.
Collapse
|
252
|
Hu J, Gao JB, Cao Y, Bottinger E, Zhang W. Exploiting noise in array CGH data to improve detection of DNA copy number change. Nucleic Acids Res 2007; 35:e35. [PMID: 17272296 PMCID: PMC1994778 DOI: 10.1093/nar/gkl730] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Developing effective methods for analyzing array-CGH data to detect chromosomal aberrations is very important for the diagnosis of pathogenesis of cancer and other diseases. Current analysis methods, being largely based on smoothing and/or segmentation, are not quite capable of detecting both the aberration regions and the boundary break points very accurately. Furthermore, when evaluating the accuracy of an algorithm for analyzing array-CGH data, it is commonly assumed that noise in the data follows normal distribution. A fundamental question is whether noise in array-CGH is indeed Gaussian, and if not, can one exploit the characteristics of noise to develop novel analysis methods that are capable of detecting accurately the aberration regions as well as the boundary break points simultaneously? By analyzing bacterial artificial chromosomes (BACs) arrays with an average 1 mb resolution, 19 k oligo arrays with the average probe spacing <100 kb and 385 k oligo arrays with the average probe spacing of about 6 kb, we show that when there are aberrations, noise in all three types of arrays is highly non-Gaussian and possesses long-range spatial correlations, and that such noise leads to worse performance of existing methods for detecting aberrations in array-CGH than the Gaussian noise case. We further develop a novel method, which has optimally exploited the character of the noise, and is capable of identifying both aberration regions as well as the boundary break points very accurately. Finally, we propose a new concept, posteriori signal-to-noise ratio (p-SNR), to assign certain confidence level to an aberration region and boundaries detected.
Collapse
Affiliation(s)
| | - Jian-Bo Gao
- *Correspondence may also be addressed to Jian-Bo Gao.
| | - Yinhe Cao
- Department of Electrical and Computer Engineering, University of Florida Gainesville, FL 32611, Biosieve 1026 Springfield Drive, Campbell, CA 95008 and Department of Medicine, Mount Sinai School of Medicine One Gustave L. Levy Place, New York, NY 10029, USA
| | - Erwin Bottinger
- Department of Electrical and Computer Engineering, University of Florida Gainesville, FL 32611, Biosieve 1026 Springfield Drive, Campbell, CA 95008 and Department of Medicine, Mount Sinai School of Medicine One Gustave L. Levy Place, New York, NY 10029, USA
| | - Weijia Zhang
- Department of Electrical and Computer Engineering, University of Florida Gainesville, FL 32611, Biosieve 1026 Springfield Drive, Campbell, CA 95008 and Department of Medicine, Mount Sinai School of Medicine One Gustave L. Levy Place, New York, NY 10029, USA
- *To whom correspondence should be addressed. +1 21224128831 2128492643
| |
Collapse
|
253
|
Oosting J, Lips EH, van Eijk R, Eilers PHC, Szuhai K, Wijmenga C, Morreau H, van Wezel T. High-resolution copy number analysis of paraffin-embedded archival tissue using SNP BeadArrays. Genome Res 2007; 17:368-76. [PMID: 17267813 PMCID: PMC1800928 DOI: 10.1101/gr.5686107] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Abstract
High-density SNP microarrays provide insight into the genomic events that occur in diseases like cancer through their capability to measure both LOH and genomic copy numbers. Where currently available methods are restricted to the use of fresh frozen tissue, we now describe the design and validation of copy number measurements using the Illumina BeadArray platform and the application of this technique to formalin-fixed, paraffin-embedded (FFPE) tissue. In fresh frozen tissue from a set of colorectal tumors with numerous chromosomal aberrations, our method measures copy number patterns that are comparable to values from established platforms, like Affymetrix GeneChip and BAC array-CGH. Moreover, paired comparisons of fresh frozen and FFPE tissues showed nearly identical patterns of genomic change. We conclude that this method enables the use of paraffin-embedded material for research into both LOH and numerical chromosomal abnormalities. These findings make the large pathological archives available for genomic analysis, which could be especially relevant for hereditary disease where fresh material from affected relatives is rarely available.
Collapse
Affiliation(s)
- Jan Oosting
- Department of Pathology, Leiden University Medical Center, 2333 ZA Leiden, The Netherlands.
| | | | | | | | | | | | | | | |
Collapse
|
254
|
Venkatraman ES, Olshen AB. A faster circular binary segmentation algorithm for the analysis of array CGH data. Bioinformatics 2007; 23:657-63. [PMID: 17234643 DOI: 10.1093/bioinformatics/btl646] [Citation(s) in RCA: 681] [Impact Index Per Article: 37.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Array CGH technologies enable the simultaneous measurement of DNA copy number for thousands of sites on a genome. We developed the circular binary segmentation (CBS) algorithm to divide the genome into regions of equal copy number. The algorithm tests for change-points using a maximal t-statistic with a permutation reference distribution to obtain the corresponding P-value. The number of computations required for the maximal test statistic is O(N2), where N is the number of markers. This makes the full permutation approach computationally prohibitive for the newer arrays that contain tens of thousands markers and highlights the need for a faster algorithm. RESULTS We present a hybrid approach to obtain the P-value of the test statistic in linear time. We also introduce a rule for stopping early when there is strong evidence for the presence of a change. We show through simulations that the hybrid approach provides a substantial gain in speed with only a negligible loss in accuracy and that the stopping rule further increases speed. We also present the analyses of array CGH data from breast cancer cell lines to show the impact of the new approaches on the analysis of real data. AVAILABILITY An R version of the CBS algorithm has been implemented in the "DNAcopy" package of the Bioconductor project. The proposed hybrid method for the P-value is available in version 1.2.1 or higher and the stopping rule for declaring a change early is available in version 1.5.1 or higher.
Collapse
Affiliation(s)
- E S Venkatraman
- Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, 1275 York Avenue, New York, NY 10021, USA.
| | | |
Collapse
|
255
|
Alibés A, Morrissey ER, Cañada A, Rueda OM, Casado D, Yankilevich P, Díaz-Uriarte R. Asterias: A Parallelized Web-based Suite for the Analysis of Expression and aCGH Data. Cancer Inform 2007. [DOI: 10.1177/117693510700300007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
The analysis of expression and CGH arrays plays a central role in the study of complex diseases, especially cancer, including finding markers for early diagnosis and prognosis, choosing an optimal therapy, or increasing our understanding of cancer development and metastasis. Asterias ( http://www.asterias.info ) is an integrated collection of freely-accessible web tools for the analysis of gene expression and aCGH data. Most of the tools use parallel computing (via MPI) and run on a server with 60 CPUs for computation; compared to a desktop or server-based but not parallelized application, parallelization provides speed ups of factors up to 50. Most of our applications allow the user to obtain additional information for user-selected genes (chromosomal location, PubMed ids, Gene Ontology terms, etc.) by using clickable links in tables and/or figures. Our tools include: normalization of expression and aCGH data (DNMAD); converting between different types of gene/clone and protein identifiers (IDconverter/IDClight); filtering and imputation (preP); finding differentially expressed genes related to patient class and survival data (Pomelo II); searching for models of class prediction (Tnasas); using random forests to search for minimal models for class prediction or for large subsets of genes with predictive capacity (GeneSrF); searching for molecular signatures and predictive genes with survival data (SignS); detecting regions of genomic DNA gain or loss (ADaCGH). The capability to send results between different applications, access to additional functional information, and parallelized computation make our suite unique and exploit features only available to web-based applications.
Collapse
Affiliation(s)
- Andreu Alibés
- Statistical Computing Team, Structural and Computational Biology Programme, Spanish National Cancer Center (CNIO), Melchor Fernández Almagro 3, Madrid, 28029, Spain
| | - Edward R. Morrissey
- Statistical Computing Team, Structural and Computational Biology Programme, Spanish National Cancer Center (CNIO), Melchor Fernández Almagro 3, Madrid, 28029, Spain
| | - Andrés Cañada
- Statistical Computing Team, Structural and Computational Biology Programme, Spanish National Cancer Center (CNIO), Melchor Fernández Almagro 3, Madrid, 28029, Spain
| | - Oscar M. Rueda
- Statistical Computing Team, Structural and Computational Biology Programme, Spanish National Cancer Center (CNIO), Melchor Fernández Almagro 3, Madrid, 28029, Spain
| | - David Casado
- Statistical Computing Team, Structural and Computational Biology Programme, Spanish National Cancer Center (CNIO), Melchor Fernández Almagro 3, Madrid, 28029, Spain
| | - Patricio Yankilevich
- Statistical Computing Team, Structural and Computational Biology Programme, Spanish National Cancer Center (CNIO), Melchor Fernández Almagro 3, Madrid, 28029, Spain
| | - Ramón Díaz-Uriarte
- Statistical Computing Team, Structural and Computational Biology Programme, Spanish National Cancer Center (CNIO), Melchor Fernández Almagro 3, Madrid, 28029, Spain
| |
Collapse
|
256
|
Juric D, Bredel C, Sikic BI, Bredel M. Integrated high-resolution genome-wide analysis of gene dosage and gene expression in human brain tumors. Methods Mol Biol 2007; 377:187-202. [PMID: 17634618 DOI: 10.1007/978-1-59745-390-5_12] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
A hallmark genomic feature of human brain tumors is the presence of multiple complex structural and numerical chromosomal aberrations that result in altered gene dosages. These genetic alterations lead to widespread, genome-wide gene expression changes. Both gene expression as well as gene copy number profiles can be assessed on a large scale using microarray methodology. The integration of genetic data with gene expression data provides a particularly effective approach for cancer gene discovery. Utilizing an array of bioinformatics tools, we describe an analysis algorithm that allows for the integration of gene copy number and gene expression profiles as a first-pass means of identifying potential cancer gene targets in human (brain) tumors. This strategy combines circular binary segmentation for the identification of gene copy number alterations, and gene copy number and gene expression data integration with a modification of signal-to-noise ratio computation and random permutation testing. We have evaluated this approach and confirmed its efficacy in the human glioma genome.
Collapse
Affiliation(s)
- Dejan Juric
- Division of Oncology, Center for Clinical Sciences Research, Stanford University School of Medicine, Stanford, CA, USA
| | | | | | | |
Collapse
|
257
|
Vauhkonen H, Vauhkonen M, Sajantila A, Sipponen P, Knuutila S. Characterizing genetically stable and unstable gastric cancers by microsatellites and array comparative genomic hybridization. ACTA ACUST UNITED AC 2006; 170:133-9. [PMID: 17011984 DOI: 10.1016/j.cancergencyto.2006.06.001] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2005] [Revised: 05/24/2006] [Accepted: 06/01/2006] [Indexed: 01/02/2023]
Abstract
Gastric cancer (GCA) displays a variety of genomic aberrations, including DNA copy number alterations, microsatellite instability (MSI), and loss of heterozygosity (LOH). The main aim of the present work was to determine the copy number aberrations in tumors with and without MSI or LOH. Fifteen fresh-frozen GCA samples, 11 of the intestinal and 4 of the diffuse type, were grouped by microsatellite analysis into high-level MSI (MSI-H, n = 2), LOH (n = 5), and microsatellite stable, LOH not detected (MSS/LOH-N, n = 8) tumors. The DNA samples were subsequently analyzed by array comparative genomic hybridization with 16,000 cDNA clones. As expected, the LOH tumors showed more copy number changes; however, the frequency of small-size amplifications was similar across all tumor groups. In addition, the cDNA arrays detected two apparently single-gene amplicons, at 11q13 (CCND1) and 12p12.1 (K-RAS), the presence of which were confirmed using oligonucleotide arrays. A novel amplicon at 5q13.2 was found only in diffuse-type tumors, which were otherwise genetically stable. The results suggest that DNA copy number changes may also occur in gastric cancers that show genomic stability in microsatellite analysis.
Collapse
Affiliation(s)
- Hanna Vauhkonen
- Department of Pathology, Haartman Institute and HUSLAB, University of Helsinki and Helsinki University Central Hospital, POB 21 (Haartmaninkatu 3), FI-00014, Helsinki, Finland.
| | | | | | | | | |
Collapse
|
258
|
Liu F, Park PJ, Lai W, Maher E, Chakravarti A, Durso L, Jiang X, Yu Y, Brosius A, Thomas M, Chin L, Brennan C, DePinho RA, Kohane I, Carroll RS, Black PM, Johnson MD. A genome-wide screen reveals functional gene clusters in the cancer genome and identifies EphA2 as a mitogen in glioblastoma. Cancer Res 2006; 66:10815-23. [PMID: 17090523 DOI: 10.1158/0008-5472.can-06-1408] [Citation(s) in RCA: 94] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
A novel genome-wide screen that combines patient outcome analysis with array comparative genomic hybridization and mRNA expression profiling was developed to identify genes with copy number alterations, aberrant mRNA expression, and relevance to survival in glioblastoma. The method led to the discovery of physical gene clusters within the cancer genome with boundaries defined by physical proximity, correlated mRNA expression patterns, and survival relatedness. These boundaries delineate a novel genomic interval called the functional common region (FCR). Many FCRs contained genes of high biological relevance to cancer and were used to pinpoint functionally significant DNA alterations that were too small or infrequent to be reliably identified using standard algorithms. One such FCR contained the EphA2 receptor tyrosine kinase. Validation experiments showed that EphA2 mRNA overexpression correlated inversely with patient survival in a panel of 21 glioblastomas, and ligand-mediated EphA2 receptor activation increased glioblastoma proliferation and tumor growth via a mitogen-activated protein kinase-dependent pathway. This novel genome-wide approach greatly expanded the list of target genes in glioblastoma and represents a powerful new strategy to identify the upstream determinants of tumor phenotype in a range of human cancers.
Collapse
Affiliation(s)
- Fenghua Liu
- Department of Neurological Surgery, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
259
|
Ribeiro FR, Henrique R, Hektoen M, Berg M, Jerónimo C, Teixeira MR, Lothe RA. Comparison of chromosomal and array-based comparative genomic hybridization for the detection of genomic imbalances in primary prostate carcinomas. Mol Cancer 2006; 5:33. [PMID: 16952311 PMCID: PMC1570364 DOI: 10.1186/1476-4598-5-33] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2006] [Accepted: 09/04/2006] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND In order to gain new insights into the molecular mechanisms involved in prostate cancer, we performed array-based comparative genomic hybridization (aCGH) on a series of 46 primary prostate carcinomas using a 1 Mbp whole-genome coverage platform. As chromosomal comparative genomic hybridization (cCGH) data was available for these samples, we compared the sensitivity and overall concordance of the two methodologies, and used the combined information to infer the best of three different aCGH scoring approaches. RESULTS Our data demonstrate that the reliability of aCGH in the analysis of primary prostate carcinomas depends to some extent on the scoring approach used, with the breakpoint estimation method being the most sensitive and reliable. The pattern of copy number changes detected by aCGH was concordant with that of cCGH, but the higher resolution technique detected 2.7 times more aberrations and 15.2% more carcinomas with genomic imbalances. We additionally show that several aberrations were consistently overlooked using cCGH, such as small deletions at 5q, 6q, 12p, and 17p. The latter were validated by fluorescence in situ hybridization targeting TP53, although only one carcinoma harbored a point mutation in this gene. Strikingly, homozygous deletions at 10q23.31, encompassing the PTEN locus, were seen in 58% of the cases with 10q loss. CONCLUSION We conclude that aCGH can significantly improve the detection of genomic aberrations in cancer cells as compared to previously established whole-genome methodologies, although contamination with normal cells may influence the sensitivity and specificity of some scoring approaches. Our work delineated recurrent copy number changes and revealed novel amplified loci and frequent homozygous deletions in primary prostate carcinomas, which may guide future work aimed at identifying the relevant target genes. In particular, biallelic loss seems to be a frequent mechanism of inactivation of the PTEN gene in prostate carcinogenesis.
Collapse
Affiliation(s)
- Franclim R Ribeiro
- Department of Genetics, Portuguese Oncology Institute – Porto, Porto, Portugal
- Department of Cancer Prevention, Institute for Cancer Research, The Norwegian Radium Hospital, Oslo, Norway
| | - Rui Henrique
- Department of Pathology, Portuguese Oncology Institute – Porto, Porto, Portugal
- Department of Pathology and Molecular Immunology, Institute of Biomedical Sciences, University of Porto, Porto, Portugal
| | - Merete Hektoen
- Department of Cancer Prevention, Institute for Cancer Research, The Norwegian Radium Hospital, Oslo, Norway
| | - Marianne Berg
- Department of Cancer Prevention, Institute for Cancer Research, The Norwegian Radium Hospital, Oslo, Norway
| | - Carmen Jerónimo
- Department of Genetics, Portuguese Oncology Institute – Porto, Porto, Portugal
- Department of Pathology and Molecular Immunology, Institute of Biomedical Sciences, University of Porto, Porto, Portugal
- Fernando Pessoa University, Porto, Portugal
| | - Manuel R Teixeira
- Department of Genetics, Portuguese Oncology Institute – Porto, Porto, Portugal
- Department of Pathology and Molecular Immunology, Institute of Biomedical Sciences, University of Porto, Porto, Portugal
| | - Ragnhild A Lothe
- Department of Cancer Prevention, Institute for Cancer Research, The Norwegian Radium Hospital, Oslo, Norway
- Department of Molecular Biosciences, University of Oslo, Oslo, Norway
| |
Collapse
|
260
|
Peiffer DA, Le JM, Steemers FJ, Chang W, Jenniges T, Garcia F, Haden K, Li J, Shaw CA, Belmont J, Cheung SW, Shen RM, Barker DL, Gunderson KL. High-resolution genomic profiling of chromosomal aberrations using Infinium whole-genome genotyping. Genome Res 2006; 16:1136-48. [PMID: 16899659 PMCID: PMC1557768 DOI: 10.1101/gr.5402306] [Citation(s) in RCA: 401] [Impact Index Per Article: 21.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Array-CGH is a powerful tool for the detection of chromosomal aberrations. The introduction of high-density SNP genotyping technology to genomic profiling, termed SNP-CGH, represents a further advance, since simultaneous measurement of both signal intensity variations and changes in allelic composition makes it possible to detect both copy number changes and copy-neutral loss-of-heterozygosity (LOH) events. We demonstrate the utility of SNP-CGH with two Infinium whole-genome genotyping BeadChips, assaying 109,000 and 317,000 SNP loci, to detect chromosomal aberrations in samples bearing constitutional aberrations as well tumor samples at sub-100 kb effective resolution. Detected aberrations include homozygous deletions, hemizygous deletions, copy-neutral LOH, duplications, and amplifications. The statistical ability to detect common aberrations was modeled by analysis of an X chromosome titration model system, and sensitivity was modeled by titration of gDNA from a tumor cell with that of its paired normal cell line. Analysis was facilitated by using a genome browser that plots log ratios of normalized intensities and allelic ratios along the chromosomes. We developed two modes of SNP-CGH analysis, a single sample and a paired sample mode. The single sample mode computes log intensity ratios and allelic ratios by referencing to canonical genotype clusters generated from approximately 120 reference samples, whereas the paired sample mode uses a paired normal reference sample from the same individual. Finally, the two analysis modes are compared and contrasted for their utility in analyzing different types of input gDNA: low input amounts, fragmented gDNA, and Phi29 whole-genome pre-amplified DNA.
Collapse
|
261
|
McGhee SA, McCabe ERB. Genome-wide testing: genomic medicine: commentary on the article by Bar-Shira et al. on page 353. Pediatr Res 2006; 60:243-4. [PMID: 16923947 DOI: 10.1203/01.pdr.0000233116.85413.cd] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Sean A McGhee
- Department of Pediatrics, David Giffin School of Medicine, Mattel Children's Hospital, University of California, Los Angeles, CA 90095, USA.
| | | |
Collapse
|
262
|
Diskin SJ, Eck T, Greshock J, Mosse YP, Naylor T, Stoeckert CJ, Weber BL, Maris JM, Grant GR. STAC: A method for testing the significance of DNA copy number aberrations across multiple array-CGH experiments. Genes Dev 2006; 16:1149-58. [PMID: 16899652 PMCID: PMC1557772 DOI: 10.1101/gr.5076506] [Citation(s) in RCA: 124] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Regions of gain and loss of genomic DNA occur in many cancers and can drive the genesis and progression of disease. These copy number aberrations (CNAs) can be detected at high resolution by using microarray-based techniques. However, robust statistical approaches are needed to identify nonrandom gains and losses across multiple experiments/samples. We have developed a method called Significance Testing for Aberrant Copy number (STAC) to address this need. STAC utilizes two complementary statistics in combination with a novel search strategy. The significance of both statistics is assessed, and P-values are assigned to each location on the genome by using a multiple testing corrected permutation approach. We validate our method by using two published cancer data sets. STAC identifies genomic alterations known to be of clinical and biological significance and provides statistical support for 85% of previously reported regions. Moreover, STAC identifies numerous additional regions of significant gain/loss in these data that warrant further investigation. The P-values provided by STAC can be used to prioritize regions for follow-up study in an unbiased fashion. We conclude that STAC is a powerful tool for identifying nonrandom genomic amplifications and deletions across multiple experiments. A Java version of STAC is freely available for download at http://cbil.upenn.edu/STAC.
Collapse
Affiliation(s)
- Sharon J. Diskin
- Division of Oncology, Children's Hospital of Philadelphia and Department of Pediatrics, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania 19104, USA
- Penn Center for Bioinformatics (PCBI), University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Thomas Eck
- Penn Center for Bioinformatics (PCBI), University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Joel Greshock
- Abramson Family Cancer Research Institute, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania 19104, USA
| | - Yael P. Mosse
- Division of Oncology, Children's Hospital of Philadelphia and Department of Pediatrics, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania 19104, USA
| | - Tara Naylor
- Abramson Family Cancer Research Institute, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania 19104, USA
| | - Christian J. Stoeckert
- Penn Center for Bioinformatics (PCBI), University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
- Department of Genetics, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania, 19104, USA
| | - Barbara L. Weber
- Abramson Family Cancer Research Institute, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania 19104, USA
| | - John M. Maris
- Division of Oncology, Children's Hospital of Philadelphia and Department of Pediatrics, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania 19104, USA
- Abramson Family Cancer Research Institute, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania 19104, USA
| | - Gregory R. Grant
- Penn Center for Bioinformatics (PCBI), University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| |
Collapse
|
263
|
Abstract
MOTIVATION Presently available methods that use p-values to estimate or control the false discovery rate (FDR) implicitly assume that p-values are continuously distributed and based on two-sided tests. Therefore, it is difficult to reliably estimate the FDR when p-values are discrete or based on one-sided tests. RESULTS A simple and robust method to estimate the FDR is proposed. The proposed method does not rely on implicit assumptions that tests are two-sided or yield continuously distributed p-values. The proposed method is proven to be conservative and have desirable large-sample properties. In addition, the proposed method was among the best performers across a series of 'real data simulations' comparing the performance of five currently available methods. AVAILABILITY Libraries of S-plus and R routines to implement the method are freely available from www.stjuderesearch.org/depts/biostats.
Collapse
Affiliation(s)
- Stan Pounds
- Department of Biostatistics, St. Jude Children's Research Hospital 332 N. Lauderdale Street, Memphis, TN 38135, USA
| | | |
Collapse
|
264
|
Le Caignec C, Spits C, Sermon K, De Rycke M, Thienpont B, Debrock S, Staessen C, Moreau Y, Fryns JP, Van Steirteghem A, Liebaers I, Vermeesch JR. Single-cell chromosomal imbalances detection by array CGH. Nucleic Acids Res 2006; 34:e68. [PMID: 16698960 PMCID: PMC3303179 DOI: 10.1093/nar/gkl336] [Citation(s) in RCA: 131] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
Genomic imbalances are a major cause of constitutional and acquired disorders. Therefore, aneuploidy screening has become the cornerstone of preimplantation, prenatal and postnatal genetic diagnosis, as well as a routine aspect of the diagnostic workup of many acquired disorders. Recently, array comparative genomic hybridization (array CGH) has been introduced as a rapid and high-resolution method for the detection of both benign and disease-causing genomic copy-number variations. Until now, array CGH has been performed using a significant quantity of DNA derived from a pool of cells. Here, we present an array CGH method that accurately detects chromosomal imbalances from a single lymphoblast, fibroblast and blastomere within a single day. Trisomy 13, 18, 21 and monosomy X, as well as normal ploidy levels of all other chromosomes, were accurately determined from single fibroblasts. Moreover, we showed that a segmental deletion as small as 34 Mb could be detected. Finally, we demonstrated the possibility to detect aneuploidies in single blastomeres derived from preimplantation embryos. This technique offers new possibilities for genetic analysis of single cells in general and opens the route towards aneuploidy screening and detection of unbalanced translocations in preimplantation embryos in particular.
Collapse
Affiliation(s)
- Cedric Le Caignec
- Center for Human Genetics, University Hospital Gasthuisberg Leuven, Belgium
- Research Centre Reproduction and Genetics, University Hospital and Medical School, Vrije Universiteit Brussel Brussels, Belgium
| | - Claudia Spits
- Research Centre Reproduction and Genetics, University Hospital and Medical School, Vrije Universiteit Brussel Brussels, Belgium
| | - Karen Sermon
- Research Centre Reproduction and Genetics, University Hospital and Medical School, Vrije Universiteit Brussel Brussels, Belgium
| | - Martine De Rycke
- Research Centre Reproduction and Genetics, University Hospital and Medical School, Vrije Universiteit Brussel Brussels, Belgium
| | - Bernard Thienpont
- Center for Human Genetics, University Hospital Gasthuisberg Leuven, Belgium
| | - Sophie Debrock
- Leuven University Fertility Center, University Hospital Gasthuisberg Leuven, Belgium
| | - Catherine Staessen
- Research Centre Reproduction and Genetics, University Hospital and Medical School, Vrije Universiteit Brussel Brussels, Belgium
| | | | - Jean-Pierre Fryns
- Center for Human Genetics, University Hospital Gasthuisberg Leuven, Belgium
| | - Andre Van Steirteghem
- Research Centre Reproduction and Genetics, University Hospital and Medical School, Vrije Universiteit Brussel Brussels, Belgium
| | - Inge Liebaers
- Research Centre Reproduction and Genetics, University Hospital and Medical School, Vrije Universiteit Brussel Brussels, Belgium
| | - Joris R. Vermeesch
- Center for Human Genetics, University Hospital Gasthuisberg Leuven, Belgium
- To whom correspondence should be addressed. Tel: +32 1634 5941; Fax: +32 1634 6060;
| |
Collapse
|
265
|
Han W, Han MR, Kang JJ, Bae JY, Lee JH, Bae YJ, Lee JE, Shin HJ, Hwang KT, Hwang SE, Kim SW, Noh DY. Genomic alterations identified by array comparative genomic hybridization as prognostic markers in tamoxifen-treated estrogen receptor-positive breast cancer. BMC Cancer 2006; 6:92. [PMID: 16608533 PMCID: PMC1459182 DOI: 10.1186/1471-2407-6-92] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2005] [Accepted: 04/12/2006] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND A considerable proportion of estrogen receptor (ER)-positive breast cancer recurs despite tamoxifen treatment, which is a serious problem commonly encountered in clinical practice. We tried to find novel prognostic markers in this subtype of breast cancer. METHODS We performed array comparative genomic hybridization (CGH) with 1,440 human bacterial artificial chromosome (BAC) clones to assess copy number changes in 28 fresh-frozen ER-positive breast cancer tissues. All of the patients included had received at least 1 year of tamoxifen treatment. Nine patients had distant recurrence within 5 years (Recurrence group) of diagnosis and 19 patients were alive without disease at least 5 years after diagnosis (Non-recurrence group). RESULTS Potential prognostic variables were comparable between the two groups. In an unsupervised clustering analysis, samples from each group were well separated. The most common regions of gain in all samples were 1q32.1, 17q23.3, 8q24.11, 17q12-q21.1, and 8p11.21, and the most common regions of loss were 6q14.1-q16.3, 11q21-q24.3, and 13q13.2-q14.3, as called by CGH-Explorer software. The average frequency of copy number changes was similar between the two groups. The most significant chromosomal alterations found more often in the Recurrence group using two different statistical methods were loss of 11p15.5-p15.4, 1p36.33, 11q13.1, and 11p11.2 (adjusted p values < 0.001). In subgroup analysis according to lymph node status, loss of 11p15 and 1p36 were found more often in Recurrence group with borderline significance within the lymph node positive patients (adjusted p = 0.052). CONCLUSION Our array CGH analysis with BAC clones could detect various genomic alterations in ER-positive breast cancers, and Recurrence group samples showed a significantly different pattern of DNA copy number changes than did Non-recurrence group samples.
Collapse
MESH Headings
- Adult
- Aged
- Antineoplastic Agents, Hormonal/therapeutic use
- Antineoplastic Combined Chemotherapy Protocols/administration & dosage
- Antineoplastic Combined Chemotherapy Protocols/therapeutic use
- Breast Neoplasms/chemistry
- Breast Neoplasms/drug therapy
- Breast Neoplasms/genetics
- Breast Neoplasms/radiotherapy
- Breast Neoplasms/surgery
- Carcinoma, Ductal, Breast/chemistry
- Carcinoma, Ductal, Breast/drug therapy
- Carcinoma, Ductal, Breast/genetics
- Carcinoma, Ductal, Breast/radiotherapy
- Carcinoma, Ductal, Breast/secondary
- Carcinoma, Ductal, Breast/surgery
- Chemotherapy, Adjuvant
- Chromosomes, Artificial, Bacterial
- Cluster Analysis
- Combined Modality Therapy
- Cyclophosphamide/administration & dosage
- DNA, Neoplasm/genetics
- Disease-Free Survival
- Estrogen Receptor Modulators/therapeutic use
- Estrogens
- Female
- Fluorouracil/administration & dosage
- Humans
- Life Tables
- Mastectomy
- Methotrexate/administration & dosage
- Middle Aged
- Neoplasm Metastasis
- Neoplasm Proteins/analysis
- Neoplasms, Hormone-Dependent/chemistry
- Neoplasms, Hormone-Dependent/drug therapy
- Neoplasms, Hormone-Dependent/genetics
- Neoplasms, Hormone-Dependent/radiotherapy
- Neoplasms, Hormone-Dependent/surgery
- Nucleic Acid Hybridization
- Oligonucleotide Array Sequence Analysis
- Prognosis
- Radiotherapy, Adjuvant
- Receptors, Estrogen/analysis
- Tamoxifen/therapeutic use
Collapse
Affiliation(s)
- Wonshik Han
- Department of Surgery, Seoul National University College of Medicine, Seoul, Korea
| | - Mi-Ryung Han
- Cancer Research Institute, Seoul National University College of Medicine, Seoul, Korea
| | | | - Ji-Yeon Bae
- Cancer Research Institute, Seoul National University College of Medicine, Seoul, Korea
| | | | | | - Jeong Eon Lee
- Department of Surgery, Seoul National University College of Medicine, Seoul, Korea
| | - Hyuk-Jae Shin
- Department of Surgery, Seoul National University College of Medicine, Seoul, Korea
| | - Ki-Tae Hwang
- Department of Surgery, Seoul National University College of Medicine, Seoul, Korea
| | - Sung-Eun Hwang
- Department of Surgery, Seoul National University College of Medicine, Seoul, Korea
| | - Sung-Won Kim
- Department of Surgery, Seoul National University College of Medicine, Seoul, Korea
| | - Dong-Young Noh
- Department of Surgery, Seoul National University College of Medicine, Seoul, Korea
- Cancer Research Institute, Seoul National University College of Medicine, Seoul, Korea
| |
Collapse
|
266
|
Marioni JC, Thorne NP, Tavaré S. BioHMM: a heterogeneous hidden Markov model for segmenting array CGH data. Bioinformatics 2006; 22:1144-6. [PMID: 16533818 DOI: 10.1093/bioinformatics/btl089] [Citation(s) in RCA: 99] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
SUMMARY We have developed a new method (BioHMM) for segmenting array comparative genomic hybridization data into states with the same underlying copy number. By utilizing a heterogeneous hidden Markov model, BioHMM incorporates relevant biological factors (e.g. the distance between adjacent clones) in the segmentation process.
Collapse
Affiliation(s)
- J C Marioni
- Hutchison-MRC Research Centre, Department of Oncology, Computational Biology Group, University of Cambridge Hills Road, Cambridge.
| | | | | |
Collapse
|
267
|
Huang J, Wei W, Chen J, Zhang J, Liu G, Di X, Mei R, Ishikawa S, Aburatani H, Jones KW, Shapero MH. CARAT: a novel method for allelic detection of DNA copy number changes using high density oligonucleotide arrays. BMC Bioinformatics 2006; 7:83. [PMID: 16504045 PMCID: PMC1402331 DOI: 10.1186/1471-2105-7-83] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2005] [Accepted: 02/21/2006] [Indexed: 12/13/2022] Open
Abstract
Background DNA copy number alterations are one of the main characteristics of the cancer cell karyotype and can contribute to the complex phenotype of these cells. These alterations can lead to gains in cellular oncogenes as well as losses in tumor suppressor genes and can span small intervals as well as involve entire chromosomes. The ability to accurately detect these changes is central to understanding how they impact the biology of the cell. Results We describe a novel algorithm called CARAT (Copy Number Analysis with Regression And Tree) that uses probe intensity information to infer copy number in an allele-specific manner from high density DNA oligonuceotide arrays designed to genotype over 100, 000 SNPs. Total and allele-specific copy number estimations using CARAT are independently evaluated for a subset of SNPs using quantitative PCR and allelic TaqMan reactions with several human breast cancer cell lines. The sensitivity and specificity of the algorithm are characterized using DNA samples containing differing numbers of X chromosomes as well as a test set of normal individuals. Results from the algorithm show a high degree of agreement with results from independent verification methods. Conclusion Overall, CARAT automatically detects regions with copy number variations and assigns a significance score to each alteration as well as generating allele-specific output. When coupled with SNP genotype calls from the same array, CARAT provides additional detail into the structure of genome wide alterations that can contribute to allelic imbalance.
Collapse
Affiliation(s)
- Jing Huang
- Affymetrix, Inc. 3420 Central Expressway, Santa Clara CA 95051, USA
| | - Wen Wei
- Affymetrix, Inc. 3420 Central Expressway, Santa Clara CA 95051, USA
| | - Joyce Chen
- Affymetrix, Inc. 3420 Central Expressway, Santa Clara CA 95051, USA
| | - Jane Zhang
- Affymetrix, Inc. 3420 Central Expressway, Santa Clara CA 95051, USA
| | - Guoying Liu
- Affymetrix, Inc. 3420 Central Expressway, Santa Clara CA 95051, USA
| | - Xiaojun Di
- Affymetrix, Inc. 3420 Central Expressway, Santa Clara CA 95051, USA
| | - Rui Mei
- Affymetrix, Inc. 3420 Central Expressway, Santa Clara CA 95051, USA
| | - Shumpei Ishikawa
- University of Tokyo, Genome Science Division Research Center for Advanced Science and Technology, 4-6-1 Komaba, Meguro, 153-8904, Tokyo
| | - Hiroyuki Aburatani
- University of Tokyo, Genome Science Division Research Center for Advanced Science and Technology, 4-6-1 Komaba, Meguro, 153-8904, Tokyo
| | - Keith W Jones
- Affymetrix, Inc. 3420 Central Expressway, Santa Clara CA 95051, USA
| | | |
Collapse
|
268
|
van den IJssel P, Tijssen M, Chin SF, Eijk P, Carvalho B, Hopmans E, Holstege H, Bangarusamy DK, Jonkers J, Meijer GA, Caldas C, Ylstra B. Human and mouse oligonucleotide-based array CGH. Nucleic Acids Res 2005; 33:e192. [PMID: 16361265 PMCID: PMC1316119 DOI: 10.1093/nar/gni191] [Citation(s) in RCA: 143] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Array-based comparative genomic hybridization is a high resolution method for measuring chromosomal copy number changes. Here we present a validated protocol using in-house spotted oligonucleotide libraries for array comparative genomic hybridization (CGH). This oligo array CGH platform yields reproducible results and is capable of detecting single copy gains, multi-copy amplifications as well as homozygous and heterozygous deletions as small as 100 kb with high resolution. A human oligonucleotide library was printed on amine binding slides. Arrays were hybridized using a hybstation and analysed using BlueFuse feature extraction software, with >95% of spots passing quality control. The protocol allows as little as 300 ng of input DNA and a 90% reduction of Cot-1 DNA without compromising quality. High quality results have also been obtained with DNA from archival tissue. Finally, in addition to human oligo arrays, we have applied the protocol successfully to mouse oligo arrays. We believe that this oligo-based platform using ‘off-the-shelf’ oligo libraries provides an easy accessible alternative to BAC arrays for CGH, which is cost-effective, available at high resolution and easily implemented for any sequenced organism without compromising the quality of the results.
Collapse
Affiliation(s)
| | | | - Suet-Feung Chin
- Cancer Genomics Program, Department of Oncology, University of CambridgeCambridge, UK
| | | | | | | | - Henne Holstege
- Division of Molecular Biology, The Netherlands Cancer InstituteAmsterdam, The Netherlands
| | | | - Jos Jonkers
- Division of Molecular Biology, The Netherlands Cancer InstituteAmsterdam, The Netherlands
| | | | - Carlos Caldas
- Cancer Genomics Program, Department of Oncology, University of CambridgeCambridge, UK
| | - Bauke Ylstra
- To whom correspondence should be addressed at Microarray Core Facility, VU University Medical Center, van der Boechorststraat 7-9, 1081 BT Amsterdam, The Netherlands. Tel: +31 20 444 8299; Fax: +31 20 444 8318;
| |
Collapse
|
269
|
Willenbrock H, Fridlyand J. A comparison study: applying segmentation to array CGH data for downstream analyses. ACTA ACUST UNITED AC 2005; 21:4084-91. [PMID: 16159913 DOI: 10.1093/bioinformatics/bti677] [Citation(s) in RCA: 210] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Array comparative genomic hybridization (CGH) allows detection and mapping of copy number of DNA segments. A challenge is to make inferences about the copy number structure of the genome. Several statistical methods have been proposed to determine genomic segments with different copy number levels. However, to date, no comprehensive comparison of various characteristics of these methods exists. Moreover, the segmentation results have not been utilized in downstream analyses. RESULTS We describe a comparison of three popular and publicly available methods for the analysis of array CGH data and we demonstrate how segmentation results may be utilized in the downstream analyses such as testing and classification, yielding higher power and prediction accuracy. Since the methods operate on individual chromosomes, we also propose a novel procedure for merging segments across the genome, which results in an interpretable set of copy number levels, and thus facilitate identification of copy number alterations in each genome. AVAILABILITY http://www.bioconductor.org
Collapse
Affiliation(s)
- Hanni Willenbrock
- Center for Biological Sequence Analysis, Department of Biotechnology, Technical University of Denmark, Kgs. Lyngby
| | | |
Collapse
|