51
|
Mayrink VD, Lucas JE. Bayesian factor models for the detection of coherent patterns in gene expression data. BRAZ J PROBAB STAT 2015. [DOI: 10.1214/13-bjps226] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
52
|
Seiser EL, Innocenti F. Hidden Markov Model-Based CNV Detection Algorithms for Illumina Genotyping Microarrays. Cancer Inform 2015; 13:77-83. [PMID: 25657572 PMCID: PMC4310714 DOI: 10.4137/cin.s16345] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2014] [Revised: 11/18/2014] [Accepted: 11/21/2014] [Indexed: 12/24/2022] Open
Abstract
Somatic alterations in DNA copy number have been well studied in numerous malignancies, yet the role of germline DNA copy number variation in cancer is still emerging. Genotyping microarrays generate allele-specific signal intensities to determine genotype, but may also be used to infer DNA copy number using additional computational approaches. Numerous tools have been developed to analyze Illumina genotype microarray data for copy number variant (CNV) discovery, although commonly utilized algorithms freely available to the public employ approaches based upon the use of hidden Markov models (HMMs). QuantiSNP, PennCNV, and GenoCN utilize HMMs with six copy number states but vary in how transition and emission probabilities are calculated. Performance of these CNV detection algorithms has been shown to be variable between both genotyping platforms and data sets, although HMM approaches generally outperform other current methods. Low sensitivity is prevalent with HMM-based algorithms, suggesting the need for continued improvement in CNV detection methodologies.
Collapse
Affiliation(s)
- Eric L Seiser
- Center for Pharmacogenomics and Individualized Therapy, The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Federico Innocenti
- Center for Pharmacogenomics and Individualized Therapy, The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA. ; UNC Lineberger Comprehensive Cancer Center, The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| |
Collapse
|
53
|
Chen H, Bell JM, Zavala NA, Ji HP, Zhang NR. Allele-specific copy number profiling by next-generation DNA sequencing. Nucleic Acids Res 2014; 43:e23. [PMID: 25477383 PMCID: PMC4344483 DOI: 10.1093/nar/gku1252] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
The progression and clonal development of tumors often involve amplifications and deletions of genomic DNA. Estimation of allele-specific copy number, which quantifies the number of copies of each allele at each variant loci rather than the total number of chromosome copies, is an important step in the characterization of tumor genomes and the inference of their clonal history. We describe a new method, falcon, for finding somatic allele-specific copy number changes by next generation sequencing of tumors with matched normals. falcon is based on a change-point model on a bivariate mixed Binomial process, which explicitly models the copy numbers of the two chromosome haplotypes and corrects for local allele-specific coverage biases. By using the Binomial distribution rather than a normal approximation, falcon more effectively pools evidence from sites with low coverage. A modified Bayesian information criterion is used to guide model selection for determining the number of copy number events. Falcon is evaluated on in silico spike-in data and applied to the analysis of a pre-malignant colon tumor sample and late-stage colorectal adenocarcinoma from the same individual. The allele-specific copy number estimates obtained by falcon allows us to draw detailed conclusions regarding the clonal history of the individual's colon cancer.
Collapse
Affiliation(s)
- Hao Chen
- Department of Statistics, University of California, One Shields Avenue, Davis, CA 95616, USA
| | - John M Bell
- Division of Oncology, School of Medicine, Stanford University, 291 Campus Dr, Stanford, CA 94305, USA
| | - Nicolas A Zavala
- Division of Oncology, School of Medicine, Stanford University, 291 Campus Dr, Stanford, CA 94305, USA
| | - Hanlee P Ji
- Division of Oncology, School of Medicine, Stanford University, 291 Campus Dr, Stanford, CA 94305, USA
| | - Nancy R Zhang
- Department of Statistics, The Wharton School, University of Pennsylvania, 3730 Walnut Street, Philadelphia, PA 19104, USA
| |
Collapse
|
54
|
Iranmanesh SM, Guo NL. Integrated DNA Copy Number and Gene Expression Regulatory Network Analysis of Non-small Cell Lung Cancer Metastasis. Cancer Inform 2014; 13:13-23. [PMID: 25392690 PMCID: PMC4218678 DOI: 10.4137/cin.s14055] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2014] [Revised: 08/05/2014] [Accepted: 08/08/2014] [Indexed: 11/05/2022] Open
Abstract
Integrative analysis of multi-level molecular profiles can distinguish interactions that cannot be revealed based on one kind of data in the analysis of cancer susceptibility and metastasis. DNA copy number variations (CNVs) are common in cancer cells, and their role in cell behaviors and relationship to gene expression (GE) is poorly understood. An integrative analysis of CNV and genome-wide mRNA expression can discover copy number alterations and their possible regulatory effects on GE. This study presents a novel framework to identify important genes and construct potential regulatory networks based on these genes. Using this approach, DNA copy number aberrations and their effects on GE in lung cancer progression were revealed. Specifically, this approach contains the following steps: (1) select a pool of candidate driver genes, which have significant CNV in lung cancer patient tumors or have a significant association with the clinical outcome at the transcriptional level; (2) rank important driver genes in lung cancer patients with good prognosis and poor prognosis, respectively, and use top-ranked driver genes to construct regulatory networks with the COpy Number and EXpression In Cancer (CONEXIC) method; (3) identify experimentally confirmed molecular interactions in the constructed regulatory networks using Ingenuity Pathway Analysis (IPA); and (4) visualize the refined regulatory networks with the software package Genatomy. The constructed CNV/mRNA regulatory networks provide important insights into potential CNV-regulated transcriptional mechanisms in lung cancer metastasis.
Collapse
Affiliation(s)
- Seyed M Iranmanesh
- Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV, USA
| | - Nancy L Guo
- Mary Babb Randolph Cancer Center/School of Public Health, West Virginia University, Morgantown, WV, USA
| |
Collapse
|
55
|
Salzo S, Masecchia S, Verri A, Barla A. Alternating proximal regularized dictionary learning. Neural Comput 2014; 26:2855-95. [PMID: 25248086 DOI: 10.1162/neco_a_00672] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
We present an algorithm for dictionary learning that is based on the alternating proximal algorithm studied by Attouch, Bolte, Redont, and Soubeyran (2010), coupled with a reliable and efficient dual algorithm for computation of the related proximity operators. This algorithm is suitable for a general dictionary learning model composed of a Bregman-type data fit term that accounts for the goodness of the representation and several convex penalization terms on the coefficients and atoms, explaining the prior knowledge at hand. As Attouch et al. recently proved, an alternating proximal scheme ensures better convergence properties than the simpler alternating minimization. We take care of the issue of inexactness in the computation of the involved proximity operators, giving a sound stopping criterion for the dual inner algorithm, which keeps under control the related errors, unavoidable for such a complex penalty terms, providing ultimately an overall effective procedure. Thanks to the generality of the proposed framework, we give an application in the context of genome-wide data understanding, revising the model proposed by Nowak, Hastie, Pollack, and Tibshirani (2011). The aim is to extract latent features (atoms) and perform segmentation on array-based comparative genomic hybridization (aCGH) data. We improve several important aspects that increase the quality and interpretability of the results. We show the effectiveness of the proposed model with two experiments on synthetic data, which highlight the enhancements over the original model.
Collapse
Affiliation(s)
- Saverio Salzo
- DIMA, Università degli Studi di Genova, Via Dodecaneso 35, 16146 Genoa, Italy
| | | | | | | |
Collapse
|
56
|
Magi A, Tattini L, Cifola I, D'Aurizio R, Benelli M, Mangano E, Battaglia C, Bonora E, Kurg A, Seri M, Magini P, Giusti B, Romeo G, Pippucci T, De Bellis G, Abbate R, Gensini GF. EXCAVATOR: detecting copy number variants from whole-exome sequencing data. Genome Biol 2014; 14:R120. [PMID: 24172663 PMCID: PMC4053953 DOI: 10.1186/gb-2013-14-10-r120] [Citation(s) in RCA: 188] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2013] [Accepted: 10/30/2013] [Indexed: 12/11/2022] Open
Abstract
We developed a novel software tool, EXCAVATOR, for the detection of copy number variants (CNVs) from whole-exome sequencing data. EXCAVATOR combines a three-step normalization procedure with a novel heterogeneous hidden Markov model algorithm and a calling method that classifies genomic regions into five copy number states. We validate EXCAVATOR on three datasets and compare the results with three other methods. These analyses show that EXCAVATOR outperforms the other methods and is therefore a valuable tool for the investigation of CNVs in largescale projects, as well as in clinical research and diagnostics. EXCAVATOR is freely available at http://sourceforge.net/projects/excavatortool/.
Collapse
|
57
|
Pierre-Jean M, Rigaill G, Neuvial P. Performance evaluation of DNA copy number segmentation methods. Brief Bioinform 2014; 16:600-15. [PMID: 25202135 PMCID: PMC4501247 DOI: 10.1093/bib/bbu026] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2014] [Accepted: 06/10/2014] [Indexed: 11/13/2022] Open
Abstract
A number of bioinformatic or biostatistical methods are available for analyzing DNA copy number profiles measured from microarray or sequencing technologies. In the absence of rich enough gold standard data sets, the performance of these methods is generally assessed using unrealistic simulation studies, or based on small real data analyses. To make an objective and reproducible performance assessment, we have designed and implemented a framework to generate realistic DNA copy number profiles of cancer samples with known truth. These profiles are generated by resampling publicly available SNP microarray data from genomic regions with known copy-number state. The original data have been extracted from dilutions series of tumor cell lines with matched blood samples at several concentrations. Therefore, the signal-to-noise ratio of the generated profiles can be controlled through the (known) percentage of tumor cells in the sample. This article describes this framework and its application to a comparison study between methods for segmenting DNA copy number profiles from SNP microarrays. This study indicates that no single method is uniformly better than all others. It also helps identifying pros and cons of the compared methods as a function of biologically informative parameters, such as the fraction of tumor cells in the sample and the proportion of heterozygous markers. This comparison study may be reproduced using the open source and cross-platform R package jointseg, which implements the proposed data generation and evaluation framework: http://r-forge.r-project.org/R/?group_id=1562.
Collapse
|
58
|
Muñoz-Minjares J, Cabal-Aragón J, Shmaliy YS. Confidence masks for genome DNA copy number variations in applications to HR-CGH array measurements. Biomed Signal Process Control 2014. [DOI: 10.1016/j.bspc.2014.06.006] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
59
|
Seifert M, Abou-El-Ardat K, Friedrich B, Klink B, Deutsch A. Autoregressive higher-order hidden Markov models: exploiting local chromosomal dependencies in the analysis of tumor expression profiles. PLoS One 2014; 9:e100295. [PMID: 24955771 PMCID: PMC4067306 DOI: 10.1371/journal.pone.0100295] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2014] [Accepted: 05/22/2014] [Indexed: 12/21/2022] Open
Abstract
Changes in gene expression programs play a central role in cancer. Chromosomal aberrations such as deletions, duplications and translocations of DNA segments can lead to highly significant positive correlations of gene expression levels of neighboring genes. This should be utilized to improve the analysis of tumor expression profiles. Here, we develop a novel model class of autoregressive higher-order Hidden Markov Models (HMMs) that carefully exploit local data-dependent chromosomal dependencies to improve the identification of differentially expressed genes in tumor. Autoregressive higher-order HMMs overcome generally existing limitations of standard first-order HMMs in the modeling of dependencies between genes in close chromosomal proximity by the simultaneous usage of higher-order state-transitions and autoregressive emissions as novel model features. We apply autoregressive higher-order HMMs to the analysis of breast cancer and glioma gene expression data and perform in-depth model evaluation studies. We find that autoregressive higher-order HMMs clearly improve the identification of overexpressed genes with underlying gene copy number duplications in breast cancer in comparison to mixture models, standard first- and higher-order HMMs, and other related methods. The performance benefit is attributed to the simultaneous usage of higher-order state-transitions in combination with autoregressive emissions. This benefit could not be reached by using each of these two features independently. We also find that autoregressive higher-order HMMs are better able to identify differentially expressed genes in tumors independent of the underlying gene copy number status in comparison to the majority of related methods. This is further supported by the identification of well-known and of previously unreported hotspots of differential expression in glioblastomas demonstrating the efficacy of autoregressive higher-order HMMs for the analysis of individual tumor expression profiles. Moreover, we reveal interesting novel details of systematic alterations of gene expression levels in known cancer signaling pathways distinguishing oligodendrogliomas, astrocytomas and glioblastomas. An implementation is available under www.jstacs.de/index.php/ARHMM.
Collapse
Affiliation(s)
- Michael Seifert
- Center for Information Services and High Performance Computing, Dresden University of Technology, Dresden, Germany
| | - Khalil Abou-El-Ardat
- Institute for Clinical Genetics, Faculty of Medicine Carl Gustav Carus, Dresden University of Technology, Dresden, Germany
| | - Betty Friedrich
- Center for Information Services and High Performance Computing, Dresden University of Technology, Dresden, Germany
| | - Barbara Klink
- Institute for Clinical Genetics, Faculty of Medicine Carl Gustav Carus, Dresden University of Technology, Dresden, Germany
| | - Andreas Deutsch
- Center for Information Services and High Performance Computing, Dresden University of Technology, Dresden, Germany
| |
Collapse
|
60
|
Du Y, Murani E, Ponsuksili S, Wimmers K. biomvRhsmm: genomic segmentation with hidden semi-Markov model. BIOMED RESEARCH INTERNATIONAL 2014; 2014:910390. [PMID: 24995333 PMCID: PMC4065698 DOI: 10.1155/2014/910390] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/19/2013] [Revised: 03/03/2014] [Accepted: 03/21/2014] [Indexed: 11/25/2022]
Abstract
High-throughput technologies like tiling array and next-generation sequencing (NGS) generate continuous homogeneous segments or signal peaks in the genome that represent transcripts and transcript variants (transcript mapping and quantification), regions of deletion and amplification (copy number variation), or regions characterized by particular common features like chromatin state or DNA methylation ratio (epigenetic modifications). However, the volume and output of data produced by these technologies present challenges in analysis. Here, a hidden semi-Markov model (HSMM) is implemented and tailored to handle multiple genomic profile, to better facilitate genome annotation by assisting in the detection of transcripts, regulatory regions, and copy number variation by holistic microarray or NGS. With support for various data distributions, instead of limiting itself to one specific application, the proposed hidden semi-Markov model is designed to allow modeling options to accommodate different types of genomic data and to serve as a general segmentation engine. By incorporating genomic positions into the sojourn distribution of HSMM, with optional prior learning using annotation or previous studies, the modeling output is more biologically sensible. The proposed model has been compared with several other state-of-the-art segmentation models through simulation benchmarking, which shows that our efficient implementation achieves comparable or better sensitivity and specificity in genomic segmentation.
Collapse
Affiliation(s)
- Yang Du
- Institute for Genome Biology, Leibniz Institute for Farm Animal Biology, 18196 Dummerstorf, Germany
| | - Eduard Murani
- Institute for Genome Biology, Leibniz Institute for Farm Animal Biology, 18196 Dummerstorf, Germany
| | - Siriluck Ponsuksili
- Research Group Functional Genomics, Leibniz Institute for Farm Animal Biology, 18196 Dummerstorf, Germany
| | - Klaus Wimmers
- Institute for Genome Biology, Leibniz Institute for Farm Animal Biology, 18196 Dummerstorf, Germany
| |
Collapse
|
61
|
Affiliation(s)
- Klaus Frick
- Interstate University of Applied Sciences of Technology; Buchs Switzerland
| | - Axel Munk
- University of Göttingen; Göttingen Germany
- Max Planck Institute for Biophysical Chemistry; Göttingen Germany
| | | |
Collapse
|
62
|
Jonker MJ, de Leeuw WC, Marinković M, Wittink FRA, Rauwerda H, Bruning O, Ensink WA, Fluit AC, Boel CH, Jong MD, Breit TM. Absence/presence calling in microarray-based CGH experiments with non-model organisms. Nucleic Acids Res 2014; 42:e94. [PMID: 24771343 PMCID: PMC4066771 DOI: 10.1093/nar/gku343] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Structural variations in genomes are commonly studied by (micro)array-based comparative genomic hybridization. The data analysis methods to infer copy number variation in model organisms (human, mouse) are established. In principle, the procedures are based on signal ratios between test and reference samples and the order of the probe targets in the genome. These procedures are less applicable to experiments with non-model organisms, which frequently comprise non-sequenced genomes with an unknown order of probe targets. We therefore present an additional analysis approach, which does not depend on the structural information of a reference genome, and quantifies the presence or absence of a probe target in an unknown genome. The principle is that intensity values of target probes are compared with the intensities of negative-control probes and positive-control probes from a control hybridization, to determine if a probe target is absent or present. In a test, analyzing the genome content of a known bacterial strain: Staphylococcus aureus MRSA252, this approach proved to be successful, demonstrated by receiver operating characteristic area under the curve values larger than 0.9995. We show its usability in various applications, such as comparing genome content and validating next-generation sequencing reads from eukaryotic non-model organisms.
Collapse
Affiliation(s)
- Martijs J Jonker
- MicroArray Department & Integrative Bioinformatics Unit (MAD-IBU), Swammerdam Institute for Life Sciences (SILS), Faculty of Science (FNWI), University of Amsterdam (UvA), 1098 XH, Amsterdam, the Netherlands Netherlands Bioinformatics Centre (NBIC), 6525 GA, Nijmegen, the Netherlands
| | - Wim C de Leeuw
- MicroArray Department & Integrative Bioinformatics Unit (MAD-IBU), Swammerdam Institute for Life Sciences (SILS), Faculty of Science (FNWI), University of Amsterdam (UvA), 1098 XH, Amsterdam, the Netherlands Netherlands Bioinformatics Centre (NBIC), 6525 GA, Nijmegen, the Netherlands
| | - Marino Marinković
- MicroArray Department & Integrative Bioinformatics Unit (MAD-IBU), Swammerdam Institute for Life Sciences (SILS), Faculty of Science (FNWI), University of Amsterdam (UvA), 1098 XH, Amsterdam, the Netherlands Department of Aquatic Ecology and Ecotoxicology, Institute for Biodiversity and Ecosystem Dynamics (IBED), University of Amsterdam, Amsterdam, the Netherlands
| | - Floyd R A Wittink
- MicroArray Department & Integrative Bioinformatics Unit (MAD-IBU), Swammerdam Institute for Life Sciences (SILS), Faculty of Science (FNWI), University of Amsterdam (UvA), 1098 XH, Amsterdam, the Netherlands
| | - Han Rauwerda
- MicroArray Department & Integrative Bioinformatics Unit (MAD-IBU), Swammerdam Institute for Life Sciences (SILS), Faculty of Science (FNWI), University of Amsterdam (UvA), 1098 XH, Amsterdam, the Netherlands Netherlands Bioinformatics Centre (NBIC), 6525 GA, Nijmegen, the Netherlands
| | - Oskar Bruning
- MicroArray Department & Integrative Bioinformatics Unit (MAD-IBU), Swammerdam Institute for Life Sciences (SILS), Faculty of Science (FNWI), University of Amsterdam (UvA), 1098 XH, Amsterdam, the Netherlands Netherlands Bioinformatics Centre (NBIC), 6525 GA, Nijmegen, the Netherlands
| | - Wim A Ensink
- MicroArray Department & Integrative Bioinformatics Unit (MAD-IBU), Swammerdam Institute for Life Sciences (SILS), Faculty of Science (FNWI), University of Amsterdam (UvA), 1098 XH, Amsterdam, the Netherlands
| | - Ad C Fluit
- Medical Microbiology, University Medical Center Utrecht, Utrecht, the Netherlands
| | - C H Boel
- Medical Microbiology, University Medical Center Utrecht, Utrecht, the Netherlands
| | - Mark de Jong
- MicroArray Department & Integrative Bioinformatics Unit (MAD-IBU), Swammerdam Institute for Life Sciences (SILS), Faculty of Science (FNWI), University of Amsterdam (UvA), 1098 XH, Amsterdam, the Netherlands
| | - Timo M Breit
- MicroArray Department & Integrative Bioinformatics Unit (MAD-IBU), Swammerdam Institute for Life Sciences (SILS), Faculty of Science (FNWI), University of Amsterdam (UvA), 1098 XH, Amsterdam, the Netherlands Netherlands Bioinformatics Centre (NBIC), 6525 GA, Nijmegen, the Netherlands
| |
Collapse
|
63
|
Zhao Q, Han MJ, Sun W, Zhang Z. Copy number variations among silkworms. BMC Genomics 2014; 15:251. [PMID: 24684762 PMCID: PMC3997817 DOI: 10.1186/1471-2164-15-251] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2013] [Accepted: 03/25/2014] [Indexed: 11/10/2022] Open
Abstract
Background Copy number variations (CNVs), which are important source for genetic and phenotypic variation, have been shown to be associated with disease as well as important QTLs, especially in domesticated animals. However, little is known about the CNVs in silkworm. Results In this study, we have constructed the first CNVs map based on genome-wide analysis of CNVs in domesticated silkworm. Using next-generation sequencing as well as quantitative PCR (qPCR), we identified ~319 CNVs in total and almost half of them (~ 49%) were distributed on uncharacterized chromosome. The CNVs covered 10.8 Mb, which is about 2.3% of the entire silkworm genome. Furthermore, approximately 61% of CNVs directly overlapped with SDs in silkworm. The genes in CNVs are mainly related to reproduction, immunity, detoxification and signal recognition, which is consistent with the observations in mammals. Conclusions An initial CNVs map for silkworm has been described in this study. And this map provides new information for genetic variations in silkworm. Furthermore, the silkworm CNVs may play important roles in reproduction, immunity, detoxification and signal recognition. This study provided insight into the evolution of the silkworm genome and an invaluable resource for insect genomics research.
Collapse
Affiliation(s)
| | | | | | - Ze Zhang
- Laboratory of Evolutionary and Functional Genomics, School of Life Sciences, Chongqing University, Chongqing 400044, China.
| |
Collapse
|
64
|
Shin DH, Lee HJ, Cho S, Kim HJ, Hwang JY, Lee CK, Jeong J, Yoon D, Kim H. Deleted copy number variation of Hanwoo and Holstein using next generation sequencing at the population level. BMC Genomics 2014; 15:240. [PMID: 24673797 PMCID: PMC4051123 DOI: 10.1186/1471-2164-15-240] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2013] [Accepted: 03/03/2014] [Indexed: 02/01/2023] Open
Abstract
BACKGROUND Copy number variation (CNV), a source of genetic diversity in mammals, has been shown to underlie biological functions related to production traits. Notwithstanding, there have been few studies conducted on CNVs using next generation sequencing at the population level. RESULTS Illumina NGS data was obtained for ten Holsteins, a dairy cattle, and 22 Hanwoo, a beef cattle. The sequence data for each of the 32 animals varied from 13.58-fold to almost 20-fold coverage. We detected a total of 6,811 deleted CNVs across the analyzed individuals (average length = 2732.2 bp) corresponding to 0.74% of the cattle genome (18.6 Mbp of variable sequence). By examining the overlap between CNV deletion regions and genes, we selected 30 genes with the highest deletion scores. These genes were found to be related to the nervous system, more specifically with nervous transmission, neuron motion, and neurogenesis. We regarded these genes as having been effected by the domestication process. Further analysis of the CNV genotyping information revealed 94 putative selected CNVs and 954 breed-specific CNVs. CONCLUSIONS This study provides useful information for assessing the impact of CNVs on cattle traits using NGS at the population level.
Collapse
Affiliation(s)
- Dong-Hyun Shin
- Department of Agricultural Biotechnology, Animal Biotechnology Major, and Research Institute for Agriculture and Life Sciences, Seoul National University, Seoul 151-921, Korea
| | - Hyun-Jeong Lee
- Division of Animal Genomics and Bioinformatics, National Institute of Animal science, Rural Development Administration, #564 Omockchun-dong, Suwon 441-706, Korea
| | - Seoae Cho
- C&K genomics, Seoul National University Mt.4-2, Main Bldg. #514, SNU Research Park, NakSeoungDae, Gwanakgu, Seoul 151-919, Republic of Korea
| | - Hyeon Jeong Kim
- C&K genomics, Seoul National University Mt.4-2, Main Bldg. #514, SNU Research Park, NakSeoungDae, Gwanakgu, Seoul 151-919, Republic of Korea
| | - Jae Yeon Hwang
- Department of Agricultural Biotechnology, Animal Biotechnology Major, and Research Institute for Agriculture and Life Sciences, Seoul National University, Seoul 151-921, Korea
| | - Chang-Kyu Lee
- Department of Agricultural Biotechnology, Animal Biotechnology Major, and Research Institute for Agriculture and Life Sciences, Seoul National University, Seoul 151-921, Korea
| | - JinYoung Jeong
- Division of Animal Genomics and Bioinformatics, National Institute of Animal science, Rural Development Administration, #564 Omockchun-dong, Suwon 441-706, Korea
| | - Duhak Yoon
- Department of Animal Science, Kyungpook National University, Sangju 742-711, Korea
| | - Heebal Kim
- Department of Agricultural Biotechnology, Animal Biotechnology Major, and Research Institute for Agriculture and Life Sciences, Seoul National University, Seoul 151-921, Korea
- C&K genomics, Seoul National University Mt.4-2, Main Bldg. #514, SNU Research Park, NakSeoungDae, Gwanakgu, Seoul 151-919, Republic of Korea
| |
Collapse
|
65
|
Zhou X, Liu J, Wan X, Yu W. Piecewise-constant and low-rank approximation for identification of recurrent copy number variations. Bioinformatics 2014; 30:1943-9. [PMID: 24642062 DOI: 10.1093/bioinformatics/btu131] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
MOTIVATION The post-genome era sees urgent need for more novel approaches to extracting useful information from the huge amount of genetic data. The identification of recurrent copy number variations (CNVs) from array-based comparative genomic hybridization (aCGH) data can help understand complex diseases, such as cancer. Most of the previous computational methods focused on single-sample analysis or statistical testing based on the results of single-sample analysis. Finding recurrent CNVs from multi-sample data remains a challenging topic worth further study. RESULTS We present a general and robust method to identify recurrent CNVs from multi-sample aCGH profiles. We express the raw dataset as a matrix and demonstrate that recurrent CNVs will form a low-rank matrix. Hence, we formulate the problem as a matrix recovering problem, where we aim to find a piecewise-constant and low-rank approximation (PLA) to the input matrix. We propose a convex formulation for matrix recovery and an efficient algorithm to globally solve the problem. We demonstrate the advantages of PLA compared with alternative methods using synthesized datasets and two breast cancer datasets. The experimental results show that PLA can successfully reconstruct the recurrent CNV patterns from raw data and achieve better performance compared with alternative methods under a wide range of scenarios. AVAILABILITY AND IMPLEMENTATION The MATLAB code is available at http://bioinformatics.ust.hk/pla.zip.
Collapse
Affiliation(s)
- Xiaowei Zhou
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon and Department of Computer Science and Institute of Theoretical and Computational Study, Hong Kong Baptist University, Kowloon Tong, Hong Kong, China
| | - Jiming Liu
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon and Department of Computer Science and Institute of Theoretical and Computational Study, Hong Kong Baptist University, Kowloon Tong, Hong Kong, China
| | - Xiang Wan
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon and Department of Computer Science and Institute of Theoretical and Computational Study, Hong Kong Baptist University, Kowloon Tong, Hong Kong, China
| | - Weichuan Yu
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon and Department of Computer Science and Institute of Theoretical and Computational Study, Hong Kong Baptist University, Kowloon Tong, Hong Kong, China
| |
Collapse
|
66
|
Li M, Wen Y, Fu W. A Single-Array-Based Method for Detecting Copy Number Variants Using Affymetrix High Density SNP Arrays and its Application to Breast Cancer. Cancer Inform 2014; 13:95-103. [PMID: 26279618 PMCID: PMC4519351 DOI: 10.4137/cin.s15203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2014] [Revised: 06/03/2015] [Accepted: 06/04/2015] [Indexed: 11/06/2022] Open
Abstract
Cumulative evidence has shown that structural variations, due to insertions, deletions, and inversions of DNA, may contribute considerably to the development of complex human diseases, such as breast cancer. High-throughput genotyping technologies, such as Affymetrix high density single-nucleotide polymorphism (SNP) arrays, have produced large amounts of genetic data for genome-wide SNP genotype calling and copy number estimation. Meanwhile, there is a great need for accurate and efficient statistical methods to detect copy number variants. In this article, we introduce a hidden-Markov-model (HMM)-based method, referred to as the PICR-CNV, for copy number inference. The proposed method first estimates copy number abundance for each single SNP on a single array based on the raw fluorescence values, and then standardizes the estimated copy number abundance to achieve equal footing among multiple arrays. This method requires no between-array normalization, and thus, maintains data integrity and independence of samples among individual subjects. In addition to our efforts to apply new statistical technology to raw fluorescence values, the HMM has been applied to the standardized copy number abundance in order to reduce experimental noise. Through simulations, we show our refined method is able to infer copy number variants accurately. Application of the proposed method to a breast cancer dataset helps to identify genomic regions significantly associated with the disease.
Collapse
Affiliation(s)
- Ming Li
- Division of Biostatistics, Department of Pediatrics, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - Yalu Wen
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing MI, USA
| | - Wenjiang Fu
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing MI, USA. ; Department of Mathematics, University of Houston, Houston, TX, USA
| |
Collapse
|
67
|
Brito I, Hupé P, Neuvial P, Barillot E. Stability-based comparison of class discovery methods for DNA copy number profiles. PLoS One 2013; 8:e81458. [PMID: 24339933 PMCID: PMC3855312 DOI: 10.1371/journal.pone.0081458] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2010] [Accepted: 10/22/2013] [Indexed: 11/19/2022] Open
Abstract
MOTIVATION Array-CGH can be used to determine DNA copy number, imbalances in which are a fundamental factor in the genesis and progression of tumors. The discovery of classes with similar patterns of array-CGH profiles therefore adds to our understanding of cancer and the treatment of patients. Various input data representations for array-CGH, dissimilarity measures between tumor samples and clustering algorithms may be used for this purpose. The choice between procedures is often difficult. An evaluation procedure is therefore required to select the best class discovery method (combination of one input data representation, one dissimilarity measure and one clustering algorithm) for array-CGH. Robustness of the resulting classes is a common requirement, but no stability-based comparison of class discovery methods for array-CGH profiles has ever been reported. RESULTS We applied several class discovery methods and evaluated the stability of their solutions, with a modified version of Bertoni's [Formula: see text]-based test [1]. Our version relaxes the assumption of independency required by original Bertoni's [Formula: see text]-based test. We conclude that Minimal Regions of alteration (a concept introduced by [2]) for input data representation, sim [3] or agree [4] for dissimilarity measure and the use of average group distance in the clustering algorithm produce the most robust classes of array-CGH profiles. AVAILABILITY The software is available from http://bioinfo.curie.fr/projects/cgh-clustering. It has also been partly integrated into "Visualization and analysis of array-CGH"(VAMP)[5]. The data sets used are publicly available from ACTuDB [6].
Collapse
Affiliation(s)
- Isabel Brito
- Institut Curie, Paris, France
- INSERM, U900, Paris, France
- Mines ParisTech, Fontainebleau, France
| | - Philippe Hupé
- Institut Curie, Paris, France
- INSERM, U900, Paris, France
- Mines ParisTech, Fontainebleau, France
- CNRS UMR144, Paris, France
| | - Pierre Neuvial
- Laboratoire Statistique & Génome, Université d′Évry Val d′Essonne, UMR CNRS 8071-USC INRA, Évry, France
| | - Emmanuel Barillot
- Institut Curie, Paris, France
- INSERM, U900, Paris, France
- Mines ParisTech, Fontainebleau, France
| |
Collapse
|
68
|
Luong TM, Rozenholc Y, Nuel G. Fast estimation of posterior probabilities in change-point analysis through a constrained hidden Markov model. Comput Stat Data Anal 2013. [DOI: 10.1016/j.csda.2013.06.020] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
69
|
Roy S, Motsinger Reif A. Evaluation of calling algorithms for array-CGH. Front Genet 2013; 4:217. [PMID: 24298279 PMCID: PMC3829466 DOI: 10.3389/fgene.2013.00217] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2013] [Accepted: 10/07/2013] [Indexed: 11/22/2022] Open
Abstract
Copy number variation (CNV) detection has become an integral part many of genetic studies and new technologies promise to revolutionize our ability to detect and link them to disease. However, recent studies highlight discrepancies in the genome wide CNV profile when measured by different technologies and even by the same technology. Furthermore, the change point algorithms used to call CNVs can have substantial disagreement on the same data set. We focus this article on comparative genomic hybridization (CGH) arrays because this platform lends itself well to accurate statistical modeling. We describe some newer methodological developments in local statistics that are well suited for CNV detection and calling on CGH arrays. Then we use both simulation studies and public data to compare these new local methods with the global methods that currently dominate literature. These results offer suggestions for choosing a particular method and provide insight to the lack of reproducibility that has been seen in the field so far.
Collapse
Affiliation(s)
- Siddharth Roy
- Department of Statistics, College of Physical and Mathematical Sciences, North Carolina State University Raleigh, NC, USA
| | | |
Collapse
|
70
|
Comparing Segmentation Methods for Genome Annotation Based on RNA-Seq Data. JOURNAL OF AGRICULTURAL BIOLOGICAL AND ENVIRONMENTAL STATISTICS 2013. [DOI: 10.1007/s13253-013-0159-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
71
|
de Rinaldis E, Gazinska P, Mera A, Modrusan Z, Fedorowicz GM, Burford B, Gillett C, Marra P, Grigoriadis A, Dornan D, Holmberg L, Pinder S, Tutt A. Integrated genomic analysis of triple-negative breast cancers reveals novel microRNAs associated with clinical and molecular phenotypes and sheds light on the pathways they control. BMC Genomics 2013; 14:643. [PMID: 24059244 PMCID: PMC4008358 DOI: 10.1186/1471-2164-14-643] [Citation(s) in RCA: 69] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2013] [Accepted: 08/09/2013] [Indexed: 11/20/2022] Open
Abstract
Background This study focuses on the analysis of miRNAs expression data in a cohort of 181 well characterised breast cancer samples composed primarily of triple-negative (ER/PR/HER2-negative) tumours with associated genome-wide DNA and mRNA data, extensive patient follow-up and pathological information. Results We identified 7 miRNAs associated with prognosis in the triple-negative tumours and an additional 7 when the analysis was extended to the set of all ER-negative cases. miRNAs linked to an unfavourable prognosis were associated with a broad spectrum of motility mechanisms involved in the invasion of stromal tissues, such as cell-adhesion, growth factor-mediated signalling pathways, interaction with the extracellular matrix and cytoskeleton remodelling. When we compared different intrinsic molecular subtypes we found 46 miRNAs that were specifically expressed in one or more intrinsic subtypes. Integrated genomic analyses indicated these miRNAs to be influenced by DNA genomic aberrations and to have an overall influence on the expression levels of their predicted targets. Among others, our analyses highlighted the role of miR-17-92 and miR-106b-25, two polycistronic miRNA clusters with known oncogenic functions. We showed that their basal-like subtype specific up-regulation is influenced by increased DNA copy number and contributes to the transcriptional phenotype as well as the activation of oncogenic pathways in basal-like tumours. Conclusions This study analyses previously unreported miRNA, mRNA and DNA data and integrates these with pathological and clinical information, from a well-annotated cohort of breast cancers enriched for triple-negative subtypes. It provides a conceptual framework, as well as integrative methods and system-level results and contributes to elucidate the role of miRNAs as biomarkers and modulators of oncogenic processes in these types of tumours.
Collapse
Affiliation(s)
- Emanuele de Rinaldis
- Breakthrough Breast Cancer Research Unit, Division of Cancer Studies, School of Medicine, King's College London, Guy's Hospital, London, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
72
|
Herrmann S, Schwender H, Ickstadt K, Müller P. A Bayesian changepoint analysis of ChIP-Seq data of Lamin B. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2013; 1844:138-44. [PMID: 24036208 DOI: 10.1016/j.bbapap.2013.09.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/17/2012] [Revised: 07/19/2013] [Accepted: 09/01/2013] [Indexed: 11/17/2022]
Abstract
The spatial organisation of the chromosomes in the nucleus is influenced by chromatin regions binding to the nucleic lamina, i.e., the inner part of the nucleic envelope. To investigate the architecture of chromosomes in the interphase nucleus, it is thus of high interest to detect such chromatin segments. This goal can be achieved by considering the fibrous protein Lamin B as a surrogate, since regions of high abundance of Lamin B can indicate chromatin segments attached to the nucleic lamina. We analyse ChIP-Seq (Chromatin-Immunoprecipitation Sequencing) data from an experiment that is designed to record Lamin B abundance. We introduce a Bayesian segmentation procedure in which a Markov Chain Monte Carlo (MCMC) algorithm is used for inference about the desired segmentation. The procedure is based on a Bayesian hierarchical model. Inference allows the distinction between regions of high versus low levels of Lamin B, and therefore, gives an insight into the binding of the chromatin to the nucleic envelope. An implementation of this approach is available in the statistical software environment R. This article is part of a special issue entitled: Computational proteomics in the post-identification era. Guest Editors: Martin Eisenacher and Christian Stephan.
Collapse
Affiliation(s)
- S Herrmann
- Faculty of Statistics, TU Dortmund University, Dortmund, Germany
| | | | | | | |
Collapse
|
73
|
Metzger J, Philipp U, Lopes MS, da Camara Machado A, Felicetti M, Silvestrelli M, Distl O. Analysis of copy number variants by three detection algorithms and their association with body size in horses. BMC Genomics 2013; 14:487. [PMID: 23865711 PMCID: PMC3720552 DOI: 10.1186/1471-2164-14-487] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2012] [Accepted: 07/15/2013] [Indexed: 12/14/2022] Open
Abstract
Background Copy number variants (CNVs) have been shown to play an important role in genetic diversity of mammals and in the development of many complex phenotypic traits. The aim of this study was to perform a standard comparative evaluation of CNVs in horses using three different CNV detection programs and to identify genomic regions associated with body size in horses. Results Analysis was performed using the Illumina Equine SNP50 genotyping beadchip for 854 horses. CNVs were detected by three different algorithms, CNVPartition, PennCNV and QuantiSNP. Comparative analysis revealed 50 CNVs that affected 153 different genes mainly involved in sensory perception, signal transduction and cellular components. Genome-wide association analysis for body size showed highly significant deleted regions on ECA1, ECA8 and ECA9. Homologous regions to the detected CNVs on ECA1 and ECA9 have also been shown to be correlated with human height. Conclusions Comparative analysis of CNV detection algorithms was useful to increase the specificity of CNV detection but had certain limitations dependent on the detection tool. GWAS revealed genome-wide associated CNVs for body size in horses.
Collapse
Affiliation(s)
- Julia Metzger
- Institute for Animal Breeding and Genetics, University of Veterinary Medicine Hannover, Bünteweg 17p, 30559, Hannover, Germany
| | | | | | | | | | | | | |
Collapse
|
74
|
Comparative Analysis of CNV Calling Algorithms: Literature Survey and a Case Study Using Bovine High-Density SNP Data. MICROARRAYS 2013; 2:171-85. [PMID: 27605188 PMCID: PMC5003459 DOI: 10.3390/microarrays2030171] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/02/2013] [Revised: 06/04/2013] [Accepted: 06/05/2013] [Indexed: 11/23/2022]
Abstract
Copy number variations (CNVs) are gains and losses of genomic sequence between two individuals of a species when compared to a reference genome. The data from single nucleotide polymorphism (SNP) microarrays are now routinely used for genotyping, but they also can be utilized for copy number detection. Substantial progress has been made in array design and CNV calling algorithms and at least 10 comparison studies in humans have been published to assess them. In this review, we first survey the literature on existing microarray platforms and CNV calling algorithms. We then examine a number of CNV calling tools to evaluate their impacts using bovine high-density SNP data. Large incongruities in the results from different CNV calling tools highlight the need for standardizing array data collection, quality assessment and experimental validation. Only after careful experimental design and rigorous data filtering can the impacts of CNVs on both normal phenotypic variability and disease susceptibility be fully revealed.
Collapse
|
75
|
Gandolfi A, Benelli M, Magi A, Chiti S. Moment estimation in discrete shifting level model applied to fast array-CGH segmentation. STAT NEERL 2013. [DOI: 10.1111/stan.12005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Affiliation(s)
- A. Gandolfi
- Dipartimento di Matematica U. Dini; Università di Firenze; Viale Morgagni 67/A; 50134; Florence; Italy
| | | | | | - S. Chiti
- Dipartimento di Matematica U. Dini; Università di Firenze; Viale Morgagni 67/A; 50134; Florence; Italy
| |
Collapse
|
76
|
Duan J, Zhang JG, Deng HW, Wang YP. CNV-TV: a robust method to discover copy number variation from short sequencing reads. BMC Bioinformatics 2013; 14:150. [PMID: 23634703 PMCID: PMC3679874 DOI: 10.1186/1471-2105-14-150] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2012] [Accepted: 04/19/2013] [Indexed: 11/29/2022] Open
Abstract
Background Copy number variation (CNV) is an important structural variation (SV) in human genome. Various studies have shown that CNVs are associated with complex diseases. Traditional CNV detection methods such as fluorescence in situ hybridization (FISH) and array comparative genomic hybridization (aCGH) suffer from low resolution. The next generation sequencing (NGS) technique promises a higher resolution detection of CNVs and several methods were recently proposed for realizing such a promise. However, the performances of these methods are not robust under some conditions, e.g., some of them may fail to detect CNVs of short sizes. There has been a strong demand for reliable detection of CNVs from high resolution NGS data. Results A novel and robust method to detect CNV from short sequencing reads is proposed in this study. The detection of CNV is modeled as a change-point detection from the read depth (RD) signal derived from the NGS, which is fitted with a total variation (TV) penalized least squares model. The performance (e.g., sensitivity and specificity) of the proposed approach are evaluated by comparison with several recently published methods on both simulated and real data from the 1000 Genomes Project. Conclusion The experimental results showed that both the true positive rate and false positive rate of the proposed detection method do not change significantly for CNVs with different copy numbers and lengthes, when compared with several existing methods. Therefore, our proposed approach results in a more reliable detection of CNVs than the existing methods.
Collapse
Affiliation(s)
- Junbo Duan
- Department of Biomedical Engineering, Tulane University, New Orleans, LA, USA
| | | | | | | |
Collapse
|
77
|
Duan J, Zhang JG, Deng HW, Wang YP. Comparative studies of copy number variation detection methods for next-generation sequencing technologies. PLoS One 2013; 8:e59128. [PMID: 23527109 PMCID: PMC3604020 DOI: 10.1371/journal.pone.0059128] [Citation(s) in RCA: 116] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2012] [Accepted: 02/12/2013] [Indexed: 11/25/2022] Open
Abstract
Copy number variation (CNV) has played an important role in studies of susceptibility or resistance to complex diseases. Traditional methods such as fluorescence in situ hybridization (FISH) and array comparative genomic hybridization (aCGH) suffer from low resolution of genomic regions. Following the emergence of next generation sequencing (NGS) technologies, CNV detection methods based on the short read data have recently been developed. However, due to the relatively young age of the procedures, their performance is not fully understood. To help investigators choose suitable methods to detect CNVs, comparative studies are needed. We compared six publicly available CNV detection methods: CNV-seq, FREEC, readDepth, CNVnator, SegSeq and event-wise testing (EWT). They are evaluated both on simulated and real data with different experiment settings. The receiver operating characteristic (ROC) curve is employed to demonstrate the detection performance in terms of sensitivity and specificity, box plot is employed to compare their performances in terms of breakpoint and copy number estimation, Venn diagram is employed to show the consistency among these methods, and F-score is employed to show the overlapping quality of detected CNVs. The computational demands are also studied. The results of our work provide a comprehensive evaluation on the performances of the selected CNV detection methods, which will help biological investigators choose the best possible method.
Collapse
Affiliation(s)
- Junbo Duan
- Department of Biomedical Engineering, Tulane University, New Orleans, Louisiana, United States of America
- Center for Bioinformatics and Genomics, Tulane University, New Orleans, Louisiana, United States of America
| | - Ji-Gang Zhang
- Department of Biostatistics and Bioinformatics, Tulane University, New Orleans, Louisiana, United States of America
- Center for Bioinformatics and Genomics, Tulane University, New Orleans, Louisiana, United States of America
| | - Hong-Wen Deng
- Department of Biomedical Engineering, Tulane University, New Orleans, Louisiana, United States of America
- Department of Biostatistics and Bioinformatics, Tulane University, New Orleans, Louisiana, United States of America
- Center for Bioinformatics and Genomics, Tulane University, New Orleans, Louisiana, United States of America
| | - Yu-Ping Wang
- Department of Biomedical Engineering, Tulane University, New Orleans, Louisiana, United States of America
- Department of Biostatistics and Bioinformatics, Tulane University, New Orleans, Louisiana, United States of America
- Center for Bioinformatics and Genomics, Tulane University, New Orleans, Louisiana, United States of America
- * E-mail:
| |
Collapse
|
78
|
Zhou X, Yang C, Wan X, Zhao H, Yu W. Multisample aCGH data analysis via total variation and spectral regularization. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:230-235. [PMID: 23702561 PMCID: PMC3715577 DOI: 10.1109/tcbb.2012.166] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
DNA copy number variation (CNV) accounts for a large proportion of genetic variation. One commonly used approach to detecting CNVs is array-based comparative genomic hybridization (aCGH). Although many methods have been proposed to analyze aCGH data, it is not clear how to combine information from multiple samples to improve CNV detection. In this paper, we propose to use a matrix to approximate the multisample aCGH data and minimize the total variation of each sample as well as the nuclear norm of the whole matrix. In this way, we can make use of the smoothness property of each sample and the correlation among multiple samples simultaneously in a convex optimization framework. We also developed an efficient and scalable algorithm to handle large-scale data. Experiments demonstrate that the proposed method outperforms the state-of-the-art techniques under a wide range of scenarios and it is capable of processing large data sets with millions of probes.
Collapse
Affiliation(s)
- Xiaowei Zhou
- Department of Electronic and Computer Engineering, Hong Kong University of Science and Technology, Hong Kong, China.
| | | | | | | | | |
Collapse
|
79
|
Comparative analysis of methods for identifying recurrent copy number alterations in cancer. PLoS One 2012; 7:e52516. [PMID: 23285074 PMCID: PMC3527554 DOI: 10.1371/journal.pone.0052516] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2012] [Accepted: 11/14/2012] [Indexed: 11/19/2022] Open
Abstract
Recurrent copy number alterations (CNAs) play an important role in cancer genesis. While a number of computational methods have been proposed for identifying such CNAs, their relative merits remain largely unknown in practice since very few efforts have been focused on comparative analysis of the methods. To facilitate studies of recurrent CNA identification in cancer genome, it is imperative to conduct a comprehensive comparison of performance and limitations among existing methods. In this paper, six representative methods proposed in the latest six years are compared. These include one-stage and two-stage approaches, working with raw intensity ratio data and discretized data respectively. They are based on various techniques such as kernel regression, correlation matrix diagonal segmentation, semi-parametric permutation and cyclic permutation schemes. We explore multiple criteria including type I error rate, detection power, Receiver Operating Characteristics (ROC) curve and the area under curve (AUC), and computational complexity, to evaluate performance of the methods under multiple simulation scenarios. We also characterize their abilities on applications to two real datasets obtained from cancers with lung adenocarcinoma and glioblastoma. This comparison study reveals general characteristics of the existing methods for identifying recurrent CNAs, and further provides new insights into their strengths and weaknesses. It is believed helpful to accelerate the development of novel and improved methods.
Collapse
|
80
|
A discovery resource of rare copy number variations in individuals with autism spectrum disorder. G3-GENES GENOMES GENETICS 2012; 2:1665-85. [PMID: 23275889 PMCID: PMC3516488 DOI: 10.1534/g3.112.004689] [Citation(s) in RCA: 145] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/01/2012] [Accepted: 10/24/2012] [Indexed: 12/15/2022]
Abstract
The identification of rare inherited and de novo copy number variations (CNVs) in human subjects has proven a productive approach to highlight risk genes for autism spectrum disorder (ASD). A variety of microarrays are available to detect CNVs, including single-nucleotide polymorphism (SNP) arrays and comparative genomic hybridization (CGH) arrays. Here, we examine a cohort of 696 unrelated ASD cases using a high-resolution one-million feature CGH microarray, the majority of which were previously genotyped with SNP arrays. Our objective was to discover new CNVs in ASD cases that were not detected by SNP microarray analysis and to delineate novel ASD risk loci via combined analysis of CGH and SNP array data sets on the ASD cohort and CGH data on an additional 1000 control samples. Of the 615 ASD cases analyzed on both SNP and CGH arrays, we found that 13,572 of 21,346 (64%) of the CNVs were exclusively detected by the CGH array. Several of the CGH-specific CNVs are rare in population frequency and impact previously reported ASD genes (e.g., NRXN1, GRM8, DPYD), as well as novel ASD candidate genes (e.g., CIB2, DAPP1, SAE1), and all were inherited except for a de novo CNV in the GPHN gene. A functional enrichment test of gene-sets in ASD cases over controls revealed nucleotide metabolism as a potential novel pathway involved in ASD, which includes several candidate genes for follow-up (e.g., DPYD, UPB1, UPP1, TYMP). Finally, this extensively phenotyped and genotyped ASD clinical cohort serves as an invaluable resource for the next step of genome sequencing for complete genetic variation detection.
Collapse
|
81
|
Tarabichi M, Detours V, Konopka T. Piecewise polynomial representations of genomic tracks. PLoS One 2012; 7:e48941. [PMID: 23166601 PMCID: PMC3499510 DOI: 10.1371/journal.pone.0048941] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2012] [Accepted: 10/01/2012] [Indexed: 01/17/2023] Open
Abstract
Genomic data from micro-array and sequencing projects consist of associations of measured values to chromosomal coordinates. These associations can be thought of as functions in one dimension and can thus be stored, analyzed, and interpreted as piecewise-polynomial curves. We present a general framework for building piecewise polynomial representations of genome-scale signals and illustrate some of its applications via examples. We show that piecewise constant segmentation, a typical step in copy-number analyses, can be carried out within this framework for both array and (DNA) sequencing data offering advantages over existing methods in each case. Higher-order polynomial curves can be used, for example, to detect trends and/or discontinuities in transcription levels from RNA-seq data. We give a concrete application of piecewise linear functions to diagnose and quantify alignment quality at exon borders (splice sites). Our software (source and object code) for building piecewise polynomial models is available at http://sourceforge.net/projects/locsmoc/.
Collapse
Affiliation(s)
| | - Vincent Detours
- IRIBHM, Université Libre de Bruxelles, Brussels, Belgium
- Welbio, Université Libre de Bruxelles, Brussels, Belgium
| | - Tomasz Konopka
- IRIBHM, Université Libre de Bruxelles, Brussels, Belgium
| |
Collapse
|
82
|
Grigoriadis A, Mackay A, Noel E, Wu PJ, Natrajan R, Frankum J, Reis-Filho JS, Tutt A. Molecular characterisation of cell line models for triple-negative breast cancers. BMC Genomics 2012; 13:619. [PMID: 23151021 PMCID: PMC3546428 DOI: 10.1186/1471-2164-13-619] [Citation(s) in RCA: 60] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2012] [Accepted: 10/31/2012] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND Triple-negative breast cancers (BC) represent a heterogeneous subtype of BCs, generally associated with an aggressive clinical course and where targeted therapies are currently limited. Target validation studies for all BC subtypes have largely employed established BC cell lines, which have proven to be effective tools for drug discovery. RESULTS Given the lines of evidence suggesting that BC cell lines are effective tools for drug discovery, we assessed the similarities between triple-negative BCs and cell lines, to identify in vitro representatives, modelling the diversity within this BC subtype. 25 BC cell lines, enriched for those lacking ER, PR and HER2 expression, were subjected to transcriptomic, genomic and epigenomic profiling analyses and comparisons were made to existing knowledge of corresponding perturbations in triple-negative BCs. Transcriptional analysis segregated ER-negative BC cell lines into three groups, displaying distinctive abundances for genes involved in epithelial-mesenchymal transition, apocrine and high-grade carcinomas. DNA copy number aberrations of triple-negative BCs were well represented in cell lines and genes with coordinately altered gene expression showed similar patterns in tumours and cell lines. Methylation events in triple-negative BCs were mostly retained in epigenomes of cell lines. Combined methylation and gene expression analyses revealed a subset of genes characteristic of the Claudin-low BC subtype, exhibiting epigenetic-regulated gene expression in BC cell lines and tumours, suggesting that methylation patterns are likely to underpin subtype-specificity. CONCLUSION Here, we provide a comprehensive analysis of triple-negative BC features on several molecular levels in BC cell lines, thereby creating an in-depth resource to access the suitability of individual lines as experimental models for studying BC tumour biology, biomarkers and possible therapeutic targets in the context of preclinical target validation.
Collapse
Affiliation(s)
- Anita Grigoriadis
- Breakthrough Breast Cancer Research Unit, Guy's Hospital, King's Health Partners AHSC, King's College London School of Medicine, London SE1 9RT, UK.
| | | | | | | | | | | | | | | |
Collapse
|
83
|
Li W, Olivier M. Current analysis platforms and methods for detecting copy number variation. Physiol Genomics 2012; 45:1-16. [PMID: 23132758 DOI: 10.1152/physiolgenomics.00082.2012] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Copy number variation (CNV), generated through duplication or deletion events that affect one or more loci, is widespread in the human genomes and is often associated with functional consequences that may include changes in gene expression levels or fusion of genes. Genome-wide association studies indicate that some disease phenotypes and physiological pathways might be impacted by CNV in a small number of characterized genomic regions. However, the pervasiveness and full impact of such variation remains unclear. Suitable analytic methods are needed to thoroughly mine human genomes for genomic structural variation, and to explore the interplay between observed CNV and disease phenotypes, but many medical researchers are unfamiliar with the features and nuances of recently developed technologies for detecting CNV. In this article, we evaluate a suite of commonly used and recently developed approaches to uncovering genome-wide CNVs and discuss the relative merits of each.
Collapse
Affiliation(s)
- Wenli Li
- Biotechnology and Bioengineering Center, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | | |
Collapse
|
84
|
Nilsen G, Liestøl K, Van Loo P, Moen Vollan HK, Eide MB, Rueda OM, Chin SF, Russell R, Baumbusch LO, Caldas C, Børresen-Dale AL, Lingjaerde OC. Copynumber: Efficient algorithms for single- and multi-track copy number segmentation. BMC Genomics 2012; 13:591. [PMID: 23442169 PMCID: PMC3582591 DOI: 10.1186/1471-2164-13-591] [Citation(s) in RCA: 212] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2012] [Accepted: 10/15/2012] [Indexed: 12/15/2022] Open
Abstract
Background Cancer progression is associated with genomic instability and an accumulation of gains and losses of DNA. The growing variety of tools for measuring genomic copy numbers, including various types of array-CGH, SNP arrays and high-throughput sequencing, calls for a coherent framework offering unified and consistent handling of single- and multi-track segmentation problems. In addition, there is a demand for highly computationally efficient segmentation algorithms, due to the emergence of very high density scans of copy number. Results A comprehensive Bioconductor package for copy number analysis is presented. The package offers a unified framework for single sample, multi-sample and multi-track segmentation and is based on statistically sound penalized least squares principles. Conditional on the number of breakpoints, the estimates are optimal in the least squares sense. A novel and computationally highly efficient algorithm is proposed that utilizes vector-based operations in R. Three case studies are presented. Conclusions The R package copynumber is a software suite for segmentation of single- and multi-track copy number data using algorithms based on coherent least squares principles.
Collapse
Affiliation(s)
- Gro Nilsen
- Biomedical Informatics, Dept of Informatics, University of Oslo, Oslo, Norway
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
85
|
Kim SY, Kim JH, Chung YJ. Effect of Combining Multiple CNV Defining Algorithms on the Reliability of CNV Calls from SNP Genotyping Data. Genomics Inform 2012; 10:194-9. [PMID: 23166530 PMCID: PMC3492655 DOI: 10.5808/gi.2012.10.3.194] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2012] [Revised: 08/20/2012] [Accepted: 08/23/2012] [Indexed: 01/11/2023] Open
Abstract
In addition to single-nucleotide polymorphisms (SNP), copy number variation (CNV) is a major component of human genetic diversity. Among many whole-genome analysis platforms, SNP arrays have been commonly used for genomewide CNV discovery. Recently, a number of CNV defining algorithms from SNP genotyping data have been developed; however, due to the fundamental limitation of SNP genotyping data for the measurement of signal intensity, there are still concerns regarding the possibility of false discovery or low sensitivity for detecting CNVs. In this study, we aimed to verify the effect of combining multiple CNV calling algorithms and set up the most reliable pipeline for CNV calling with Affymetrix Genomewide SNP 5.0 data. For this purpose, we selected the 3 most commonly used algorithms for CNV segmentation from SNP genotyping data, PennCNV, QuantiSNP; and BirdSuite. After defining the CNV loci using the 3 different algorithms, we assessed how many of them overlapped with each other, and we also validated the CNVs by genomic quantitative PCR. Through this analysis, we proposed that for reliable CNV-based genomewide association study using SNP array data, CNV calls must be performed with at least 3 different algorithms and that the CNVs consistently called from more than 2 algorithms must be used for association analysis, because they are more reliable than the CNVs called from a single algorithm. Our result will be helpful to set up the CNV analysis protocols for Affymetrix Genomewide SNP 5.0 genotyping data.
Collapse
Affiliation(s)
- Soon-Young Kim
- Integrated Research Center for Genome Polymorphism, The Catholic University of Korea School of Medicine, Seoul 137-701, Korea. ; Department of Microbiology, The Catholic University of Korea School of Medicine, Seoul 137-701, Korea
| | | | | |
Collapse
|
86
|
Abstract
Genomic microarrays are now widely used diagnostically for the molecular karyotyping of patients with intellectual disability, congenital anomalies and autistic spectrum disorder and have more recently been applied for the detection of genomic imbalances in prenatal genetic diagnosis. We present an overview of the different arrays, protocols used and discuss methods of genomic array data analysis.
Collapse
Affiliation(s)
- Paul D Brady
- Laboratory for Cytogenetics and Genome Research, Centre for Human Genetics, University Hospital Leuven, K.U. Leuven, Leuven, Belgium
| | | |
Collapse
|
87
|
Chen AJ, Paik JH, Zhang H, Shukla SA, Mortensen R, Hu J, Ying H, Hu B, Hurt J, Farny N, Dong C, Xiao Y, Wang YA, Silver PA, Chin L, Vasudevan S, Depinho RA. STAR RNA-binding protein Quaking suppresses cancer via stabilization of specific miRNA. Genes Dev 2012; 26:1459-72. [PMID: 22751500 DOI: 10.1101/gad.189001.112] [Citation(s) in RCA: 92] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Multidimensional cancer genome analysis and validation has defined Quaking (QKI), a member of the signal transduction and activation of RNA (STAR) family of RNA-binding proteins, as a novel glioblastoma multiforme (GBM) tumor suppressor. Here, we establish that p53 directly regulates QKI gene expression, and QKI protein associates with and leads to the stabilization of miR-20a; miR-20a, in turn, regulates TGFβR2 and the TGFβ signaling network. This pathway circuitry is substantiated by in silico epistasis analysis of its components in the human GBM TCGA (The Cancer Genome Atlas Project) collection and by their gain- and loss-of-function interactions in in vitro and in vivo complementation studies. This p53-QKI-miR-20a-TGFβ pathway expands our understanding of the p53 tumor suppression network in cancer and reveals a novel tumor suppression mechanism involving regulation of specific cancer-relevant microRNAs.
Collapse
Affiliation(s)
- An-Jou Chen
- Belfer Institute for Applied Cancer Science, Harvard Medical School, Boston, Massachusetts 02115, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
88
|
Vermeesch JR, Brady PD, Sanlaville D, Kok K, Hastings RJ. Genome-wide arrays: quality criteria and platforms to be used in routine diagnostics. Hum Mutat 2012; 33:906-15. [PMID: 22415865 DOI: 10.1002/humu.22076] [Citation(s) in RCA: 60] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Abstract
Whole-genome analysis using genome-wide arrays, also called "genomic arrays," "microarrays," or "arrays," has become the first-tier diagnostic test for patients with developmental abnormalities and/or intellectual disabilities. In addition to constitutional anomalies, genomic arrays are also used to diagnose acquired disorders. Despite the rapid implementation of these technologies in diagnostic laboratories, external quality control schemes (such as CEQA, EMQN, UK NEQAS, and the USA QA scheme CAP) and interlaboratory comparisons show that there are huge differences in quality, interpretation, and reporting among laboratories. We offer guidance to laboratories to help assure the quality of array experiments and to standardize minimum detection resolution, and we also provide guidelines to standardize interpretation and reporting.
Collapse
Affiliation(s)
- Joris R Vermeesch
- Laboratory for Cytogenetics and Genome Research, Centre for Human Genetics, KU Leuven, University Hospital Leuven, Leuven, Belgium.
| | | | | | | | | |
Collapse
|
89
|
Shi J, Li P. An integrative segmentation method for detecting germline copy number variations in SNP arrays. Genet Epidemiol 2012; 36:373-83. [PMID: 22539397 DOI: 10.1002/gepi.21631] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Germline copy number variations (CNVs) are a major source of genetic variation in humans. In large-scale studies of complex diseases, CNVs are usually detected from data generated by single nucleotide polymorphism (SNP) genotyping arrays. In this paper, we develop an integrative segmentation method, SegCNV, for detecting CNVs integrating both log R ratio (LRR) and B allele frequency (BAF). Based on simulation studies, SegCNV had modestly better power to detect deletions and substantially better power to detect duplications compared with circular binary segmentation (CBS) that relies purely on LRRs; and it had better power to detect deletions and a comparable performance to detect duplications compared with PennCNV and QuantiSNP. In two Hapmap subjects with deep sequence data available as a gold standard, SegCNV detected more true short deletions than PennCNV and QuantiSNP. For 21 short duplications validated experimentally in the AGRE dataset, SegCNV, QuantiSNP, and PennCNV detected all of them while CBS detected only three. SegCNV is much faster than the HMM-based (where HMM is hidden Markov model) methods, taking only several seconds to analyze genome-wide data for one subject.
Collapse
Affiliation(s)
- Jianxin Shi
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland 20854, USA.
| | | |
Collapse
|
90
|
Niu YS, Zhang H. The screening and ranking algorithm to detect DNA copy number variations. Ann Appl Stat 2012. [DOI: 10.1214/12-aoas539] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
91
|
Marked genetic differences between BRAF and NRAS mutated primary melanomas as revealed by array comparative genomic hybridization. Melanoma Res 2012; 22:202-14. [PMID: 22456166 DOI: 10.1097/cmr.0b013e328352dbc8] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Somatic mutations of BRAF and NRAS oncogenes are thought to be among the first steps in melanoma initiation, but these mutations alone are insufficient to cause tumor progression. Our group studied the distinct genomic imbalances of primary melanomas harboring different BRAF or NRAS genotypes. We also aimed to highlight regions of change commonly seen together in different melanoma subgroups. Array comparative genomic hybridization was performed to assess copy number changes in 47 primary melanomas. BRAF and NRAS were screened for mutations by melting curve analysis. Reverse transcription PCR and fluorescence in-situ hybridization were performed to confirm the array comparative genomic hybridization results. Pairwise comparisons revealed distinct genomic profiles between melanomas harboring different mutations. Primary melanomas with the BRAF mutation exhibited more frequent losses on 10q23-q26 and gains on chromosome 7 and 1q23-q25 compared with melanomas with the NRAS mutation. Loss on the 11q23-q25 sequence was found mainly in conjunction with the NRAS mutation. Primary melanomas without the BRAF or the NRAS mutation showed frequent alterations in chromosomes 17 and 4. Correlation analysis revealed chromosomal alterations that coexist more often in these tumor subgroups. To find classifiers for BRAF mutation, random forest analysis was used. Fifteen candidates emerged with 87% prediction accuracy. Signaling interactions between the EGF/MAPK-JAK pathways were observed to be extensively altered in melanomas with the BRAF mutation. We found marked differences in the genetic pattern of the BRAF and NRAS mutated melanoma subgroups that might suggest that these mutations contribute to malignant melanoma in conjunction with distinct cooperating oncogenic events.
Collapse
|
92
|
Liu GE, Bickhart DM. Copy number variation in the cattle genome. Funct Integr Genomics 2012; 12:609-24. [DOI: 10.1007/s10142-012-0289-9] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2012] [Revised: 06/13/2012] [Accepted: 06/20/2012] [Indexed: 11/29/2022]
|
93
|
Valsesia A, Stevenson BJ, Waterworth D, Mooser V, Vollenweider P, Waeber G, Jongeneel CV, Beckmann JS, Kutalik Z, Bergmann S. Identification and validation of copy number variants using SNP genotyping arrays from a large clinical cohort. BMC Genomics 2012; 13:241. [PMID: 22702538 PMCID: PMC3464625 DOI: 10.1186/1471-2164-13-241] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2011] [Accepted: 06/15/2012] [Indexed: 01/11/2023] Open
Abstract
BACKGROUND Genotypes obtained with commercial SNP arrays have been extensively used in many large case-control or population-based cohorts for SNP-based genome-wide association studies for a multitude of traits. Yet, these genotypes capture only a small fraction of the variance of the studied traits. Genomic structural variants (GSV) such as Copy Number Variation (CNV) may account for part of the missing heritability, but their comprehensive detection requires either next-generation arrays or sequencing. Sophisticated algorithms that infer CNVs by combining the intensities from SNP-probes for the two alleles can already be used to extract a partial view of such GSV from existing data sets. RESULTS Here we present several advances to facilitate the latter approach. First, we introduce a novel CNV detection method based on a Gaussian Mixture Model. Second, we propose a new algorithm, PCA merge, for combining copy-number profiles from many individuals into consensus regions. We applied both our new methods as well as existing ones to data from 5612 individuals from the CoLaus study who were genotyped on Affymetrix 500K arrays. We developed a number of procedures in order to evaluate the performance of the different methods. This includes comparison with previously published CNVs as well as using a replication sample of 239 individuals, genotyped with Illumina 550K arrays. We also established a new evaluation procedure that employs the fact that related individuals are expected to share their CNVs more frequently than randomly selected individuals. The ability to detect both rare and common CNVs provides a valuable resource that will facilitate association studies exploring potential phenotypic associations with CNVs. CONCLUSION Our new methodologies for CNV detection and their evaluation will help in extracting additional information from the large amount of SNP-genotyping data on various cohorts and use this to explore structural variants and their impact on complex traits.
Collapse
Affiliation(s)
- Armand Valsesia
- Department of Medical Genetics, University of Lausanne, Lausanne, Switzerland
| | | | | | | | | | | | | | | | | | | |
Collapse
|
94
|
Shen JJ, Zhang NR. Change-point model on nonhomogeneous Poisson processes with application in copy number profiling by next-generation DNA sequencing. Ann Appl Stat 2012. [DOI: 10.1214/11-aoas517] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
95
|
Breheny P, Chalise P, Batzler A, Wang L, Fridley BL. Genetic association studies of copy-number variation: should assignment of copy number states precede testing? PLoS One 2012; 7:e34262. [PMID: 22493684 PMCID: PMC3320903 DOI: 10.1371/journal.pone.0034262] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2011] [Accepted: 02/24/2012] [Indexed: 11/18/2022] Open
Abstract
Recently, structural variation in the genome has been implicated in many complex diseases. Using genomewide single nucleotide polymorphism (SNP) arrays, researchers are able to investigate the impact not only of SNP variation, but also of copy-number variants (CNVs) on the phenotype. The most common analytic approach involves estimating, at the level of the individual genome, the underlying number of copies present at each location. Once this is completed, tests are performed to determine the association between copy number state and phenotype. An alternative approach is to carry out association testing first, between phenotype and raw intensities from the SNP array at the level of the individual marker, and then aggregate neighboring test results to identify CNVs associated with the phenotype. Here, we explore the strengths and weaknesses of these two approaches using both simulations and real data from a pharmacogenomic study of the chemotherapeutic agent gemcitabine. Our results indicate that pooled marker-level testing is capable of offering a dramatic increase in power (> 12-fold) over CNV-level testing, particularly for small CNVs. However, CNV-level testing is superior when CNVs are large and rare; understanding these tradeoffs is an important consideration in conducting association studies of structural variation.
Collapse
Affiliation(s)
- Patrick Breheny
- Department of Biostatistics, University of Kentucky, Lexington, Kentucky, United States of America.
| | | | | | | | | |
Collapse
|
96
|
Zheng X, Shaffer JR, McHugh CP, Laurie CC, Feenstra B, Melbye M, Murray JC, Marazita ML, Feingold E. Using family data as a verification standard to evaluate copy number variation calling strategies for genetic association studies. Genet Epidemiol 2012; 36:253-62. [PMID: 22714937 PMCID: PMC3696390 DOI: 10.1002/gepi.21618] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
A major concern for all copy number variation (CNV) detection algorithms is their reliability and repeatability. However, it is difficult to evaluate the reliability of CNV-calling strategies due to the lack of gold-standard data that would tell us which CNVs are real. We propose that if CNVs are called in duplicate samples, or inherited from parent to child, then these can be considered validated CNVs. We used two large family-based genome-wide association study (GWAS) datasets from the GENEVA consortium to look at concordance rates of CNV calls between duplicate samples, parent-child pairs, and unrelated pairs. Our goal was to make recommendations for ways to filter and use CNV calls in GWAS datasets that do not include family data. We used PennCNV as our primary CNV-calling algorithm, and tested CNV calls using different datasets and marker sets, and with various filters on CNVs and samples. Using the Illumina core HumanHap550 single nucleotide polymorphism (SNP) set, we saw duplicate concordance rates of approximately 55% and parent-child transmission rates of approximately 28% in our datasets. GC model adjustment and sample quality filtering had little effect on these reliability measures. Stratification on CNV size and DNA sample type did have some effect. Overall, our results show that it is probably not possible to find a CNV-calling strategy (including filtering and algorithm) that will give us a set of "reliable" CNV calls using current chip technologies. But if we understand the error process, we can still use CNV calls appropriately in genetic association studies.
Collapse
Affiliation(s)
- Xiaojing Zheng
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA 15261, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
97
|
Comparative analysis of algorithms for integration of copy number and expression data. Nat Methods 2012; 9:351-5. [PMID: 22327835 DOI: 10.1038/nmeth.1893] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2011] [Accepted: 01/06/2012] [Indexed: 12/15/2022]
Abstract
Chromosomal instability is a hallmark of cancer, and genes that display abnormal expression in aberrant chromosomal regions are likely to be key players in tumor progression. Identifying such driver genes reliably requires computational methods that can integrate genome-scale data from several sources. We compared the performance of ten algorithms that integrate copy-number and transcriptomics data from 15 head and neck squamous cell carcinoma cell lines, 129 lung squamous cell carcinoma primary tumors and simulated data. Our results revealed clear differences between the methods in terms of sensitivity and specificity as well as in their performance in small and large sample sizes. Results of the comparison are available at http://csbi.ltdk.helsinki.fi/cn2gealgo/.
Collapse
|
98
|
Bickhart DM, Hou Y, Schroeder SG, Alkan C, Cardone MF, Matukumalli LK, Song J, Schnabel RD, Ventura M, Taylor JF, Garcia JF, Van Tassell CP, Sonstegard TS, Eichler EE, Liu GE. Copy number variation of individual cattle genomes using next-generation sequencing. Genome Res 2012; 22:778-90. [PMID: 22300768 DOI: 10.1101/gr.133967.111] [Citation(s) in RCA: 228] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Copy number variations (CNVs) affect a wide range of phenotypic traits; however, CNVs in or near segmental duplication regions are often intractable. Using a read depth approach based on next-generation sequencing, we examined genome-wide copy number differences among five taurine (three Angus, one Holstein, and one Hereford) and one indicine (Nelore) cattle. Within mapped chromosomal sequence, we identified 1265 CNV regions comprising ~55.6-Mbp sequence--476 of which (~38%) have not previously been reported. We validated this sequence-based CNV call set with array comparative genomic hybridization (aCGH), quantitative PCR (qPCR), and fluorescent in situ hybridization (FISH), achieving a validation rate of 82% and a false positive rate of 8%. We further estimated absolute copy numbers for genomic segments and annotated genes in each individual. Surveys of the top 25 most variable genes revealed that the Nelore individual had the lowest copy numbers in 13 cases (~52%, χ(2) test; P-value <0.05). In contrast, genes related to pathogen- and parasite-resistance, such as CATHL4 and ULBP17, were highly duplicated in the Nelore individual relative to the taurine cattle, while genes involved in lipid transport and metabolism, including APOL3 and FABP2, were highly duplicated in the beef breeds. These CNV regions also harbor genes like BPIFA2A (BSP30A) and WC1, suggesting that some CNVs may be associated with breed-specific differences in adaptation, health, and production traits. By providing the first individualized cattle CNV and segmental duplication maps and genome-wide gene copy number estimates, we enable future CNV studies into highly duplicated regions in the cattle genome.
Collapse
Affiliation(s)
- Derek M Bickhart
- USDA-ARS, ANRI, Bovine Functional Genomics Laboratory, Beltsville, Maryland 20705, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
99
|
Seifert M, Gohr A, Strickert M, Grosse I. Parsimonious higher-order hidden Markov models for improved array-CGH analysis with applications to Arabidopsis thaliana. PLoS Comput Biol 2012; 8:e1002286. [PMID: 22253580 PMCID: PMC3257270 DOI: 10.1371/journal.pcbi.1002286] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2011] [Accepted: 10/11/2011] [Indexed: 12/19/2022] Open
Abstract
Array-based comparative genomic hybridization (Array-CGH) is an important technology in molecular biology for the detection of DNA copy number polymorphisms between closely related genomes. Hidden Markov Models (HMMs) are popular tools for the analysis of Array-CGH data, but current methods are only based on first-order HMMs having constrained abilities to model spatial dependencies between measurements of closely adjacent chromosomal regions. Here, we develop parsimonious higher-order HMMs enabling the interpolation between a mixture model ignoring spatial dependencies and a higher-order HMM exhaustively modeling spatial dependencies. We apply parsimonious higher-order HMMs to the analysis of Array-CGH data of the accessions C24 and Col-0 of the model plant Arabidopsis thaliana. We compare these models against first-order HMMs and other existing methods using a reference of known deletions and sequence deviations. We find that parsimonious higher-order HMMs clearly improve the identification of these polymorphisms. Moreover, we perform a functional analysis of identified polymorphisms revealing novel details of genomic differences between C24 and Col-0. Additional model evaluations are done on widely considered Array-CGH data of human cell lines indicating that parsimonious HMMs are also well-suited for the analysis of non-plant specific data. All these results indicate that parsimonious higher-order HMMs are useful for Array-CGH analyses. An implementation of parsimonious higher-order HMMs is available as part of the open source Java library Jstacs (www.jstacs.de/index.php/PHHMM). Array-based comparative genomics is a standard approach for the identification of DNA copy number polymorphisms between closely related genomes. The huge amounts of data produced by these experiments require efficient and accurate bioinformatics tools for the identification of copy number polymorphisms. Hidden Markov Models (HMMs) are frequently used for analyzing such data sets, but current models are based on first-order HMMs only having limited capabilities to model spatial dependencies between measurements of closely adjacent chromosomal regions. We develop parsimonious higher-order HMMs enabling the interpolation between a mixture model ignoring spatial dependencies and a higher-order HMM exhaustively modeling these dependencies to overcome this limitation. In an in-depth case study with Arabidopsis thaliana, we find that parsimonious higher-order HMMs clearly improve the identification of copy number polymorphisms in comparison to standard first-order HMMs and other frequently used methods. Functional analysis of identified polymorphisms revealed details of genomic differences between the accessions C24 and Col-0 of Arabidopsis thaliana. An additional study on human cell lines further indicates that parsimonious HMMs are well-suited for the analysis of Array-CGH data.
Collapse
Affiliation(s)
- Michael Seifert
- Department of Molecular Genetics, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Germany.
| | | | | | | |
Collapse
|
100
|
Lindgren D, Höglund M, Vallon-Christersson J. Genotyping techniques to address diversity in tumors. Adv Cancer Res 2012; 112:151-82. [PMID: 21925304 DOI: 10.1016/b978-0-12-387688-1.00006-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Array-based genotyping platforms have during recent years been established as a valuable tool for the characterization of genomic alterations in cancer. The analysis of tumor samples, however, presents challenges for data analysis and interpretation. For example, tumor samples are often admixed with nonaberrant cells that define the tumor microenvironment, such as infiltrating lymphocytes and fibroblasts, or vasculature. Furthermore, tumors often comprise subclones harboring divergent aberrations that are acquired subsequent to the tumor-initiating event. The combined analysis of both genotype and copy number status obtained by array-based genotyping platforms provide opportunities to address these challenges. In this chapter, we present the basic principles for current array-based genotyping platforms and how they can be used to infer genotype and copy number for acquired genomic alterations. We describe how these techniques can be used to resolve tumor ploidy, normal cell admixture, and subclonality. We also exemplify how genotyping techniques can be applied in tumor studies to elucidate the hierarchy among tumor clones, and thus, provide means to study clonal expansion and tumor evolution.
Collapse
Affiliation(s)
- David Lindgren
- Center for Molecular Pathology, Department of Laboratory Medicine, Lund University, SUS Malmö, Malmö, Sweden
| | | | | |
Collapse
|