1
|
Xi J, Li A, Wang M. HetRCNA: A Novel Method to Identify Recurrent Copy Number Alternations from Heterogeneous Tumor Samples Based on Matrix Decomposition Framework. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:422-434. [PMID: 29994262 DOI: 10.1109/tcbb.2018.2846599] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
A common strategy to discovering cancer associated copy number aberrations (CNAs) from a cohort of cancer samples is to detect recurrent CNAs (RCNAs). Although the previous methods can successfully identify communal RCNAs shared by nearly all tumor samples, detecting subgroup-specific RCNAs and their related subgroup samples from cancer samples with heterogeneity is still invalid for these existing approaches. In this paper, we introduce a novel integrated method called HetRCNA, which can identify statistically significant subgroup-specific RCNAs and their related subgroup samples. Based on matrix decomposition framework with weight constraint, HetRCNA can successfully measure the subgroup samples by coefficients of left vectors with weight constraint and subgroup-specific RCNAs by coefficients of the right vectors and significance test. When we evaluate HetRCNA on simulated dataset, the results show that HetRCNA gives the best performances among the competing methods and is robust to the noise factors of the simulated data. When HetRCNA is applied on a real breast cancer dataset, our approach successfully identifies a bunch of RCNA regions and the result is highly correlated with the results of the other two investigated approaches. Notably, the genomic regions identified by HetRCNA harbor many breast cancer related genes reported by previous researches.
Collapse
|
2
|
Nguyen N, Vo A, Sun H, Huang H. Heavy-Tailed Noise Suppression and Derivative Wavelet Scalogram for Detecting DNA Copy Number Aberrations. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1625-1635. [PMID: 28692986 DOI: 10.1109/tcbb.2017.2723884] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Most existing array comparative genomic hybridization (array CGH) data processing methods and evaluation models assumed that the probability density function (pdf) of noise in array CGH data is a Gaussian distribution. However, in practice, such noise distribution is peaky and heavy-tailed. Therefore, a Gaussian pdf is not adequate to approximate the noise in array CGH data and hence introduces wrong detections of chromosomal aberrations and leads misunderstanding on disease pathogenesis. A more accurate and sufficient model of noise in array CGH data is necessary and beneficial to the detection of DNA copy number variations. We analyze the real array CGH data from different platforms and show that the distribution of noise in array CGH data is fitted very well by generalized Gaussian distribution (GGD). Based on our new noise model, we propose a novel array CGH processing method combining the advantages of both the smoothing and segmentation approaches. The new method uses generalized Gaussian bivariate shrinkage function and one-directional derivative wavelet scalogram in generalized Gaussian noise. In the smoothing step, with the new generalized Gaussian noise model, we derive the heavy-tailed noise suppression algorithm in stationary wavelet domain. In the segmentation step, the 1D Gaussian derivative wavelet scalogram is employed to detect break points. Both real and simulated array CGH data with different noises (such as Gaussian noise, GGD noise, and real noise) are used in our experiments. We demonstrate that our new method outperforms other state-of-the-art methods, in terms of both root mean squared errors and receiver operating characteristic curves.
Collapse
|
3
|
Simam J, Rono M, Ngoi J, Nyonda M, Mok S, Marsh K, Bozdech Z, Mackinnon M. Gene copy number variation in natural populations of Plasmodium falciparum in Eastern Africa. BMC Genomics 2018; 19:372. [PMID: 29783949 PMCID: PMC5963192 DOI: 10.1186/s12864-018-4689-7] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2017] [Accepted: 04/17/2018] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND Gene copy number variants (CNVs), which consist of deletions and amplifications of single or sets of contiguous genes, contribute to the great diversity in the Plasmodium falciparum genome. In vitro studies in the laboratory have revealed their important role in parasite fitness phenotypes such as red cell invasion, transmissibility and cytoadherence. Studies of natural parasite populations indicate that CNVs are also common in the field and thus may facilitate adaptation of the parasite to its local environment. RESULTS In a survey of 183 fresh field isolates from three populations in Eastern Africa with different malaria transmission intensities, we identified 94 CNV loci using microarrays. All CNVs had low population frequencies (minor allele frequency < 5%) but each parasite isolate carried an average of 8 CNVs. Nine CNVs showed high levels of population differentiation (FST > 0.3) and nine exhibited significant clines in population frequency across a gradient in transmission intensity. The clearest example of this was a large deletion on chromosome 9 previously reported only in laboratory-adapted isolates. This deletion was present in 33% of isolates from a population with low and highly seasonal malaria transmission, and in < 9% of isolates from populations with higher transmission. Subsets of CNVs were strongly correlated in their population frequencies, implying co-selection. CONCLUSIONS These results support the hypothesis that CNVs are the target of selection in natural populations of P. falciparum. Their environment-specific patterns observed here imply an important role for them in conferring adaptability to the parasite thus enabling it to persist in its highly diverse ecological environment.
Collapse
Affiliation(s)
| | - Martin Rono
- KEMRI-Wellcome Trust Research Program, Kilifi, Kenya.,Centre for Tropical Medicine and Global Health, Nuffield Department of Clinical Medicine, University of Oxford, Oxford, UK.,Pwani University Bioscience Research Centre, Pwani University, Kilifi, Kenya
| | - Joyce Ngoi
- KEMRI-Wellcome Trust Research Program, Kilifi, Kenya
| | - Mary Nyonda
- Department of Microbiology and Molecular Medicine, Medical Faculty, University of Geneva, Geneva, Switzerland
| | - Sachel Mok
- Department of Microbiology and Immunology, Columbia University, New York, USA
| | - Kevin Marsh
- Centre for Tropical Medicine and Global Health, Nuffield Department of Clinical Medicine, University of Oxford, Oxford, UK
| | - Zbynek Bozdech
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | | |
Collapse
|
4
|
Xi J, Li A. Discovering Recurrent Copy Number Aberrations in Complex Patterns via Non-Negative Sparse Singular Value Decomposition. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:656-668. [PMID: 26372614 DOI: 10.1109/tcbb.2015.2474404] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Recurrent copy number aberrations (RCNAs) in multiple cancer samples are strongly associated with tumorigenesis, and RCNA discovery is helpful to cancer research and treatment. Despite the emergence of numerous RCNA discovering methods, most of them are unable to detect RCNAs in complex patterns that are influenced by complicating factors including aberration in partial samples, co-existing of gains and losses and normal-like tumor samples. Here, we propose a novel computational method, called non-negative sparse singular value decomposition (NN-SSVD), to address the RCNA discovering problem in complex patterns. In NN-SSVD, the measurement of RCNA is based on the aberration frequency in a part of samples rather than all samples, which can circumvent the complexity of different RCNA patterns. We evaluate NN-SSVD on synthetic dataset by comparison on detection scores and Receiver Operating Characteristics curves, and the results show that NN-SSVD outperforms existing methods in RCNA discovery and demonstrate more robustness to RCNA complicating factors. Applying our approach on a breast cancer dataset, we successfully identify a number of genomic regions that are strongly correlated with previous studies, which harbor a bunch of known breast cancer associated genes.
Collapse
|
5
|
Masecchia S, Coco S, Barla A, Verri A, Tonini GP. Genome instability model of metastatic neuroblastoma tumorigenesis by a dictionary learning algorithm. BMC Med Genomics 2015; 8:57. [PMID: 26358114 PMCID: PMC4566396 DOI: 10.1186/s12920-015-0132-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2015] [Accepted: 08/28/2015] [Indexed: 12/21/2022] Open
Abstract
Background Metastatic neuroblastoma (NB) occurs in pediatric patients as stage 4S or stage 4 and it is characterized by heterogeneous clinical behavior associated with diverse genotypes. Tumors of stage 4 contain several structural copy number aberrations (CNAs) rarely found in stage 4S. To date, the NB tumorigenesis is not still elucidated, although it is evident that genomic instability plays a critical role in the genesis of the tumor. Here we propose a mathematical approach to decipher genomic data and we provide a new model of NB metastatic tumorigenesis. Method We elucidate NB tumorigenesis using Enhanced Fused Lasso Latent Feature Model (E-FLLat) modeling the array comparative chromosome hybridization (aCGH) data of 190 metastatic NBs (63 stage 4S and 127 stage 4). This model for aCGH segmentation, based on the minimization of functional dictionary learning (DL), combines several penalties tailored to the specificities of aCGH data. In DL, the original signal is approximated by a linear weighted combination of atoms: the elements of the learned dictionary. Results The hierarchical structures for stage 4S shows at the first level of the oncogenetic tree several whole chromosome gains except to the unbalanced gains of 17q, 2p and 2q. Conversely, the high CNA complexity found in stage 4 tumors, requires two different trees. Both stage 4 oncogenetic trees are marked diverged, up to five sublevels and the 17q gain is the most common event at the first level (2/3 nodes). Moreover the 11q deletion, one of the major unfavorable marker of disease progression, occurs before 3p loss indicating that critical chromosome aberrations appear at early stages of tumorigenesis. Finally, we also observed a significant (p = 0.025) association between patient age and chromosome loss in stage 4 cases. Conclusion These results led us to propose a genome instability progressive model in which NB cells initiate with a DNA synthesis uncoupled from cell division, that leads to stage 4S tumors, primarily characterized by numerical aberrations, or stage 4 tumors with high levels of genome instability resulting in complex chromosome rearrangements associated with high tumor aggressiveness and rapid disease progression. Electronic supplementary material The online version of this article (doi:10.1186/s12920-015-0132-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | - Simona Coco
- Lung Cancer Unit; IRCCS A.O.U. San Martino - IST, Genova, Italy.
| | - Annalisa Barla
- DIBRIS, Università degli Studi di Genova, Genova, Italy.
| | | | - Gian Paolo Tonini
- Neuroblastoma Laboratory, Onco/Hematology Laboratory, Department of Woman and Child Health, University of Padua, Pediatric Research Institute, Fondazione Città della Speranza, Padua, Corso Stati Uniti, 4, 35127, Padua, Italy.
| |
Collapse
|
6
|
Genome-wide copy number variation study reveals KCNIP1 as a modulator of insulin secretion. Genomics 2014; 104:113-20. [DOI: 10.1016/j.ygeno.2014.05.004] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2014] [Revised: 05/19/2014] [Accepted: 05/23/2014] [Indexed: 01/09/2023]
|
7
|
Lin YJ, Chen YT, Hsu SN, Peng CH, Tang CY, Yen TC, Hsieh WP. HaplotypeCN: copy number haplotype inference with Hidden Markov Model and localized haplotype clustering. PLoS One 2014; 9:e96841. [PMID: 24849202 PMCID: PMC4029584 DOI: 10.1371/journal.pone.0096841] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2013] [Accepted: 04/11/2014] [Indexed: 11/18/2022] Open
Abstract
Copy number variation (CNV) has been reported to be associated with disease and various cancers. Hence, identifying the accurate position and the type of CNV is currently a critical issue. There are many tools targeting on detecting CNV regions, constructing haplotype phases on CNV regions, or estimating the numerical copy numbers. However, none of them can do all of the three tasks at the same time. This paper presents a method based on Hidden Markov Model to detect parent specific copy number change on both chromosomes with signals from SNP arrays. A haplotype tree is constructed with dynamic branch merging to model the transition of the copy number status of the two alleles assessed at each SNP locus. The emission models are constructed for the genotypes formed with the two haplotypes. The proposed method can provide the segmentation points of the CNV regions as well as the haplotype phasing for the allelic status on each chromosome. The estimated copy numbers are provided as fractional numbers, which can accommodate the somatic mutation in cancer specimens that usually consist of heterogeneous cell populations. The algorithm is evaluated on simulated data and the previously published regions of CNV of the 270 HapMap individuals. The results were compared with five popular methods: PennCNV, genoCN, COKGEN, QuantiSNP and cnvHap. The application on oral cancer samples demonstrates how the proposed method can facilitate clinical association studies. The proposed algorithm exhibits comparable sensitivity of the CNV regions to the best algorithm in our genome-wide study and demonstrates the highest detection rate in SNP dense regions. In addition, we provide better haplotype phasing accuracy than similar approaches. The clinical association carried out with our fractional estimate of copy numbers in the cancer samples provides better detection power than that with integer copy number states.
Collapse
Affiliation(s)
- Yen-Jen Lin
- Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan
| | - Yu-Tin Chen
- Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan
| | - Shu-Ni Hsu
- Institute of Statistics, National Tsing Hua University, Hsinchu, Taiwan
| | - Chien-Hua Peng
- Department of Resource Center for Clinical Research, Chang Gung Memorial Hospital, Taoyuan, Taiwan
| | - Chuan-Yi Tang
- Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan
- Department of Computer Science and Information Engineering, Providence University, Taichung, Taiwan
| | - Tzu-Chen Yen
- Head and Neck Oncology Group, Chang Gung Memorial Hospital, Taoyuan, Taiwan
- Nuclear Medicine and Molecular Imaging Center, Chang Gung Memorial Hospital, Taoyuan, Taiwan
| | - Wen-Ping Hsieh
- Institute of Statistics, National Tsing Hua University, Hsinchu, Taiwan
- * E-mail:
| |
Collapse
|
8
|
Mok S, Liong KY, Lim EH, Huang X, Zhu L, Preiser PR, Bozdech Z. Structural polymorphism in the promoter of pfmrp2 confers Plasmodium falciparum tolerance to quinoline drugs. Mol Microbiol 2014; 91:918-934. [PMID: 24372851 PMCID: PMC4286016 DOI: 10.1111/mmi.12505] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/23/2013] [Indexed: 12/17/2022]
Abstract
Drug resistance in Plasmodium falciparum remains a challenge for the malaria eradication programmes around the world. With the emergence of artemisinin resistance, the efficacy of the partner drugs in the artemisinin combination therapies (ACT) that include quinoline-based drugs is becoming critical. So far only few resistance markers have been identified from which only two transmembrane transporters namely PfMDR1 (an ATP-binding cassette transporter) and PfCRT (a drug-metabolite transporter) have been experimentally verified. Another P. falciparum transporter, the ATP-binding cassette containing multidrug resistance-associated protein (PfMRP2) represents an additional possible factor of drug resistance in P. falciparum. In this study, we identified a parasite clone that is derived from the 3D7 P. falciparum strain and shows increased resistance to chloroquine, mefloquine and quinine through the trophozoite and schizont stages. We demonstrate that the resistance phenotype is caused by a 4.1 kb deletion in the 5' upstream region of the pfmrp2 gene that leads to an alteration in the pfmrp2 transcription and thus increased level of PfMRP2 protein. These results also suggest the importance of putative promoter elements in regulation of gene expression during the P. falciparum intra-erythrocytic developmental cycle and the potential of genetic polymorphisms within these regions to underlie drug resistance.
Collapse
Affiliation(s)
- Sachel Mok
- School of Biological Sciences, Nanyang Technological UniversitySingapore
| | - Kek-Yee Liong
- School of Biological Sciences, Nanyang Technological UniversitySingapore
| | - Eng-How Lim
- School of Biological Sciences, Nanyang Technological UniversitySingapore
| | - Ximei Huang
- School of Biological Sciences, Nanyang Technological UniversitySingapore
| | - Lei Zhu
- School of Biological Sciences, Nanyang Technological UniversitySingapore
| | | | - Zbynek Bozdech
- School of Biological Sciences, Nanyang Technological UniversitySingapore
| |
Collapse
|
9
|
Younkin SG, Scharpf RB, Schwender H, Parker MM, Scott AF, Marazita ML, Beaty TH, Ruczinski I. A genome-wide study of de novo deletions identifies a candidate locus for non-syndromic isolated cleft lip/palate risk. BMC Genet 2014; 15:24. [PMID: 24528994 PMCID: PMC3929298 DOI: 10.1186/1471-2156-15-24] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2013] [Accepted: 01/31/2014] [Indexed: 01/25/2023] Open
Abstract
Background Copy number variants (CNVs) may play an important part in the development of common birth defects such as oral clefts, and individual patients with multiple birth defects (including clefts) have been shown to carry small and large chromosomal deletions. In this paper we investigate de novo deletions defined as DNA segments missing in an oral cleft proband but present in both unaffected parents. We compare de novo deletion frequencies in children of European ancestry with an isolated, non-syndromic oral cleft to frequencies in children of European ancestry from randomly sampled trios. Results We identified a genome-wide significant 62 kilo base (kb) non-coding region on chromosome 7p14.1 where de novo deletions occur more frequently among oral cleft cases than controls. We also observed wider de novo deletions among cleft lip and palate (CLP) cases than seen among cleft palate (CP) and cleft lip (CL) cases. Conclusions This study presents a region where de novo deletions appear to be involved in the etiology of oral clefts, although the underlying biological mechanisms are still unknown. Larger de novo deletions are more likely to interfere with normal craniofacial development and may result in more severe clefts. Study protocol and sample DNA source can severely affect estimates of de novo deletion frequencies. Follow-up studies are needed to further validate these findings and to potentially identify additional structural variants underlying oral clefts.
Collapse
Affiliation(s)
- Samuel G Younkin
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore MD, USA.
| | | | | | | | | | | | | | | |
Collapse
|
10
|
Vandeweyer G, Kooy RF. Detection and interpretation of genomic structural variation in health and disease. Expert Rev Mol Diagn 2014; 13:61-82. [DOI: 10.1586/erm.12.119] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
11
|
Fowler KE, Pong-Wong R, Bauer J, Clemente EJ, Reitter CP, Affara NA, Waite S, Walling GA, Griffin DK. Genome wide analysis reveals single nucleotide polymorphisms associated with fatness and putative novel copy number variants in three pig breeds. BMC Genomics 2013; 14:784. [PMID: 24225222 PMCID: PMC3879217 DOI: 10.1186/1471-2164-14-784] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2013] [Accepted: 10/29/2013] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Obesity, excess fat tissue in the body, can underlie a variety of medical complaints including heart disease, stroke and cancer. The pig is an excellent model organism for the study of various human disorders, including obesity, as well as being the foremost agricultural species. In order to identify genetic variants associated with fatness, we used a selective genomic approach sampling DNA from animals at the extreme ends of the fat and lean spectrum using estimated breeding values derived from a total population size of over 70,000 animals. DNA from 3 breeds (Sire Line Large White, Duroc and a white Pietrain composite line (Titan)) was used to interrogate the Illumina Porcine SNP60 Genotyping Beadchip in order to identify significant associations in terms of single nucleotide polymorphisms (SNPs) and copy number variants (CNVs). RESULTS By sampling animals at each end of the fat/lean EBV (estimate breeding value) spectrum the whole population could be assessed using less than 300 animals, without losing statistical power. Indeed, several significant SNPs (at the 5% genome wide significance level) were discovered, 4 of these linked to genes with ontologies that had previously been correlated with fatness (NTS, FABP6, SST and NR3C2). Quantitative analysis of the data identified putative CNV regions containing genes whose ontology suggested fatness related functions (MCHR1, PPARα, SLC5A1 and SLC5A4). CONCLUSIONS Selective genotyping of EBVs at either end of the phenotypic spectrum proved to be a cost effective means of identifying SNPs and CNVs associated with fatness and with estimated major effects in a large population of animals.
Collapse
Affiliation(s)
- Katie E Fowler
- School of Biosciences, University of Kent, Canterbury, Kent CT2 7NH, UK
| | - Ricardo Pong-Wong
- Roslin Institute, The University of Edinburgh, Roslin Biocentre, Midlothian, Scotland EH25 9PS, UK
| | - Julien Bauer
- Department of Pathology, University of Cambridge, Tennis Court Road, Cambridge CB2 1QP, UK
| | - Emily J Clemente
- Department of Pathology, University of Cambridge, Tennis Court Road, Cambridge CB2 1QP, UK
| | - Christopher P Reitter
- Department of Pathology, University of Cambridge, Tennis Court Road, Cambridge CB2 1QP, UK
| | - Nabeel A Affara
- Department of Pathology, University of Cambridge, Tennis Court Road, Cambridge CB2 1QP, UK
| | - Stephen Waite
- JSR Genetics, Southburn, Driffield, East Yorkshirea YO25 9ED, UK
| | - Grant A Walling
- JSR Genetics, Southburn, Driffield, East Yorkshirea YO25 9ED, UK
| | - Darren K Griffin
- School of Biosciences, University of Kent, Canterbury, Kent CT2 7NH, UK
| |
Collapse
|
12
|
Subramanian A, Shackney S, Schwartz R. Novel multisample scheme for inferring phylogenetic markers from whole genome tumor profiles. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:1422-1431. [PMID: 24407301 PMCID: PMC3830698 DOI: 10.1109/tcbb.2013.33] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Computational cancer phylogenetics seeks to enumerate the temporal sequences of aberrations in tumor evolution, thereby delineating the evolution of possible tumor progression pathways, molecular subtypes, and mechanisms of action. We previously developed a pipeline for constructing phylogenies describing evolution between major recurring cell types computationally inferred from whole-genome tumor profiles. The accuracy and detail of the phylogenies, however, depend on the identification of accurate, high-resolution molecular markers of progression, i.e., reproducible regions of aberration that robustly differentiate different subtypes and stages of progression. Here, we present a novel hidden Markov model (HMM) scheme for the problem of inferring such phylogenetically significant markers through joint segmentation and calling of multisample tumor data. Our method classifies sets of genome-wide DNA copy number measurements into a partitioning of samples into normal (diploid) or amplified at each probe. It differs from other similar HMM methods in its design specifically for the needs of tumor phylogenetics, by seeking to identify robust markers of progression conserved across a set of copy number profiles. We show an analysis of our method in comparison to other methods on both synthetic and real tumor data, which confirms its effectiveness for tumor phylogeny inference and suggests avenues for future advances.
Collapse
Affiliation(s)
- Ayshwarya Subramanian
- Graduate student at the Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, 15213.
| | | | | |
Collapse
|
13
|
Lopes AM, Aston KI, Thompson E, Carvalho F, Gonçalves J, Huang N, Matthiesen R, Noordam MJ, Quintela I, Ramu A, Seabra C, Wilfert AB, Dai J, Downie JM, Fernandes S, Guo X, Sha J, Amorim A, Barros A, Carracedo A, Hu Z, Hurles ME, Moskovtsev S, Ober C, Paduch DA, Schiffman JD, Schlegel PN, Sousa M, Carrell DT, Conrad DF. Human spermatogenic failure purges deleterious mutation load from the autosomes and both sex chromosomes, including the gene DMRT1. PLoS Genet 2013; 9:e1003349. [PMID: 23555275 PMCID: PMC3605256 DOI: 10.1371/journal.pgen.1003349] [Citation(s) in RCA: 97] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2012] [Accepted: 01/17/2013] [Indexed: 01/17/2023] Open
Abstract
Gonadal failure, along with early pregnancy loss and perinatal death, may be an important filter that limits the propagation of harmful mutations in the human population. We hypothesized that men with spermatogenic impairment, a disease with unknown genetic architecture and a common cause of male infertility, are enriched for rare deleterious mutations compared to men with normal spermatogenesis. After assaying genomewide SNPs and CNVs in 323 Caucasian men with idiopathic spermatogenic impairment and more than 1,100 controls, we estimate that each rare autosomal deletion detected in our study multiplicatively changes a man's risk of disease by 10% (OR 1.10 [1.04–1.16], p<2×10−3), rare X-linked CNVs by 29%, (OR 1.29 [1.11–1.50], p<1×10−3), and rare Y-linked duplications by 88% (OR 1.88 [1.13–3.13], p<0.03). By contrasting the properties of our case-specific CNVs with those of CNV callsets from cases of autism, schizophrenia, bipolar disorder, and intellectual disability, we propose that the CNV burden in spermatogenic impairment is distinct from the burden of large, dominant mutations described for neurodevelopmental disorders. We identified two patients with deletions of DMRT1, a gene on chromosome 9p24.3 orthologous to the putative sex determination locus of the avian ZW chromosome system. In an independent sample of Han Chinese men, we identified 3 more DMRT1 deletions in 979 cases of idiopathic azoospermia and none in 1,734 controls, and found none in an additional 4,519 controls from public databases. The combined results indicate that DMRT1 loss-of-function mutations are a risk factor and potential genetic cause of human spermatogenic failure (frequency of 0.38% in 1306 cases and 0% in 7,754 controls, p = 6.2×10−5). Our study identifies other recurrent CNVs as potential causes of idiopathic azoospermia and generates hypotheses for directing future studies on the genetic basis of male infertility and IVF outcomes. Infertility is a disease that prevents the transmission of DNA from one generation to the next, and consequently it has been difficult to study the genetics of infertility using classical human genetics methods. Now, new technologies for screening entire genomes for rare and patient-specific mutations are revolutionizing our understanding of reproductively lethal diseases. Here, we apply techniques for variation discovery to study a condition called azoospermia, the failure to produce sperm. Large deletions of the Y chromosome are the primary known genetic risk factor for azoospermia, and genetic testing for these deletions is part of the standard treatment for this condition. We have screened over 300 men with azoospermia for rare deletions and duplications, and find an enrichment of these mutations throughout the genome compared to unaffected men. Our results indicate that sperm production is affected by mutations beyond the Y chromosome and will motivate whole-genome analyses of larger numbers of men with impaired spermatogenesis. Our finding of an enrichment of rare deleterious mutations in men with poor sperm production also raises the possibility that the slightly increased rate of birth defects reported in children conceived by in vitro fertilization may have a genetic basis.
Collapse
Affiliation(s)
- Alexandra M. Lopes
- Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP), Porto, Portugal
- * E-mail: (AML); (DFC)
| | - Kenneth I. Aston
- Andrology and IVF Laboratories, Department of Surgery, University of Utah School of Medicine, Salt Lake City, Utah, United States of America
| | - Emma Thompson
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
| | - Filipa Carvalho
- Department of Genetics, Faculty of Medicine, University of Porto, Porto, Portugal
| | - João Gonçalves
- Department of Human Genetics, National Institute of Health Dr. Ricardo Jorge, Lisbon, Portugal
| | - Ni Huang
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Rune Matthiesen
- Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP), Porto, Portugal
| | - Michiel J. Noordam
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Inés Quintela
- Genomics Medicine Group, National Genotyping Center, University of Santiago de Compostela, Santiago de Compostela, Spain
| | - Avinash Ramu
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Catarina Seabra
- Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP), Porto, Portugal
| | - Amy B. Wilfert
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Juncheng Dai
- Department of Epidemiology and Biostatistics and Key Laboratory of Modern Toxicology of Ministry of Education, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Jonathan M. Downie
- Department of Oncological Sciences, Huntsman Cancer Institute, University of Utah School of Medicine, Salt Lake City, Utah, United States of America
| | - Susana Fernandes
- Department of Genetics, Faculty of Medicine, University of Porto, Porto, Portugal
| | - Xuejiang Guo
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, China
- Department of Histology and Embryology, Nanjing Medical University, Nanjing, China
| | - Jiahao Sha
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, China
- Department of Histology and Embryology, Nanjing Medical University, Nanjing, China
| | - António Amorim
- Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP), Porto, Portugal
- Faculty of Sciences, University of Porto, Porto, Portugal
| | - Alberto Barros
- Department of Genetics, Faculty of Medicine, University of Porto, Porto, Portugal
- Centre for Reproductive Genetics Alberto Barros, Porto, Portugal
| | - Angel Carracedo
- Genomics Medicine Group, National Genotyping Center, University of Santiago de Compostela, Santiago de Compostela, Spain
- Galician Foundation of Genomic Medicine and University of Santiago de Compostela, CIBERER, Santiago de Compostela, Spain
| | - Zhibin Hu
- Department of Epidemiology and Biostatistics and Key Laboratory of Modern Toxicology of Ministry of Education, School of Public Health, Nanjing Medical University, Nanjing, China
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, China
| | - Matthew E. Hurles
- Genome Mutation and Genetic Disease Group, Wellcome Trust Sanger Institute, Cambridge, United Kingdom
| | - Sergey Moskovtsev
- CReATe Fertility Center, University of Toronto, Toronto, Canada
- Department of Obstetrics and Gynaecology, University of Toronto, Toronto, Canada
| | - Carole Ober
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
- Department of Obstetrics and Gynecology, University of Chicago, Chicago, Illinois, United States of America
| | - Darius A. Paduch
- Department of Urology, Weill Cornell Medical College, New York-Presbyterian Hospital, New York, New York, United States of America
| | - Joshua D. Schiffman
- Department of Oncological Sciences, Huntsman Cancer Institute, University of Utah School of Medicine, Salt Lake City, Utah, United States of America
- Center for Children's Cancer Research (C3R), Huntsman Cancer Institute, University of Utah School of Medicine, Salt Lake City, Utah, United States of America
- Division of Pediatric Hematology/Oncology, Huntsman Cancer Institute, University of Utah School of Medicine, Salt Lake City, Utah, United States of America
| | - Peter N. Schlegel
- Department of Urology, Weill Cornell Medical College, New York-Presbyterian Hospital, New York, New York, United States of America
| | - Mário Sousa
- Laboratory of Cell Biology, UMIB, ICBAS, University of Porto, Porto, Portugal
| | - Douglas T. Carrell
- Andrology and IVF Laboratories, Department of Surgery, University of Utah School of Medicine, Salt Lake City, Utah, United States of America
- Department of Physiology, University of Utah School of Medicine, Salt Lake City, Utah, United States of America
- Department of Obstetrics and Gynecology, University of Utah School of Medicine, Salt Lake City, Utah, United States of America
| | - Donald F. Conrad
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, United States of America
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, Missouri, United States of America
- * E-mail: (AML); (DFC)
| |
Collapse
|
14
|
Zhou X, Yang C, Wan X, Zhao H, Yu W. Multisample aCGH data analysis via total variation and spectral regularization. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:230-235. [PMID: 23702561 PMCID: PMC3715577 DOI: 10.1109/tcbb.2012.166] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
DNA copy number variation (CNV) accounts for a large proportion of genetic variation. One commonly used approach to detecting CNVs is array-based comparative genomic hybridization (aCGH). Although many methods have been proposed to analyze aCGH data, it is not clear how to combine information from multiple samples to improve CNV detection. In this paper, we propose to use a matrix to approximate the multisample aCGH data and minimize the total variation of each sample as well as the nuclear norm of the whole matrix. In this way, we can make use of the smoothness property of each sample and the correlation among multiple samples simultaneously in a convex optimization framework. We also developed an efficient and scalable algorithm to handle large-scale data. Experiments demonstrate that the proposed method outperforms the state-of-the-art techniques under a wide range of scenarios and it is capable of processing large data sets with millions of probes.
Collapse
Affiliation(s)
- Xiaowei Zhou
- Department of Electronic and Computer Engineering, Hong Kong University of Science and Technology, Hong Kong, China.
| | | | | | | | | |
Collapse
|
15
|
Rueda OM, Rueda C, Diaz-Uriarte R. A Bayesian HMM with random effects and an unknown number of states for DNA copy number analysis. J STAT COMPUT SIM 2013. [DOI: 10.1080/00949655.2011.609818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
16
|
Pronold M, Vali M, Pique-Regi R, Asgharzadeh S. Copy number variation signature to predict human ancestry. BMC Bioinformatics 2012; 13:336. [PMID: 23270563 PMCID: PMC3598683 DOI: 10.1186/1471-2105-13-336] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2012] [Accepted: 12/06/2012] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND Copy number variations (CNVs) are genomic structural variants that are found in healthy populations and have been observed to be associated with disease susceptibility. Existing methods for CNV detection are often performed on a sample-by-sample basis, which is not ideal for large datasets where common CNVs must be estimated by comparing the frequency of CNVs in the individual samples. Here we describe a simple and novel approach to locate genome-wide CNVs common to a specific population, using human ancestry as the phenotype. RESULTS We utilized our previously published Genome Alteration Detection Analysis (GADA) algorithm to identify common ancestry CNVs (caCNVs) and built a caCNV model to predict population structure. We identified a 73 caCNV signature using a training set of 225 healthy individuals from European, Asian, and African ancestry. The signature was validated on an independent test set of 300 individuals with similar ancestral background. The error rate in predicting ancestry in this test set was 2% using the 73 caCNV signature. Among the caCNVs identified, several were previously confirmed experimentally to vary by ancestry. Our signature also contains a caCNV region with a single microRNA (MIR270), which represents the first reported variation of microRNA by ancestry. CONCLUSIONS We developed a new methodology to identify common CNVs and demonstrated its performance by building a caCNV signature to predict human ancestry with high accuracy. The utility of our approach could be extended to large case-control studies to identify CNV signatures for other phenotypes such as disease susceptibility and drug response.
Collapse
Affiliation(s)
- Melissa Pronold
- Department of Pediatrics, Children's Hospital Los Angeles and The Saban Research Institute, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | | | | | | |
Collapse
|
17
|
Scharpf RB, Beaty TH, Schwender H, Younkin SG, Scott AF, Ruczinski I. Fast detection of de novo copy number variants from SNP arrays for case-parent trios. BMC Bioinformatics 2012; 13:330. [PMID: 23234608 PMCID: PMC3576329 DOI: 10.1186/1471-2105-13-330] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2011] [Accepted: 12/07/2012] [Indexed: 11/10/2022] Open
Abstract
Background In studies of case-parent trios, we define copy number variants (CNVs) in the offspring that differ from the parental copy numbers as de novo and of interest for their potential functional role in disease. Among the leading array-based methods for discovery of de novo CNVs in case-parent trios is the joint hidden Markov model (HMM) implemented in the PennCNV software. However, the computational demands of the joint HMM are substantial and the extent to which false positive identifications occur in case-parent trios has not been well described. We evaluate these issues in a study of oral cleft case-parent trios. Results Our analysis of the oral cleft trios reveals that genomic waves represent a substantial source of false positive identifications in the joint HMM, despite a wave-correction implementation in PennCNV. In addition, the noise of low-level summaries of relative copy number (log R ratios) is strongly associated with batch and correlated with the frequency of de novo CNV calls. Exploiting the trio design, we propose a univariate statistic for relative copy number referred to as the minimum distance that can reduce technical variation from probe effects and genomic waves. We use circular binary segmentation to segment the minimum distance and maximum a posteriori estimation to infer de novo CNVs from the segmented genome. Compared to PennCNV on simulated data, MinimumDistance identifies fewer false positives on average and is comparable to PennCNV with respect to false negatives. Genomic waves contribute to discordance of PennCNV and MinimumDistance for high coverage de novo calls, while highly concordant calls on chromosome 22 were validated by quantitative PCR. Computationally, MinimumDistance provides a nearly 8-fold increase in speed relative to the joint HMM in a study of oral cleft trios. Conclusions Our results indicate that batch effects and genomic waves are important considerations for case-parent studies of de novo CNV, and that the minimum distance is an effective statistic for reducing technical variation contributing to false de novo discoveries. Coupled with segmentation and maximum a posteriori estimation, our algorithm compares favorably to the joint HMM with MinimumDistance being much faster.
Collapse
Affiliation(s)
- Robert B Scharpf
- Department of Oncology, Johns Hopkins University, Baltimore, MD, USA.
| | | | | | | | | | | |
Collapse
|
18
|
Valsesia A, Stevenson BJ, Waterworth D, Mooser V, Vollenweider P, Waeber G, Jongeneel CV, Beckmann JS, Kutalik Z, Bergmann S. Identification and validation of copy number variants using SNP genotyping arrays from a large clinical cohort. BMC Genomics 2012; 13:241. [PMID: 22702538 PMCID: PMC3464625 DOI: 10.1186/1471-2164-13-241] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2011] [Accepted: 06/15/2012] [Indexed: 01/11/2023] Open
Abstract
BACKGROUND Genotypes obtained with commercial SNP arrays have been extensively used in many large case-control or population-based cohorts for SNP-based genome-wide association studies for a multitude of traits. Yet, these genotypes capture only a small fraction of the variance of the studied traits. Genomic structural variants (GSV) such as Copy Number Variation (CNV) may account for part of the missing heritability, but their comprehensive detection requires either next-generation arrays or sequencing. Sophisticated algorithms that infer CNVs by combining the intensities from SNP-probes for the two alleles can already be used to extract a partial view of such GSV from existing data sets. RESULTS Here we present several advances to facilitate the latter approach. First, we introduce a novel CNV detection method based on a Gaussian Mixture Model. Second, we propose a new algorithm, PCA merge, for combining copy-number profiles from many individuals into consensus regions. We applied both our new methods as well as existing ones to data from 5612 individuals from the CoLaus study who were genotyped on Affymetrix 500K arrays. We developed a number of procedures in order to evaluate the performance of the different methods. This includes comparison with previously published CNVs as well as using a replication sample of 239 individuals, genotyped with Illumina 550K arrays. We also established a new evaluation procedure that employs the fact that related individuals are expected to share their CNVs more frequently than randomly selected individuals. The ability to detect both rare and common CNVs provides a valuable resource that will facilitate association studies exploring potential phenotypic associations with CNVs. CONCLUSION Our new methodologies for CNV detection and their evaluation will help in extracting additional information from the large amount of SNP-genotyping data on various cohorts and use this to explore structural variants and their impact on complex traits.
Collapse
Affiliation(s)
- Armand Valsesia
- Department of Medical Genetics, University of Lausanne, Lausanne, Switzerland
| | | | | | | | | | | | | | | | | | | |
Collapse
|
19
|
Wang Q, Peng P, Qian M, Wan L, Deng M. Hybridization and amplification rate correction for affymetrix SNP arrays. BMC Med Genomics 2012; 5:24. [PMID: 22691279 PMCID: PMC3428662 DOI: 10.1186/1755-8794-5-24] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2012] [Accepted: 06/12/2012] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Copy number variation (CNV) is essential to understand the pathology of many complex diseases at the DNA level. Affymetrix SNP arrays, which are widely used for CNV studies, significantly depend on accurate copy number (CN) estimation. Nevertheless, CN estimation may be biased by several factors, including cross-hybridization and training sample batch, as well as genomic waves of intensities induced by sequence-dependent hybridization rate and amplification efficiency. Since many available algorithms only address one or two of the three factors, a high false discovery rate (FDR) often results when identifying CNV. Therefore, we have developed a new CNV detection pipeline which is based on hybridization and amplification rate correction (CNVhac). METHODS CNVhac first estimates the allelic concentrations (ACs) of target sequences by using the sample independent parameters trained through physicochemical hybridization law. Then the raw CN is estimated by taking the ratio of AC to the corresponding average AC from a reference sample set for one specific site. Finally, a hidden Markov model (HMM) segmentation process is implemented to detect CNV regions. RESULTS Based on public HapMap data, the results show that CNVhac effectively smoothes the genomic waves and facilitates more accurate raw CN estimates compared to other methods. Moreover, CNVhac alleviates, to a certain extent, the sample dependence of inference and makes CNV calling with appreciable low FDRs. CONCLUSION CNVhac is an effective approach to address the common difficulties in SNP array analysis, and the working principles of CNVhac can be easily extended to other platforms.
Collapse
Affiliation(s)
- Quan Wang
- Center for Theoretical Biology, Peking University, Beijing 100871, People's Republic of China
| | | | | | | | | |
Collapse
|
20
|
Ahn J, Yoon Y, Park C, Park S. CNV detection method optimized for high-resolution arrayCGH by normality test. Comput Biol Med 2012; 42:468-73. [DOI: 10.1016/j.compbiomed.2011.12.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2010] [Revised: 12/05/2011] [Accepted: 12/27/2011] [Indexed: 11/24/2022]
|
21
|
Park C, Ahn J, Yoon Y, Park S. A multi-sample based method for identifying common CNVs in normal human genomic structure using high-resolution aCGH data. PLoS One 2011; 6:e26975. [PMID: 22073121 PMCID: PMC3205051 DOI: 10.1371/journal.pone.0026975] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2011] [Accepted: 10/07/2011] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND It is difficult to identify copy number variations (CNV) in normal human genomic data due to noise and non-linear relationships between different genomic regions and signal intensity. A high-resolution array comparative genomic hybridization (aCGH) containing 42 million probes, which is very large compared to previous arrays, was recently published. Most existing CNV detection algorithms do not work well because of noise associated with the large amount of input data and because most of the current methods were not designed to analyze normal human samples. Normal human genome analysis often requires a joint approach across multiple samples. However, the majority of existing methods can only identify CNVs from a single sample. METHODOLOGY AND PRINCIPAL FINDINGS We developed a multi-sample-based genomic variations detector (MGVD) that uses segmentation to identify common breakpoints across multiple samples and a k-means-based clustering strategy. Unlike previous methods, MGVD simultaneously considers multiple samples with different genomic intensities and identifies CNVs and CNV zones (CNVZs); CNVZ is a more precise measure of the location of a genomic variant than the CNV region (CNVR). CONCLUSIONS AND SIGNIFICANCE We designed a specialized algorithm to detect common CNVs from extremely high-resolution multi-sample aCGH data. MGVD showed high sensitivity and a low false discovery rate for a simulated data set, and outperformed most current methods when real, high-resolution HapMap datasets were analyzed. MGVD also had the fastest runtime compared to the other algorithms evaluated when actual, high-resolution aCGH data were analyzed. The CNVZs identified by MGVD can be used in association studies for revealing relationships between phenotypes and genomic aberrations. Our algorithm was developed with standard C++ and is available in Linux and MS Windows format in the STL library. It is freely available at: http://embio.yonsei.ac.kr/~Park/mgvd.php.
Collapse
Affiliation(s)
- Chihyun Park
- Department of Computer Science, Yonsei University, Seoul, South Korea
| | - Jaegyoon Ahn
- Department of Computer Science, Yonsei University, Seoul, South Korea
| | - Youngmi Yoon
- Division of Information Engineering, Gachon University of Medicine and Science, Incheon, South Korea
| | - Sanghyun Park
- Department of Computer Science, Yonsei University, Seoul, South Korea
| |
Collapse
|
22
|
Single-cell copy number variation detection. Genome Biol 2011; 12:R80. [PMID: 21854607 PMCID: PMC3245619 DOI: 10.1186/gb-2011-12-8-r80] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2011] [Revised: 08/09/2011] [Accepted: 08/19/2011] [Indexed: 12/15/2022] Open
Abstract
Detection of chromosomal aberrations from a single cell by array comparative genomic hybridization (single-cell array CGH), instead of from a population of cells, is an emerging technique. However, such detection is challenging because of the genome artifacts and the DNA amplification process inherent to the single cell approach. Current normalization algorithms result in inaccurate aberration detection for single-cell data. We propose a normalization method based on channel, genome composition and recurrent genome artifact corrections. We demonstrate that the proposed channel clone normalization significantly improves the copy number variation detection in both simulated and real single-cell array CGH data.
Collapse
|
23
|
Morganella S, Pagnotta SM, Ceccarelli M. Finding recurrent copy number alterations preserving within-sample homogeneity. ACTA ACUST UNITED AC 2011; 27:2949-56. [PMID: 21873327 DOI: 10.1093/bioinformatics/btr488] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
MOTIVATION Copy number alterations (CNAs) represent an important component of genetic variation and play a significant role in many human diseases. Development of array comparative genomic hybridization (aCGH) technology has made it possible to identify CNAs. Identification of recurrent CNAs represents the first fundamental step to provide a list of genomic regions which form the basis for further biological investigations. The main problem in recurrent CNAs discovery is related to the need to distinguish between functional changes and random events without pathological relevance. Within-sample homogeneity represents a common feature of copy number profile in cancer, so it can be used as additional source of information to increase the accuracy of the results. Although several algorithms aimed at the identification of recurrent CNAs have been proposed, no attempt of a comprehensive comparison of different approaches has yet been published. RESULTS We propose a new approach, called Genomic Analysis of Important Alterations (GAIA), to find recurrent CNAs where a statistical hypothesis framework is extended to take into account within-sample homogeneity. Statistical significance and within-sample homogeneity are combined into an iterative procedure to extract the regions that likely are involved in functional changes. Results show that GAIA represents a valid alternative to other proposed approaches. In addition, we perform an accurate comparison by using two real aCGH datasets and a carefully planned simulation study. AVAILABILITY GAIA has been implemented as R/Bioconductor package. It can be downloaded from the following page http://bioinformatics.biogem.it/download/gaia. CONTACT ceccarelli@unisannio.it; morganella@unisannio.it. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sandro Morganella
- Department of Science, University of Sannio, 82100, Benevento, Italy.
| | | | | |
Collapse
|
24
|
Mok S, Imwong M, Mackinnon MJ, Sim J, Ramadoss R, Yi P, Mayxay M, Chotivanich K, Liong KY, Russell B, Socheat D, Newton PN, Day NPJ, White NJ, Preiser PR, Nosten F, Dondorp AM, Bozdech Z. Artemisinin resistance in Plasmodium falciparum is associated with an altered temporal pattern of transcription. BMC Genomics 2011; 12:391. [PMID: 21810278 PMCID: PMC3163569 DOI: 10.1186/1471-2164-12-391] [Citation(s) in RCA: 109] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2011] [Accepted: 08/03/2011] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Artemisinin resistance in Plasmodium falciparum malaria has emerged in Western Cambodia. This is a major threat to global plans to control and eliminate malaria as the artemisinins are a key component of antimalarial treatment throughout the world. To identify key features associated with the delayed parasite clearance phenotype, we employed DNA microarrays to profile the physiological gene expression pattern of the resistant isolates. RESULTS In the ring and trophozoite stages, we observed reduced expression of many basic metabolic and cellular pathways which suggests a slower growth and maturation of these parasites during the first half of the asexual intraerythrocytic developmental cycle (IDC). In the schizont stage, there is an increased expression of essentially all functionalities associated with protein metabolism which indicates the prolonged and thus increased capacity of protein synthesis during the second half of the resistant parasite IDC. This modulation of the P. falciparum intraerythrocytic transcriptome may result from differential expression of regulatory proteins such as transcription factors or chromatin remodeling associated proteins. In addition, there is a unique and uniform copy number variation pattern in the Cambodian parasites which may represent an underlying genetic background that contributes to the resistance phenotype. CONCLUSIONS The decreased metabolic activities in the ring stages are consistent with previous suggestions of higher resilience of the early developmental stages to artemisinin. Moreover, the increased capacity of protein synthesis and protein turnover in the schizont stage may contribute to artemisinin resistance by counteracting the protein damage caused by the oxidative stress and/or protein alkylation effect of this drug. This study reports the first global transcriptional survey of artemisinin resistant parasites and provides insight to the complexities of the molecular basis of pathogens with drug resistance phenotypes in vivo.
Collapse
Affiliation(s)
- Sachel Mok
- School of Biological Sciences, Nanyang Technological University, Singapore
| | - Mallika Imwong
- Department of Molecular Tropical Medicine and Genetics, Faculty of Tropical Medicine, Mahidol University, Thailand
- Mahidol-Oxford Research Unit, Faculty of Tropical Medicine, Mahidol University, Thailand
| | | | - Joan Sim
- School of Biological Sciences, Nanyang Technological University, Singapore
| | - Ramya Ramadoss
- School of Biological Sciences, Nanyang Technological University, Singapore
| | - Poravuth Yi
- The National Center for Parasitology, Entomology, and Malaria Control, Phnom Penh, Cambodia
| | - Mayfong Mayxay
- Wellcome Trust-Mahosot Hospital-Oxford University Tropical Medicine Research Collaboration, Mahosot Hospital, Vientiane, Lao People's Democratic Republic
- Faculty of Postgraduate Studies and Research, University of Health Sciences, Vientiane, Lao People's Democratic Republic
| | - Kesinee Chotivanich
- Department of Molecular Tropical Medicine and Genetics, Faculty of Tropical Medicine, Mahidol University, Thailand
| | - Kek-Yee Liong
- School of Biological Sciences, Nanyang Technological University, Singapore
| | - Bruce Russell
- Singapore Immunology Network, Biopolis, Agency for Science Technology and Research (ASTAR), Singapore
| | - Duong Socheat
- The National Center for Parasitology, Entomology, and Malaria Control, Phnom Penh, Cambodia
| | - Paul N Newton
- Wellcome Trust-Mahosot Hospital-Oxford University Tropical Medicine Research Collaboration, Mahosot Hospital, Vientiane, Lao People's Democratic Republic
- Centre for Clinical Vaccinology and Tropical Medicine, Churchill Hospital, Oxford, UK
| | - Nicholas PJ Day
- Mahidol-Oxford Research Unit, Faculty of Tropical Medicine, Mahidol University, Thailand
- Centre for Clinical Vaccinology and Tropical Medicine, Churchill Hospital, Oxford, UK
| | - Nicholas J White
- Mahidol-Oxford Research Unit, Faculty of Tropical Medicine, Mahidol University, Thailand
- Centre for Clinical Vaccinology and Tropical Medicine, Churchill Hospital, Oxford, UK
| | - Peter R Preiser
- School of Biological Sciences, Nanyang Technological University, Singapore
| | - François Nosten
- Centre for Clinical Vaccinology and Tropical Medicine, Churchill Hospital, Oxford, UK
- Shoklo Malaria Research Unit, Mae Sot, Thailand
| | - Arjen M Dondorp
- Mahidol-Oxford Research Unit, Faculty of Tropical Medicine, Mahidol University, Thailand
- Centre for Clinical Vaccinology and Tropical Medicine, Churchill Hospital, Oxford, UK
| | - Zbynek Bozdech
- School of Biological Sciences, Nanyang Technological University, Singapore
| |
Collapse
|
25
|
Halldórsson BV, Gudbjartsson DF. An algorithm for detecting high frequency copy number polymorphisms using SNP arrays. J Comput Biol 2011; 18:955-66. [PMID: 21728861 DOI: 10.1089/cmb.2010.0317] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
We present a general algorithm for the detection of genomic variants using the Illumina iSelect platform. The Illumina iSelect platform is designed to detect SNPs, but our algorithm allows for the detections of more general forms of variations, including copy number polymorphisms and microsatellites. The algorithm does not rely on a priori information of the type of polymorphism being studied and is designed to genotype call a large number of individuals simultaneously. The algorithm proceeds by initially normalizing intensity and correcting for batch effects. Then each marker is clustered using a modified Gaussian mixture model where we account for variances in the expression of an individuals and the variance measured in bead level intensities of a probe/marker pair. Finally, these clusters are used to determine genotypes. The algorithm was then run on a dataset of 35,000 Icelandic individuals.
Collapse
|
26
|
Caceres A, Basagaña X, Gonzalez JR. Multiple correspondence discriminant analysis: an application to detect stratification in copy number variation. Stat Med 2011; 29:3284-93. [PMID: 21170921 DOI: 10.1002/sim.3890] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
We illustrate the use of multiple correspondence analysis (MCA) to correct for population stratification of copy number alteration data. In addition, we propose the use of multiple correspondence discriminant analysis (MCDA) to identify an optimal set of copy number variants (CNVs) that correctly infers the population stratification of a CNV map. Within MCDA, we highlight the novel use of correlation with class directions for variable ranking. We found a set of 20 CNVs with 98 per cent predictability in a CNV map of the HapMap populations. On this sample, the selection of variables based on centroid ranking outperformed the most common practice of ranking variables with their correlation to the principal axes.
Collapse
Affiliation(s)
- Alejandro Caceres
- Center for Research in Environmental Epidemiology (CREAL), Parc de Recerca Biomedica de Barcelona, 88 Doctor Aiguader, Barcelona, Spain
| | | | | |
Collapse
|
27
|
Picard F, Lebarbier E, Hoebeke M, Rigaill G, Thiam B, Robin S. Joint segmentation, calling, and normalization of multiple CGH profiles. Biostatistics 2011; 12:413-28. [PMID: 21209153 DOI: 10.1093/biostatistics/kxq076] [Citation(s) in RCA: 76] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
The statistical analysis of array comparative genomic hybridization (CGH) data has now shifted to the joint assessment of copy number variations at the cohort level. Considering multiple profiles gives the opportunity to correct for systematic biases observed on single profiles, such as probe GC content or the so-called "wave effect." In this article, we extend the segmentation model developed in the univariate case to the joint analysis of multiple CGH profiles. Our contribution is multiple: we propose an integrated model to perform joint segmentation, normalization, and calling for multiple array CGH profiles. This model shows great flexibility, especially in the modeling of the wave effect that gives a likelihood framework to approaches proposed by others. We propose a new dynamic programming algorithm for break point positioning, as well as a model selection criterion based on a modified bayesian information criterion proposed in the univariate case. The performance of our method is assessed using simulated and real data sets. Our method is implemented in the R package cghseg.
Collapse
Affiliation(s)
- Franck Picard
- Laboratoire de Biometrie et Biologie Evolutive, UMR CNRS 5558 - Univ. Lyon 1, F-69622, Villeurbanne, France.
| | | | | | | | | | | |
Collapse
|
28
|
Warden M, Pique-Regi R, Ortega A, Asgharzadeh S. Bioinformatics for copy number variation data. Methods Mol Biol 2011; 719:235-49. [PMID: 21370087 DOI: 10.1007/978-1-61779-027-0_11] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Copy number variation is known to be an important component of structural variation in the human genome. Greater than 1 kb in size, these gains and losses of genetic material are known to confer risk to many human diseases, both Mendelian and complex. Therefore, the technologies used to detect copy number variation have been quickly improving in both throughput and cost. From comparative genomic hybridization to synthetic high-density oligonucleotide arrays to next-generation sequencing methods, algorithms used to estimate copy number are plentiful. Here we describe a practical introduction to the copy number variation technology and available analysis methods, and demonstrate the analysis flow on an example case.
Collapse
Affiliation(s)
- Melissa Warden
- Department of Pediatrics and Pathology, Keck School of Medicine, Childrens Hospital Los Angeles, University of Southern California, Los Angeles, CA, USA
| | | | | | | |
Collapse
|
29
|
Zhang ZD, Gerstein MB. Detection of copy number variation from array intensity and sequencing read depth using a stepwise Bayesian model. BMC Bioinformatics 2010; 11:539. [PMID: 21034510 PMCID: PMC2992546 DOI: 10.1186/1471-2105-11-539] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2010] [Accepted: 10/31/2010] [Indexed: 11/17/2022] Open
Abstract
Background Copy number variants (CNVs) have been demonstrated to occur at a high frequency and are now widely believed to make a significant contribution to the phenotypic variation in human populations. Array-based comparative genomic hybridization (array-CGH) and newly developed read-depth approach through ultrahigh throughput genomic sequencing both provide rapid, robust, and comprehensive methods to identify CNVs on a whole-genome scale. Results We developed a Bayesian statistical analysis algorithm for the detection of CNVs from both types of genomic data. The algorithm can analyze such data obtained from PCR-based bacterial artificial chromosome arrays, high-density oligonucleotide arrays, and more recently developed high-throughput DNA sequencing. Treating parameters--e.g., the number of CNVs, the position of each CNV, and the data noise level--that define the underlying data generating process as random variables, our approach derives the posterior distribution of the genomic CNV structure given the observed data. Sampling from the posterior distribution using a Markov chain Monte Carlo method, we get not only best estimates for these unknown parameters but also Bayesian credible intervals for the estimates. We illustrate the characteristics of our algorithm by applying it to both synthetic and experimental data sets in comparison to other segmentation algorithms. Conclusions In particular, the synthetic data comparison shows that our method is more sensitive than other approaches at low false positive rates. Furthermore, given its Bayesian origin, our method can also be seen as a technique to refine CNVs identified by fast point-estimate methods and also as a framework to integrate array-CGH and sequencing data with other CNV-related biological knowledge, all through informative priors.
Collapse
Affiliation(s)
- Zhengdong D Zhang
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461, USA.
| | | |
Collapse
|
30
|
Kukita Y, Yahara K, Tahira T, Higasa K, Sonoda M, Yamamoto K, Kato K, Wake N, Hayashi K. A definitive haplotype map as determined by genotyping duplicated haploid genomes finds a predominant haplotype preference at copy-number variation events. Am J Hum Genet 2010; 86:918-28. [PMID: 20537301 DOI: 10.1016/j.ajhg.2010.05.003] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2010] [Revised: 04/13/2010] [Accepted: 05/07/2010] [Indexed: 10/19/2022] Open
Abstract
The majority of complete hydatidiform moles (CHMs) harbor duplicated haploid genomes that originate from sperm. This makes CHMs more advantageous than conventional diploid cells for determining haplotypes of SNPs and copy-number variations (CNVs), because all of the genetic variants in a CHM genome are homozygous. Here we report SNP and CNV haplotype structures determined by analysis of 100 CHMs from Japanese subjects via high-density DNA arrays. The obtained haplotype map should be useful as a reference for the haplotype structure of Asian populations. We resolved common CNV regions (merged CNV segments across the examined samples) into CNV events (clusters of CNV segments) on the basis of mutual overlap and found that the haplotype backgrounds of different CNV events within the same CNV region were predominantly similar, perhaps because of inherent structural instability.
Collapse
|
31
|
Mei TS, Salim A, Calza S, Seng KC, Seng CK, Pawitan Y. Identification of recurrent regions of Copy-Number Variants across multiple individuals. BMC Bioinformatics 2010; 11:147. [PMID: 20307285 PMCID: PMC2851607 DOI: 10.1186/1471-2105-11-147] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2009] [Accepted: 03/22/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Algorithms and software for CNV detection have been developed, but they detect the CNV regions sample-by-sample with individual-specific breakpoints, while common CNV regions are likely to occur at the same genomic locations across different individuals in a homogenous population. Current algorithms to detect common CNV regions do not account for the varying reliability of the individual CNVs, typically reported as confidence scores by SNP-based CNV detection algorithms. General methodologies for identifying these recurrent regions, especially those directed at SNP arrays, are still needed. RESULTS In this paper, we describe two new approaches for identifying common CNV regions based on (i) the frequency of occurrence of reliable CNVs, where reliability is determined by high confidence scores, and (ii) a weighted frequency of occurrence of CNVs, where the weights are determined by the confidence scores. In addition, motivated by the fact that we often observe partially overlapping CNV regions as a mixture of two or more distinct subregions, regions identified using the two approaches can be fine-tuned to smaller sub-regions using a clustering algorithm. We compared the performance of the methods with sequencing-based results in terms of discordance rates, rates of departure from Hardy-Weinberg equilibrium (HWE) and average frequency and size of the identified regions. The discordance rates as well as the rates of departure from HWE decrease when we select CNVs with higher confidence scores. We also performed comparisons with two previously published methods, STAC and GISTIC, and showed that the methods we consider are better at identifying low-frequency but high-confidence CNV regions. CONCLUSIONS The proposed methods for identifying common CNV regions in multiple individuals perform well compared to existing methods. The identified common regions can be used for downstream analyses such as group comparisons in association studies.
Collapse
Affiliation(s)
- Teo Shu Mei
- Department of Epidemiology and Public Health, National University of Singapore, 16 Medical Drive, Singapore
| | | | | | | | | | | |
Collapse
|