1
|
Qin F, Luo X, Cai G, Xiao F. Shall genomic correlation structure be considered in copy number variants detection? Brief Bioinform 2021; 22:6295811. [PMID: 34114005 DOI: 10.1093/bib/bbab215] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 04/16/2021] [Accepted: 05/17/2021] [Indexed: 11/14/2022] Open
Abstract
Copy number variation has been identified as a major source of genomic variation associated with disease susceptibility. With the advent of whole-exome sequencing (WES) technology, massive WES data have been generated, allowing for the identification of copy number variants (CNVs) in the protein-coding regions with direct functional interpretation. We have previously shown evidence of the genomic correlation structure in array data and developed a novel chromosomal breakpoint detection algorithm, LDcnv, which showed significantly improved detection power through integrating the correlation structure in a systematic modeling manner. However, it remains unexplored whether the genomic correlation exists in WES data and how such correlation structure integration can improve the CNV detection accuracy. In this study, we first explored the correlation structure of the WES data using the 1000 Genomes Project data. Both real raw read depth and median-normalized data showed strong evidence of the correlation structure. Motivated by this fact, we proposed a correlation-based method, CORRseq, as a novel release of the LDcnv algorithm in profiling WES data. The performance of CORRseq was evaluated in extensive simulation studies and real data analysis from the 1000 Genomes Project. CORRseq outperformed the existing methods in detecting medium and large CNVs. In conclusion, it would be more advantageous to model genomic correlation structure in detecting relatively long CNVs. This study provides great insights for methodology development of CNV detection with NGS data.
Collapse
Affiliation(s)
- Fei Qin
- Department of Epidemiology and Biostatistics, Arnold School of Public Health, University of South Carolina (USC), Discovery 449, 915 Greene St, Columbia, SC 29208, USA
| | - Xizhi Luo
- Department of Epidemiology and Biostatistics, Arnold School of Public Health, USC, Discovery 449, 915 Greene St, Columbia, SC 29208, USA
| | - Guoshuai Cai
- Department of Environmental Health Science, Arnold School of Public Health, USC, Discovery 449, 915 Greene St, Columbia, SC 29208, USA
| | - Feifei Xiao
- Department of Epidemiology and Biostatistics, Arnold School of Public Health, USC, Discovery 449, 915 Greene St, Columbia, SC 29208, USA
| |
Collapse
|
2
|
Perry EB, Makohon-Moore A, Zheng C, Kaufman CK, Cai J, Iacobuzio-Donahue CA, White RM. Tumor diversity and evolution revealed through RADseq. Oncotarget 2018; 8:41792-41805. [PMID: 28611298 PMCID: PMC5522028 DOI: 10.18632/oncotarget.18355] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2017] [Accepted: 05/12/2017] [Indexed: 12/30/2022] Open
Abstract
Cancer is an evolutionary disease, and there is increasing interest in applying tools from evolutionary biology to understand cancer progression. Restriction-site associated DNA sequencing (RADseq) was developed for the field of evolutionary genetics to study adaptation and identify evolutionary relationships among populations. Here we apply RADseq to study tumor evolution, which allows for unbiased sampling of any desired frequency of the genome, overcoming the selection bias and cost limitations inherent to exome or whole-genome sequencing. We apply RADseq to both human pancreatic cancer and zebrafish melanoma samples. Using either a low-frequency (SbfI, 0.4% of the genome) or high-frequency (NsiI, 6-9% of the genome) cutter, we successfully identify single nucleotide substitutions and copy number alterations in tumors, which can be augmented by performing RADseq on sublineages within the tumor. We are able to infer phylogenetic relationships between primary tumors and metastases. These same methods can be used to identify somatic mosaicism in seemingly normal, non-cancerous tissues. Evolutionary studies of cancer that focus on rates of tumor evolution and evolutionary relationships among tumor lineages will benefit from the flexibility and efficiency of restriction-site associated DNA sequencing.
Collapse
Affiliation(s)
- Elizabeth B Perry
- Cancer Biology & Genetics, Memorial Sloan Kettering Cancer Center, New York, New York, USA.,Biostatistics, Yale University, New Haven, Connecticut, USA
| | - Alvin Makohon-Moore
- The David M. Rubenstein Center for Pancreatic Cancer Research, Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, New York, USA
| | - Caihong Zheng
- Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing, China
| | | | - Jun Cai
- Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Christine A Iacobuzio-Donahue
- The David M. Rubenstein Center for Pancreatic Cancer Research, Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, New York, USA
| | - Richard M White
- Cancer Biology & Genetics, Memorial Sloan Kettering Cancer Center, New York, New York, USA
| |
Collapse
|
3
|
XCAVATOR: accurate detection and genotyping of copy number variants from second and third generation whole-genome sequencing experiments. BMC Genomics 2017; 18:747. [PMID: 28934930 PMCID: PMC5609061 DOI: 10.1186/s12864-017-4137-0] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2017] [Accepted: 09/11/2017] [Indexed: 11/10/2022] Open
Abstract
Background We developed a novel software package, XCAVATOR, for the identification of genomic regions involved in copy number variants/alterations (CNVs/CNAs) from short and long reads whole-genome sequencing experiments. Results By using simulated and real datasets we showed that our tool, based on read count approach, is capable to predict the boundaries and the absolute number of DNA copies CNVs/CNAs with high resolutions. To demonstrate the power of our software we applied it to the analysis Illumina and Pacific Bioscencies data and we compared its performance to other ten state of the art tools. Conclusion All the analyses we performed demonstrate that XCAVATOR is capable to detect germline and somatic CNVs/CNAs outperforming all the other tools we compared. XCAVATOR is freely available at http://sourceforge.net/projects/xcavator/. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-4137-0) contains supplementary material, which is available to authorized users.
Collapse
|
4
|
D'Aurizio R, Pippucci T, Tattini L, Giusti B, Pellegrini M, Magi A. Enhanced copy number variants detection from whole-exome sequencing data using EXCAVATOR2. Nucleic Acids Res 2016; 44:e154. [PMID: 27507884 PMCID: PMC5175347 DOI: 10.1093/nar/gkw695] [Citation(s) in RCA: 61] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2016] [Revised: 07/25/2016] [Accepted: 07/27/2016] [Indexed: 12/26/2022] Open
Abstract
Copy Number Variants (CNVs) are structural rearrangements contributing to phenotypic variation that have been proved to be associated with many disease states. Over the last years, the identification of CNVs from whole-exome sequencing (WES) data has become a common practice for research and clinical purpose and, consequently, the demand for more and more efficient and accurate methods has increased. In this paper, we demonstrate that more than 30% of WES data map outside the targeted regions and that these reads, usually discarded, can be exploited to enhance the identification of CNVs from WES experiments. Here, we present EXCAVATOR2, the first read count based tool that exploits all the reads produced by WES experiments to detect CNVs with a genome-wide resolution. To evaluate the performance of our novel tool we use it for analysing two WES data sets, a population data set sequenced by the 1000 Genomes Project and a tumor data set made of bladder cancer samples. The results obtained from these analyses demonstrate that EXCAVATOR2 outperforms other four state-of-the-art methods and that our combined approach enlarge the spectrum of detectable CNVs from WES data with an unprecedented resolution. EXCAVATOR2 is freely available at http://sourceforge.net/projects/excavator2tool/.
Collapse
Affiliation(s)
- Romina D'Aurizio
- Laboratory of Integrative Systems Medicine (LISM), Institute of Informatics and Telematics and Institute of Clinical Physiology, National Research Council, Pisa, Italy
| | - Tommaso Pippucci
- Medical Genetics Unit, Sant'Orsola Malpighi Polyclinic, Bologna, Italy
| | - Lorenzo Tattini
- Department of Computer Science, University of Pisa, Pisa, Italy
| | - Betti Giusti
- Department of Experimental and Clinical Medicine, University of Florence, Florence
| | - Marco Pellegrini
- Laboratory of Integrative Systems Medicine (LISM), Institute of Informatics and Telematics and Institute of Clinical Physiology, National Research Council, Pisa, Italy
| | - Alberto Magi
- Department of Experimental and Clinical Medicine, University of Florence, Florence
| |
Collapse
|
5
|
Orrico A, Marseglia G, Pescucci C, Cortesi A, Piomboni P, Giansanti A, Gerundino F, Ponchietti R. Molecular Dissection Using Array Comparative Genomic Hybridization and Clinical Evaluation of An Infertile Male Carrier of An Unbalanced Y;21 Translocation: A Case Report and Review of The Literature. INTERNATIONAL JOURNAL OF FERTILITY & STERILITY 2015; 9:581-5. [PMID: 26985348 PMCID: PMC4793181 DOI: 10.22074/ijfs.2015.4619] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/19/2014] [Accepted: 08/09/2014] [Indexed: 11/04/2022]
Abstract
Chromosomal defects are relatively frequent in infertile men however, translocations between the Y chromosome and autosomes are rare and less than 40 cases of Y-autosome translocation have been reported. In particular, only three individuals has been described with a Y;21 translocation, up to now. We report on an additional case of an infertile man in whom a Y;21 translocation was associated with the deletion of a large part of the Y chromosome long arm. Applying various techniques, including conventional cytogenetic procedures, fluorescence in situ hybridisation (FISH) analysis and array comparative genomic hybridization (array-CGH) studies, we identified a derivative chromosome originating from a fragment of the short arm of the chromosome Y translocated on the short arm of the 21 chromosome. The Y chromosome structural rearrangement resulted in the intactness of the entire short arm, including the sex-determining region Y (SRY) and the short stature homeobox (SHOX) loci, although translocated on the 21 chromosome, and the loss of a large part of the long arm of the Y chromosome, including azoospermia factor-a (AZFa), AZFb, AZFc and Yq heterochromatin regions. This is the first case in which a (Yp;21p) translocation has been ascertained using an array-CGH approach, thus reporting details of such a rearrangement at higher resolution.
Collapse
Affiliation(s)
- Alfredo Orrico
- Molecular Medicine Unit, Azienda Ospedaliera Universitaria Senese, Siena, Italy; Medical Genetics, Misericordia Hospital, Grosseto, Italy
| | - Giuseppina Marseglia
- Diagnostic Genetic Unit, Department of Laboratory, Careggi University Hospital, Firenze, Italy
| | - Chiara Pescucci
- Diagnostic Genetic Unit, Department of Laboratory, Careggi University Hospital, Firenze, Italy
| | - Ambra Cortesi
- Medical Genetics, Misericordia Hospital, Grosseto, Italy
| | - Paola Piomboni
- Department of Molecular and Developmental Medicine, University of Siena, Azienda Ospedaliera Universitaria Senese, Siena, Italy
| | - Andrea Giansanti
- Genitourinary Unit, University of Siena, Azienda Ospedaliera Universitaria Senese, Siena, Italy
| | - Francesca Gerundino
- Diagnostic Genetic Unit, Department of Laboratory, Careggi University Hospital, Firenze, Italy
| | - Roberto Ponchietti
- Genitourinary Unit, University of Siena, Azienda Ospedaliera Universitaria Senese, Siena, Italy
| |
Collapse
|
6
|
Anjum S, Morganella S, D'Angelo F, Iavarone A, Ceccarelli M. VEGAWES: variational segmentation on whole exome sequencing for copy number detection. BMC Bioinformatics 2015; 16:315. [PMID: 26416038 PMCID: PMC4587906 DOI: 10.1186/s12859-015-0748-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2015] [Accepted: 09/16/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Copy number variations are important in the detection and progression of significant tumors and diseases. Recently, Whole Exome Sequencing is gaining popularity with copy number variations detection due to low cost and better efficiency. In this work, we developed VEGAWES for accurate and robust detection of copy number variations on WES data. VEGAWES is an extension to a variational based segmentation algorithm, VEGA: Variational estimator for genomic aberrations, which has previously outperformed several algorithms on segmenting array comparative genomic hybridization data. RESULTS We tested this algorithm on synthetic data and 100 Glioblastoma Multiforme primary tumor samples. The results on the real data were analyzed with segmentation obtained from Single-nucleotide polymorphism data as ground truth. We compared our results with two other segmentation algorithms and assessed the performance based on accuracy and time. CONCLUSIONS In terms of both accuracy and time, VEGAWES provided better results on the synthetic data and tumor samples demonstrating its potential in robust detection of aberrant regions in the genome.
Collapse
Affiliation(s)
- Samreen Anjum
- Computational Sciences and Engineering, Qatar Computing Research Institute, Doha, P. O. Box 5825, Qatar.
| | - Sandro Morganella
- European Molecular Biology Laboratory, European Bioinformatics Institute, (EMBL -EBI), Wellcome Trust Genome Campus, Cambridge, CB10 1SD, UK.
| | | | - Antonio Iavarone
- Institute for Cancer Genetics, Columbia University, New York, 10027, USA.
| | - Michele Ceccarelli
- Computational Sciences and Engineering, Qatar Computing Research Institute, Doha, P. O. Box 5825, Qatar. .,Department of Science and Technology, University of Sannio, Benevento, 82100, Italy.
| |
Collapse
|
7
|
Magi A, Tattini L, Cifola I, D'Aurizio R, Benelli M, Mangano E, Battaglia C, Bonora E, Kurg A, Seri M, Magini P, Giusti B, Romeo G, Pippucci T, De Bellis G, Abbate R, Gensini GF. EXCAVATOR: detecting copy number variants from whole-exome sequencing data. Genome Biol 2014; 14:R120. [PMID: 24172663 PMCID: PMC4053953 DOI: 10.1186/gb-2013-14-10-r120] [Citation(s) in RCA: 188] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2013] [Accepted: 10/30/2013] [Indexed: 12/11/2022] Open
Abstract
We developed a novel software tool, EXCAVATOR, for the detection of copy number variants (CNVs) from whole-exome sequencing data. EXCAVATOR combines a three-step normalization procedure with a novel heterogeneous hidden Markov model algorithm and a calling method that classifies genomic regions into five copy number states. We validate EXCAVATOR on three datasets and compare the results with three other methods. These analyses show that EXCAVATOR outperforms the other methods and is therefore a valuable tool for the investigation of CNVs in largescale projects, as well as in clinical research and diagnostics. EXCAVATOR is freely available at http://sourceforge.net/projects/excavatortool/.
Collapse
|
8
|
Zheng C, Miao X, Li Y, Huang Y, Ruan J, Ma X, Wang L, Wu CI, Cai J. Determination of genomic copy number alteration emphasizing a restriction site-based strategy of genome re-sequencing. Bioinformatics 2013; 29:2813-21. [DOI: 10.1093/bioinformatics/btt481] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
|
9
|
Gandolfi A, Benelli M, Magi A, Chiti S. Moment estimation in discrete shifting level model applied to fast array-CGH segmentation. STAT NEERL 2013. [DOI: 10.1111/stan.12005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Affiliation(s)
- A. Gandolfi
- Dipartimento di Matematica U. Dini; Università di Firenze; Viale Morgagni 67/A; 50134; Florence; Italy
| | | | | | - S. Chiti
- Dipartimento di Matematica U. Dini; Università di Firenze; Viale Morgagni 67/A; 50134; Florence; Italy
| |
Collapse
|
10
|
Rueda OM, Diaz-Uriarte R, Caldas C. Finding common regions of alteration in copy number data. Methods Mol Biol 2013; 973:339-53. [PMID: 23412800 DOI: 10.1007/978-1-62703-281-0_21] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/20/2023]
Abstract
In this chapter, we review some recent methods designed for detecting recurrent copy number regions, that is, genomic regions that show evidence of being altered in a set of samples. We analyze Affymetrix SNP6 data from 87 Her2-type breast tumors from a recent study using three different methods, showing different definitions and features of common regions: studying heterogeneity in copy number profiles, refining candidates for driver oncogenes, and consolidating broad amplifications.
Collapse
Affiliation(s)
- Oscar M Rueda
- Cancer Research UK Cambridge Research Institute, Li Ka Shing Centre, Cambridge, UK.
| | | | | |
Collapse
|
11
|
Marseglia G, Scordo MR, Pescucci C, Nannetti G, Biagini E, Scandurra V, Gerundino F, Magi A, Benelli M, Torricelli F. 372 kb microdeletion in 18q12.3 causing SETBP1 haploinsufficiency associated with mild mental retardation and expressive speech impairment. Eur J Med Genet 2012; 55:216-21. [PMID: 22333924 DOI: 10.1016/j.ejmg.2012.01.005] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2011] [Accepted: 01/12/2012] [Indexed: 10/14/2022]
Abstract
Several cases of interstitial deletion encompassing band 18q12.3 are described in patients with mild dysmorphic features, mental retardation and impairment of expressive language. The critical deleted region contains SETBP1 gene (SET binding protein 1). Missense heterozygous mutations in this gene cause Schinzel-Giedion syndrome (SGS, MIM#269150), characterized by profound mental retardation and multiple congenital malformations. Recently, a 18q12.3 microdeletion causing SETBP1 haploinsufficiency has been described in two patients that show expressive speech impairment, moderate developmental delay and peculiar facial features. The phenotype of individual with partial chromosome 18q deletions does not resemble SGS. The deletion defines a critical region in which SETBP1 is the major candidate gene for expressive speech defect. We describe an additional patient with the smallest 18q12.3 microdeletion never reported that causes the disruption of SETBP1. The patient shows mild mental retardation and expressive speech impairment with striking discrepancy between expressive and receptive language skills. He is able to communicate using gestures and mimic expression of face and body with surprising efficacy. The significant phenotypic overlap between this patient and the cases previously reported enforce the hypothesis that SETBP1 haploinsufficiency may have a role in expressive language development.
Collapse
Affiliation(s)
- Giuseppina Marseglia
- SOD Diagnostica genetica, Azienda Ospedaliero Universitaria Careggi, Florence, Italy
| | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Magi A, Tattini L, Pippucci T, Torricelli F, Benelli M. Read count approach for DNA copy number variants detection. Bioinformatics 2011; 28:470-8. [DOI: 10.1093/bioinformatics/btr707] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
|
13
|
Magi A, Benelli M, Yoon S, Roviello F, Torricelli F. Detecting common copy number variants in high-throughput sequencing data by using JointSLM algorithm. Nucleic Acids Res 2011; 39:e65. [PMID: 21321017 PMCID: PMC3105418 DOI: 10.1093/nar/gkr068] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
The discovery of genomic structural variants (SVs), such as copy number variants (CNVs), is essential to understand genetic variation of human populations and complex diseases. Over recent years, the advent of new high-throughput sequencing (HTS) platforms has opened many opportunities for SVs discovery, and a very promising approach consists in measuring the depth of coverage (DOC) of reads aligned to the human reference genome. At present, few computational methods have been developed for the analysis of DOC data and all of these methods allow to analyse only one sample at time. For these reasons, we developed a novel algorithm (JointSLM) that allows to detect common CNVs among individuals by analysing DOC data from multiple samples simultaneously. We test JointSLM performance on synthetic and real data and we show its unprecedented resolution that enables the detection of recurrent CNV regions as small as 500 bp in size. When we apply JointSLM to analyse chromosome one of eight genomes with different ancestry, we identify 3000 regions with recurrent CNVs of different frequency and size: hierarchical clustering on these regions segregates the eight individuals in two groups that reflect their ancestry, demonstrating the potential utility of JointSLM for population genetics studies.
Collapse
Affiliation(s)
- Alberto Magi
- Laboratory Department, Diagnostic Genetic Unit, Careggi Hospital, Florence 5014, Italy.
| | | | | | | | | |
Collapse
|