1
|
Magi A, Mattei G, Mingrino A, Caprioli C, Ronchini C, Frigè G, Semeraro R, Bolognini D, Rambaldi A, Candoni A, Colombo E, Mazzarella L, Pelicci PG. High-resolution Nanopore methylome-maps reveal random hyper-methylation at CpG-poor regions as driver of chemoresistance in leukemias. Commun Biol 2023; 6:382. [PMID: 37031307 PMCID: PMC10082806 DOI: 10.1038/s42003-023-04756-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Accepted: 03/24/2023] [Indexed: 04/10/2023] Open
Abstract
Aberrant DNA methylation at CpG dinucleotides is a cancer hallmark that is associated with the emergence of resistance to anti cancer treatment, though molecular mechanisms and biological significance remain elusive. Genome scale methylation maps by currently used methods are based on chemical modification of DNA and are best suited for analyses of methylation at CpG rich regions (CpG islands). We report the first high coverage whole-genome map in cancer using the long read nanopore technology, which allows simultaneous DNA-sequence and -methylation analyses on native DNA. We analyzed clonal epigenomic/genomic evolution in Acute Myeloid Leukemias (AMLs) at diagnosis and relapse, after chemotherapy. Long read sequencing coupled to a novel computational method allowed definition of differential methylation at unprecedented resolution, and showed that the relapse methylome is characterized by hypermethylation at both CpG islands and sparse CpGs regions. Most differentially methylated genes, however, were not differentially expressed nor enriched for chemoresistance genes. A small fraction of under-expressed and hyper-methylated genes at sparse CpGs, in the gene body, was significantly enriched in transcription factors (TFs). Remarkably, these few TFs supported large gene-regulatory networks including 50% of all differentially expressed genes in the relapsed AMLs and highly-enriched in chemoresistance genes. Notably, hypermethylated regions at sparse CpGs were poorly conserved in the relapsed AMLs, under-represented at their genomic positions and showed higher methylation entropy, as compared to CpG islands. Analyses of available datasets confirmed TF binding to their target genes and conservation of the same gene-regulatory networks in large patient cohorts. Relapsed AMLs carried few patient specific structural variants and DNA mutations, apparently not involved in drug resistance. Thus, drug resistance in AMLs can be mainly ascribed to the selection of random epigenetic alterations at sparse CpGs of a few transcription factors, which then induce reprogramming of the relapsing phenotype, independently of clonal genomic evolution.
Collapse
Affiliation(s)
- Alberto Magi
- Department of Information Engineering, University of Florence, Florence, Italy.
- Institute for Biomedical Technologies, National Research Council, Segrate, Milano, Italy.
| | - Gianluca Mattei
- Department of Information Engineering, University of Florence, Florence, Italy
| | - Alessandra Mingrino
- Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy
| | - Chiara Caprioli
- Department of Experimental Oncology, IEO European Institute of Oncology IRCCS, Milano, Italy
- Department of Oncology and Hemato-Oncology, University of Milan, Milan, Italy
| | - Chiara Ronchini
- Department of Experimental Oncology, IEO European Institute of Oncology IRCCS, Milano, Italy
| | - GianMaria Frigè
- Department of Experimental Oncology, IEO European Institute of Oncology IRCCS, Milano, Italy
- Department of Oncology and Hemato-Oncology, University of Milan, Milan, Italy
| | - Roberto Semeraro
- Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy
| | - Davide Bolognini
- Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy
| | - Alessandro Rambaldi
- Department of Oncology and Hemato-Oncology, University of Milan, Milan, Italy
- Azienda Socio-Sanitaria Territoriale Papa Giovanni XXIII, Bergamo, Italy
| | - Anna Candoni
- Clinica Ematologica, Azienda Sanitaria Universitaria Integrata di Udine, Udine, Italy
| | - Emanuela Colombo
- Department of Experimental Oncology, IEO European Institute of Oncology IRCCS, Milano, Italy
| | - Luca Mazzarella
- Department of Experimental Oncology, IEO European Institute of Oncology IRCCS, Milano, Italy
| | - Pier Giuseppe Pelicci
- Department of Experimental Oncology, IEO European Institute of Oncology IRCCS, Milano, Italy.
- Department of Oncology and Hemato-Oncology, University of Milan, Milan, Italy.
| |
Collapse
|
2
|
Qin F, Luo X, Cai G, Xiao F. Shall genomic correlation structure be considered in copy number variants detection? Brief Bioinform 2021; 22:6295811. [PMID: 34114005 DOI: 10.1093/bib/bbab215] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 04/16/2021] [Accepted: 05/17/2021] [Indexed: 11/14/2022] Open
Abstract
Copy number variation has been identified as a major source of genomic variation associated with disease susceptibility. With the advent of whole-exome sequencing (WES) technology, massive WES data have been generated, allowing for the identification of copy number variants (CNVs) in the protein-coding regions with direct functional interpretation. We have previously shown evidence of the genomic correlation structure in array data and developed a novel chromosomal breakpoint detection algorithm, LDcnv, which showed significantly improved detection power through integrating the correlation structure in a systematic modeling manner. However, it remains unexplored whether the genomic correlation exists in WES data and how such correlation structure integration can improve the CNV detection accuracy. In this study, we first explored the correlation structure of the WES data using the 1000 Genomes Project data. Both real raw read depth and median-normalized data showed strong evidence of the correlation structure. Motivated by this fact, we proposed a correlation-based method, CORRseq, as a novel release of the LDcnv algorithm in profiling WES data. The performance of CORRseq was evaluated in extensive simulation studies and real data analysis from the 1000 Genomes Project. CORRseq outperformed the existing methods in detecting medium and large CNVs. In conclusion, it would be more advantageous to model genomic correlation structure in detecting relatively long CNVs. This study provides great insights for methodology development of CNV detection with NGS data.
Collapse
Affiliation(s)
- Fei Qin
- Department of Epidemiology and Biostatistics, Arnold School of Public Health, University of South Carolina (USC), Discovery 449, 915 Greene St, Columbia, SC 29208, USA
| | - Xizhi Luo
- Department of Epidemiology and Biostatistics, Arnold School of Public Health, USC, Discovery 449, 915 Greene St, Columbia, SC 29208, USA
| | - Guoshuai Cai
- Department of Environmental Health Science, Arnold School of Public Health, USC, Discovery 449, 915 Greene St, Columbia, SC 29208, USA
| | - Feifei Xiao
- Department of Epidemiology and Biostatistics, Arnold School of Public Health, USC, Discovery 449, 915 Greene St, Columbia, SC 29208, USA
| |
Collapse
|
3
|
Statistical Considerations on NGS Data for Inferring Copy Number Variations. Methods Mol Biol 2021; 2243:27-58. [PMID: 33606251 DOI: 10.1007/978-1-0716-1103-6_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
The next-generation sequencing (NGS) technology has revolutionized research in genetics and genomics, resulting in massive NGS data and opening more fronts to answer unresolved issues in genetics. NGS data are usually stored at three levels: image files, sequence tags, and alignment reads. The sizes of these types of data usually range from several hundreds of gigabytes to several terabytes. Biostatisticians and bioinformaticians are typically working with the aligned NGS read count data (hence the last level of NGS data) for data modeling and interpretation.To horn in on the use of NGS technology, researchers utilize it to profile the whole genome to study DNA copy number variations (CNVs) for an individual subject (or patient) as well as groups of subjects (or patients). The resulting aligned NGS read count data are then modeled by proper mathematical and statistical approaches so that the loci of CNVs can be accurately detected. In this book chapter, a summary of most popularly used statistical methods for detecting CNVs using NGS data is given. The goal is to provide readers with a comprehensive resource of available statistical approaches for inferring DNA copy number variations using NGS data.
Collapse
|
4
|
Magi A, Bolognini D, Bartalucci N, Mingrino A, Semeraro R, Giovannini L, Bonifacio S, Parrini D, Pelo E, Mannelli F, Guglielmelli P, Maria Vannucchi A. Nano-GLADIATOR: real-time detection of copy number alterations from nanopore sequencing data. Bioinformatics 2020; 35:4213-4221. [PMID: 30949684 DOI: 10.1093/bioinformatics/btz241] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Revised: 03/05/2019] [Accepted: 04/03/2019] [Indexed: 12/15/2022] Open
Abstract
MOTIVATION The past few years have seen the emergence of nanopore-based sequencing technologies which interrogate single molecule of DNA and generate reads sequentially. RESULTS In this paper, we demonstrate that, thanks to the sequentiality of the nanopore process, the data generated in the first tens of minutes of a typical MinION/GridION run can be exploited to resolve the alterations of a human genome at a karyotype level with a resolution in the order of tens of Mb, while the data produced in the first 6-12 h allow to obtain a resolution comparable to currently available array-based technologies, and thanks to a novel probabilistic approach are capable to predict the allelic fraction of genomic alteration with high accuracy. To exploit the unique characteristics of nanopore sequencing data we developed a novel software tool, Nano-GLADIATOR, that is capable to perform copy number variants/alterations detection and allelic fraction prediction during the sequencing run ('On-line' mode) and after experiment completion ('Off-line' mode). We tested Nano-GLADIATOR on publicly available ('Off-line' mode) and on novel whole genome sequencing dataset generated with MinION device ('On-line' mode) showing that our tool is capable to perform real-time copy number alterations detection obtaining good results with respect to other state-of-the-art tools. AVAILABILITY AND IMPLEMENTATION Nano-GLADIATOR is freely available at https://sourceforge.net/projects/nanogladiator/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Alberto Magi
- Department of Information Engineering, University of Florence, Florence, Italy
| | - Davide Bolognini
- Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy
| | - Niccoló Bartalucci
- Department of Experimental and Clinical Medicine, CRIMM, Center Research and Innovation of Myeloproliferative Neoplasms, Azienda Ospedaliera Universitaria Careggi, University of Florence, Florence, Italy
| | - Alessandra Mingrino
- Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy
| | - Roberto Semeraro
- Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy
| | - Luna Giovannini
- Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy
| | - Stefania Bonifacio
- Department of Laboratory Diagnosis, Genetic Diagnosis Service, Careggi Teaching Hospital, Florence, Italy
| | - Daniela Parrini
- Department of Laboratory Diagnosis, Genetic Diagnosis Service, Careggi Teaching Hospital, Florence, Italy
| | - Elisabetta Pelo
- Department of Laboratory Diagnosis, Genetic Diagnosis Service, Careggi Teaching Hospital, Florence, Italy
| | - Francesco Mannelli
- Department of Experimental and Clinical Medicine, CRIMM, Center Research and Innovation of Myeloproliferative Neoplasms, Azienda Ospedaliera Universitaria Careggi, University of Florence, Florence, Italy
| | - Paola Guglielmelli
- Department of Experimental and Clinical Medicine, CRIMM, Center Research and Innovation of Myeloproliferative Neoplasms, Azienda Ospedaliera Universitaria Careggi, University of Florence, Florence, Italy
| | - Alessandro Maria Vannucchi
- Department of Experimental and Clinical Medicine, CRIMM, Center Research and Innovation of Myeloproliferative Neoplasms, Azienda Ospedaliera Universitaria Careggi, University of Florence, Florence, Italy
| |
Collapse
|
5
|
D'Aurizio R, Semeraro R, Magi A. Using XCAVATOR and EXCAVATOR2 to Identify CNVs from WGS, WES, and TS Data. CURRENT PROTOCOLS IN HUMAN GENETICS 2018; 98:e65. [PMID: 29975818 DOI: 10.1002/cphg.65] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Copy Number Variants (CNVs) are structural rearrangements contributing to phenotypic variation but also associated with many disease states. In recent years, the identification of CNVs from high-throughput sequencing experiments has become a common practice for both research and clinical purposes. Several computational methods have been developed so far. In this unit, we describe and give instructions on how to run two read count-based tools, XCAVATOR and EXCAVATOR2, which are tailored for the detection of both germline and somatic CNVs from different sequencing experiments (whole-genome, whole-exome, and targeted) in various disease contexts and population genetic studies. © 2018 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Romina D'Aurizio
- Institute of Informatics and Telematics, National Research Council, Pisa, Italy
| | - Roberto Semeraro
- Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy
| | - Alberto Magi
- Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy
| |
Collapse
|
6
|
XCAVATOR: accurate detection and genotyping of copy number variants from second and third generation whole-genome sequencing experiments. BMC Genomics 2017; 18:747. [PMID: 28934930 PMCID: PMC5609061 DOI: 10.1186/s12864-017-4137-0] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2017] [Accepted: 09/11/2017] [Indexed: 11/10/2022] Open
Abstract
Background We developed a novel software package, XCAVATOR, for the identification of genomic regions involved in copy number variants/alterations (CNVs/CNAs) from short and long reads whole-genome sequencing experiments. Results By using simulated and real datasets we showed that our tool, based on read count approach, is capable to predict the boundaries and the absolute number of DNA copies CNVs/CNAs with high resolutions. To demonstrate the power of our software we applied it to the analysis Illumina and Pacific Bioscencies data and we compared its performance to other ten state of the art tools. Conclusion All the analyses we performed demonstrate that XCAVATOR is capable to detect germline and somatic CNVs/CNAs outperforming all the other tools we compared. XCAVATOR is freely available at http://sourceforge.net/projects/xcavator/. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-4137-0) contains supplementary material, which is available to authorized users.
Collapse
|
7
|
SLMSuite: a suite of algorithms for segmenting genomic profiles. BMC Bioinformatics 2017; 18:321. [PMID: 28659129 PMCID: PMC5490196 DOI: 10.1186/s12859-017-1734-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2016] [Accepted: 06/20/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The identification of copy number variants (CNVs) is essential to study human genetic variation and to understand the genetic basis of mendelian disorders and cancers. At present, genome-wide detection of CNVs can be achieved using microarray or second generation sequencing (SGS) data. Although these technologies are very different, the genomic profiles that they generate are mathematically very similar and consist of noisy signals in which a decrease or increase of consecutive data represent deletions or duplication of DNA. In this framework, the most important step of the analysis consists of segmenting genomic profiles for the identification of the boundaries of genomic regions with increased or decreased signal. RESULTS Here we introduce SLMSuite, a collection of algorithms, based on shifting level models (SLM), to segment genomic profiles from array and SGS experiments. The SLM algorithms take as input the log-transformed genomic profiles from SGS or microarray experiments and output segmentation results. We apply our method to the analysis of synthetic genomic profiles and real whole genome sequencing data and we demonstrate that it outperforms the state of the art circular binary segmentation algorithm in terms of sensitivity, specificity and computational speed. CONCLUSION The SLMSuite contains an R library with the segmentation methods and three wrappers that allow to use them in Python, Ruby and C++. SLMSuite is freely available at https://sourceforge.net/projects/slmsuite .
Collapse
|
8
|
Ji T, Chen J. Statistical models for DNA copy number variation detection using read-depth data from next generation sequencing experiments. AUST NZ J STAT 2016. [DOI: 10.1111/anzs.12175] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Affiliation(s)
- Tieming Ji
- Department of Statistics; University of Missouri at Columbia; Columbia MI 65211 USA
| | - Jie Chen
- Department of Biostatistics and Epidemiology; Medical College of Georgia, Augusta University; Augusta GA 30912 USA
| |
Collapse
|
9
|
Yu Z, Li A, Wang M. CloneCNA: detecting subclonal somatic copy number alterations in heterogeneous tumor samples from whole-exome sequencing data. BMC Bioinformatics 2016; 17:310. [PMID: 27538789 PMCID: PMC4990858 DOI: 10.1186/s12859-016-1174-7] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2016] [Accepted: 08/11/2016] [Indexed: 12/13/2022] Open
Abstract
Background Copy number alteration is a main genetic structural variation that plays an important role in tumor initialization and progression. Accurate detection of copy number alterations is necessary for discovering cancer-causing genes. Whole-exome sequencing has become a widely used technology in the last decade for detecting various types of genomic aberrations in cancer genomes. However, there are several major issues encountered in these detection problems, including normal cell contamination, tumor aneuploidy, and intra-tumor heterogeneity. Especially, deciphering the intra-tumor heterogeneity is imperative for identifying clonal and subclonal copy number alterations. Results We introduce CloneCNA, a novel bioinformatics tool for efficiently addressing these issues and automatically detecting clonal and subclonal somatic copy number alterations from heterogeneous tumor samples. CloneCNA fully explores the log ratio of read counts between paired tumor-normal samples and tumor B allele frequency of germline heterozygous SNP positions, further employs efficient statistical models to quantitatively represent copy number status of tumor sample containing multiple clones. We examine CloneCNA on simulated heterogeneous and real tumor samples, and the results demonstrate that CloneCNA has higher power to detect copy number alterations than existing methods. Conclusions CloneCNA, a novel algorithm is developed to efficiently and accurately identify somatic copy number alterations from heterogeneous tumor samples. We demonstrate the statistical framework of CloneCNA represents a remarkable advance for tumor whole-exome sequencing data. We expect that CloneCNA will promote cancer-focused studies for investigating the role of clonal evolution and elucidating critical events benefiting tumor tumourigenesis and progression. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1174-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Zhenhua Yu
- School of Information Science and Technology, University of Science and Technology of China, Hefei, AH230027, China
| | - Ao Li
- School of Information Science and Technology, University of Science and Technology of China, Hefei, AH230027, China. .,Centers for Biomedical Engineering, University of Science and Technology of China, Hefei, AH230027, China.
| | - Minghui Wang
- School of Information Science and Technology, University of Science and Technology of China, Hefei, AH230027, China.,Centers for Biomedical Engineering, University of Science and Technology of China, Hefei, AH230027, China
| |
Collapse
|
10
|
D'Aurizio R, Pippucci T, Tattini L, Giusti B, Pellegrini M, Magi A. Enhanced copy number variants detection from whole-exome sequencing data using EXCAVATOR2. Nucleic Acids Res 2016; 44:e154. [PMID: 27507884 PMCID: PMC5175347 DOI: 10.1093/nar/gkw695] [Citation(s) in RCA: 61] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2016] [Revised: 07/25/2016] [Accepted: 07/27/2016] [Indexed: 12/26/2022] Open
Abstract
Copy Number Variants (CNVs) are structural rearrangements contributing to phenotypic variation that have been proved to be associated with many disease states. Over the last years, the identification of CNVs from whole-exome sequencing (WES) data has become a common practice for research and clinical purpose and, consequently, the demand for more and more efficient and accurate methods has increased. In this paper, we demonstrate that more than 30% of WES data map outside the targeted regions and that these reads, usually discarded, can be exploited to enhance the identification of CNVs from WES experiments. Here, we present EXCAVATOR2, the first read count based tool that exploits all the reads produced by WES experiments to detect CNVs with a genome-wide resolution. To evaluate the performance of our novel tool we use it for analysing two WES data sets, a population data set sequenced by the 1000 Genomes Project and a tumor data set made of bladder cancer samples. The results obtained from these analyses demonstrate that EXCAVATOR2 outperforms other four state-of-the-art methods and that our combined approach enlarge the spectrum of detectable CNVs from WES data with an unprecedented resolution. EXCAVATOR2 is freely available at http://sourceforge.net/projects/excavator2tool/.
Collapse
Affiliation(s)
- Romina D'Aurizio
- Laboratory of Integrative Systems Medicine (LISM), Institute of Informatics and Telematics and Institute of Clinical Physiology, National Research Council, Pisa, Italy
| | - Tommaso Pippucci
- Medical Genetics Unit, Sant'Orsola Malpighi Polyclinic, Bologna, Italy
| | - Lorenzo Tattini
- Department of Computer Science, University of Pisa, Pisa, Italy
| | - Betti Giusti
- Department of Experimental and Clinical Medicine, University of Florence, Florence
| | - Marco Pellegrini
- Laboratory of Integrative Systems Medicine (LISM), Institute of Informatics and Telematics and Institute of Clinical Physiology, National Research Council, Pisa, Italy
| | - Alberto Magi
- Department of Experimental and Clinical Medicine, University of Florence, Florence
| |
Collapse
|
11
|
Orrico A, Marseglia G, Pescucci C, Cortesi A, Piomboni P, Giansanti A, Gerundino F, Ponchietti R. Molecular Dissection Using Array Comparative Genomic Hybridization and Clinical Evaluation of An Infertile Male Carrier of An Unbalanced Y;21 Translocation: A Case Report and Review of The Literature. INTERNATIONAL JOURNAL OF FERTILITY & STERILITY 2015; 9:581-5. [PMID: 26985348 PMCID: PMC4793181 DOI: 10.22074/ijfs.2015.4619] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/19/2014] [Accepted: 08/09/2014] [Indexed: 11/04/2022]
Abstract
Chromosomal defects are relatively frequent in infertile men however, translocations between the Y chromosome and autosomes are rare and less than 40 cases of Y-autosome translocation have been reported. In particular, only three individuals has been described with a Y;21 translocation, up to now. We report on an additional case of an infertile man in whom a Y;21 translocation was associated with the deletion of a large part of the Y chromosome long arm. Applying various techniques, including conventional cytogenetic procedures, fluorescence in situ hybridisation (FISH) analysis and array comparative genomic hybridization (array-CGH) studies, we identified a derivative chromosome originating from a fragment of the short arm of the chromosome Y translocated on the short arm of the 21 chromosome. The Y chromosome structural rearrangement resulted in the intactness of the entire short arm, including the sex-determining region Y (SRY) and the short stature homeobox (SHOX) loci, although translocated on the 21 chromosome, and the loss of a large part of the long arm of the Y chromosome, including azoospermia factor-a (AZFa), AZFb, AZFc and Yq heterochromatin regions. This is the first case in which a (Yp;21p) translocation has been ascertained using an array-CGH approach, thus reporting details of such a rearrangement at higher resolution.
Collapse
Affiliation(s)
- Alfredo Orrico
- Molecular Medicine Unit, Azienda Ospedaliera Universitaria Senese, Siena, Italy; Medical Genetics, Misericordia Hospital, Grosseto, Italy
| | - Giuseppina Marseglia
- Diagnostic Genetic Unit, Department of Laboratory, Careggi University Hospital, Firenze, Italy
| | - Chiara Pescucci
- Diagnostic Genetic Unit, Department of Laboratory, Careggi University Hospital, Firenze, Italy
| | - Ambra Cortesi
- Medical Genetics, Misericordia Hospital, Grosseto, Italy
| | - Paola Piomboni
- Department of Molecular and Developmental Medicine, University of Siena, Azienda Ospedaliera Universitaria Senese, Siena, Italy
| | - Andrea Giansanti
- Genitourinary Unit, University of Siena, Azienda Ospedaliera Universitaria Senese, Siena, Italy
| | - Francesca Gerundino
- Diagnostic Genetic Unit, Department of Laboratory, Careggi University Hospital, Firenze, Italy
| | - Roberto Ponchietti
- Genitourinary Unit, University of Siena, Azienda Ospedaliera Universitaria Senese, Siena, Italy
| |
Collapse
|
12
|
Anjum S, Morganella S, D'Angelo F, Iavarone A, Ceccarelli M. VEGAWES: variational segmentation on whole exome sequencing for copy number detection. BMC Bioinformatics 2015; 16:315. [PMID: 26416038 PMCID: PMC4587906 DOI: 10.1186/s12859-015-0748-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2015] [Accepted: 09/16/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Copy number variations are important in the detection and progression of significant tumors and diseases. Recently, Whole Exome Sequencing is gaining popularity with copy number variations detection due to low cost and better efficiency. In this work, we developed VEGAWES for accurate and robust detection of copy number variations on WES data. VEGAWES is an extension to a variational based segmentation algorithm, VEGA: Variational estimator for genomic aberrations, which has previously outperformed several algorithms on segmenting array comparative genomic hybridization data. RESULTS We tested this algorithm on synthetic data and 100 Glioblastoma Multiforme primary tumor samples. The results on the real data were analyzed with segmentation obtained from Single-nucleotide polymorphism data as ground truth. We compared our results with two other segmentation algorithms and assessed the performance based on accuracy and time. CONCLUSIONS In terms of both accuracy and time, VEGAWES provided better results on the synthetic data and tumor samples demonstrating its potential in robust detection of aberrant regions in the genome.
Collapse
Affiliation(s)
- Samreen Anjum
- Computational Sciences and Engineering, Qatar Computing Research Institute, Doha, P. O. Box 5825, Qatar.
| | - Sandro Morganella
- European Molecular Biology Laboratory, European Bioinformatics Institute, (EMBL -EBI), Wellcome Trust Genome Campus, Cambridge, CB10 1SD, UK.
| | | | - Antonio Iavarone
- Institute for Cancer Genetics, Columbia University, New York, 10027, USA.
| | - Michele Ceccarelli
- Computational Sciences and Engineering, Qatar Computing Research Institute, Doha, P. O. Box 5825, Qatar. .,Department of Science and Technology, University of Sannio, Benevento, 82100, Italy.
| |
Collapse
|
13
|
Tattini L, D'Aurizio R, Magi A. Detection of Genomic Structural Variants from Next-Generation Sequencing Data. Front Bioeng Biotechnol 2015; 3:92. [PMID: 26161383 PMCID: PMC4479793 DOI: 10.3389/fbioe.2015.00092] [Citation(s) in RCA: 169] [Impact Index Per Article: 16.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2014] [Accepted: 06/10/2015] [Indexed: 01/16/2023] Open
Abstract
Structural variants are genomic rearrangements larger than 50 bp accounting for around 1% of the variation among human genomes. They impact on phenotypic diversity and play a role in various diseases including neurological/neurocognitive disorders and cancer development and progression. Dissecting structural variants from next-generation sequencing data presents several challenges and a number of approaches have been proposed in the literature. In this mini review, we describe and summarize the latest tools – and their underlying algorithms – designed for the analysis of whole-genome sequencing, whole-exome sequencing, custom captures, and amplicon sequencing data, pointing out the major advantages/drawbacks. We also report a summary of the most recent applications of third-generation sequencing platforms. This assessment provides a guided indication – with particular emphasis on human genetics and copy number variants – for researchers involved in the investigation of these genomic events.
Collapse
Affiliation(s)
- Lorenzo Tattini
- Department of Neurosciences, Psychology, Pharmacology and Child Health, University of Florence , Florence , Italy
| | - Romina D'Aurizio
- Laboratory of Integrative Systems Medicine (LISM), Institute of Informatics and Telematics and Institute of Clinical Physiology, National Research Council , Pisa , Italy
| | - Alberto Magi
- Department of Clinical and Experimental Medicine, University of Florence , Florence , Italy
| |
Collapse
|
14
|
Magi A, Tattini L, Cifola I, D'Aurizio R, Benelli M, Mangano E, Battaglia C, Bonora E, Kurg A, Seri M, Magini P, Giusti B, Romeo G, Pippucci T, De Bellis G, Abbate R, Gensini GF. EXCAVATOR: detecting copy number variants from whole-exome sequencing data. Genome Biol 2014; 14:R120. [PMID: 24172663 PMCID: PMC4053953 DOI: 10.1186/gb-2013-14-10-r120] [Citation(s) in RCA: 188] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2013] [Accepted: 10/30/2013] [Indexed: 12/11/2022] Open
Abstract
We developed a novel software tool, EXCAVATOR, for the detection of copy number variants (CNVs) from whole-exome sequencing data. EXCAVATOR combines a three-step normalization procedure with a novel heterogeneous hidden Markov model algorithm and a calling method that classifies genomic regions into five copy number states. We validate EXCAVATOR on three datasets and compare the results with three other methods. These analyses show that EXCAVATOR outperforms the other methods and is therefore a valuable tool for the investigation of CNVs in largescale projects, as well as in clinical research and diagnostics. EXCAVATOR is freely available at http://sourceforge.net/projects/excavatortool/.
Collapse
|
15
|
Li X, Chen S, Xie W, Vogel I, Choy KW, Chen F, Christensen R, Zhang C, Ge H, Jiang H, Yu C, Huang F, Wang W, Jiang H, Zhang X. PSCC: sensitive and reliable population-scale copy number variation detection method based on low coverage sequencing. PLoS One 2014; 9:e85096. [PMID: 24465483 PMCID: PMC3897425 DOI: 10.1371/journal.pone.0085096] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2013] [Accepted: 11/22/2013] [Indexed: 11/28/2022] Open
Abstract
Background Copy number variations (CNVs) represent an important type of genetic variation that deeply impact phenotypic polymorphisms and human diseases. The advent of high-throughput sequencing technologies provides an opportunity to revolutionize the discovery of CNVs and to explore their relationship with diseases. However, most of the existing methods depend on sequencing depth and show instability with low sequence coverage. In this study, using low coverage whole-genome sequencing (LCS) we have developed an effective population-scale CNV calling (PSCC) method. Methodology/Principal Findings In our novel method, two-step correction was used to remove biases caused by local GC content and complex genomic characteristics. We chose a binary segmentation method to locate CNV segments and designed combined statistics tests to ensure the stable performance of the false positive control. The simulation data showed that our PSCC method could achieve 99.7%/100% and 98.6%/100% sensitivity and specificity for over 300 kb CNV calling in the condition of LCS (∼2×) and ultra LCS (∼0.2×), respectively. Finally, we applied this novel method to analyze 34 clinical samples with an average of 2× LCS. In the final results, all the 31 pathogenic CNVs identified by aCGH were successfully detected. In addition, the performance comparison revealed that our method had significant advantages over existing methods using ultra LCS. Conclusions/Significance Our study showed that PSCC can sensitively and reliably detect CNVs using low coverage or even ultra-low coverage data through population-scale sequencing.
Collapse
Affiliation(s)
| | - Shengpei Chen
- BGI-Shenzhen, Shenzhen, China ; State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | | | - Ida Vogel
- Department of Clinical Genetics, Aarhus University Hospital, Aarhus, Denmark
| | - Kwong Wai Choy
- Department of Obstetrics and Gynaecology, The Chinese University of Hong Kong, Shatin, NT, Hong Kong
| | | | - Rikke Christensen
- Department of Clinical Genetics, Aarhus University Hospital, Aarhus, Denmark
| | | | | | - Haojun Jiang
- BGI-Shenzhen, Shenzhen, China ; State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | | | - Fang Huang
- Guangzhou Children's Social Welfare Home, Guangzhou, China
| | - Wei Wang
- BGI-Shenzhen, Shenzhen, China ; Clinical laboratory of BGI Health, Shenzhen, China
| | | | - Xiuqing Zhang
- BGI-Shenzhen, Shenzhen, China ; The Guangdong Enterprise Key Laboratory of Human Disease Genomics, BGI-Shenzhen, Shenzhen, China
| |
Collapse
|
16
|
Vandeweyer G, Kooy RF. Detection and interpretation of genomic structural variation in health and disease. Expert Rev Mol Diagn 2014; 13:61-82. [DOI: 10.1586/erm.12.119] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
17
|
Gandolfi A, Benelli M, Magi A, Chiti S. Moment estimation in discrete shifting level model applied to fast array-CGH segmentation. STAT NEERL 2013. [DOI: 10.1111/stan.12005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Affiliation(s)
- A. Gandolfi
- Dipartimento di Matematica U. Dini; Università di Firenze; Viale Morgagni 67/A; 50134; Florence; Italy
| | | | | | - S. Chiti
- Dipartimento di Matematica U. Dini; Università di Firenze; Viale Morgagni 67/A; 50134; Florence; Italy
| |
Collapse
|
18
|
Marseglia G, Scordo MR, Pescucci C, Nannetti G, Biagini E, Scandurra V, Gerundino F, Magi A, Benelli M, Torricelli F. 372 kb microdeletion in 18q12.3 causing SETBP1 haploinsufficiency associated with mild mental retardation and expressive speech impairment. Eur J Med Genet 2012; 55:216-21. [PMID: 22333924 DOI: 10.1016/j.ejmg.2012.01.005] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2011] [Accepted: 01/12/2012] [Indexed: 10/14/2022]
Abstract
Several cases of interstitial deletion encompassing band 18q12.3 are described in patients with mild dysmorphic features, mental retardation and impairment of expressive language. The critical deleted region contains SETBP1 gene (SET binding protein 1). Missense heterozygous mutations in this gene cause Schinzel-Giedion syndrome (SGS, MIM#269150), characterized by profound mental retardation and multiple congenital malformations. Recently, a 18q12.3 microdeletion causing SETBP1 haploinsufficiency has been described in two patients that show expressive speech impairment, moderate developmental delay and peculiar facial features. The phenotype of individual with partial chromosome 18q deletions does not resemble SGS. The deletion defines a critical region in which SETBP1 is the major candidate gene for expressive speech defect. We describe an additional patient with the smallest 18q12.3 microdeletion never reported that causes the disruption of SETBP1. The patient shows mild mental retardation and expressive speech impairment with striking discrepancy between expressive and receptive language skills. He is able to communicate using gestures and mimic expression of face and body with surprising efficacy. The significant phenotypic overlap between this patient and the cases previously reported enforce the hypothesis that SETBP1 haploinsufficiency may have a role in expressive language development.
Collapse
Affiliation(s)
- Giuseppina Marseglia
- SOD Diagnostica genetica, Azienda Ospedaliero Universitaria Careggi, Florence, Italy
| | | | | | | | | | | | | | | | | | | |
Collapse
|
19
|
Magi A, Tattini L, Pippucci T, Torricelli F, Benelli M. Read count approach for DNA copy number variants detection. Bioinformatics 2011; 28:470-8. [DOI: 10.1093/bioinformatics/btr707] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
|
20
|
Magi A, Benelli M, Yoon S, Roviello F, Torricelli F. Detecting common copy number variants in high-throughput sequencing data by using JointSLM algorithm. Nucleic Acids Res 2011; 39:e65. [PMID: 21321017 PMCID: PMC3105418 DOI: 10.1093/nar/gkr068] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
The discovery of genomic structural variants (SVs), such as copy number variants (CNVs), is essential to understand genetic variation of human populations and complex diseases. Over recent years, the advent of new high-throughput sequencing (HTS) platforms has opened many opportunities for SVs discovery, and a very promising approach consists in measuring the depth of coverage (DOC) of reads aligned to the human reference genome. At present, few computational methods have been developed for the analysis of DOC data and all of these methods allow to analyse only one sample at time. For these reasons, we developed a novel algorithm (JointSLM) that allows to detect common CNVs among individuals by analysing DOC data from multiple samples simultaneously. We test JointSLM performance on synthetic and real data and we show its unprecedented resolution that enables the detection of recurrent CNV regions as small as 500 bp in size. When we apply JointSLM to analyse chromosome one of eight genomes with different ancestry, we identify 3000 regions with recurrent CNVs of different frequency and size: hierarchical clustering on these regions segregates the eight individuals in two groups that reflect their ancestry, demonstrating the potential utility of JointSLM for population genetics studies.
Collapse
Affiliation(s)
- Alberto Magi
- Laboratory Department, Diagnostic Genetic Unit, Careggi Hospital, Florence 5014, Italy.
| | | | | | | | | |
Collapse
|
21
|
Morganella S, Cerulo L, Viglietto G, Ceccarelli M. VEGA: variational segmentation for copy number detection. ACTA ACUST UNITED AC 2010; 26:3020-7. [PMID: 20959380 DOI: 10.1093/bioinformatics/btq586] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Genomic copy number (CN) information is useful to study genetic traits of many diseases. Using array comparative genomic hybridization (aCGH), researchers are able to measure the copy number of thousands of DNA loci at the same time. Therefore, a current challenge in bioinformatics is the development of efficient algorithms to detect the map of aberrant chromosomal regions. METHODS We describe an approach for the segmentation of copy number aCGH data. Variational estimator for genomic aberrations (VEGA) adopt a variational model used in image segmentation. The optimal segmentation is modeled as the minimum of an energy functional encompassing both the quality of interpolation of the data and the complexity of the solution measured by the length of the boundaries between segmented regions. This solution is obtained by a region growing process where the stop condition is completely data driven. RESULTS VEGA is compared with three algorithms that represent the state of the art in CN segmentation. Performance assessment is made both on synthetic and real data. Synthetic data simulate different noise conditions. Results on these data show the robustness with respect to noise of variational models and the accuracy of VEGA in terms of recall and precision. Eight mantle cell lymphoma cell lines and two samples of glioblastoma multiforme are used to evaluate the behavior of VEGA on real biological data. Comparison between results and current biological knowledge shows the ability of the proposed method in detecting known chromosomal aberrations. AVAILABILITY VEGA has been implemented in R and is available at the address http://www.dsba.unisannio.it/Members/ceccarelli/vega in the section Download.
Collapse
Affiliation(s)
- Sandro Morganella
- Department of Biological and Environmental Studies, University of Sannio, Benevento, Italy
| | | | | | | |
Collapse
|