1
|
Chen X, Wei S, Sun C, Yi Z, Wang Z, Wu Y, Xu J, Tao J, Chen H, Zhang M, Jiang Y, Lv H, Huang C. Computational Tools for Studying Genome Structural Variation. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2025; 29:36-48. [PMID: 39905890 DOI: 10.1089/omi.2024.0200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2025]
Abstract
Structural variation (SV) typically refers to alterations in DNA fragments at least 50 base pairs long in the human genome. It can alter thousands of DNA nucleotides and thus significantly influence human health, disease, and clinical phenotypes. There is a shared and growing recognition that the emergence of effective computational tools and high-throughput technologies such as short-read sequencing and long-read sequencing offers novel insight into SV and, by extension, diseases affecting planetary health. However, numerous available SV tools exist with varying strengths and weaknesses. This is currently hampering the abilities of scholars to select the optimal tools to study SVs. Here, we reviewed 175 tools developed in the past two decades for SV detection, annotation, visualization, and downstream analysis of human genomics. In this expert review, we provide a comprehensive catalog of SV-related tools across different technology platforms and summarize their features, strengths, and limitations with an eye to accelerate systems science and planetary health innovations.
Collapse
Affiliation(s)
- Xingyu Chen
- Dr. Neher's Biophysics Laboratory for Innovative Drug Discovery, State Kay Laboratory of Quality Research in Chinese Medicine & Faculty of Chinese Medicine, Macau University of Science and Technology, Taipa, China
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Siyu Wei
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Chen Sun
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Zelin Yi
- Dr. Neher's Biophysics Laboratory for Innovative Drug Discovery, State Kay Laboratory of Quality Research in Chinese Medicine & Faculty of Chinese Medicine, Macau University of Science and Technology, Taipa, China
| | - Zihan Wang
- Dr. Neher's Biophysics Laboratory for Innovative Drug Discovery, State Kay Laboratory of Quality Research in Chinese Medicine & Faculty of Chinese Medicine, Macau University of Science and Technology, Taipa, China
| | - Yingyi Wu
- Dr. Neher's Biophysics Laboratory for Innovative Drug Discovery, State Kay Laboratory of Quality Research in Chinese Medicine & Faculty of Chinese Medicine, Macau University of Science and Technology, Taipa, China
| | - Jing Xu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Junxian Tao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Haiyan Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Mingming Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Yongshuai Jiang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Hongchao Lv
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Chen Huang
- Dr. Neher's Biophysics Laboratory for Innovative Drug Discovery, State Kay Laboratory of Quality Research in Chinese Medicine & Faculty of Chinese Medicine, Macau University of Science and Technology, Taipa, China
| |
Collapse
|
2
|
Gordeeva V, Sharova E, Arapidi G. Progress in Methods for Copy Number Variation Profiling. Int J Mol Sci 2022; 23:ijms23042143. [PMID: 35216262 PMCID: PMC8879278 DOI: 10.3390/ijms23042143] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Revised: 02/09/2022] [Accepted: 02/11/2022] [Indexed: 02/04/2023] Open
Abstract
Copy number variations (CNVs) are the predominant class of structural genomic variations involved in the processes of evolutionary adaptation, genomic disorders, and disease progression. Compared with single-nucleotide variants, there have been challenges associated with the detection of CNVs owing to their diverse sizes. However, the field has seen significant progress in the past 20–30 years. This has been made possible due to the rapid development of molecular diagnostic methods which ensure a more detailed view of the genome structure, further complemented by recent advances in computational methods. Here, we review the major approaches that have been used to routinely detect CNVs, ranging from cytogenetics to the latest sequencing technologies, and then cover their specific features.
Collapse
Affiliation(s)
- Veronika Gordeeva
- Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, 119435 Moscow, Russia
- Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, 119435 Moscow, Russia; (E.S.); (G.A.)
- Moscow Institute of Physics and Technology, National Research University, Moscow Oblast, 141701 Moscow, Russia
- Correspondence:
| | - Elena Sharova
- Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, 119435 Moscow, Russia; (E.S.); (G.A.)
| | - Georgij Arapidi
- Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, 119435 Moscow, Russia; (E.S.); (G.A.)
- Moscow Institute of Physics and Technology, National Research University, Moscow Oblast, 141701 Moscow, Russia
- Shemyakin–Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 117997 Moscow, Russia
| |
Collapse
|
3
|
Zhuang X, Ye R, So MT, Lam WY, Karim A, Yu M, Ngo ND, Cherny SS, Tam PKH, Garcia-Barcelo MM, Tang CSM, Sham PC. A random forest-based framework for genotyping and accuracy assessment of copy number variations. NAR Genom Bioinform 2020; 2:lqaa071. [PMID: 33575619 PMCID: PMC7671382 DOI: 10.1093/nargab/lqaa071] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Revised: 08/18/2020] [Accepted: 08/26/2020] [Indexed: 12/24/2022] Open
Abstract
Detection of copy number variations (CNVs) is essential for uncovering genetic factors underlying human diseases. However, CNV detection by current methods is prone to error, and precisely identifying CNVs from paired-end whole genome sequencing (WGS) data is still challenging. Here, we present a framework, CNV-JACG, for Judging the Accuracy of CNVs and Genotyping using paired-end WGS data. CNV-JACG is based on a random forest model trained on 21 distinctive features characterizing the CNV region and its breakpoints. Using the data from the 1000 Genomes Project, Genome in a Bottle Consortium, the Human Genome Structural Variation Consortium and in-house technical replicates, we show that CNV-JACG has superior sensitivity over the latest genotyping method, SV2, particularly for the small CNVs (≤1 kb). We also demonstrate that CNV-JACG outperforms SV2 in terms of Mendelian inconsistency in trios and concordance between technical replicates. Our study suggests that CNV-JACG would be a useful tool in assessing the accuracy of CNVs to meet the ever-growing needs for uncovering the missing heritability linked to CNVs.
Collapse
Affiliation(s)
- Xuehan Zhuang
- Department of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Rui Ye
- Department of Psychiatry, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Man-Ting So
- Department of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Wai-Yee Lam
- Department of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Anwarul Karim
- Department of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Michelle Yu
- Department of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Ngoc Diem Ngo
- National Hospital of Pediatrics, Ha Noi 100000, Vietnam
| | - Stacey S Cherny
- Department of Psychiatry, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Paul Kwong-Hang Tam
- Department of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | | | - Clara Sze-Man Tang
- Department of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Pak Chung Sham
- Department of Psychiatry, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| |
Collapse
|
4
|
Dong J, Qi M, Wang S, Yuan X. DINTD: Detection and Inference of Tandem Duplications From Short Sequencing Reads. Front Genet 2020; 11:924. [PMID: 32849857 PMCID: PMC7433346 DOI: 10.3389/fgene.2020.00924] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Accepted: 07/24/2020] [Indexed: 11/21/2022] Open
Abstract
Tandem duplication (TD) is an important type of structural variation (SV) in the human genome and has biological significance for human cancer evolution and tumor genesis. Accurate and reliable detection of TDs plays an important role in advancing early detection, diagnosis, and treatment of disease. The advent of next-generation sequencing technologies has made it possible for the study of TDs. However, detection is still challenging due to the uneven distribution of reads and the uncertain amplitude of TD regions. In this paper, we present a new method, DINTD (Detection and INference of Tandem Duplications), to detect and infer TDs using short sequencing reads. The major principle of the proposed method is that it first extracts read depth and mapping quality signals, then uses the DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm to find the possible TD regions. The total variation penalized least squares model is fitted with read depth and mapping quality signals to denoise signals. A 2D binary search tree is used to search the neighbor points effectively. To further identify the exact breakpoints of the TD regions, split-read signals are integrated into DINTD. The experimental results of DINTD on simulated data sets showed that DINTD can outperform other methods for sensitivity, precision, F1-score, and boundary bias. DINTD is further validated on real samples, and the experiment results indicate that it is consistent with other methods. This study indicates that DINTD can be used as an effective tool for detecting TDs.
Collapse
Affiliation(s)
- Jinxin Dong
- School of Computer Science and Technology, Xidian University, Xi'an, China.,School of Computer Science and Technology, Liaocheng University, Liaocheng, China
| | - Minyong Qi
- School of Computer Science and Technology, Xidian University, Xi'an, China.,School of Computer Science and Technology, Liaocheng University, Liaocheng, China
| | - Shaoqiang Wang
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Xiguo Yuan
- School of Computer Science and Technology, Xidian University, Xi'an, China
| |
Collapse
|
5
|
Whitford W, Lehnert K, Snell RG, Jacobsen JC. RBV: Read balance validator, a tool for prioritising copy number variations in germline conditions. Sci Rep 2019; 9:16934. [PMID: 31729446 PMCID: PMC6858463 DOI: 10.1038/s41598-019-53181-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2018] [Accepted: 10/25/2019] [Indexed: 11/11/2022] Open
Abstract
The popularisation and decreased cost of genome resequencing has resulted in an increased use in molecular diagnostics. While there are a number of established and high quality bioinfomatic tools for identifying small genetic variants including single nucleotide variants and indels, currently there is no established standard for the detection of copy number variants (CNVs) from sequence data. The requirement for CNV detection from high throughput sequencing has resulted in the development of a large number of software packages. These tools typically utilise the sequence data characteristics: read depth, split reads, read pairs, and assembly-based techniques. However, the additional source of information from read balance (defined as relative proportion of reads of each allele at each position) has been underutilised in the existing applications. Here we present Read Balance Validator (RBV), a bioinformatic tool that uses read balance for prioritisation and validation of putative CNVs. The software simultaneously interrogates nominated regions for the presence of deletions or multiplications, and can differentiate larger CNVs from diploid regions. Additionally, the utility of RBV to test for inheritance of CNVs is demonstrated in this report. RBV is a CNV validation and prioritisation bioinformatic tool for both genome and exome sequencing available as a python package from https://github.com/whitneywhitford/RBV.
Collapse
Affiliation(s)
- Whitney Whitford
- School of Biological Sciences, The University of Auckland, Auckland, New Zealand. .,Centre for Brain Research, The University of Auckland, Auckland, New Zealand.
| | - Klaus Lehnert
- School of Biological Sciences, The University of Auckland, Auckland, New Zealand.,Centre for Brain Research, The University of Auckland, Auckland, New Zealand
| | - Russell G Snell
- School of Biological Sciences, The University of Auckland, Auckland, New Zealand.,Centre for Brain Research, The University of Auckland, Auckland, New Zealand
| | - Jessie C Jacobsen
- School of Biological Sciences, The University of Auckland, Auckland, New Zealand.,Centre for Brain Research, The University of Auckland, Auckland, New Zealand
| |
Collapse
|
6
|
Kosugi S, Momozawa Y, Liu X, Terao C, Kubo M, Kamatani Y. Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing. Genome Biol 2019; 20:117. [PMID: 31159850 PMCID: PMC6547561 DOI: 10.1186/s13059-019-1720-5] [Citation(s) in RCA: 283] [Impact Index Per Article: 47.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2018] [Accepted: 05/20/2019] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Structural variations (SVs) or copy number variations (CNVs) greatly impact the functions of the genes encoded in the genome and are responsible for diverse human diseases. Although a number of existing SV detection algorithms can detect many types of SVs using whole genome sequencing (WGS) data, no single algorithm can call every type of SVs with high precision and high recall. RESULTS We comprehensively evaluate the performance of 69 existing SV detection algorithms using multiple simulated and real WGS datasets. The results highlight a subset of algorithms that accurately call SVs depending on specific types and size ranges of the SVs and that accurately determine breakpoints, sizes, and genotypes of the SVs. We enumerate potential good algorithms for each SV category, among which GRIDSS, Lumpy, SVseq2, SoftSV, Manta, and Wham are better algorithms in deletion or duplication categories. To improve the accuracy of SV calling, we systematically evaluate the accuracy of overlapping calls between possible combinations of algorithms for every type and size range of SVs. The results demonstrate that both the precision and recall for overlapping calls vary depending on the combinations of specific algorithms rather than the combinations of methods used in the algorithms. CONCLUSION These results suggest that careful selection of the algorithms for each type and size range of SVs is required for accurate calling of SVs. The selection of specific pairs of algorithms for overlapping calls promises to effectively improve the SV detection accuracy.
Collapse
Affiliation(s)
- Shunichi Kosugi
- Laboratory for Statistical Analysis, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045 Japan
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045 Japan
| | - Yukihide Momozawa
- Laboratory for Genotyping Development, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045 Japan
| | - Xiaoxi Liu
- Laboratory for Genotyping Development, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045 Japan
| | - Chikashi Terao
- Laboratory for Statistical Analysis, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045 Japan
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045 Japan
| | - Michiaki Kubo
- RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045 Japan
| | - Yoichiro Kamatani
- Laboratory for Statistical Analysis, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045 Japan
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045 Japan
| |
Collapse
|
7
|
Liu Y, He Q, Sun W. Association analysis using somatic mutations. PLoS Genet 2018; 14:e1007746. [PMID: 30388102 PMCID: PMC6235399 DOI: 10.1371/journal.pgen.1007746] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2018] [Revised: 11/14/2018] [Accepted: 10/07/2018] [Indexed: 11/18/2022] Open
Abstract
Somatic mutations drive the growth of tumor cells and are pivotal biomarkers for many cancer treatments. Genetic association analysis using somatic mutations is an effective approach to study the functional impact of somatic mutations. However, standard regression methods are not appropriate for somatic mutation association studies because somatic mutation calls often have non-ignorable false positive rate and/or false negative rate. While large scale association analysis using somatic mutations becomes feasible recently—thanks for the improvement of sequencing techniques and the reduction of sequencing cost—there is an urgent need for a new statistical method designed for somatic mutation association analysis. We propose such a method with computationally efficient software implementation: Somatic mutation Association test with Measurement Errors (SAME). SAME accounts for somatic mutation calling uncertainty using a likelihood based approach. It can be used to assess the associations between continuous/dichotomous outcomes and individual mutations or gene-level mutations. Through simulation studies across a wide range of realistic scenarios, we show that SAME can significantly improve statistical power than the naive generalized linear model that ignores mutation calling uncertainty. Finally, using the data collected from The Cancer Genome Atlas (TCGA) project, we apply SAME to study the associations between somatic mutations and gene expression in 12 cancer types, as well as the associations between somatic mutations and colon cancer subtype defined by DNA methylation data. SAME recovered some interesting findings that were missed by the generalized linear model. In addition, we demonstrated that mutation-level and gene-level analyses are often more appropriate for oncogene and tumor-suppressor gene, respectively. Cancer is a genetic disease that is driven by the accumulation of somatic mutations. Association studies using somatic mutations is a powerful approach to identify the potential impact of somatic mutations on molecular or clinical features. One challenge for such tasks is the non-ignorable somatic mutation calling errors. We have developed a statistical method to address this challenge and applied our method to study the gene expression traits associated with somatic mutations in 12 cancer types. Our results show that some somatic mutations affect gene expression in several cancer types. In particular, we show that the associations between gene expression traits and TP53 gene level mutation reveal some similarities across a few cancer types.
Collapse
Affiliation(s)
- Yang Liu
- Department of Mathematics and Statistics, Wright State University, Dayton, Ohio, United States of America
| | - Qianchan He
- Biostatistics Program, Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Wei Sun
- Biostatistics Program, Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
- * E-mail:
| |
Collapse
|
8
|
Wang W, Sun W, Wang W, Szatkiewicz J. A randomized approach to speed up the analysis of large-scale read-count data in the application of CNV detection. BMC Bioinformatics 2018; 19:74. [PMID: 29490610 PMCID: PMC5831535 DOI: 10.1186/s12859-018-2077-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2017] [Accepted: 02/20/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The application of high-throughput sequencing in a broad range of quantitative genomic assays (e.g., DNA-seq, ChIP-seq) has created a high demand for the analysis of large-scale read-count data. Typically, the genome is divided into tiling windows and windowed read-count data is generated for the entire genome from which genomic signals are detected (e.g. copy number changes in DNA-seq, enrichment peaks in ChIP-seq). For accurate analysis of read-count data, many state-of-the-art statistical methods use generalized linear models (GLM) coupled with the negative-binomial (NB) distribution by leveraging its ability for simultaneous bias correction and signal detection. However, although statistically powerful, the GLM+NB method has a quadratic computational complexity and therefore suffers from slow running time when applied to large-scale windowed read-count data. In this study, we aimed to speed up substantially the GLM+NB method by using a randomized algorithm and we demonstrate here the utility of our approach in the application of detecting copy number variants (CNVs) using a real example. RESULTS We propose an efficient estimator, the randomized GLM+NB coefficients estimator (RGE), for speeding up the GLM+NB method. RGE samples the read-count data and solves the estimation problem on a smaller scale. We first theoretically validated the consistency and the variance properties of RGE. We then applied RGE to GENSENG, a GLM+NB based method for detecting CNVs. We named the resulting method as "R-GENSENG". Based on extensive evaluation using both simulated and empirical data, we concluded that R-GENSENG is ten times faster than the original GENSENG while maintaining GENSENG's accuracy in CNV detection. CONCLUSIONS Our results suggest that RGE strategy developed here could be applied to other GLM+NB based read-count analyses, i.e. ChIP-seq data analysis, to substantially improve their computational efficiency while preserving the analytic power.
Collapse
Affiliation(s)
- WeiBo Wang
- Department of Computer Science, University of North Carolina at Chapel Hill, 201 S. Columbia St., Chapel Hill, 27599-3175 USA
| | - Wei Sun
- Biostatistics Program, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave N, Seattle, 19024 USA
| | - Wei Wang
- Department of Computer Science, University of California, Los Angeles, 580 Portola Plaza, Los Angeles, 90095-1596 USA
| | - Jin Szatkiewicz
- Department of Genetics, University of North Carolina at Chapel Hill, 120 Mason Farm Road, Chapel Hill, 27599-7264 USA
| |
Collapse
|
9
|
Hong CS, Singh LN, Mullikin JC, Biesecker LG. Assessing the reproducibility of exome copy number variations predictions. Genome Med 2016; 8:82. [PMID: 27503473 PMCID: PMC4976506 DOI: 10.1186/s13073-016-0336-6] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2016] [Accepted: 07/13/2016] [Indexed: 11/28/2022] Open
Abstract
Background Reproducibility is receiving increased attention across many domains of science and genomics is no exception. Efforts to identify copy number variations (CNVs) from exome sequence (ES) data have been increasing. Many algorithms have been published to discover CNVs from exomes and a major challenge is the reproducibility in other datasets. Here we test exome CNV calling reproducibility under three conditions: data generated by different sequencing centers; varying sample sizes; and varying capture methodology. Methods Four CNV tools were tested: eXome Hidden Markov Model (XHMM), Copy Number Inference From Exome Reads (CoNIFER), EXCAVATOR, and Copy Number Analysis for Targeted Resequencing (CONTRA). To examine the reproducibility, we ran the callers on four datasets, varying sample sizes of N = 10, 30, 75, 100, 300, and data with different capture methodology. We examined the false negative (FN) calls and false positive (FP) calls for potential limitations of the CNV callers. The positive predictive value (PPV) was measured by checking the CNV call concordance against single nucleotide polymorphism array. Results Using independently generated datasets, we examined the PPV for each dataset and observed wide range of PPVs. The PPV values were highly data dependent (p <0.001). For the sample sizes and capture method analyses, we tested the callers in triplicates. Both analyses resulted in wide ranges of PPVs, even for the same test. Interestingly, negative correlations between the PPV and the sample sizes were observed for CoNIFER (ρ = –0.80). Further examination of FN calls showed that 44 % of these were missed by all callers and were attributed to the CNV size (46 % spanned ≤3 exons). Overlap of the FP calls showed that FPs were unique to each caller, indicative of algorithm dependency. Conclusions Our results demonstrate that further improvements in CNV callers are necessary to improve reproducibility and to include wider spectrum of CNVs (including the small CNVs). These CNV callers should be evaluated on multiple independent, heterogeneously generated datasets of varying size to increase robustness and utility. These approaches to the evaluation of exome CNV are essential to support wide utility and applicability of CNV discovery in exome studies. Electronic supplementary material The online version of this article (doi:10.1186/s13073-016-0336-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Celine S Hong
- Medical Genomics and Metabolic Genetics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Larry N Singh
- Medical Genomics and Metabolic Genetics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - James C Mullikin
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20852, USA.,Comparative Genomics Analysis Unit, Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20852, USA
| | - Leslie G Biesecker
- Medical Genomics and Metabolic Genetics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA. .,NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20852, USA.
| |
Collapse
|
10
|
Li Y, Zhou S, Schwartz DC, Ma J. Allele-Specific Quantification of Structural Variations in Cancer Genomes. Cell Syst 2016; 3:21-34. [PMID: 27453446 PMCID: PMC4965314 DOI: 10.1016/j.cels.2016.05.007] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2016] [Revised: 05/13/2016] [Accepted: 05/24/2016] [Indexed: 12/21/2022]
Abstract
Aneuploidy and structural variations (SVs) generate cancer genomes containing a mixture of rearranged genomic segments with extensive somatic copy number alterations. However, existing methods can identify either SVs or allele-specific copy number alterations, but not both simultaneously, which provides a limited view of cancer genome structure. Here we introduce Weaver, an algorithm for the quantification and analysis of allele-specific copy numbers of SVs. Weaver uses a Markov Random Field to estimate joint probabilities of allele-specific copy number of SVs and their inter-connectivity based on paired-end whole-genome sequencing data. Weaver also predicts the timing of SVs relative to chromosome amplifications. We demonstrate the accuracy of Weaver using simulations and findings from whole-genome Optical Mapping. We apply Weaver to generate allele-specific copy numbers of SVs for MCF-7 and HeLa cell lines, and identify recurrent SV patterns in 44 TCGA ovarian cancer whole-genome sequencing datasets. Our approach provides a more complete assessment of the complex genomic architectures inherent to many cancer genomes.
Collapse
Affiliation(s)
- Yang Li
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Shiguo Zhou
- Laboratory for Molecular and Computational Genomics, Department of Chemistry, Laboratory of Genetics, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - David C Schwartz
- Laboratory for Molecular and Computational Genomics, Department of Chemistry, Laboratory of Genetics, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Jian Ma
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA; Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA; Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA.
| |
Collapse
|
11
|
Li N, Ding YU, Yu T, Li J, Shen Y, Wang X, Fu Q, Shen Y, Huang X, Wang J. Causal variants screened by whole exome sequencing in a patient with maternal uniparental isodisomy of chromosome 10 and a complicated phenotype. Exp Ther Med 2016; 11:2247-2253. [PMID: 27284308 PMCID: PMC4887894 DOI: 10.3892/etm.2016.3241] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2015] [Accepted: 02/11/2016] [Indexed: 11/18/2022] Open
Abstract
Uniparental disomy (UPD), which is the abnormal situation in which both copies of a chromosomal pair have been inherited from one parent, may cause clinical abnormalities by affecting genomic imprinting or causing autosomal recessive variation. Whole Exome Sequencing (WES) and chromosomal microarray analysis (CMA) are powerful technologies used to search for underlying causal variants. In the present study, WES was used to screen for candidate causal variants in the genome of a Chinese pediatric patient, who had been shown by CMA to have maternal uniparental isodisomy of chromosome 10. This was associated with numerous severe medical problems, including bilateral deafness, binocular blindness, stunted growth and leukoderma. A total of 13 rare homozygous variants of these genes were identified on chromosome 10. These included a classical splice variant in the HPS1 gene (c.398+5G>A), which causes Hermansky-Pudlak syndrome type 1 and may explain the patient's ocular and dermal disorders. In addition, six likely pathogenic genes on other chromosomes were found to be associated with the subject's ocular and aural disorders by phenotypic analysis. The results of the present study demonstrated that WES and CMA may be successfully combined in order to identify candidate causal genes. Furthermore, a connection between phenotype and genotype was established in this patient.
Collapse
Affiliation(s)
- Niu Li
- Institute of Pediatric Translational Medicine, Shanghai Children's Medical Center, Shanghai Jiaotong University School of Medicine, Shanghai 200127, P.R. China
| | - Y U Ding
- Department of Endocrinology and Metabolism, Shanghai Children's Medical Center, Shanghai Jiaotong University School of Medicine, Shanghai 200127, P.R. China
| | - Tingting Yu
- Institute of Pediatric Translational Medicine, Shanghai Children's Medical Center, Shanghai Jiaotong University School of Medicine, Shanghai 200127, P.R. China
| | - Juan Li
- Department of Endocrinology and Metabolism, Shanghai Children's Medical Center, Shanghai Jiaotong University School of Medicine, Shanghai 200127, P.R. China
| | - Yongnian Shen
- Department of Endocrinology and Metabolism, Shanghai Children's Medical Center, Shanghai Jiaotong University School of Medicine, Shanghai 200127, P.R. China
| | - Xiumin Wang
- Department of Endocrinology and Metabolism, Shanghai Children's Medical Center, Shanghai Jiaotong University School of Medicine, Shanghai 200127, P.R. China
| | - Qihua Fu
- Institute of Pediatric Translational Medicine, Shanghai Children's Medical Center, Shanghai Jiaotong University School of Medicine, Shanghai 200127, P.R. China; Department of Laboratory Medicine, Shanghai Children's Medical Center, Shanghai Jiaotong University School of Medicine, Shanghai 200127, P.R. China
| | - Yiping Shen
- Institute of Pediatric Translational Medicine, Shanghai Children's Medical Center, Shanghai Jiaotong University School of Medicine, Shanghai 200127, P.R. China; Boston Children's Hospital, Boston, MA 02115, USA
| | - Xiaodong Huang
- Department of Endocrinology and Metabolism, Shanghai Children's Medical Center, Shanghai Jiaotong University School of Medicine, Shanghai 200127, P.R. China
| | - Jian Wang
- Institute of Pediatric Translational Medicine, Shanghai Children's Medical Center, Shanghai Jiaotong University School of Medicine, Shanghai 200127, P.R. China; Department of Laboratory Medicine, Shanghai Children's Medical Center, Shanghai Jiaotong University School of Medicine, Shanghai 200127, P.R. China
| |
Collapse
|
12
|
Chan LF, Campbell DC, Novoselova TV, Clark AJL, Metherell LA. Whole-Exome Sequencing in the Differential Diagnosis of Primary Adrenal Insufficiency in Children. Front Endocrinol (Lausanne) 2015; 6:113. [PMID: 26300845 PMCID: PMC4525066 DOI: 10.3389/fendo.2015.00113] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/25/2015] [Accepted: 07/10/2015] [Indexed: 12/02/2022] Open
Abstract
Adrenal insufficiency is a rare, but potentially fatal medical condition. In children, the cause is most commonly congenital and in recent years a growing number of causative gene mutations have been identified resulting in a myriad of syndromes that share adrenal insufficiency as one of the main characteristics. The evolution of adrenal insufficiency is dependent on the variant and the particular gene affected, meaning that rapid and accurate diagnosis is imperative for effective treatment of the patient. Common practice is for candidate genes to be sequenced individually, which is a time-consuming process and complicated by overlapping clinical phenotypes. However, with the availability, and increasing cost effectiveness of whole-exome sequencing, there is the potential for this to become a powerful diagnostic tool. Here, we report the results of whole-exome sequencing of 43 patients referred to us with a diagnosis of familial glucocorticoid deficiency (FGD) who were mutation negative for MC2R, MRAP, and STAR the most commonly mutated genes in FGD. WES provided a rapid genetic diagnosis in 17/43 sequenced patients, for the remaining 60% the gene defect may be within intronic/regulatory regions not covered by WES or may be in gene(s) representing novel etiologies. The diagnosis of isolated or familial glucocorticoid deficiency was only confirmed in 3 of the 17 patients, other genetic diagnoses were adrenal hypo- and hyperplasia, Triple A, and autoimmune polyendocrinopathy syndrome type I, emphasizing both the difficulty of phenotypically distinguishing between disorders of PAI and the utility of WES as a tool to achieve this.
Collapse
Affiliation(s)
- Li F. Chan
- Centre for Endocrinology, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Daniel C. Campbell
- Centre for Endocrinology, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Tatiana V. Novoselova
- Centre for Endocrinology, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Adrian J. L. Clark
- Centre for Endocrinology, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Louise A. Metherell
- Centre for Endocrinology, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK
- *Correspondence: Louise A. Metherell, Centre for Endocrinology, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK,
| |
Collapse
|