1
|
Genomic Variations Explorer (GenVarX): a toolset for annotating promoter and CNV regions using genotypic and phenotypic differences. Front Genet 2023; 14:1251382. [PMID: 37928239 PMCID: PMC10623549 DOI: 10.3389/fgene.2023.1251382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Accepted: 09/27/2023] [Indexed: 11/07/2023] Open
Abstract
The rapid growth of sequencing technology and its increasing popularity in biology-related research over the years has made whole genome re-sequencing (WGRS) data become widely available. A large amount of WGRS data can unlock the knowledge gap between genomics and phenomics through gaining an understanding of the genomic variations that can lead to phenotype changes. These genomic variations are usually comprised of allele and structural changes in DNA, and these changes can affect the regulatory mechanisms causing changes in gene expression and altering the phenotypes of organisms. In this research work, we created the GenVarX toolset, that is backed by transcription factor binding sequence data in promoter regions, the copy number variations data, SNPs and Indels data, and phenotypes data which can potentially provide insights about phenotypic differences and solve compelling questions in plant research. Analytics-wise, we have developed strategies to better utilize the WGRS data and mine the data using efficient data processing scripts, libraries, tools, and frameworks to create the interactive and visualization-enhanced GenVarX toolset that encompasses both promoter regions and copy number variation analysis components. The main capabilities of the GenVarX toolset are to provide easy-to-use interfaces for users to perform queries, visualize data, and interact with the data. Based on different input windows on the user interface, users can provide inputs corresponding to each field and submit the information as a query. The data returned on the results page is usually displayed in a tabular fashion. In addition, interactive figures are also included in the toolset to facilitate the visualization of statistical results or tool outputs. Currently, the GenVarX toolset supports soybean, rice, and Arabidopsis. The researchers can access the soybean GenVarX toolset from SoyKB via https://soykb.org/SoybeanGenVarX/, rice GenVarX toolset, and Arabidopsis GenVarX toolset from KBCommons web portal with links https://kbcommons.org/system/tools/GenVarX/Osativa and https://kbcommons.org/system/tools/GenVarX/Athaliana, respectively.
Collapse
|
2
|
Copy-number analysis by base-level normalization: An intuitive visualization tool for evaluating copy number variations. Clin Genet 2023; 103:35-44. [PMID: 36152294 DOI: 10.1111/cge.14236] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 09/19/2022] [Accepted: 09/20/2022] [Indexed: 12/13/2022]
Abstract
Next-generation sequencing (NGS) facilitates comprehensive molecular analyses that help with diagnosing unsolved disorders. In addition to detecting single-nucleotide variations and small insertions/deletions, bioinformatics tools can identify copy number variations (CNVs) in NGS data, which improves the diagnostic yield. However, due to the possibility of false positives, subsequent confirmation tests are generally performed. Here, we introduce Copy-number Analysis by BAse-level NormAlization (CABANA), a visualization tool that allows users to intuitively identify candidate CNVs using the normalized single-base-level read depth calculated from NGS data. To demonstrate how CABANA works, NGS data were obtained from 474 patients with neuromuscular disorders. CNVs were screened using a conventional bioinformatics tool, ExomeDepth, and then we normalized and visualized those data at the single-base level using CABANA, followed by manual inspection by geneticists to filter out false positives and determine candidate CNVs. In doing so, we identified 31 candidate CNVs (7%) in 474 patients and subsequently confirmed all of them to be true using multiplex ligation-dependent probe amplification. The performance of CABANA was deemed acceptable by comparing its diagnostic yield with previous data about neuromuscular disorders. Despite some limitations, we expect CABANA to help researchers accurately identify CNVs and reduce the need for subsequent confirmation testing.
Collapse
|
3
|
Annotation of structural variants with reported allele frequencies and related metrics from multiple datasets using SVAFotate. BMC Bioinformatics 2022; 23:490. [PMID: 36384437 PMCID: PMC9670370 DOI: 10.1186/s12859-022-05008-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Accepted: 10/25/2022] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Identification of deleterious genetic variants using DNA sequencing data relies on increasingly detailed filtering strategies to isolate the small subset of variants that are more likely to underlie a disease phenotype. Datasets reflecting population allele frequencies of different types of variants serve as powerful filtering tools, especially in the context of rare disease analysis. While such population-scale allele frequency datasets now exist for structural variants (SVs), it remains a challenge to match SV calls between multiple datasets, thereby complicating estimates of a putative SV's population allele frequency. RESULTS We introduce SVAFotate, a software tool that enables the annotation of SVs with variant allele frequency and related information from existing SV datasets. As a result, VCF files annotated by SVAFotate offer a variety of metrics to aid in the stratification of SVs as common or rare in the broader human population. CONCLUSIONS Here we demonstrate the use of SVAFotate in the classification of SVs with regards to their population frequency and illustrate how SVAFotate's annotations can be used to filter and prioritize SVs. Lastly, we detail how best to utilize these SV annotations in the analysis of genetic variation in studies of rare disease.
Collapse
|
4
|
Biallelic KIF24 Variants Are Responsible for a Spectrum of Skeletal Disorders Ranging From Lethal Skeletal Ciliopathy to Severe Acromesomelic Dysplasia. J Bone Miner Res 2022; 37:1642-1652. [PMID: 35748595 PMCID: PMC9545074 DOI: 10.1002/jbmr.4639] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/04/2021] [Revised: 06/01/2022] [Accepted: 06/14/2022] [Indexed: 11/14/2022]
Abstract
Skeletal dysplasias comprise a large spectrum of mostly monogenic disorders affecting bone growth, patterning, and homeostasis, and ranging in severity from lethal to mild phenotypes. This study aimed to underpin the genetic cause of skeletal dysplasia in three unrelated families with variable skeletal manifestations. The six affected individuals from three families had severe short stature with extreme shortening of forelimbs, short long-bones, and metatarsals, and brachydactyly (family 1); mild short stature, platyspondyly, and metaphyseal irregularities (family 2); or a prenatally lethal skeletal dysplasia with kidney features suggestive of a ciliopathy (family 3). Genetic studies by whole genome, whole exome, and ciliome panel sequencing identified in all affected individuals biallelic missense variants in KIF24, which encodes a kinesin family member controlling ciliogenesis. In families 1 and 3, with the more severe phenotype, the affected subjects harbored homozygous variants (c.1457A>G; p.(Ile486Val) and c.1565A>G; p.(Asn522Ser), respectively) in the motor domain which plays a crucial role in KIF24 function. In family 2, compound heterozygous variants (c.1697C>T; p.(Ser566Phe)/c.1811C>T; p.(Thr604Met)) were found C-terminal to the motor domain, in agreement with a genotype-phenotype correlation. In vitro experiments performed on amnioblasts of one affected fetus from family 3 showed that primary cilia assembly was severely impaired, and that cytokinesis was also affected. In conclusion, our study describes novel forms of skeletal dysplasia associated with biallelic variants in KIF24. To our knowledge this is the first report implicating KIF24 variants as the cause of a skeletal dysplasia, thereby extending the genetic heterogeneity and the phenotypic spectrum of rare bone disorders and underscoring the wide range of monogenetic skeletal ciliopathies. © 2022 The Authors. Journal of Bone and Mineral Research published by Wiley Periodicals LLC on behalf of American Society for Bone and Mineral Research (ASBMR).
Collapse
|
5
|
Automated prediction of the clinical impact of structural copy number variations. Sci Rep 2022; 12:555. [PMID: 35017614 PMCID: PMC8752772 DOI: 10.1038/s41598-021-04505-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Accepted: 11/01/2021] [Indexed: 11/09/2022] Open
Abstract
Copy number variants (CNVs) play an important role in many biological processes, including the development of genetic diseases, making them attractive targets for genetic analyses. The interpretation of the effect of these structural variants is a challenging problem due to highly variable numbers of gene, regulatory, or other genomic elements affected by the CNV. This led to the demand for the interpretation tools that would relieve researchers, laboratory diagnosticians, genetic counselors, and clinical geneticists from the laborious process of annotation and classification of CNVs. We designed and validated a prediction method (ISV; Interpretation of Structural Variants) that is based on boosted trees which takes into account annotations of CNVs from several publicly available databases. The presented approach achieved more than 98% prediction accuracy on both copy number loss and copy number gain variants while also allowing CNVs being assigned "uncertain" significance in predictions. We believe that ISV's prediction capability and explainability have a great potential to guide users to more precise interpretations and classifications of CNVs.
Collapse
|
6
|
VarGenius-HZD Allows Accurate Detection of Rare Homozygous or Hemizygous Deletions in Targeted Sequencing Leveraging Breadth of Coverage. Genes (Basel) 2021; 12:genes12121979. [PMID: 34946927 PMCID: PMC8701221 DOI: 10.3390/genes12121979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Revised: 12/06/2021] [Accepted: 12/08/2021] [Indexed: 11/17/2022] Open
Abstract
Homozygous deletions (HDs) may be the cause of rare diseases and cancer, and their discovery in targeted sequencing is a challenging task. Different tools have been developed to disentangle HD discovery but a sensitive caller is still lacking. We present VarGenius-HZD, a sensitive and scalable algorithm that leverages breadth-of-coverage for the detection of rare homozygous and hemizygous single-exon deletions (HDs). To assess its effectiveness, we detected both real and synthetic rare HDs in fifty exomes from the 1000 Genomes Project obtaining higher sensitivity in comparison with state-of-the-art algorithms that each missed at least one event. We then applied our tool on targeted sequencing data from patients with Inherited Retinal Dystrophies and solved five cases that still lacked a genetic diagnosis. We provide VarGenius-HZD either stand-alone or integrated within our recently developed software, enabling the automated selection of samples using the internal database. Hence, it could be extremely useful for both diagnostic and research purposes.
Collapse
|
7
|
Dominant Distal Myopathy 3 (MPD3) Caused by a Deletion in the HNRNPA1 Gene. NEUROLOGY-GENETICS 2021; 7:e632. [PMID: 34722876 PMCID: PMC8552285 DOI: 10.1212/nxg.0000000000000632] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Revised: 08/27/2021] [Accepted: 09/08/2021] [Indexed: 12/15/2022]
Abstract
Background and Objectives To determine the genetic cause of the disease in the previously reported family with adult-onset autosomal dominant distal myopathy (myopathy, distal, 3; MPD3). Methods Continued clinical evaluation including muscle MRI and muscle pathology. A linkage analysis with single nucleotide polymorphism arrays and genome sequencing were used to identify the genetic defect, which was verified by Sanger sequencing. RNA sequencing was used to investigate the transcriptional effects of the identified genetic defect. Results Small hand muscles (intrinsic, thenar, and hypothenar) were first involved with spread to the lower legs and later proximal muscles. Dystrophic changes with rimmed vacuoles and cytoplasmic inclusions were observed in muscle biopsies at advanced stage. A single nucleotide polymorphism array confirmed the previous microsatellite-based linkage to 8p22-q11 and 12q13-q22. Genome sequencing of three affected family members combined with structural variant calling revealed a small heterozygous deletion of 160 base pairs spanning the second last exon 10 of the heterogeneous nuclear ribonucleoprotein A1 (HNRNPA1) gene, which is in the linked region on chromosome 12. Segregation of the mutation with the disease was confirmed by Sanger sequencing. RNA sequencing showed that the mutant allele produces a shorter mutant mRNA transcript compared with the wild-type allele. Immunofluorescence studies on muscle biopsies revealed small p62 and larger TDP-43 inclusions. Discussion A small exon 10 deletion in the gene HNRNPA1 was identified as the cause of MPD3 in this family. The new HNRNPA1-related phenotype, upper limb presenting distal myopathy, was thus confirmed, and the family displays the complexities of gene identification.
Collapse
|
8
|
Uncovering potential single nucleotide polymorphisms, copy number variations and related signaling pathways in primary Sjogren's syndrome. Bioengineered 2021; 12:9313-9331. [PMID: 34723755 PMCID: PMC8809958 DOI: 10.1080/21655979.2021.2000245] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Primary Sjogren’s syndrome (pSS) is a complex systemic autoimmune disease, which is difficult to accurately diagnose due to symptom diversity in patients, especially at earlier stages. We tried to find potential single nucleotide polymorphisms (SNPs), copy number variations (CNVs) and related signaling pathways. Genomic DNA was extracted from peripheral blood of 12 individuals (7 individuals from 3 pSS pedigrees and 5 sporadic cases) for whole-exome sequencing (WES) analysis. SNPs and CNVs were identified, followed by functional annotation of genes with SNPs and CNVs. Gene expression profile (involving 64 normal controls and 166 cases) was downloaded from the Gene Expression Omnibus database (GEO) dataset for differentially expression analysis. Sanger sequencing and in vitro validation was used to validate the identified SNPs and differentially expressed genes, respectively. A total of 5 SNPs were identified in both pedigrees and sporadic cases, such as FES, PPM1J, and TRAPPC9. A total of 3402 and 19 CNVs were identified in pedigrees and sporadic cases, respectively. Fifty-one differentially expressed genes were associated with immunity, such as BATF3, LAP3, BATF2, PARP9, and IL15RA. AMPK signaling pathway and cell adhesion molecules (CAMs) were the most significantly enriched signaling pathways of identified SNPs. Identified CNVs were associated with systemic lupus erythematosus, mineral absorption, and HTLV-I infection. IL2-STAT5 signaling, interferon-gamma response, and interferon-alpha response were significantly enriched immune related signaling pathways of identified differentially expressed genes. In conclusion, our study found some potential SNPs, CNVs, and related signaling pathways, which could be useful in understanding the pathological mechanism of pSS.
Collapse
|
9
|
CNVxplorer: a web tool to assist clinical interpretation of CNVs in rare disease patients. Nucleic Acids Res 2021; 49:W93-W103. [PMID: 34019647 PMCID: PMC8262689 DOI: 10.1093/nar/gkab347] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2021] [Revised: 04/12/2021] [Accepted: 05/20/2021] [Indexed: 12/20/2022] Open
Abstract
Copy Number Variants (CNVs) are an important cause of rare diseases. Array-based Comparative Genomic Hybridization tests yield a ∼12% diagnostic rate, with ∼8% of patients presenting CNVs of unknown significance. CNVs interpretation is particularly challenging on genomic regions outside of those overlapping with previously reported structural variants or disease-associated genes. Recent studies showed that a more comprehensive evaluation of CNV features, leveraging both coding and non-coding impacts, can significantly improve diagnostic rates. However, currently available CNV interpretation tools are mostly gene-centric or provide only non-interactive annotations difficult to assess in the clinical practice. Here, we present CNVxplorer, a web server suited for the functional assessment of CNVs in a clinical diagnostic setting. CNVxplorer mines a comprehensive set of clinical, genomic, and epigenomic features associated with CNVs. It provides sequence constraint metrics, impact on regulatory elements and topologically associating domains, as well as expression patterns. Analyses offered cover (a) agreement with patient phenotypes; (b) visualizations of associations among genes, regulatory elements and transcription factors; (c) enrichment on functional and pathway annotations and (d) co-occurrence of terms across PubMed publications related to the query CNVs. A flexible evaluation workflow allows dynamic re-interrogation in clinical sessions. CNVxplorer is publicly available at http://cnvxplorer.com.
Collapse
|
10
|
Abstract
Gains and losses of large segments of genomic DNA, known as copy number variants (CNVs) gained considerable interest in clinical diagnostics lately, as particular forms may lead to inherited genetic diseases. In recent decades, researchers developed a wide variety of cytogenetic and molecular methods with different detection capabilities to detect clinically relevant CNVs. In this review, we summarize methodological progress from conventional approaches to current state of the art techniques capable of detecting CNVs from a few bases up to several megabases. Although the recent rapid progress of sequencing methods has enabled precise detection of CNVs, determining their functional effect on cellular and whole-body physiology remains a challenge. Here, we provide a comprehensive list of databases and bioinformatics tools that may serve as useful assets for researchers, laboratory diagnosticians, and clinical geneticists facing the challenge of CNV detection and interpretation.
Collapse
|
11
|
AnnotSV: an integrated tool for structural variations annotation. Bioinformatics 2019; 34:3572-3574. [PMID: 29669011 DOI: 10.1093/bioinformatics/bty304] [Citation(s) in RCA: 179] [Impact Index Per Article: 35.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2017] [Accepted: 04/13/2018] [Indexed: 01/27/2023] Open
Abstract
Summary Structural Variations (SV) are a major source of variability in the human genome that shaped its actual structure during evolution. Moreover, many human diseases are caused by SV, highlighting the need to accurately detect those genomic events but also to annotate them and assist their biological interpretation. Therefore, we developed AnnotSV that compiles functionally, regulatory and clinically relevant information and aims at providing annotations useful to (i) interpret SV potential pathogenicity and (ii) filter out SV potential false positive. In particular, AnnotSV reports heterozygous and homozygous counts of single nucleotide variations (SNVs) and small insertions/deletions called within each SV for the analyzed patients, this genomic information being extremely useful to support or question the existence of an SV. We also report the computed allelic frequency relative to overlapping variants from DGV (MacDonald et al., 2014), that is especially powerful to filter out common SV. To delineate the strength of AnnotSV, we annotated the 4751 SV from one sample of the 1000 Genomes Project, integrating the sample information of four million of SNV/indel, in less than 60 s. Availability and implementation AnnotSV is implemented in Tcl and runs in command line on all platforms. The source code is available under the GNU GPL license. Source code, README and Supplementary data are available at http://lbgi.fr/AnnotSV/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
|
12
|
Increasing the diagnostic yield of exome sequencing by copy number variant analysis. PLoS One 2018; 13:e0209185. [PMID: 30557390 PMCID: PMC6296659 DOI: 10.1371/journal.pone.0209185] [Citation(s) in RCA: 51] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2018] [Accepted: 12/01/2018] [Indexed: 01/17/2023] Open
Abstract
As whole exome sequencing (WES) becomes more widely used in the clinical realm, a wealth of unanalyzed information will be routinely generated. Using WES read depth data to predict copy number variation (CNV) could extend the diagnostic utility of this previously underutilized data by providing clinically important information such as previously unsuspected deletions or duplications. We evaluated ExomeDepth, a free R package, in addition to an aneuploidy prediction method, to detect CNVs in WES data. First, in a blinded pilot study, five out of five genomic alterations were correctly identified from clinical samples with previously defined chromosomal gains or losses, including submicroscopic deletions, duplications, and chromosomal trisomy. We then examined CNV calls among 53 patients participating in the NCGENES research study and undergoing WES, who had existing clinical chromosomal microarray (CMA) data that could be used for validation. For unique CNVs that overlap well with WES coverage regions, sensitivity was 89% for deletions and 65% for duplications. While specificity of the algorithm calls remains a concern, this is less of an issue at high threshold filtering levels. When applied to all 672 patients from the exome sequencing study, ExomeDepth identified eleven diagnostically relevant CNVs ranging in size from a two exon deletion to whole chromosome duplications, as well as numerous other CNVs with varying clinical significance. This opportunistic analysis of WES data yields an additional 1.6% of patients in this study with pathogenic or likely pathogenic CNVs that are clinically relevant to their phenotype as well as clinically relevant secondary findings. Finally, we demonstrate the potential value of copy number analysis in cases where a single heterozygous likely or known pathogenic single nucleotide alteration is identified in a gene associated with an autosomal recessive condition.
Collapse
|
13
|
Clinical analysis of germline copy number variation in DMD using a non-conjugate hierarchical Bayesian model. BMC Med Genomics 2018; 11:91. [PMID: 30342520 PMCID: PMC6195989 DOI: 10.1186/s12920-018-0404-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2018] [Accepted: 09/18/2018] [Indexed: 12/31/2022] Open
Abstract
Background Detection of copy number variants (CNVs) is an important aspect of clinical testing for several disorders, including Duchenne muscular dystrophy, and is often performed using multiplex ligation-dependent probe amplification (MLPA). However, since many genetic carrier screens depend instead on next-generation sequencing (NGS) for wider discovery of small variants, they often do not include CNV analysis. Moreover, most computational techniques developed to detect CNVs from exome sequencing data are not suitable for carrier screening, as they require matched normals, very large cohorts, or extensive gene panels. Methods We present a computational software package, geneCNV (http://github.com/vkozareva/geneCNV), which can identify exon-level CNVs using exome sequencing data from only a few genes. The tool relies on a hierarchical parametric model trained on a small cohort of reference samples. Results Using geneCNV, we accurately inferred heterozygous CNVs in the DMD gene across a cohort of 15 test subjects. These results were validated against MLPA, the current standard for clinical CNV analysis in DMD. We also benchmarked the tool’s performance against other computational techniques and found comparable or improved CNV detection in DMD using data from panels ranging from 4,000 genes to as few as 8 genes. Conclusions geneCNV allows for the creation of cost-effective screening panels by allowing NGS sequencing approaches to generate results equivalent to bespoke genotyping assays like MLPA. By using a parametric model to detect CNVs, it also fulfills regulatory requirements to define a reference range for a genetic test. It is freely available and can be incorporated into any Illumina sequencing pipeline to create clinical assays for detection of exon duplications and deletions. Electronic supplementary material The online version of this article (10.1186/s12920-018-0404-4) contains supplementary material, which is available to authorized users.
Collapse
|
14
|
Ximmer: a system for improving accuracy and consistency of CNV calling from exome data. Gigascience 2018; 7:5091801. [PMID: 30192941 PMCID: PMC6177737 DOI: 10.1093/gigascience/giy112] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2018] [Accepted: 08/23/2018] [Indexed: 01/13/2023] Open
Abstract
Background While exome and targeted next-generation DNA sequencing are primarily used for detecting single nucleotide changes and small indels, detection of copy number variants (CNVs) can provide highly valuable additional information from the data. Although there are dozens of exome CNV detection methods available, these are often difficult to use, and accuracy varies unpredictably between and within datasets. Findings We present Ximmer, a tool that supports an end-to-end process for evaluating, tuning, and running analysis methods for detection of CNVs in germline samples. Ximmer includes a simulation framework, implementations of several commonly used CNV detection methods, and a visualization and curation tool that together enable interactive exploration and quality control of CNV results. Using Ximmer, we comprehensively evaluate CNV detection on four datasets using five different detection methods. We show that application of Ximmer can improve accuracy and aid in quality control of CNV detection results. In addition, Ximmer can be used to run analyses and explore CNV results in exome data. Conclusions Ximmer offers a comprehensive tool and method for applying and improving accuracy of CNV detection methods for exome data.
Collapse
|
15
|
iCopyDAV: Integrated platform for copy number variations-Detection, annotation and visualization. PLoS One 2018; 13:e0195334. [PMID: 29621297 PMCID: PMC5886540 DOI: 10.1371/journal.pone.0195334] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2017] [Accepted: 03/20/2018] [Indexed: 12/14/2022] Open
Abstract
Discovery of copy number variations (CNVs), a major category of structural variations, have dramatically changed our understanding of differences between individuals and provide an alternate paradigm for the genetic basis of human diseases. CNVs include both copy gain and copy loss events and their detection genome-wide is now possible using high-throughput, low-cost next generation sequencing (NGS) methods. However, accurate detection of CNVs from NGS data is not straightforward due to non-uniform coverage of reads resulting from various systemic biases. We have developed an integrated platform, iCopyDAV, to handle some of these issues in CNV detection in whole genome NGS data. It has a modular framework comprising five major modules: data pre-treatment, segmentation, variant calling, annotation and visualization. An important feature of iCopyDAV is the functional annotation module that enables the user to identify and prioritize CNVs encompassing various functional elements, genomic features and disease-associations. Parallelization of the segmentation algorithms makes the iCopyDAV platform even accessible on a desktop. Here we show the effect of sequencing coverage, read length, bin size, data pre-treatment and segmentation approaches on accurate detection of the complete spectrum of CNVs. Performance of iCopyDAV is evaluated on both simulated data and real data for different sequencing depths. It is an open-source integrated pipeline available at https://github.com/vogetihrsh/icopydav and as Docker’s image at http://bioinf.iiit.ac.in/icopydav/.
Collapse
|
16
|
Polygenic Versus Monogenic Causes of Hypercholesterolemia Ascertained Clinically. Arterioscler Thromb Vasc Biol 2016; 36:2439-2445. [DOI: 10.1161/atvbaha.116.308027] [Citation(s) in RCA: 144] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2016] [Accepted: 10/10/2016] [Indexed: 11/16/2022]
Abstract
Objective—
Next-generation sequencing technology is transforming our understanding of heterozygous familial hypercholesterolemia, including revision of prevalence estimates and attribution of polygenic effects. Here, we examined the contributions of monogenic and polygenic factors in patients with severe hypercholesterolemia referred to a specialty clinic.
Approach and Results—
We applied targeted next-generation sequencing with custom annotation, coupled with evaluation of large-scale copy number variation and polygenic scores for raised low-density lipoprotein cholesterol in a cohort of 313 individuals with severe hypercholesterolemia, defined as low-density lipoprotein cholesterol >5.0 mmol/L (>194 mg/dL). We found that (1) monogenic familial hypercholesterolemia–causing mutations detected by targeted next-generation sequencing were present in 47.3% of individuals; (2) the percentage of individuals with monogenic mutations increased to 53.7% when copy number variations were included; (3) the percentage further increased to 67.1% when individuals with extreme polygenic scores were included; and (4) the percentage of individuals with an identified genetic component increased from 57.0% to 92.0% as low-density lipoprotein cholesterol level increased from 5.0 to >8.0 mmol/L (194 to >310 mg/dL).
Conclusions—
In a clinically ascertained sample with severe hypercholesterolemia, we found that most patients had a discrete genetic basis detected using a comprehensive screening approach that includes targeted next-generation sequencing, an assay for copy number variations, and polygenic trait scores.
Collapse
|
17
|
Detection of genomic rearrangements from targeted resequencing data in Parkinson's disease patients. Mov Disord 2016; 32:165-169. [PMID: 28124432 PMCID: PMC5297984 DOI: 10.1002/mds.26845] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2016] [Revised: 09/12/2016] [Accepted: 09/27/2016] [Indexed: 12/18/2022] Open
Abstract
Background The analysis of coverage depth in next‐generation sequencing data allows the detection of gene dose alterations. We explore the frequency of such structural events in a Spanish cohort of sporadic PD cases. Methods Gene dose alterations were detected with the eXome‐Hidden Markov Model (XHMM) software from depth of coverage in resequencing data available for 38 Mendelian and other risk PD loci in 394 individuals (249 cases and 145 controls) and subsequently validated by quantitative PCR. Results We identified 10 PD patients with exon dosage alterations in PARK2, GBA‐GBAP1, and DJ1. Additional functional variants, including 2 novel nonsense mutations (p.Arg1552Ter in LRRK2 and p.Trp90Ter in PINK1), were confirmed by Sanger sequencing. This combined approach disclosed the genetic cause of 12 PD cases. Conclusions Gene dose alterations related to PD can be correctly identified from targeting resequencing data. This approach substantially improves the detection rate of cases with causal genetic alterations. © 2016 The Authors. Movement Disorders published by Wiley Periodicals, Inc. on behalf of International Parkinson and Movement Disorder Society.
Collapse
|
18
|
A potential founder variant in CARMIL2/RLTPR in three Norwegian families with warts, molluscum contagiosum, and T-cell dysfunction. Mol Genet Genomic Med 2016; 4:604-616. [PMID: 27896283 PMCID: PMC5118205 DOI: 10.1002/mgg3.237] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2016] [Accepted: 07/22/2016] [Indexed: 12/30/2022] Open
Abstract
Background Four patients from three Norwegian families presented with a common skin phenotype of warts, molluscum contagiosum, and dermatitis since early childhood, and various other immunological features. Warts are a common manifestation of human papilloma virus (HPV), but when they are overwhelming, disseminated and/or persistent, and presenting together with other immunological features, a primary immunodeficiency disease (PIDD) may be suspected. Methods and results The four patients were exome sequenced as part of a larger study for detecting genetic causes of primary immunodeficiencies. No disease‐causing variants were identified in known primary immunodeficiency genes or in other disease‐related OMIM genes. However, the same homozygous missense variant in CARMIL2 (also known as RLTPR) was identified in all four patients. In each family, the variant was located within a narrow region of homozygosity, representing a potential region of autozygosity. CARMIL2 is a protein of undetermined function. A role in T‐cell activation has been suggested and the mouse protein homolog (Rltpr) is essential for costimulation of T‐cell activation via CD28, and for the development of regulatory T cells. Immunophenotyping demonstrated reduced regulatory, CD4+ memory, and CD4+ follicular T cells in all four patients. In addition, they all seem to have a deficiency in IFNγ ‐synthesis in CD4+ T cells and NK cells. Conclusions We report a novel primary immunodeficiency, and a differential molecular diagnosis to CXCR4‐,DOCK8‐,GATA2‐,MAGT1‐,MCM4‐,STK4‐,RHOH‐,TMC6‐, and TMC8‐related diseases. The specific variant may represent a Norwegian founder variant segregating on a population‐specific haplotype.
Collapse
|
19
|
Integrating Epigenomic Elements and GWASs Identifies BDNF Gene Affecting Bone Mineral Density and Osteoporotic Fracture Risk. Sci Rep 2016; 6:30558. [PMID: 27465306 PMCID: PMC4964617 DOI: 10.1038/srep30558] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2016] [Accepted: 07/04/2016] [Indexed: 01/20/2023] Open
Abstract
To identify susceptibility genes for osteoporosis, we conducted an integrative analysis that combined epigenomic elements and previous genome-wide association studies (GWASs) data, followed by validation at population and functional levels, which could identify common regulatory elements and predict new susceptibility genes that are biologically meaningful to osteoporosis. By this approach, we found a set of distinct epigenomic elements significantly enriched or depleted in the promoters of osteoporosis-associated genes, including 4 transcription factor binding sites, 27 histone marks, and 21 chromatin states segmentation types. Using these epigenomic marks, we performed reverse prediction analysis to prioritize the discovery of new candidate genes. Functional enrichment analysis of all the prioritized genes revealed several key osteoporosis related pathways, including Wnt signaling. Genes with high priority were further subjected to validation using available GWASs datasets. Three genes were significantly associated with spine bone mineral density, including BDNF, PDE4D, and SATB2, which all closely related to bone metabolism. The most significant gene BDNF was also associated with osteoporotic fractures. RNA interference revealed that BDNF knockdown can suppress osteoblast differentiation. Our results demonstrated that epigenomic data could be used to indicate common epigenomic marks to discover additional loci with biological functions for osteoporosis.
Collapse
|