1
|
Liu X, Gu L, Hao C, Xu W, Leng F, Zhang P, Li W. Systematic assessment of structural variant annotation tools for genomic interpretation. Life Sci Alliance 2025; 8:e202402949. [PMID: 39658089 PMCID: PMC11632063 DOI: 10.26508/lsa.202402949] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Revised: 11/30/2024] [Accepted: 12/02/2024] [Indexed: 12/12/2024] Open
Abstract
Structural variants (SVs) over 50 base pairs play a significant role in phenotypic diversity and are associated with various diseases, but their analysis is complex and resource-intensive. Numerous computational tools have been developed for SV prioritization, yet their effectiveness in biomedicine remains unclear. Here we benchmarked eight widely used SV prioritization tools, categorized into knowledge-driven (AnnotSV, ClassifyCNV) and data-driven (CADD-SV, dbCNV, StrVCTVRE, SVScore, TADA, XCNV) groups in accordance with the ACMG guidelines. We assessed their accuracy, robustness, and usability across diverse genomic contexts, biological mechanisms and computational efficiency using seven carefully curated independent datasets. Our results revealed that both groups of methods exhibit comparable effectiveness in predicting SV pathogenicity, although performance varies among tools, emphasizing the importance of selecting the appropriate tool based on specific research purposes. Furthermore, we pinpointed the potential improvement of expanding these tools for future applications. Our benchmarking framework provides a crucial evaluation method for SV analysis tools, offering practical guidance for biomedical research and facilitating the advancement of better genomic research tools.
Collapse
Affiliation(s)
- Xuanshi Liu
- Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute; MOE Key Laboratory of Major Diseases in Children; Genetics and Birth Defects Control Center, National Center for Children's Health; Beijing Children's Hospital, Capital Medical University, Beijing, China
| | - Lei Gu
- Epigenetics Laboratory, Max-Planck Institute for Heart and Lung Research, Cardiopulmonary Institute, Bad Nauheim, Germany
| | - Chanjuan Hao
- Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute; MOE Key Laboratory of Major Diseases in Children; Genetics and Birth Defects Control Center, National Center for Children's Health; Beijing Children's Hospital, Capital Medical University, Beijing, China
| | - Wenjian Xu
- Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute; MOE Key Laboratory of Major Diseases in Children; Genetics and Birth Defects Control Center, National Center for Children's Health; Beijing Children's Hospital, Capital Medical University, Beijing, China
| | - Fei Leng
- Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute; MOE Key Laboratory of Major Diseases in Children; Genetics and Birth Defects Control Center, National Center for Children's Health; Beijing Children's Hospital, Capital Medical University, Beijing, China
| | - Peng Zhang
- Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute; MOE Key Laboratory of Major Diseases in Children; Genetics and Birth Defects Control Center, National Center for Children's Health; Beijing Children's Hospital, Capital Medical University, Beijing, China
| | - Wei Li
- Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute; MOE Key Laboratory of Major Diseases in Children; Genetics and Birth Defects Control Center, National Center for Children's Health; Beijing Children's Hospital, Capital Medical University, Beijing, China
| |
Collapse
|
2
|
Chen XR, Cui YZ, Li BZ, Yuan YJ. Genome engineering on size reduction and complexity simplification: A review. J Adv Res 2024; 60:159-171. [PMID: 37442424 PMCID: PMC11156615 DOI: 10.1016/j.jare.2023.07.006] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 06/25/2023] [Accepted: 07/10/2023] [Indexed: 07/15/2023] Open
Abstract
BACKGROUND Genome simplification is an important topic in the field of life sciences that has attracted attention from its conception to the present day. It can help uncover the essential components of the genome and, in turn, shed light on the underlying operating principles of complex biological systems. This has made it a central focus of both basic and applied research in the life sciences. With the recent advancements in related technologies and our increasing knowledge of the genome, now is an opportune time to delve into this topic. AIM OF REVIEW Our review investigates the progress of genome simplification from two perspectives: genome size reduction and complexity simplification. In addition, we provide insights into the future development trends of genome simplification. KEY SCIENTIFIC CONCEPTS OF REVIEW Reducing genome size requires eliminating non-essential elements as much as possible. This process has been facilitated by advances in genome manipulation and synthesis techniques. However, we still need a better and clearer understanding of living systems to reduce genome complexity. As there is a lack of quantitative and clearly defined standards for this task, we have opted to approach the topic from various perspectives and present our findings accordingly.
Collapse
Affiliation(s)
- Xiang-Rong Chen
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), School of Chemical Engineering and Technology, Tianjin University, Tianjin, China; Frontiers Research Institute for Synthetic Biology, Tianjin University, Tianjin, China
| | - You-Zhi Cui
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), School of Chemical Engineering and Technology, Tianjin University, Tianjin, China; Frontiers Research Institute for Synthetic Biology, Tianjin University, Tianjin, China
| | - Bing-Zhi Li
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), School of Chemical Engineering and Technology, Tianjin University, Tianjin, China; Frontiers Research Institute for Synthetic Biology, Tianjin University, Tianjin, China.
| | - Ying-Jin Yuan
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), School of Chemical Engineering and Technology, Tianjin University, Tianjin, China; Frontiers Research Institute for Synthetic Biology, Tianjin University, Tianjin, China
| |
Collapse
|
3
|
Nijboer TCW, Hessel EVS, van Haaften GW, van Zandvoort MJ, van der Spek PJ, Troelstra C, de Kovel CGF, Koeleman BPC, van der Zwaag B, Brilstra EH, Burbach JPH. Identification of candidate genes for developmental colour agnosia in a single unique family. PLoS One 2023; 18:e0290013. [PMID: 37672513 PMCID: PMC10482254 DOI: 10.1371/journal.pone.0290013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 07/31/2023] [Indexed: 09/08/2023] Open
Abstract
Colour agnosia is a disorder that impairs colour knowledge (naming, recognition) despite intact colour perception. Previously, we have identified the first and only-known family with hereditary developmental colour agnosia. The aim of the current study was to explore genomic regions and candidate genes that potentially cause this trait in this family. For three family members with developmental colour agnosia and three unaffected family members CGH-array analysis and exome sequencing was performed, and linkage analysis was carried out using DominantMapper, resulting in the identification of 19 cosegregating chromosomal regions. Whole exome sequencing resulted in 11 rare coding variants present in all affected family members with developmental colour agnosia and absent in unaffected members. These variants affected genes that have been implicated in neural processes and functions (CACNA2D4, DDX25, GRINA, MYO15A) or that have an indirect link to brain function, development or disease (MAML2, STAU1, TMED3, RABEPK), and a remaining group lacking brain expression or involved in non-neural traits (DEPDC7, OR1J1, OR8D4). Although this is an explorative study, the small set of candidate genes that could serve as a starting point for unravelling mechanisms of higher level cognitive functions and cortical specialization, and disorders therein such as developmental colour agnosia.
Collapse
Affiliation(s)
- Tanja C. W. Nijboer
- UMCU Brain Center and Center of Excellence for Rehabilitation Medicine, University Medical Center Utrecht and De Hoogstraat Rehabilitation, Utrecht, The Netherlands
- Department of Experimental Psychology and Helmholtz Institute, Utrecht University, Utrecht, The Netherlands
| | - Ellen V. S. Hessel
- UMCU Brain Center and Center of Excellence for Rehabilitation Medicine, University Medical Center Utrecht and De Hoogstraat Rehabilitation, Utrecht, The Netherlands
- Department of Biomedical Genetics, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Gijs W. van Haaften
- Department of Biomedical Genetics, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Martine J. van Zandvoort
- Department of Experimental Psychology and Helmholtz Institute, Utrecht University, Utrecht, The Netherlands
| | - Peter J. van der Spek
- Department of Pathology, Erasmus Medical Center Rotterdam, Rotterdam, the Netherlands
| | - Christine Troelstra
- Department of Pathology, Erasmus Medical Center Rotterdam, Rotterdam, the Netherlands
| | - Carolien G. F. de Kovel
- Department of Biomedical Genetics, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Bobby P. C. Koeleman
- Department of Biomedical Genetics, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Bert van der Zwaag
- Department of Biomedical Genetics, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Eva H. Brilstra
- Department of Biomedical Genetics, University Medical Center Utrecht, Utrecht, The Netherlands
| | - J. Peter H. Burbach
- UMCU Brain Center, Department of Translational Neuroscience, University Medical Center Utrecht, Utrecht, the Netherlands
| |
Collapse
|
4
|
Glessner JT, Li J, Liu Y, Khan M, Chang X, Sleiman PMA, Hakonarson H. ParseCNV2: efficient sequencing tool for copy number variation genome-wide association studies. Eur J Hum Genet 2023; 31:304-312. [PMID: 36316489 PMCID: PMC9995309 DOI: 10.1038/s41431-022-01222-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 10/01/2022] [Accepted: 10/14/2022] [Indexed: 11/06/2022] Open
Abstract
Improved copy number variation (CNV) detection remains an area of heavy emphasis for algorithm development; however, both CNV curation and disease association approaches remain in its infancy. The current practice of focusing on candidate CNVs, where researchers study specific CNVs they believe to be pathological while discarding others, refrains from considering the full spectrum of CNVs in a hypothesis-free GWAS. To address this, we present a next-generation approach to CNV association by natively supporting the popular VCF specification for sequencing-derived variants as well as SNP array calls using a PennCNV format. The code is fast and efficient, allowing for the analysis of large (>100,000 sample) cohorts without dividing up the data on a compute cluster. The scripts are condensed into a single tool to promote simplicity and best practices. CNV curation pre and post-association is rigorously supported and emphasized to yield reliable results of highest quality. We benchmarked two large datasets, including the UK Biobank (n > 450,000) and CAG Biobank (n > 350,000) both of which are genotyped at >0.5 M probes, for our input files. ParseCNV has been actively supported and developed since 2008. ParseCNV2 presents a critical addition to formalizing CNV association for inclusion with SNP associations in GWAS Catalog. Clinical CNV prioritization, interactive quality control (QC), and adjustment for covariates are revolutionary new features of ParseCNV2 vs. ParseCNV. The software is freely available at: https://github.com/CAG-CNV/ParseCNV2 .
Collapse
Affiliation(s)
- Joseph T Glessner
- Department of Pediatrics, Children's Hospital of Philadelphia, 3401 Civic Center Blvd, Philadelphia, PA, 19104, USA.
- Department of Pediatrics, Perelman School of Medicine, 3400 Civic Center Blvd, Philadelphia, PA, 19104, USA.
| | - Jin Li
- Department of Cell Biology, the Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China
| | - Yichuan Liu
- Department of Pediatrics, Children's Hospital of Philadelphia, 3401 Civic Center Blvd, Philadelphia, PA, 19104, USA
- Department of Pediatrics, Perelman School of Medicine, 3400 Civic Center Blvd, Philadelphia, PA, 19104, USA
| | - Munir Khan
- Department of Pediatrics, Children's Hospital of Philadelphia, 3401 Civic Center Blvd, Philadelphia, PA, 19104, USA
- Department of Pediatrics, Perelman School of Medicine, 3400 Civic Center Blvd, Philadelphia, PA, 19104, USA
| | - Xiao Chang
- Department of Pediatrics, Children's Hospital of Philadelphia, 3401 Civic Center Blvd, Philadelphia, PA, 19104, USA
- Department of Pediatrics, Perelman School of Medicine, 3400 Civic Center Blvd, Philadelphia, PA, 19104, USA
| | - Patrick M A Sleiman
- Department of Pediatrics, Children's Hospital of Philadelphia, 3401 Civic Center Blvd, Philadelphia, PA, 19104, USA
- Department of Pediatrics, Perelman School of Medicine, 3400 Civic Center Blvd, Philadelphia, PA, 19104, USA
| | - Hakon Hakonarson
- Department of Pediatrics, Children's Hospital of Philadelphia, 3401 Civic Center Blvd, Philadelphia, PA, 19104, USA
- Department of Pediatrics, Perelman School of Medicine, 3400 Civic Center Blvd, Philadelphia, PA, 19104, USA
| |
Collapse
|
5
|
Qiu S, Qiu Y, Li Y, Zhu X, Liu Y, Qiao Y, Cheng Y, Liu Y. Nexus between genome-wide copy number variations and autism spectrum disorder in Northeast Han Chinese population. BMC Psychiatry 2023; 23:96. [PMID: 36750796 PMCID: PMC9906952 DOI: 10.1186/s12888-023-04565-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Accepted: 01/23/2023] [Indexed: 02/09/2023] Open
Abstract
BACKGROUND Autism spectrum disorder (ASD) is a common neurodevelopmental disorder, with an increasing prevalence worldwide. Copy number variation (CNV), as one of genetic factors, is involved in ASD etiology. However, there exist substantial differences in terms of location and frequency of some CNVs in the general Asian population. Whole-genome studies of CNVs in Northeast Han Chinese samples are still lacking, necessitating our ongoing work to investigate the characteristics of CNVs in a Northeast Han Chinese population with clinically diagnosed ASD. METHODS We performed a genome-wide CNVs screening in Northeast Han Chinese individuals with ASD using array-based comparative genomic hybridization. RESULTS We found that 22 kinds of CNVs (6 deletions and 16 duplications) were potentially pathogenic. These CNVs were distributed in chromosome 1p36.33, 1p36.31, 1q42.13, 2p23.1-p22.3, 5p15.33, 5p15.33-p15.2, 7p22.3, 7p22.3-p22.2, 7q22.1-q22.2, 10q23.2-q23.31, 10q26.2-q26.3, 11p15.5, 11q25, 12p12.1-p11.23, 14q11.2, 15q13.3, 16p13.3, 16q21, 22q13.31-q13.33, and Xq12-q13.1. Additionally, we found 20 potential pathogenic genes of ASD in our population, including eight protein coding genes (six duplications [DRD4, HRAS, OPHN1, SHANK3, SLC6A3, and TSC2] and two deletions [CHRNA7 and PTEN]) and 12 microRNAs-coding genes (ten duplications [MIR202, MIR210, MIR3178, MIR339, MIR4516, MIR4717, MIR483, MIR675, MIR6821, and MIR940] and two deletions [MIR107 and MIR558]). CONCLUSION We identified CNVs and genes implicated in ASD risks, conferring perception to further reveal ASD etiology.
Collapse
Affiliation(s)
- Shuang Qiu
- grid.64924.3d0000 0004 1760 5735Department of Epidemiology and Biostatistics, School of Public Health, Jilin University, Changchun, 130021 Jilin China ,grid.64924.3d0000 0004 1760 5735Department of Laboratory Medicine, Jilin University Hospital, Changchun, 130000 Jilin China
| | - Yingjia Qiu
- grid.415954.80000 0004 1771 3349China-Japan Union Hospital, Jilin University, Changchun, 130033 Jilin China
| | - Yong Li
- grid.64924.3d0000 0004 1760 5735Department of Epidemiology and Biostatistics, School of Public Health, Jilin University, Changchun, 130021 Jilin China
| | - Xiaojuan Zhu
- grid.27446.330000 0004 1789 9163The Key Laboratory of Molecular Epigenetics of Ministry of Education, Institute of Cytology and Genetics, Northeast Normal University, Changchun, 130021 Jilin China
| | - Yunkai Liu
- grid.430605.40000 0004 1758 4110Department of Cardiovascular Diseases, the First Hospital of Jilin University, Changchun, 130021 Jilin China ,Key Laboratory for Cardiovascular Mechanism of Traditional Chinese Medicine, Changchun, 130021 Jilin China ,grid.430605.40000 0004 1758 4110Institute of Translational Medicine, the First Hospital of Jilin University, Changchun, 130021 Jilin China
| | - Yichun Qiao
- grid.64924.3d0000 0004 1760 5735Department of Epidemiology and Biostatistics, School of Public Health, Jilin University, Changchun, 130021 Jilin China
| | - Yi Cheng
- Department of Cardiovascular Diseases, the First Hospital of Jilin University, Changchun, 130021, Jilin, China. .,Key Laboratory for Cardiovascular Mechanism of Traditional Chinese Medicine, Changchun, 130021, Jilin, China. .,Institute of Translational Medicine, the First Hospital of Jilin University, Changchun, 130021, Jilin, China.
| | - Yawen Liu
- Department of Epidemiology and Biostatistics, School of Public Health, Jilin University, Changchun, 130021, Jilin, China.
| |
Collapse
|
6
|
Hassan S, Bahar R, Johan MF, Mohamed Hashim EK, Abdullah WZ, Esa E, Abdul Hamid FS, Zulkafli Z. Next-Generation Sequencing (NGS) and Third-Generation Sequencing (TGS) for the Diagnosis of Thalassemia. Diagnostics (Basel) 2023; 13:diagnostics13030373. [PMID: 36766477 PMCID: PMC9914462 DOI: 10.3390/diagnostics13030373] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 01/11/2023] [Accepted: 01/16/2023] [Indexed: 01/20/2023] Open
Abstract
Thalassemia is one of the most heterogeneous diseases, with more than a thousand mutation types recorded worldwide. Molecular diagnosis of thalassemia by conventional PCR-based DNA analysis is time- and resource-consuming owing to the phenotype variability, disease complexity, and molecular diagnostic test limitations. Moreover, genetic counseling must be backed-up by an extensive diagnosis of the thalassemia-causing phenotype and the possible genetic modifiers. Data coming from advanced molecular techniques such as targeted sequencing by next-generation sequencing (NGS) and third-generation sequencing (TGS) are more appropriate and valuable for DNA analysis of thalassemia. While NGS is superior at variant calling to TGS thanks to its lower error rates, the longer reads nature of the TGS permits haplotype-phasing that is superior for variant discovery on the homologous genes and CNV calling. The emergence of many cutting-edge machine learning-based bioinformatics tools has improved the accuracy of variant and CNV calling. Constant improvement of these sequencing and bioinformatics will enable precise thalassemia detections, especially for the CNV and the homologous HBA and HBG genes. In conclusion, laboratory transiting from conventional DNA analysis to NGS or TGS and following the guidelines towards a single assay will contribute to a better diagnostics approach of thalassemia.
Collapse
Affiliation(s)
- Syahzuwan Hassan
- Department of Hematology, School of Medical Sciences, Health Campus, Universiti Sains Malaysia, Kubang Kerian 16150, Malaysia
- Institute for Medical Research, Shah Alam 40170, Malaysia
| | - Rosnah Bahar
- Department of Hematology, School of Medical Sciences, Health Campus, Universiti Sains Malaysia, Kubang Kerian 16150, Malaysia
| | - Muhammad Farid Johan
- Department of Hematology, School of Medical Sciences, Health Campus, Universiti Sains Malaysia, Kubang Kerian 16150, Malaysia
| | | | - Wan Zaidah Abdullah
- Department of Hematology, School of Medical Sciences, Health Campus, Universiti Sains Malaysia, Kubang Kerian 16150, Malaysia
| | - Ezalia Esa
- Institute for Medical Research, Shah Alam 40170, Malaysia
| | | | - Zefarina Zulkafli
- Department of Hematology, School of Medical Sciences, Health Campus, Universiti Sains Malaysia, Kubang Kerian 16150, Malaysia
- Correspondence:
| |
Collapse
|
7
|
Özer L, Aktuna S, Unsal E, Ünal MA, Sahin G, Baltaci V. A novel SLC35D1 variant causing milder phenotype of Schneckenbecken dysplasia in a large pedigree. Am J Med Genet A 2022; 188:3078-3083. [PMID: 35934917 DOI: 10.1002/ajmg.a.62939] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Revised: 06/07/2022] [Accepted: 06/10/2022] [Indexed: 01/31/2023]
Abstract
SLC35D1 gene encodes UDP-glucuronic acid/UDP-n-acetylgalactosamine dual transporter protein and transports organic or inorganic molecules across cellular membranes. SLC35D1 gene pathogenic variants causes Schneckenbecken dysplasia (SHNKND) which is a rare lethal autosomal recessive disorder characterized by the snail-like pelvis, flattening of vertebral bodies, short and broad long bones with a dumbbell-like appearance, thoracic hypoplasia. Only six cases with homozygous SLC35D1 variants have been reported to date, and all of these cases were lost in the perinatal period. Here we report different family members with a novel SLC35D1 variant who presented a milder phenotype of SHNKND. The affected patients have common clinical features such as short stature, mild mesomelia, shortening of the lower extremity, genu valgum, and narrow thorax. Exome sequencing of the proband revealed a homozygous missense variant of SLC35D1 gene, c.401 T > C (p. Met134Thr). The affected siblings, their two cousins, and their paternal uncle with a similar phenotype were also homozygous for the variant. This is the first case report of a family with a novel likely pathogenic variant (p. Met134Thr) and mild phenotypic features. It has the largest family with different ages of patients (ages ranged 4-31 years old) reported to date. The present report supports the evidence that the p. Met134Thr variant is responsible for a milder phenotype than previously reported cases with SLC35D1 pathogenic variants.
Collapse
Affiliation(s)
- Leyla Özer
- Department of Medical Genetics, Yuksek İhtisas University Medical School, Ankara, Turkey.,Mikrogen Genetic Diagnosis Center, Ankara, Turkey
| | - Suleyman Aktuna
- Department of Medical Genetics, Yuksek İhtisas University Medical School, Ankara, Turkey.,Mikrogen Genetic Diagnosis Center, Ankara, Turkey
| | - Evrim Unsal
- Department of Medical Genetics, Yuksek İhtisas University Medical School, Ankara, Turkey.,Mikrogen Genetic Diagnosis Center, Ankara, Turkey
| | - Mehmet Altay Ünal
- Ankara University Stem Cell Institute, Ankara University, Ankara, Turkey
| | | | - Volkan Baltaci
- Department of Medical Genetics, Yuksek İhtisas University Medical School, Ankara, Turkey.,Mikrogen Genetic Diagnosis Center, Ankara, Turkey
| |
Collapse
|
8
|
Attique H, Shah S, Jabeen S, Khan FG, Khan A, ELAffendi M. Multiclass Cancer Prediction Based on Copy Number Variation Using Deep Learning. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:4742986. [PMID: 35720914 PMCID: PMC9203194 DOI: 10.1155/2022/4742986] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Accepted: 05/21/2022] [Indexed: 12/02/2022]
Abstract
DNA copy number variation (CNV) is the type of DNA variation which is associated with various human diseases. CNV ranges in size from 1 kilobase to several megabases on a chromosome. Most of the computational research for cancer classification is traditional machine learning based, which relies on handcrafted extraction and selection of features. To the best of our knowledge, the deep learning-based research also uses the step of feature extraction and selection. To understand the difference between multiple human cancers, we developed three end-to-end deep learning models, i.e., DNN (fully connected), CNN (convolution neural network), and RNN (recurrent neural network), to classify six cancer types using the CNV data of 24,174 genes. The strength of an end-to-end deep learning model lies in representation learning (automatic feature extraction). The purpose of proposing more than one model is to find which architecture among them performs better for CNV data. Our best model achieved 92% accuracy with an ROC of 0.99, and we compared the performances of our proposed models with state-of-the-art techniques. Our models have outperformed the state-of-the-art techniques in terms of accuracy, precision, and ROC. In the future, we aim to work on other types of cancers as well.
Collapse
Affiliation(s)
- Haleema Attique
- Department of Computer Science, COMSATS University Islamabad, Abbottabad Campus, Islamabad, Pakistan
| | - Sajid Shah
- Department of Computer Science, COMSATS University Islamabad, Abbottabad Campus, Islamabad, Pakistan
- EIAS Data Science Lab, College of Computer and Information Sciences, Prince Sultan University, Riyadh, Saudi Arabia
| | - Saima Jabeen
- Department of IT and Computer Science, Pak-Austria Facchochschule: Institute of Applied Sciences and Technology, Mang, Haripur, KPK, Pakistan
| | - Fiaz Gul Khan
- Department of Computer Science, COMSATS University Islamabad, Abbottabad Campus, Islamabad, Pakistan
| | - Ahmad Khan
- Department of Computer Science, COMSATS University Islamabad, Abbottabad Campus, Islamabad, Pakistan
| | - Mohammed ELAffendi
- EIAS Data Science Lab, College of Computer and Information Sciences, Prince Sultan University, Riyadh, Saudi Arabia
| |
Collapse
|
9
|
Requena F, Abdallah HH, García A, Nitschké P, Romana S, Malan V, Rausell A. CNVxplorer: a web tool to assist clinical interpretation of CNVs in rare disease patients. Nucleic Acids Res 2021; 49:W93-W103. [PMID: 34019647 PMCID: PMC8262689 DOI: 10.1093/nar/gkab347] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2021] [Revised: 04/12/2021] [Accepted: 05/20/2021] [Indexed: 12/20/2022] Open
Abstract
Copy Number Variants (CNVs) are an important cause of rare diseases. Array-based Comparative Genomic Hybridization tests yield a ∼12% diagnostic rate, with ∼8% of patients presenting CNVs of unknown significance. CNVs interpretation is particularly challenging on genomic regions outside of those overlapping with previously reported structural variants or disease-associated genes. Recent studies showed that a more comprehensive evaluation of CNV features, leveraging both coding and non-coding impacts, can significantly improve diagnostic rates. However, currently available CNV interpretation tools are mostly gene-centric or provide only non-interactive annotations difficult to assess in the clinical practice. Here, we present CNVxplorer, a web server suited for the functional assessment of CNVs in a clinical diagnostic setting. CNVxplorer mines a comprehensive set of clinical, genomic, and epigenomic features associated with CNVs. It provides sequence constraint metrics, impact on regulatory elements and topologically associating domains, as well as expression patterns. Analyses offered cover (a) agreement with patient phenotypes; (b) visualizations of associations among genes, regulatory elements and transcription factors; (c) enrichment on functional and pathway annotations and (d) co-occurrence of terms across PubMed publications related to the query CNVs. A flexible evaluation workflow allows dynamic re-interrogation in clinical sessions. CNVxplorer is publicly available at http://cnvxplorer.com.
Collapse
Affiliation(s)
- Francisco Requena
- Université de Paris, Institut Imagine, F-75006 Paris, France
- Clinical Bioinformatics Laboratory, Imagine Institute, INSERM UMR1163, F-75015 Paris, France
| | - Hamza Hadj Abdallah
- Université de Paris, Institut Imagine, F-75006 Paris, France
- Service de Cytogénétique, Hôpital Necker-Enfants Malades, APHP, F-75015 Paris, France
| | - Alejandro García
- Université de Paris, Institut Imagine, F-75006 Paris, France
- Clinical Bioinformatics Laboratory, Imagine Institute, INSERM UMR1163, F-75015 Paris, France
| | - Patrick Nitschké
- Université de Paris, Institut Imagine, F-75006 Paris, France
- Plateforme de Bioinformatique, Université Paris Descartes, F-75015 Paris, France
| | - Sergi Romana
- Université de Paris, Institut Imagine, F-75006 Paris, France
- Service de Cytogénétique, Hôpital Necker-Enfants Malades, APHP, F-75015 Paris, France
| | - Valérie Malan
- Université de Paris, Institut Imagine, F-75006 Paris, France
- Service de Cytogénétique, Hôpital Necker-Enfants Malades, APHP, F-75015 Paris, France
| | - Antonio Rausell
- Université de Paris, Institut Imagine, F-75006 Paris, France
- Clinical Bioinformatics Laboratory, Imagine Institute, INSERM UMR1163, F-75015 Paris, France
- Service de Génétique Moleculaire, Hôpital Necker-Enfants Malades, APHP, F-75015, Paris, France
| |
Collapse
|
10
|
Abstract
Gains and losses of large segments of genomic DNA, known as copy number variants (CNVs) gained considerable interest in clinical diagnostics lately, as particular forms may lead to inherited genetic diseases. In recent decades, researchers developed a wide variety of cytogenetic and molecular methods with different detection capabilities to detect clinically relevant CNVs. In this review, we summarize methodological progress from conventional approaches to current state of the art techniques capable of detecting CNVs from a few bases up to several megabases. Although the recent rapid progress of sequencing methods has enabled precise detection of CNVs, determining their functional effect on cellular and whole-body physiology remains a challenge. Here, we provide a comprehensive list of databases and bioinformatics tools that may serve as useful assets for researchers, laboratory diagnosticians, and clinical geneticists facing the challenge of CNV detection and interpretation.
Collapse
|
11
|
Gurbich TA, Ilinsky VV. ClassifyCNV: a tool for clinical annotation of copy-number variants. Sci Rep 2020; 10:20375. [PMID: 33230148 PMCID: PMC7683568 DOI: 10.1038/s41598-020-76425-3] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Accepted: 10/27/2020] [Indexed: 12/14/2022] Open
Abstract
Copy-number variants (CNVs) are an important part of human genetic variation. They can be benign or can play a role in human disease by creating dosage imbalances and disrupting genes and regulatory elements. Accurate identification and clinical annotation of CNVs is essential, however, manual evaluation of individual CNVs by clinicians is challenging on a large scale. Here, we present ClassifyCNV, an easy-to-use tool that implements the 2019 ACMG classification guidelines to assess CNV pathogenicity. ClassifyCNV uses genomic coordinates and CNV type as input and reports a clinical classification for each variant, a classification score breakdown, and a list of genes of potential importance for variant interpretation. We validate ClassifyCNV’s performance using a set of known clinical CNVs and a set of manually evaluated variants. ClassifyCNV matches the pathogenicity category for 81% of manually evaluated variants with the significance of the remaining pathogenic and benign variants automatically determined as uncertain, requiring a further evaluation by a clinician. ClassifyCNV facilitates the implementation of the latest ACMG guidelines in high-throughput CNV analysis, is suitable for integration into NGS analysis pipelines, and can decrease time to diagnosis. The tool is available at https://github.com/Genotek/ClassifyCNV.
Collapse
Affiliation(s)
- Tatiana A Gurbich
- Genotek Ltd., Nastavnicheskii pereulok 17/1, 105120, Moscow, Russia.
| | | |
Collapse
|
12
|
Fan L, Wu J, Wu Y, Shi X, Xin X, Li S, Zeng W, Deng D, Feng L, Chen S, Xiao J. Analysis of Chromosomal Copy Number in First-Trimester Pregnancy Loss Using Next-Generation Sequencing. Front Genet 2020; 11:545856. [PMID: 33193619 PMCID: PMC7606984 DOI: 10.3389/fgene.2020.545856] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2020] [Accepted: 09/22/2020] [Indexed: 01/01/2023] Open
Abstract
Embryonic chromosomal abnormality is one of the significant causative factors of early pregnancy loss. Our goal was to evaluate the clinical utility of next-generation sequencing (NGS) technology in identifying chromosomal anomalies associated with first-trimester pregnancy loss. In addition, we attempted to provide fertility guidance to couples anticipating a successful pregnancy. A total of 1,010 miscarriage specimens were collected between March 2016 and January 2019 from women who suffered first-trimester pregnancy loss. Total DNA was isolated from products of conception, and NGS analysis was carried out. We detected a total of 634 cases of chromosomal variants. Among the 634 cases, 462 (72.9%) displayed numerical variants including 383 (60.4%) aneuploidies, 44 (6.9%) polyploidies, and 34 (5.5%) mosaicisms. The other 172 (27.1%) cases showed structural variants including 19 (3.0%) benign copy number variations (CNVs), 52 (8.2%) pathogenic CNVs, and 101 (16%) variants of unknown significance (VOUS) CNVs. When maternal age was ≥ 35 years, the sporadic abortion (SA) group showed an increased frequency of chromosomal variants in comparison with the recurrent miscarriage (RM) group (90/121 vs. 64/104). It was evident that the groups with advanced maternal age had a sharply increased frequency of aneuploidy, whatever the frequency of pregnancy loss (71/121 vs. 155/432, 49/104 vs. 108/349). Our data suggest that NGS could be used for the successful detection of genetic anomalies in pregnancy loss. We recommend that fetal chromosome analysis be offered routinely for all pregnancy losses, regardless of their frequency.
Collapse
Affiliation(s)
- Lei Fan
- Department of Obstetrics and Gynecology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Jianli Wu
- Department of Obstetrics and Gynecology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Yuanyuan Wu
- Department of Obstetrics and Gynecology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Xinwei Shi
- Department of Obstetrics and Gynecology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Xing Xin
- Department of Obstetrics and Gynecology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Shufang Li
- Department of Obstetrics and Gynecology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Wanjiang Zeng
- Department of Obstetrics and Gynecology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Dongrui Deng
- Department of Obstetrics and Gynecology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Ling Feng
- Department of Obstetrics and Gynecology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Suhua Chen
- Department of Obstetrics and Gynecology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Juan Xiao
- Department of Obstetrics and Gynecology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| |
Collapse
|
13
|
Liu M, Zhong Y, Liu H, Liang D, Liu E, Zhang Y, Tian F, Liang Q, Cram DS, Wang H, Wu L, Yu F. REDBot: Natural language process methods for clinical copy number variation reporting in prenatal and products of conception diagnosis. Mol Genet Genomic Med 2020; 8:e1488. [PMID: 32961042 PMCID: PMC7667294 DOI: 10.1002/mgg3.1488] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2020] [Revised: 08/07/2020] [Accepted: 08/10/2020] [Indexed: 12/13/2022] Open
Abstract
Background Current copy number variation (CNV) identification methods have rapidly become mature. However, the postdetection processes such as variant interpretation or reporting are inefficient. To overcome this situation, we developed REDBot as an automated software package for accurate and direct generation of clinical diagnostic reports for prenatal and products of conception (POC) samples. Methods We applied natural language process (NLP) methods for analyzing 30,235 in‐house historical clinical reports through active learning, and then, developed clinical knowledge bases, evidence‐based interpretation methods and reporting criteria to support the whole postdetection pipeline. Results Of the 30,235 reports, we obtained 37,175 CNV‐paragraph pairs. For these pairs, the active learning approaches achieved a 0.9466 average F1‐score in sentence classification. The overall accuracy for variant classification was 95.7%, 95.2%, and 100.0% in retrospective, prospective, and clinical utility experiments, respectively. Conclusion By integrating NLP methods in CNVs postdetection pipeline, REDBot is a robust and rapid tool with clinical utility for prenatal and POC diagnosis.
Collapse
Affiliation(s)
| | | | - Hongqian Liu
- Department of Obstetrics and Gynecology, West China Second University Hospital, Sichuan University, Chengdu
| | - Desheng Liang
- Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, China.,Hunan Jiahui Genetics Hospital, Changsha, China
| | - Erhong Liu
- Berry Genomics Corporation, Beijing, China
| | - Yu Zhang
- Berry Genomics Corporation, Beijing, China
| | - Feng Tian
- Berry Genomics Corporation, Beijing, China
| | | | | | - Hua Wang
- Hunan Provincial Maternal and Child Health Care Hospital, Changsha, China
| | - Lingqian Wu
- Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, China
| | - Fuli Yu
- Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| |
Collapse
|
14
|
FunVar: A systematic pipeline to unravel the convergence patterns of genetic variants in ASD, a paradigmatic complex disease. J Biomed Inform 2019; 98:103273. [PMID: 31454647 DOI: 10.1016/j.jbi.2019.103273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2019] [Revised: 07/28/2019] [Accepted: 08/24/2019] [Indexed: 11/22/2022]
Abstract
In recent years, the technological advances for capturing genetic variation in large populations led to the identification of large numbers of putative or disease-causing variants. However, their mechanistic understanding is lagging far behind and has posed new challenges regarding their relevance for disease phenotypes, particularly for common complex disorders. In this study, we propose a systematic pipeline to infer biological meaning from genetic variants, namely rare Copy Number Variants (CNVs). The pipeline consists of three modules that seek to (1) improve genetic data quality by excluding low confidence CNVs, (2) identify disrupted biological processes, and (3) aggregate similar enriched biological processes terms using semantic similarity. The proposed pipeline was applied to CNVs from individuals diagnosed with Autism Spectrum Disorder (ASD). We found that rare CNVs disrupting brain expressed genes dysregulated a wide range of biological processes, such as nervous system development and protein polyubiquitination. The disrupted biological processes identified in ASD patients were in accordance with previous findings. This coherence with literature indicates the feasibility of the proposed pipeline in interpreting the biological role of genetic variants in complex disease development. The suggested pipeline is easily adjustable at each step and its independence from any specific dataset and software makes it an effective tool in analyzing existing genetic resources. The FunVar pipeline is available at https://github.com/lasigeBioTM/FunVar and includes pre and post processing steps to effectively interpret biological mechanisms of putative disease causing genetic variants.
Collapse
|
15
|
Coelho Molck M, Simioni M, Paiva Vieira T, Paoli Monteiro F, Gil-da-Silva-Lopes VL. A Pure 2-Mb 3q26.2 Duplication Proximal to the Critical Region of 3q Duplication Syndrome. Mol Syndromol 2018; 9:197-204. [PMID: 30140197 DOI: 10.1159/000489870] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/09/2018] [Indexed: 12/23/2022] Open
Abstract
Partial duplication of chromosome 3q - dup(3q) - is a recognizable syndrome with dysmorphic facial features, microcephaly, digital anomalies, and genitourinary and cardiac defects, as well as growth retardation and developmental delay. Most cases of dup(3q) result from unbalanced translocations or inversions and are accompanied by additional chromosomal imbalances. Pure dup(3q) is rare, and only 31 cases have been reported so far. We report a new case of a girl with a pure 2-Mb duplication at 3q26.2 not encompassing the known critical region 3q26.3q27. After an extensive review, to the best of our knowledge, the case herein presented harbors the shortest 3q duplication of this region. The clinical phenotype of this patient resembles previously reported cases of pure dup(3q) syndrome, including intellectual disability, synophrys, a wide nasal bridge, dysmorphic ears, clinodactyly, and cardiac defects. We suggest that the 3q26.2 duplication is a candidate copy number alteration explaining our patient's clinical phenotype.
Collapse
Affiliation(s)
- Miriam Coelho Molck
- Department of Medical Genetics, University of Campinas (UNICAMP), Campinas, Brazil
| | - Milena Simioni
- Department of Medical Genetics, University of Campinas (UNICAMP), Campinas, Brazil
| | - Társis Paiva Vieira
- Department of Medical Genetics, University of Campinas (UNICAMP), Campinas, Brazil
| | | | | |
Collapse
|
16
|
Dharanipragada P, Vogeti S, Parekh N. iCopyDAV: Integrated platform for copy number variations-Detection, annotation and visualization. PLoS One 2018; 13:e0195334. [PMID: 29621297 PMCID: PMC5886540 DOI: 10.1371/journal.pone.0195334] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2017] [Accepted: 03/20/2018] [Indexed: 12/14/2022] Open
Abstract
Discovery of copy number variations (CNVs), a major category of structural variations, have dramatically changed our understanding of differences between individuals and provide an alternate paradigm for the genetic basis of human diseases. CNVs include both copy gain and copy loss events and their detection genome-wide is now possible using high-throughput, low-cost next generation sequencing (NGS) methods. However, accurate detection of CNVs from NGS data is not straightforward due to non-uniform coverage of reads resulting from various systemic biases. We have developed an integrated platform, iCopyDAV, to handle some of these issues in CNV detection in whole genome NGS data. It has a modular framework comprising five major modules: data pre-treatment, segmentation, variant calling, annotation and visualization. An important feature of iCopyDAV is the functional annotation module that enables the user to identify and prioritize CNVs encompassing various functional elements, genomic features and disease-associations. Parallelization of the segmentation algorithms makes the iCopyDAV platform even accessible on a desktop. Here we show the effect of sequencing coverage, read length, bin size, data pre-treatment and segmentation approaches on accurate detection of the complete spectrum of CNVs. Performance of iCopyDAV is evaluated on both simulated data and real data for different sequencing depths. It is an open-source integrated pipeline available at https://github.com/vogetihrsh/icopydav and as Docker’s image at http://bioinf.iiit.ac.in/icopydav/.
Collapse
Affiliation(s)
- Prashanthi Dharanipragada
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India
| | - Sriharsha Vogeti
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India
| | - Nita Parekh
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India
- * E-mail:
| |
Collapse
|
17
|
Characterization of brain tumor initiating cells isolated from an animal model of CNS primitive neuroectodermal tumors. Oncotarget 2018; 9:13733-13747. [PMID: 29568390 PMCID: PMC5862611 DOI: 10.18632/oncotarget.24460] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2017] [Accepted: 01/30/2018] [Indexed: 01/17/2023] Open
Abstract
CNS Primitive Neuroectodermal tumors (CNS-PNETs) are members of the embryonal family of malignant childhood brain tumors, which remain refractory to current therapeutic treatments. Current paradigm of brain tumorigenesis implicates brain tumor-initiating cells (BTIC) in the onset of tumorigenesis and tumor maintenance. However, despite their significance, there is currently no comprehensive characterization of CNS-PNETs BTICs. Recently, we described an animal model of CNS-PNET generated by orthotopic transplantation of human Radial Glial (RG) cells - the progenitor cells for adult neural stem cells (NSC) - into NOD-SCID mice brain and proposed that BTICs may play a role in the maintenance of these tumors. Here we report the characterization of BTIC lines derived from this CNS-PNET animal model. BTIC’s orthotopic transplantation generated highly aggressive tumors also characterized as CNS-PNETs. The BTICs have the hallmarks of NSCs as they demonstrate self-renewing capacity and have the ability to differentiate into astrocytes and early migrating neurons. Moreover, the cells demonstrate aberrant accumulation of wild type tumor-suppressor protein p53, indicating its functional inactivation, highly up-regulated levels of onco-protein cMYC and the BTIC marker OCT3/4, along with metabolic switch to glycolysis - suggesting that these changes occurred in the early stages of tumorigenesis. Furthermore, based on RNA- and DNA-seq data, the BTICs did not acquire any transcriptome-changing genomic alterations indicating that the onset of tumorigenesis may be epigenetically driven. The study of these BTIC self-renewing cells in our model may enable uncovering the molecular alterations that are responsible for the onset and maintenance of the malignant PNET phenotype.
Collapse
|
18
|
Genome-wide common and rare variant analysis provides novel insights into clozapine-associated neutropenia. Mol Psychiatry 2017; 22:1502-1508. [PMID: 27400856 PMCID: PMC5065090 DOI: 10.1038/mp.2016.97] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/21/2016] [Revised: 05/10/2016] [Accepted: 05/17/2016] [Indexed: 01/31/2023]
Abstract
The antipsychotic clozapine is uniquely effective in the management of schizophrenia; however, its use is limited by its potential to induce agranulocytosis. The causes of this, and of its precursor neutropenia, are largely unknown, although genetic factors have an important role. We sought risk alleles for clozapine-associated neutropenia in a sample of 66 cases and 5583 clozapine-treated controls, through a genome-wide association study (GWAS), imputed human leukocyte antigen (HLA) alleles, exome array and copy-number variation (CNV) analyses. We then combined associated variants in a meta-analysis with data from the Clozapine-Induced Agranulocytosis Consortium (up to 163 cases and 7970 controls). In the largest combined sample to date, we identified a novel association with rs149104283 (odds ratio (OR)=4.32, P=1.79 × 10-8), intronic to transcripts of SLCO1B3 and SLCO1B7, members of a family of hepatic transporter genes previously implicated in adverse drug reactions including simvastatin-induced myopathy and docetaxel-induced neutropenia. Exome array analysis identified gene-wide associations of uncommon non-synonymous variants within UBAP2 and STARD9. We additionally provide independent replication of a previously identified variant in HLA-DQB1 (OR=15.6, P=0.015, positive predictive value=35.1%). These results implicate biological pathways through which clozapine may act to cause this serious adverse effect.
Collapse
|
19
|
Molparia B, Nichani E, Torkamani A. Assessment of circulating copy number variant detection for cancer screening. PLoS One 2017; 12:e0180647. [PMID: 28686671 PMCID: PMC5501586 DOI: 10.1371/journal.pone.0180647] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2016] [Accepted: 06/19/2017] [Indexed: 12/21/2022] Open
Abstract
Current high-sensitivity cancer screening methods, largely utilizing correlative biomarkers, suffer from false positive rates that lead to unnecessary medical procedures and debatable public health benefit overall. Detection of circulating tumor DNA (ctDNA), a causal biomarker, has the potential to revolutionize cancer screening. Thus far, the majority of ctDNA studies have focused on detection of tumor-specific point mutations after cancer diagnosis for the purpose of post-treatment surveillance. However, ctDNA point mutation detection methods developed to date likely lack either the scope or analytical sensitivity necessary to be useful for cancer screening, due to the low (<1%) ctDNA fraction derived from early stage tumors. On the other hand, tumor-derived copy number variant (CNV) detection is hypothetically a superior means of ctDNA-based cancer screening for many tumor types, given that, relative to point mutations, each individual tumor CNV contributes a much larger number of ctDNA fragments to the overall pool of circulating free DNA (cfDNA). A small number of studies have demonstrated the potential of ctDNA CNV-based screening in select cancer types. Here we perform an in silico assessment of the potential for ctDNA CNV-based cancer screening across many common cancers, and suggest ctDNA CNV detection shows promise as a broad cancer screening methodology.
Collapse
Affiliation(s)
- Bhuvan Molparia
- The Scripps Translational Science Institute, La Jolla, CA, United States of America
- The Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, United States of America
| | - Eshaan Nichani
- Massachusetts Institute of Technology, Cambridge, MA, United States of America
| | - Ali Torkamani
- The Scripps Translational Science Institute, La Jolla, CA, United States of America
- The Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, United States of America
- The Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, CA, United States of America
- Scripps Health, La Jolla, CA, United States of America
- * E-mail:
| |
Collapse
|
20
|
Mason-Suares H, Landry L, S. Lebo M. Detecting Copy Number Variation via Next Generation Technology. CURRENT GENETIC MEDICINE REPORTS 2016. [DOI: 10.1007/s40142-016-0091-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
|
21
|
Urnikyte A, Domarkiene I, Stoma S, Ambrozaityte L, Uktveryte I, Meskiene R, Kasiulevičius V, Burokiene N, Kučinskas V. CNV analysis in the Lithuanian population. BMC Genet 2016; 17:64. [PMID: 27142071 PMCID: PMC4855864 DOI: 10.1186/s12863-016-0373-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2015] [Accepted: 04/22/2016] [Indexed: 12/13/2022] Open
Abstract
Background Although copy number variation (CNV) has received much attention, knowledge about the characteristics of CNVs such as occurrence rate and distribution in the genome between populations and within the same population is still insufficient. In this study, Illumina 770 K HumanOmniExpress-12 v1.0 (and v1.1) arrays were used to examine the diversity and distribution of CNVs in 286 unrelated individuals from the two main ethnolinguistic groups of the Lithuanian population (Aukštaičiai and Žemaičiai) (see Additional file 3). For primary data analysis, the Illumina GenomeStudio™ Genotyping Module v1.9 and two algorithms, cnvPartition 3.2.0 and QuantiSNP 2.0, were used to identify high-confidence CNVs. Results A total of 478 autosomal CNVs were detected by both algorithms, and those were clustered in 87 copy number variation regions (CNVRs), spanning ~12.5 Mb of the genome (see Table 1). At least 8.6 % of the CNVRs were unique and had not been reported in the Database of Genomic Variants. Most CNVRs (57.5 %) were rare, with a frequency of <1 %, whereas common CNVRs with at least 5 % frequency made up only 1.1 % of all CNVRs identified. About 49 % of non-singleton CNVRs were shared between Aukštaičiai and Žemaičiai, and the remaining CNVRs were specific to each group. Many of the CNVs detected (66 %) overlapped with known UCSC gene regions. Conclusions The ethnolinguistic groups of the Lithuanian population could not be differentiated based on CNV profiles, which may reflect their geographical proximity and suggest the homogeneity of the Lithuanian population. In addition, putative novel CNVs unique to the Lithuanian population were identified. The results of our study enhance the CNV map of the Lithuanian population. Electronic supplementary material The online version of this article (doi:10.1186/s12863-016-0373-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- A Urnikyte
- Department of Human and Medical Genetics, Faculty of Medicine, Vilnius University, Santariskiu St. 2, LT-08661, Vilnius, Lithuania.
| | - I Domarkiene
- Department of Human and Medical Genetics, Faculty of Medicine, Vilnius University, Santariskiu St. 2, LT-08661, Vilnius, Lithuania
| | - S Stoma
- Master of Science (MSc), Bioinformatics student, VU University Amsterdam, Amsterdam, Netherlands
| | - L Ambrozaityte
- Department of Human and Medical Genetics, Faculty of Medicine, Vilnius University, Santariskiu St. 2, LT-08661, Vilnius, Lithuania
| | - I Uktveryte
- Department of Human and Medical Genetics, Faculty of Medicine, Vilnius University, Santariskiu St. 2, LT-08661, Vilnius, Lithuania
| | - R Meskiene
- Department of Human and Medical Genetics, Faculty of Medicine, Vilnius University, Santariskiu St. 2, LT-08661, Vilnius, Lithuania
| | - V Kasiulevičius
- Clinics of Internal Diseases, Family Medicine and Oncology, Faculty of Medicine, Vilnius University, Santariskiu St. 2, LT-08661, Vilnius, Lithuania
| | - N Burokiene
- Clinics of Internal Diseases, Family Medicine and Oncology, Faculty of Medicine, Vilnius University, Santariskiu St. 2, LT-08661, Vilnius, Lithuania
| | - V Kučinskas
- Department of Human and Medical Genetics, Faculty of Medicine, Vilnius University, Santariskiu St. 2, LT-08661, Vilnius, Lithuania
| |
Collapse
|
22
|
Whole-genome mutational burden analysis of three pluripotency induction methods. Nat Commun 2016; 7:10536. [PMID: 26892726 PMCID: PMC4762882 DOI: 10.1038/ncomms10536] [Citation(s) in RCA: 91] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2015] [Accepted: 12/23/2015] [Indexed: 01/12/2023] Open
Abstract
There is concern that the stresses of inducing pluripotency may lead to deleterious DNA mutations in induced pluripotent stem cell (iPSC) lines, which would compromise their use for cell therapies. Here we report comparative genomic analysis of nine isogenic iPSC lines generated using three reprogramming methods: integrating retroviral vectors, non-integrating Sendai virus and synthetic mRNAs. We used whole-genome sequencing and de novo genome mapping to identify single-nucleotide variants, insertions and deletions, and structural variants. Our results show a moderate number of variants in the iPSCs that were not evident in the parental fibroblasts, which may result from reprogramming. There were only small differences in the total numbers and types of variants among different reprogramming methods. Most importantly, a thorough genomic analysis showed that the variants were generally benign. We conclude that the process of reprogramming is unlikely to introduce variants that would make the cells inappropriate for therapy. It is feared that reprogramming may introduce DNA mutations. Here Bhutani et al. take three different reprogramming methods and using comparative whole genome analyses do identify nucleotide variations that are different in reprogrammed cells from the original fibroblasts, but none convey oncogenic potential.
Collapse
|
23
|
Hakenberg J, Cheng WY, Thomas P, Wang YC, Uzilov AV, Chen R. Integrating 400 million variants from 80,000 human samples with extensive annotations: towards a knowledge base to analyze disease cohorts. BMC Bioinformatics 2016; 17:24. [PMID: 26746786 PMCID: PMC4706706 DOI: 10.1186/s12859-015-0865-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2015] [Accepted: 12/17/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Data from a plethora of high-throughput sequencing studies is readily available to researchers, providing genetic variants detected in a variety of healthy and disease populations. While each individual cohort helps gain insights into polymorphic and disease-associated variants, a joint perspective can be more powerful in identifying polymorphisms, rare variants, disease-associations, genetic burden, somatic variants, and disease mechanisms. DESCRIPTION We have set up a Reference Variant Store (RVS) containing variants observed in a number of large-scale sequencing efforts, such as 1000 Genomes, ExAC, Scripps Wellderly, UK10K; various genotyping studies; and disease association databases. RVS holds extensive annotations pertaining to affected genes, functional impacts, disease associations, and population frequencies. RVS currently stores 400 million distinct variants observed in more than 80,000 human samples. CONCLUSIONS RVS facilitates cross-study analysis to discover novel genetic risk factors, gene-disease associations, potential disease mechanisms, and actionable variants. Due to its large reference populations, RVS can also be employed for variant filtration and gene prioritization. AVAILABILITY A web interface to public datasets and annotations in RVS is available at https://rvs.u.hpc.mssm.edu/.
Collapse
Affiliation(s)
- Jörg Hakenberg
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1425 Madison Ave, Box 1498, New York, 10029, USA.
| | - Wei-Yi Cheng
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1425 Madison Ave, Box 1498, New York, 10029, USA.
- Current affiliation: Illumina, Inc., 451 El Camino Real, Suite 210, Santa Clara, 95050, USA.
| | - Philippe Thomas
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1425 Madison Ave, Box 1498, New York, 10029, USA.
- Current affiliation: Roche Parma Research and Early Development, Informatics, Roche Innovation Center New York, 430 East 29th St, New York, 10016, USA.
| | - Ying-Chih Wang
- Department of Computer Science, Humboldt-Universität zu Berlin, Unter den Linden 6, Berlin, 10099, Germany.
- Current affiliation: German Research Centre for Artificial Intelligence (DFKI), Alt Moabit 91c, Berlin, 10559, Germany.
| | - Andrew V Uzilov
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1425 Madison Ave, Box 1498, New York, 10029, USA.
| | - Rong Chen
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1425 Madison Ave, Box 1498, New York, 10029, USA.
| |
Collapse
|
24
|
Walker LC, Wiggins GAR, Pearson JF. The Role of Constitutional Copy Number Variants in Breast Cancer. ACTA ACUST UNITED AC 2015; 4:407-23. [PMID: 27600231 PMCID: PMC4996380 DOI: 10.3390/microarrays4030407] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2015] [Revised: 08/26/2015] [Accepted: 09/01/2015] [Indexed: 01/16/2023]
Abstract
Constitutional copy number variants (CNVs) include inherited and de novo deviations from a diploid state at a defined genomic region. These variants contribute significantly to genetic variation and disease in humans, including breast cancer susceptibility. Identification of genetic risk factors for breast cancer in recent years has been dominated by the use of genome-wide technologies, such as single nucleotide polymorphism (SNP)-arrays, with a significant focus on single nucleotide variants. To date, these large datasets have been underutilised for generating genome-wide CNV profiles despite offering a massive resource for assessing the contribution of these structural variants to breast cancer risk. Technical challenges remain in determining the location and distribution of CNVs across the human genome due to the accuracy of computational prediction algorithms and resolution of the array data. Moreover, better methods are required for interpreting the functional effect of newly discovered CNVs. In this review, we explore current and future application of SNP array technology to assess rare and common CNVs in association with breast cancer risk in humans.
Collapse
Affiliation(s)
- Logan C Walker
- Mackenzie Cancer Research Group, Department of Pathology, University of Otago, Christchurch 8140, New Zealand.
| | - George A R Wiggins
- Mackenzie Cancer Research Group, Department of Pathology, University of Otago, Christchurch 8140, New Zealand.
| | - John F Pearson
- Biostatistics and Computational Biology Unit, University of Otago, Christchurch 8140, New Zealand.
| |
Collapse
|
25
|
Wang J, Liao J, Zhang J, Cheng WY, Hakenberg J, Ma M, Webb BD, Ramasamudram-Chakravarthi R, Karger L, Mehta L, Kornreich R, Diaz GA, Li S, Edelmann L, Chen R. ClinLabGeneticist: a tool for clinical management of genetic variants from whole exome sequencing in clinical genetic laboratories. Genome Med 2015; 7:77. [PMID: 26338694 PMCID: PMC4558641 DOI: 10.1186/s13073-015-0207-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2015] [Accepted: 07/16/2015] [Indexed: 01/09/2023] Open
Abstract
Routine clinical application of whole exome sequencing remains challenging due to difficulties in variant interpretation, large dataset management, and workflow integration. We describe a tool named ClinLabGeneticist to implement a workflow in clinical laboratories for management of variant assessment in genetic testing and disease diagnosis. We established an extensive variant annotation data source for the identification of pathogenic variants. A dashboard was deployed to aid a multi-step, hierarchical review process leading to final clinical decisions on genetic variant assessment. In addition, a central database was built to archive all of the genetic testing data, notes, and comments throughout the review process, variant validation data by Sanger sequencing as well as the final clinical reports for future reference. The entire workflow including data entry, distribution of work assignments, variant evaluation and review, selection of variants for validation, report generation, and communications between various personnel is integrated into a single data management platform. Three case studies are presented to illustrate the utility of ClinLabGeneticist. ClinLabGeneticist is freely available to academia at http://rongchenlab.org/software/clinlabgeneticist .
Collapse
Affiliation(s)
- Jinlian Wang
- Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| | - Jun Liao
- Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| | - Jinglan Zhang
- Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| | - Wei-Yi Cheng
- Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| | - Jörg Hakenberg
- Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| | - Meng Ma
- Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| | - Bryn D Webb
- Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| | - Rajasekar Ramasamudram-Chakravarthi
- Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| | - Lisa Karger
- Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| | - Lakshmi Mehta
- Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| | - Ruth Kornreich
- Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| | - George A Diaz
- Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| | - Shuyu Li
- Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| | - Lisa Edelmann
- Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| | - Rong Chen
- Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
26
|
Zhang Y, Yu Z, Ban R, Zhang H, Iqbal F, Zhao A, Li A, Shi Q. DeAnnCNV: a tool for online detection and annotation of copy number variations from whole-exome sequencing data. Nucleic Acids Res 2015; 43:W289-94. [PMID: 26013811 PMCID: PMC4489280 DOI: 10.1093/nar/gkv556] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2015] [Revised: 04/30/2015] [Accepted: 05/15/2015] [Indexed: 01/08/2023] Open
Abstract
With the decrease in costs, whole-exome sequencing (WES) has become a very popular and powerful tool for the identification of genetic variants underlying human diseases. However, integrated tools to precisely detect and systematically annotate copy number variations (CNVs) from WES data are still in great demand. Here, we present an online tool, DeAnnCNV (Detection and Annotation of Copy Number Variations from WES data), to meet the current demands of WES users. Upon submitting the file generated from WES data by an in-house tool that can be downloaded from our server, DeAnnCNV can detect CNVs in each sample and extract the shared CNVs among multiple samples. DeAnnCNV also provides additional useful supporting information for the detected CNVs and associated genes to help users to find the potential candidates for further experimental study. The web server is implemented in PHP + Perl + MATLAB and is online available to all users for free at http://mcg.ustc.edu.cn/db/cnv/.
Collapse
Affiliation(s)
- Yuanwei Zhang
- Molecular and Cell Genetics Laboratory, The CAS Key Laboratory of Innate Immunity and Chronic Disease, Hefei National Laboratory for Physical Sciences at Microscale and School of Life Sciences, University of Science and Technology of China, Hefei 230027, China
| | - Zhenhua Yu
- School of Information Science and Technology, University of Science and Technology of China, Hefei 230027, China
| | - Rongjun Ban
- School of Information Science and Technology, University of Science and Technology of China, Hefei 230027, China
| | - Huan Zhang
- Molecular and Cell Genetics Laboratory, The CAS Key Laboratory of Innate Immunity and Chronic Disease, Hefei National Laboratory for Physical Sciences at Microscale and School of Life Sciences, University of Science and Technology of China, Hefei 230027, China
| | - Furhan Iqbal
- Molecular and Cell Genetics Laboratory, The CAS Key Laboratory of Innate Immunity and Chronic Disease, Hefei National Laboratory for Physical Sciences at Microscale and School of Life Sciences, University of Science and Technology of China, Hefei 230027, China Institute of Pure and Applied Biology, Bahauddin Zakariya University Multan, 60800, Pakistan
| | - Aiwu Zhao
- Hefei Institute of Physical Science, China Academy of Science, Hefei 230027, China
| | - Ao Li
- School of Information Science and Technology, University of Science and Technology of China, Hefei 230027, China Research Centers for Biomedical Engineering, University of Science and Technology of China, Hefei 230027, China
| | - Qinghua Shi
- Molecular and Cell Genetics Laboratory, The CAS Key Laboratory of Innate Immunity and Chronic Disease, Hefei National Laboratory for Physical Sciences at Microscale and School of Life Sciences, University of Science and Technology of China, Hefei 230027, China
| |
Collapse
|