1
|
Ha TT, Burgess R, Newman M, Moey C, Mandelstam SA, Gardner AE, Ivancevic AM, Pham D, Kumar R, Smith N, Patel C, Malone S, Ryan MM, Calvert S, van Eyk CL, Lardelli M, Berkovic SF, Leventer RJ, Richards LJ, Scheffer IE, Gecz J, Corbett MA. Aicardi Syndrome Is a Genetically Heterogeneous Disorder. Genes (Basel) 2023; 14:1565. [PMID: 37628618 PMCID: PMC10454071 DOI: 10.3390/genes14081565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 07/25/2023] [Accepted: 07/26/2023] [Indexed: 08/27/2023] Open
Abstract
Aicardi Syndrome (AIC) is a rare neurodevelopmental disorder recognized by the classical triad of agenesis of the corpus callosum, chorioretinal lacunae and infantile epileptic spasms syndrome. The diagnostic criteria of AIC were revised in 2005 to include additional phenotypes that are frequently observed in this patient group. AIC has been traditionally considered as X-linked and male lethal because it almost exclusively affects females. Despite numerous genetic and genomic investigations on AIC, a unifying X-linked cause has not been identified. Here, we performed exome and genome sequencing of 10 females with AIC or suspected AIC based on current criteria. We identified a unique de novo variant, each in different genes: KMT2B, SLF1, SMARCB1, SZT2 and WNT8B, in five of these females. Notably, genomic analyses of coding and non-coding single nucleotide variants, short tandem repeats and structural variation highlighted a distinct lack of X-linked candidate genes. We assessed the likely pathogenicity of our candidate autosomal variants using the TOPflash assay for WNT8B and morpholino knockdown in zebrafish (Danio rerio) embryos for other candidates. We show expression of Wnt8b and Slf1 are restricted to clinically relevant cortical tissues during mouse development. Our findings suggest that AIC is genetically heterogeneous with implicated genes converging on molecular pathways central to cortical development.
Collapse
Affiliation(s)
- Thuong T. Ha
- School of Biological Sciences, Faculty of Science, University of Adelaide, Adelaide, SA 5005, Australia
- Department of Genetics and Molecular Pathology, Centre for Cancer Biology, An Alliance between SA Pathology and the University of South Australia, Adelaide, SA 5000, Australia
| | - Rosemary Burgess
- Epilepsy Research Centre, Faculty of Medicine, Dentistry and Health Sciences, University of Melbourne, Austin Health, Heidelberg, VIC 3084, Australia (S.F.B.); (I.E.S.)
| | - Morgan Newman
- Alzheimer’s Disease Genetics Laboratory, School of Biological Sciences, Faculty of Science, University of Adelaide, Adelaide, SA 5005, Australia (M.L.)
| | - Ching Moey
- The Queensland Brain Institute, The School of Biomedical Sciences, Faculty of Medicine, The University of Queensland, Brisbane, QLD 4000, Australia
| | - Simone A. Mandelstam
- Department of Paediatrics, Faculty of Medicine, Dentistry and Health Sciences, University of Melbourne, Parkville, VIC 3052, Australia
- Department of Medical Imaging, The Royal Children’s Hospital, Melbourne, VIC 3052, Australia
| | - Alison E. Gardner
- Adelaide Medical School and Robinson Research Institute, Faculty of Health and Medical Sciences, University of Adelaide, Adelaide, SA 5005, Australia (M.A.C.)
| | - Atma M. Ivancevic
- Department of Molecular, Cellular, and Developmental Biology, College of Arts and Sciences, University of Colorado, Boulder, CO 80309, USA
| | - Duyen Pham
- Adelaide Medical School and Robinson Research Institute, Faculty of Health and Medical Sciences, University of Adelaide, Adelaide, SA 5005, Australia (M.A.C.)
| | - Raman Kumar
- Adelaide Medical School and Robinson Research Institute, Faculty of Health and Medical Sciences, University of Adelaide, Adelaide, SA 5005, Australia (M.A.C.)
| | - Nicholas Smith
- Adelaide Medical School and Robinson Research Institute, Faculty of Health and Medical Sciences, University of Adelaide, Adelaide, SA 5005, Australia (M.A.C.)
- Department of Neurology, Women’s and Children’s Hospital, North Adelaide, SA 5006, Australia
| | - Chirag Patel
- Genetic Health Queensland, Royal Brisbane and Women’s Hospital, Herston, QLD 4029, Australia
| | - Stephen Malone
- Queensland Children’s Hospital, South Brisbane, QLD 4101, Australia
| | - Monique M. Ryan
- Department of Paediatrics, Faculty of Medicine, Dentistry and Health Sciences, University of Melbourne, Parkville, VIC 3052, Australia
- Department of Neurology, The Royal Children’s Hospital, Parkville, VIC 3052, Australia
- Murdoch Children’s Research Institute, Parkville, VIC 3052, Australia
| | - Sophie Calvert
- Department of Neurosciences, Queensland Children’s Hospital, South Brisbane, QLD 4101, Australia;
| | - Clare L. van Eyk
- Adelaide Medical School and Robinson Research Institute, Faculty of Health and Medical Sciences, University of Adelaide, Adelaide, SA 5005, Australia (M.A.C.)
| | - Michael Lardelli
- Alzheimer’s Disease Genetics Laboratory, School of Biological Sciences, Faculty of Science, University of Adelaide, Adelaide, SA 5005, Australia (M.L.)
| | - Samuel F. Berkovic
- Epilepsy Research Centre, Faculty of Medicine, Dentistry and Health Sciences, University of Melbourne, Austin Health, Heidelberg, VIC 3084, Australia (S.F.B.); (I.E.S.)
| | - Richard J. Leventer
- Department of Paediatrics, Faculty of Medicine, Dentistry and Health Sciences, University of Melbourne, Parkville, VIC 3052, Australia
- Department of Neurology, The Royal Children’s Hospital, Parkville, VIC 3052, Australia
- Murdoch Children’s Research Institute, Parkville, VIC 3052, Australia
| | - Linda J. Richards
- The Queensland Brain Institute, The School of Biomedical Sciences, Faculty of Medicine, The University of Queensland, Brisbane, QLD 4000, Australia
- Department of Neuroscience, School of Medicine, Washington University, St Louis, MO 63110, USA
| | - Ingrid E. Scheffer
- Epilepsy Research Centre, Faculty of Medicine, Dentistry and Health Sciences, University of Melbourne, Austin Health, Heidelberg, VIC 3084, Australia (S.F.B.); (I.E.S.)
- Department of Paediatrics, Faculty of Medicine, Dentistry and Health Sciences, University of Melbourne, Parkville, VIC 3052, Australia
- Department of Neurology, The Royal Children’s Hospital, Parkville, VIC 3052, Australia
- Murdoch Children’s Research Institute, Parkville, VIC 3052, Australia
- Florey Institute of Neuroscience and Mental Health, Parkville, VIC 3052, Australia
| | - Jozef Gecz
- School of Biological Sciences, Faculty of Science, University of Adelaide, Adelaide, SA 5005, Australia
- Adelaide Medical School and Robinson Research Institute, Faculty of Health and Medical Sciences, University of Adelaide, Adelaide, SA 5005, Australia (M.A.C.)
- South Australian Health and Medical Research Institute, Adelaide, SA 5000, Australia
| | - Mark A. Corbett
- Adelaide Medical School and Robinson Research Institute, Faculty of Health and Medical Sciences, University of Adelaide, Adelaide, SA 5005, Australia (M.A.C.)
| |
Collapse
|
2
|
Baltoumas FA, Karatzas E, Paez-Espino D, Venetsianou NK, Aplakidou E, Oulas A, Finn RD, Ovchinnikov S, Pafilis E, Kyrpides NC, Pavlopoulos GA. Exploring microbial functional biodiversity at the protein family level-From metagenomic sequence reads to annotated protein clusters. FRONTIERS IN BIOINFORMATICS 2023; 3:1157956. [PMID: 36959975 PMCID: PMC10029925 DOI: 10.3389/fbinf.2023.1157956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 02/21/2023] [Indexed: 03/06/2023] Open
Abstract
Metagenomics has enabled accessing the genetic repertoire of natural microbial communities. Metagenome shotgun sequencing has become the method of choice for studying and classifying microorganisms from various environments. To this end, several methods have been developed to process and analyze the sequence data from raw reads to end-products such as predicted protein sequences or families. In this article, we provide a thorough review to simplify such processes and discuss the alternative methodologies that can be followed in order to explore biodiversity at the protein family level. We provide details for analysis tools and we comment on their scalability as well as their advantages and disadvantages. Finally, we report the available data repositories and recommend various approaches for protein family annotation related to phylogenetic distribution, structure prediction and metadata enrichment.
Collapse
Affiliation(s)
- Fotis A. Baltoumas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
- *Correspondence: Fotis A. Baltoumas, ; Nikos C. Kyrpides, ; Georgios A. Pavlopoulos,
| | - Evangelos Karatzas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - David Paez-Espino
- Lawrence Berkeley National Laboratory, DOE Joint Genome Institute, Berkeley, CA, United States
| | - Nefeli K. Venetsianou
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - Eleni Aplakidou
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - Anastasis Oulas
- The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| | - Robert D. Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, United Kingdom
| | - Sergey Ovchinnikov
- John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, MA, United States
| | - Evangelos Pafilis
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece
| | - Nikos C. Kyrpides
- Lawrence Berkeley National Laboratory, DOE Joint Genome Institute, Berkeley, CA, United States
- *Correspondence: Fotis A. Baltoumas, ; Nikos C. Kyrpides, ; Georgios A. Pavlopoulos,
| | - Georgios A. Pavlopoulos
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
- Center of New Biotechnologies and Precision Medicine, Department of Medicine, School of Health Sciences, National and Kapodistrian University of Athens, Athens, Greece
- Hellenic Army Academy, Vari, Greece
- *Correspondence: Fotis A. Baltoumas, ; Nikos C. Kyrpides, ; Georgios A. Pavlopoulos,
| |
Collapse
|
3
|
Detection of repeat expansions in large next generation DNA and RNA sequencing data without alignment. Sci Rep 2022; 12:13124. [PMID: 35907931 PMCID: PMC9338934 DOI: 10.1038/s41598-022-17267-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Accepted: 07/22/2022] [Indexed: 11/10/2022] Open
Abstract
Bioinformatic methods for detecting short tandem repeat expansions in short-read sequencing have identified new repeat expansions in humans, but require alignment information to identify repetitive motif enrichment at genomic locations. We present superSTR, an ultrafast method that does not require alignment. superSTR is used to process whole-genome and whole-exome sequencing data, and perform the first STR analysis of the UK Biobank, efficiently screening and identifying known and potential disease-associated STRs in the exomes of 49,953 biobank participants. We demonstrate the first bioinformatic screening of RNA sequencing data to detect repeat expansions in humans and mouse models of ataxia and dystrophy.
Collapse
|
4
|
Schröder C, Horsthemke B, Depienne C. GC-rich repeat expansions: associated disorders and mechanisms. MED GENET-BERLIN 2022. [DOI: 10.1515/medgen-2021-2099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Abstract
Noncoding repeat expansions are a well-known cause of genetic disorders mainly affecting the central nervous system. Missed by most standard technologies used in routine diagnosis, pathogenic noncoding repeat expansions have to be searched for using specific techniques such as repeat-primed PCR or specific bioinformatics tools applied to genome data, such as ExpansionHunter. In this review, we focus on GC-rich repeat expansions, which represent at least one third of all noncoding repeat expansions described so far. GC-rich expansions are mainly located in regulatory regions (promoter, 5′ untranslated region, first intron) of genes and can lead to either a toxic gain-of-function mediated by RNA toxicity and/or repeat-associated non-AUG (RAN) translation, or a loss-of-function of the associated gene, depending on their size and their methylation status. We herein review the clinical and molecular characteristics of disorders associated with these difficult-to-detect expansions.
Collapse
Affiliation(s)
- Christopher Schröder
- Institute of Human Genetics , University Hospital Essen, University of Duisburg-Essen , Essen , Germany
| | - Bernhard Horsthemke
- Institute of Human Genetics , University Hospital Essen, University of Duisburg-Essen , Essen , Germany
| | - Christel Depienne
- Institute of Human Genetics , University Hospital Essen, University of Duisburg-Essen , Essen , Germany
| |
Collapse
|
5
|
Neurodegenerative diseases associated with non-coding CGG tandem repeat expansions. Nat Rev Neurol 2022; 18:145-157. [PMID: 35022573 DOI: 10.1038/s41582-021-00612-7] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/15/2021] [Indexed: 02/07/2023]
Abstract
Non-coding CGG repeat expansions cause multiple neurodegenerative disorders, including fragile X-associated tremor/ataxia syndrome, neuronal intranuclear inclusion disease, oculopharyngeal myopathy with leukodystrophy, and oculopharyngodistal myopathy. The underlying genetic causes of several of these diseases have been identified only in the past 2-3 years. These expansion disorders have substantial overlapping clinical, neuroimaging and histopathological features. The shared features suggest common mechanisms that could have implications for the development of therapies for this group of diseases - similar therapeutic strategies or drugs may be effective for various neurodegenerative disorders induced by non-coding CGG expansions. In this Review, we provide an overview of clinical and pathological features of these CGG repeat expansion diseases and consider the likely pathological mechanisms, including RNA toxicity, CGG repeat-associated non-AUG-initiated translation, protein aggregation and mitochondrial impairment. We then discuss future research needed to improve the identification and diagnosis of CGG repeat expansion diseases, to improve modelling of these diseases and to understand their pathogenesis. We also consider possible therapeutic strategies. Finally, we propose that CGG repeat expansion diseases may represent manifestations of a single underlying neuromyodegenerative syndrome in which different organs are affected to different extents depending on the gene location of the repeat expansion.
Collapse
|
6
|
Loureiro JR, Castro AF, Figueiredo AS, Silveira I. Molecular Mechanisms in Pentanucleotide Repeat Diseases. Cells 2022; 11:cells11020205. [PMID: 35053321 PMCID: PMC8773600 DOI: 10.3390/cells11020205] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Revised: 01/04/2022] [Accepted: 01/05/2022] [Indexed: 02/01/2023] Open
Abstract
The number of neurodegenerative diseases resulting from repeat expansion has increased extraordinarily in recent years. In several of these pathologies, the repeat can be transcribed in RNA from both DNA strands producing, at least, one toxic RNA repeat that causes neurodegeneration by a complex mechanism. Recently, seven diseases have been found caused by a novel intronic pentanucleotide repeat in distinct genes encoding proteins highly expressed in the cerebellum. These disorders are clinically heterogeneous being characterized by impaired motor function, resulting from ataxia or epilepsy. The role that apparently normal proteins from these mutant genes play in these pathologies is not known. However, recent advances in previously known spinocerebellar ataxias originated by abnormal non-coding pentanucleotide repeats point to a gain of a toxic function by the pathogenic repeat-containing RNA that abnormally forms nuclear foci with RNA-binding proteins. In cells, RNA foci have been shown to be formed by phase separation. Moreover, the field of repeat expansions has lately achieved an extraordinary progress with the discovery that RNA repeats, polyglutamine, and polyalanine proteins are crucial for the formation of nuclear membraneless organelles by phase separation, which is perturbed when they are expanded. This review will cover the amazing advances on repeat diseases.
Collapse
Affiliation(s)
- Joana R. Loureiro
- Genetics of Cognitive Dysfunction Laboratory, i3S- Instituto de Investigação e Inovação em Saúde, Universidade do Porto, 4200-135 Porto, Portugal; (J.R.L.); (A.F.C.); (A.S.F.)
- Institute for Molecular and Cell Biology, Universidade do Porto, 4200-135 Porto, Portugal
| | - Ana F. Castro
- Genetics of Cognitive Dysfunction Laboratory, i3S- Instituto de Investigação e Inovação em Saúde, Universidade do Porto, 4200-135 Porto, Portugal; (J.R.L.); (A.F.C.); (A.S.F.)
- Institute for Molecular and Cell Biology, Universidade do Porto, 4200-135 Porto, Portugal
- Instituto de Ciências Biomédicas Abel Salazar, Universidade do Porto, 4050-313 Porto, Portugal
| | - Ana S. Figueiredo
- Genetics of Cognitive Dysfunction Laboratory, i3S- Instituto de Investigação e Inovação em Saúde, Universidade do Porto, 4200-135 Porto, Portugal; (J.R.L.); (A.F.C.); (A.S.F.)
- Institute for Molecular and Cell Biology, Universidade do Porto, 4200-135 Porto, Portugal
- Instituto de Ciências Biomédicas Abel Salazar, Universidade do Porto, 4050-313 Porto, Portugal
| | - Isabel Silveira
- Genetics of Cognitive Dysfunction Laboratory, i3S- Instituto de Investigação e Inovação em Saúde, Universidade do Porto, 4200-135 Porto, Portugal; (J.R.L.); (A.F.C.); (A.S.F.)
- Institute for Molecular and Cell Biology, Universidade do Porto, 4200-135 Porto, Portugal
- Correspondence: ; Tel.: +351-2240-8800
| |
Collapse
|
7
|
Gall-Duncan T, Sato N, Yuen RKC, Pearson CE. Advancing genomic technologies and clinical awareness accelerates discovery of disease-associated tandem repeat sequences. Genome Res 2022; 32:1-27. [PMID: 34965938 PMCID: PMC8744678 DOI: 10.1101/gr.269530.120] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Accepted: 11/29/2021] [Indexed: 11/25/2022]
Abstract
Expansions of gene-specific DNA tandem repeats (TRs), first described in 1991 as a disease-causing mutation in humans, are now known to cause >60 phenotypes, not just disease, and not only in humans. TRs are a common form of genetic variation with biological consequences, observed, so far, in humans, dogs, plants, oysters, and yeast. Repeat diseases show atypical clinical features, genetic anticipation, and multiple and partially penetrant phenotypes among family members. Discovery of disease-causing repeat expansion loci accelerated through technological advances in DNA sequencing and computational analyses. Between 2019 and 2021, 17 new disease-causing TR expansions were reported, totaling 63 TR loci (>69 diseases), with a likelihood of more discoveries, and in more organisms. Recent and historical lessons reveal that properly assessed clinical presentations, coupled with genetic and biological awareness, can guide discovery of disease-causing unstable TRs. We highlight critical but underrecognized aspects of TR mutations. Repeat motifs may not be present in current reference genomes but will be in forthcoming gapless long-read references. Repeat motif size can be a single nucleotide to kilobases/unit. At a given locus, repeat motif sequence purity can vary with consequence. Pathogenic repeats can be "insertions" within nonpathogenic TRs. Expansions, contractions, and somatic length variations of TRs can have clinical/biological consequences. TR instabilities occur in humans and other organisms. TRs can be epigenetically modified and/or chromosomal fragile sites. We discuss the expanding field of disease-associated TR instabilities, highlighting prospects, clinical and genetic clues, tools, and challenges for further discoveries of disease-causing TR instabilities and understanding their biological and pathological impacts-a vista that is about to expand.
Collapse
Affiliation(s)
- Terence Gall-Duncan
- Program of Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario M5G 1L7, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| | - Nozomu Sato
- Program of Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario M5G 1L7, Canada
| | - Ryan K C Yuen
- Program of Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario M5G 1L7, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| | - Christopher E Pearson
- Program of Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario M5G 1L7, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| |
Collapse
|
8
|
Zou D, Wang L, Liao J, Xiao H, Duan J, Zhang T, Li J, Yin Z, Zhou J, Yan H, Huang Y, Zhan N, Yang Y, Ye J, Chen F, Zhu S, Wen F, Guo J. Genome sequencing of 320 Chinese children with epilepsy: a clinical and molecular study. Brain 2021; 144:3623-3634. [PMID: 34145886 PMCID: PMC8719847 DOI: 10.1093/brain/awab233] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Revised: 05/25/2021] [Accepted: 06/05/2021] [Indexed: 02/05/2023] Open
Abstract
The aim of this study is to evaluate the diagnostic value of genome sequencing in children with epilepsy, and to provide genome sequencing-based insights into the molecular genetic mechanisms of epilepsy to help establish accurate diagnoses, design appropriate treatments and assist in genetic counselling. We performed genome sequencing on 320 Chinese children with epilepsy, and interpreted single-nucleotide variants and copy number variants of all samples. The complete pedigree and clinical data of the probands were established and followed up. The clinical phenotypes, treatments, prognoses and genotypes of the patients were analysed. Age at seizure onset ranged from 1 day to 17 years, with a median of 4.3 years. Pathogenic/likely pathogenic variants were found in 117 of the 320 children (36.6%), of whom 93 (29.1%) had single-nucleotide variants, 22 (6.9%) had copy number variants and two had both single-nucleotide variants and copy number variants. Single-nucleotide variants were most frequently found in SCN1A (10/95, 10.5%), which is associated with Dravet syndrome, followed by PRRT2 (8/95, 8.4%), which is associated with benign familial infantile epilepsy, and TSC2 (7/95, 7.4%), which is associated with tuberous sclerosis. Among the copy number variants, there were three with a length <25 kilobases. The most common recurrent copy number variants were 17p13.3 deletions (5/24, 20.8%), 16p11.2 deletions (4/24, 16.7%), and 7q11.23 duplications (2/24, 8.3%), which are associated with epilepsy, developmental retardation and congenital abnormalities. Four particular 16p11.2 deletions and two 15q11.2 deletions were considered to be susceptibility factors contributing to neurodevelopmental disorders associated with epilepsy. The diagnostic yield was 75.0% in patients with seizure onset during the first postnatal month, and gradually decreased in patients with seizure onset at a later age. Forty-two patients (13.1%) were found to be specifically treatable for the underlying genetic cause identified by genome sequencing. Three of them received corresponding targeted therapies and demonstrated favourable prognoses. Genome sequencing provides complete genetic diagnosis, thus enabling individualized treatment and genetic counselling for the parents of the patients. Genome sequencing is expected to become the first choice of methods for genetic testing of patients with epilepsy.
Collapse
Affiliation(s)
- Dongfang Zou
- Department of Neurology, Shenzhen Children’s Hospital, Shenzhen, China
| | - Lin Wang
- BGI-Shenzhen, Shenzhen 518083, China
| | - Jianxiang Liao
- Department of Neurology, Shenzhen Children’s Hospital, Shenzhen, China
| | | | - Jing Duan
- Department of Neurology, Shenzhen Children’s Hospital, Shenzhen, China
| | | | | | | | - Jing Zhou
- BGI-Shenzhen, Shenzhen 518083, China
| | | | | | | | - Ying Yang
- BGI-Shenzhen, Shenzhen 518083, China
| | - Jingyu Ye
- BGI-Shenzhen, Shenzhen 518083, China
| | - Fang Chen
- BGI-Shenzhen, Shenzhen 518083, China
| | - Shida Zhu
- BGI-Shenzhen, Shenzhen 518083, China
| | - Feiqiu Wen
- Department of Hematology and Oncology, Shenzhen Children’s Hospital, Shenzhen, China
- Correspondence may also be addressed to: Feiqiu Wen Shenzhen Children’s Hospital No. 7019 Yitian Road, Shenzhen 518038 Guangdong, China E-mail:
| | - Jian Guo
- BGI-Shenzhen, Shenzhen 518083, China
- Correspondence to: Jian Guo BGI-Shenzhen, Beishan Industry Zone Shenzhen 518083, Guangdong, China E-mail:
| |
Collapse
|
9
|
Morishita S, Ichikawa K, Myers EW. Finding long tandem repeats in long noisy reads. Bioinformatics 2021; 37:612-621. [PMID: 33031558 PMCID: PMC8097686 DOI: 10.1093/bioinformatics/btaa865] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Revised: 09/07/2020] [Accepted: 09/23/2020] [Indexed: 11/13/2022] Open
Abstract
Motivation Long tandem repeat expansions of more than 1000 nt have been suggested to be associated with diseases, but remain largely unexplored in individual human genomes because read lengths have been too short. However, new long-read sequencing technologies can produce single reads of 10 000 nt or more that can span such repeat expansions, although these long reads have high error rates, of 10–20%, which complicates the detection of repetitive elements. Moreover, most traditional algorithms for finding tandem repeats are designed to find short tandem repeats (<1000 nt) and cannot effectively handle the high error rate of long reads in a reasonable amount of time. Results Here, we report an efficient algorithm for solving this problem that takes advantage of the length of the repeat. Namely, a long tandem repeat has hundreds or thousands of approximate copies of the repeated unit, so despite the error rate, many short k-mers will be error-free in many copies of the unit. We exploited this characteristic to develop a method for first estimating regions that could contain a tandem repeat, by analyzing the k-mer frequency distributions of fixed-size windows across the target read, followed by an algorithm that assembles the k-mers of a putative region into the consensus repeat unit by greedily traversing a de Bruijn graph. Experimental results indicated that the proposed algorithm largely outperformed Tandem Repeats Finder, a widely used program for finding tandem repeats, in terms of sensitivity. Availability and implementation https://github.com/morisUtokyo/mTR.
Collapse
Affiliation(s)
- Shinichi Morishita
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba 277-8562, Japan
| | - Kazuki Ichikawa
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba 277-8562, Japan
| | - Eugene W Myers
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Saxony 01307, Germany.,Center for Systems Biology Dresden, Dresden, Saxony 01307, Germany
| |
Collapse
|
10
|
Depienne C, Mandel JL. 30 years of repeat expansion disorders: What have we learned and what are the remaining challenges? Am J Hum Genet 2021; 108:764-785. [PMID: 33811808 DOI: 10.1016/j.ajhg.2021.03.011] [Citation(s) in RCA: 138] [Impact Index Per Article: 46.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Accepted: 03/05/2021] [Indexed: 12/13/2022] Open
Abstract
Tandem repeats represent one of the most abundant class of variations in human genomes, which are polymorphic by nature and become highly unstable in a length-dependent manner. The expansion of repeat length across generations is a well-established process that results in human disorders mainly affecting the central nervous system. At least 50 disorders associated with expansion loci have been described to date, with half recognized only in the last ten years, as prior methodological difficulties limited their identification. These limitations still apply to the current widely used molecular diagnostic methods (exome or gene panels) and thus result in missed diagnosis detrimental to affected individuals and their families, especially for disorders that are very rare and/or clinically not recognizable. Most of these disorders have been identified through family-driven approaches and many others likely remain to be identified. The recent development of long-read technologies provides a unique opportunity to systematically investigate the contribution of tandem repeats and repeat expansions to the genetic architecture of human disorders. In this review, we summarize the current and most recent knowledge about the genetics of repeat expansion disorders and the diversity of their pathophysiological mechanisms and outline the perspectives of developing personalized treatments in the future.
Collapse
|
11
|
Field MJ, Kumar R, Hackett A, Kayumi S, Shoubridge CA, Ewans LJ, Ivancevic AM, Dudding-Byth T, Carroll R, Kroes T, Gardner AE, Sullivan P, Ha TT, Schwartz CE, Cowley MJ, Dinger ME, Palmer EE, Christie L, Shaw M, Roscioli T, Gecz J, Corbett MA. Different types of disease-causing noncoding variants revealed by genomic and gene expression analyses in families with X-linked intellectual disability. Hum Mutat 2021; 42:835-847. [PMID: 33847015 DOI: 10.1002/humu.24207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2020] [Revised: 03/19/2021] [Accepted: 04/08/2021] [Indexed: 11/06/2022]
Abstract
The pioneering discovery research of X-linked intellectual disability (XLID) genes has benefitted thousands of individuals worldwide; however, approximately 30% of XLID families still remain unresolved. We postulated that noncoding variants that affect gene regulation or splicing may account for the lack of a genetic diagnosis in some cases. Detecting pathogenic, gene-regulatory variants with the same sensitivity and specificity as structural and coding variants is a major challenge for Mendelian disorders. Here, we describe three pedigrees with suggestive XLID where distinctive phenotypes associated with known genes guided the identification of three different noncoding variants. We used comprehensive structural, single-nucleotide, and repeat expansion analyses of genome sequencing. RNA-Seq from patient-derived cell lines, reverse-transcription polymerase chain reactions, Western blots, and reporter gene assays were used to confirm the functional effect of three fundamentally different classes of pathogenic noncoding variants: a retrotransposon insertion, a novel intronic splice donor, and a canonical splice variant of an untranslated exon. In one family, we excluded a rare coding variant in ARX, a known XLID gene, in favor of a regulatory noncoding variant in OFD1 that correlated with the clinical phenotype. Our results underscore the value of genomic research on unresolved XLID families to aid novel, pathogenic noncoding variant discovery.
Collapse
Affiliation(s)
- Michael J Field
- NSW Genetics of Learning Disability Service, Newcastle, New South Wales, Australia
| | - Raman Kumar
- Adelaide Medical School and Robinson Research Institute, University of Adelaide, Adelaide, South Australia, Australia
| | - Anna Hackett
- NSW Genetics of Learning Disability Service, Newcastle, New South Wales, Australia.,School of Biomedical Sciences and Pharmacy, University of Newcastle, Newcastle, New South Wales, Australia
| | - Sayaka Kayumi
- Adelaide Medical School and Robinson Research Institute, University of Adelaide, Adelaide, South Australia, Australia
| | - Cheryl A Shoubridge
- Adelaide Medical School and Robinson Research Institute, University of Adelaide, Adelaide, South Australia, Australia
| | - Lisa J Ewans
- St Vincent's Clinical School, University of New South Wales, Darlinghurst, Australia.,Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, Darlinghurst, New South Wales, Australia
| | - Atma M Ivancevic
- Department of Molecular, Cellular and Developmental Biology, University of Colorado, Boulder, Colorado, USA
| | - Tracy Dudding-Byth
- NSW Genetics of Learning Disability Service, Newcastle, New South Wales, Australia.,School of Biomedical Sciences and Pharmacy, University of Newcastle, Newcastle, New South Wales, Australia
| | - Renée Carroll
- Adelaide Medical School and Robinson Research Institute, University of Adelaide, Adelaide, South Australia, Australia
| | - Thessa Kroes
- Adelaide Medical School and Robinson Research Institute, University of Adelaide, Adelaide, South Australia, Australia
| | - Alison E Gardner
- Adelaide Medical School and Robinson Research Institute, University of Adelaide, Adelaide, South Australia, Australia
| | - Patricia Sullivan
- Children's Cancer Institute, University of New South Wales, Kensington, New South Wales, Australia
| | - Thuong T Ha
- Molecular Pathology Department, Centre for Cancer Biology, SA Pathology, Adelaide, South Australia, Australia
| | | | - Mark J Cowley
- NSW Genetics of Learning Disability Service, Newcastle, New South Wales, Australia.,Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, Darlinghurst, New South Wales, Australia.,Children's Cancer Institute, University of New South Wales, Kensington, New South Wales, Australia
| | - Marcel E Dinger
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Kensington, New South Wales, Australia
| | - Elizabeth E Palmer
- NSW Genetics of Learning Disability Service, Newcastle, New South Wales, Australia.,School of Women's and Children's Health, University of New South Wales, Kensington, Sydney, New South Wales, Australia
| | - Louise Christie
- NSW Genetics of Learning Disability Service, Newcastle, New South Wales, Australia
| | - Marie Shaw
- Adelaide Medical School and Robinson Research Institute, University of Adelaide, Adelaide, South Australia, Australia
| | - Tony Roscioli
- NeuRA, University of New South Wales, Sydney, New South Wales, Australia.,Centre for Clinical Genetics, Sydney Children's Hospital, Randwick, Sydney, New South Wales, Australia
| | - Jozef Gecz
- Adelaide Medical School and Robinson Research Institute, University of Adelaide, Adelaide, South Australia, Australia.,South Australian Health and Medical Research Institute, Adelaide, South Australia, Australia
| | - Mark A Corbett
- Adelaide Medical School and Robinson Research Institute, University of Adelaide, Adelaide, South Australia, Australia
| |
Collapse
|
12
|
Chen H, Lu Y, Lu D, Xu S. Y-LineageTracker: a high-throughput analysis framework for Y-chromosomal next-generation sequencing data. BMC Bioinformatics 2021; 22:114. [PMID: 33750289 PMCID: PMC7941695 DOI: 10.1186/s12859-021-04057-z] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Accepted: 02/28/2021] [Indexed: 01/25/2023] Open
Abstract
BACKGROUND Y-chromosome DNA (Y-DNA) has been used for tracing paternal lineages and offers a clear path from an individual to a known, or likely, direct paternal ancestor. The advance of next-generation sequencing (NGS) technologies increasingly improves the resolution of the non-recombining region of the Y-chromosome (NRY). However, a lack of suitable computer tools prevents the use of NGS data from the Y-DNA studies. RESULTS We developed Y-LineageTracker, a high-throughput analysis framework that not only utilizes state-of-the-art methodologies to automatically determine NRY haplogroups and identify microsatellite variants of Y-chromosome on a fine scale, but also optimizes comprehensive Y-DNA analysis methods for NGS data. Notably, Y-LineageTracker integrates the NRY haplogroup and Y-STR analysis modules with recognized strategies to robustly suggest an interpretation for paternal genetics and evolution. NRY haplogroup module mainly covers haplogroup classification, clustering analysis, phylogeny construction, and divergence time estimation of NRY haplogroups, and Y-STR module mainly includes Y-STR genotyping, statistical calculation, network analysis, and estimation of time to the most recent common ancestor (TMRCA) based on Y-STR haplotypes. Performance comparison indicated that Y-LineageTracker outperformed existing Y-DNA analysis tools for the high performance and satisfactory visualization effect. CONCLUSIONS Y-LineageTracker is an open-source and user-friendly command-line tool that provide multiple functions to efficiently analyze Y-DNA from NGS data at both Y-SNP and Y-STR level. Additionally, Y-LineageTracker supports various formats of input data and produces high-quality figures suitable for publication. Y-LineageTracker is coded with Python3 and supports Windows, Linux, and macOS platforms, and can be installed manually or via the Python Package Index (PyPI). The source code, examples, and manual of Y-LineageTracker are freely available at https://www.picb.ac.cn/PGG/resource.php or CodeOcean ( https://codeocean.com/capsule/7424381/tree ).
Collapse
Affiliation(s)
- Hao Chen
- Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Yan Lu
- Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
- School of Life Sciences, Fudan University, Shanghai, 200433, China
| | - Dongsheng Lu
- Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Shuhua Xu
- Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China.
- School of Life Science and Technology, ShanghaiTech University, Shanghai, 201210, China.
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, 650223, China.
- Henan Institute of Medical and Pharmaceutical Sciences, Zhengzhou University, Zhengzhou, 450052, China.
- Collaborative Innovation Center of Genetics and Development, Fudan University, Shanghai, 200438, China.
| |
Collapse
|
13
|
Bolognini D, Magi A, Benes V, Korbel JO, Rausch T. TRiCoLOR: tandem repeat profiling using whole-genome long-read sequencing data. Gigascience 2020; 9:giaa101. [PMID: 33034633 PMCID: PMC7539535 DOI: 10.1093/gigascience/giaa101] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Revised: 08/07/2020] [Accepted: 09/07/2020] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Tandem repeat sequences are widespread in the human genome, and their expansions cause multiple repeat-mediated disorders. Genome-wide discovery approaches are needed to fully elucidate their roles in health and disease, but resolving tandem repeat variation accurately remains a challenging task. While traditional mapping-based approaches using short-read data have severe limitations in the size and type of tandem repeats they can resolve, recent third-generation sequencing technologies exhibit substantially higher sequencing error rates, which complicates repeat resolution. RESULTS We developed TRiCoLOR, a freely available tool for tandem repeat profiling using error-prone long reads from third-generation sequencing technologies. The method can identify repetitive regions in sequencing data without a prior knowledge of their motifs or locations and resolve repeat multiplicity and period size in a haplotype-specific manner. The tool includes methods to interactively visualize the identified repeats and to trace their Mendelian consistency in pedigrees. CONCLUSIONS TRiCoLOR demonstrates excellent performance and improved sensitivity and specificity compared with alternative tools on synthetic data. For real human whole-genome sequencing data, TRiCoLOR achieves high validation rates, suggesting its suitability to identify tandem repeat variation in personal genomes.
Collapse
Affiliation(s)
- Davide Bolognini
- Department of Experimental and Clinical Medicine, University of Florence, Viale Pieraccini 6, Florence 50134, Italy
- European Molecular Biology Laboratory (EMBL), GeneCore, Meyerhofstraße 1, Heidelberg 69117, Germany
| | - Alberto Magi
- Department of Information Engineering, University of Florence, Via di S. Marta 3, Florence 50134, Italy
| | - Vladimir Benes
- European Molecular Biology Laboratory (EMBL), GeneCore, Meyerhofstraße 1, Heidelberg 69117, Germany
| | - Jan O Korbel
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, Heidelberg 69117, Germany
| | - Tobias Rausch
- European Molecular Biology Laboratory (EMBL), GeneCore, Meyerhofstraße 1, Heidelberg 69117, Germany
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, Heidelberg 69117, Germany
| |
Collapse
|
14
|
Advances in repeat expansion diseases and a new concept of repeat motif-phenotype correlation. Curr Opin Genet Dev 2020; 65:176-185. [PMID: 32777681 DOI: 10.1016/j.gde.2020.05.029] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Accepted: 05/22/2020] [Indexed: 12/19/2022]
Abstract
Recently repeat expansions have been found in more than 10 diseases in the past two years. Because the same repeat motifs are found in similar disease (as exemplified by benign adult familial myoclonic epilepsy) or in diseases with overlapping phenotype (as exemplified by fragile X tremor/ataxia syndrome, neuronal intranuclear inclusion disease, oculopharyngeal myopathy with leukoencephalopathy, and oculopharyngodistal myopathy), we propose a new concept of 'repeat motif-phenotype correlation', which argue for toxic gain-of-function mechanism caused by expanded repeats, rather than altered functions of genes harboring expanded repeats. The concept is expected to help identify repeat expansions taking the similar or overlapping clinical presentations as the clues. Although repeat expansions have been identified predominantly in autosomal dominant diseases, recent progresses have demonstrated that they are also observed in autosomal recessive diseases. Furthermore, repeat expansions are not infrequently observed in patients without family histories, which urges us to pay attention to sporadic diseases. We should expand our views toward repeat expansion diseases to accelerate discovery of diseases caused by repeat expansions, better understanding the disease mechanisms, and development of therapeutic measures.
Collapse
|
15
|
Paulson H. Repeat expansions in leukoencephalopathy. Ann Neurol 2019; 86:809-811. [DOI: 10.1002/ana.25613] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Accepted: 09/30/2019] [Indexed: 01/03/2023]
Affiliation(s)
- Henry Paulson
- Department of NeurologyUniversity of Michigan Ann Arbor MI
| |
Collapse
|
16
|
Florian RT, Kraft F, Leitão E, Kaya S, Klebe S, Magnin E, van Rootselaar AF, Buratti J, Kühnel T, Schröder C, Giesselmann S, Tschernoster N, Altmueller J, Lamiral A, Keren B, Nava C, Bouteiller D, Forlani S, Jornea L, Kubica R, Ye T, Plassard D, Jost B, Meyer V, Deleuze JF, Delpu Y, Avarello MDM, Vijfhuizen LS, Rudolf G, Hirsch E, Kroes T, Reif PS, Rosenow F, Ganos C, Vidailhet M, Thivard L, Mathieu A, Bourgeron T, Kurth I, Rafehi H, Steenpass L, Horsthemke B, LeGuern E, Klein KM, Labauge P, Bennett MF, Bahlo M, Gecz J, Corbett MA, Tijssen MAJ, van den Maagdenberg AMJM, Depienne C. Unstable TTTTA/TTTCA expansions in MARCH6 are associated with Familial Adult Myoclonic Epilepsy type 3. Nat Commun 2019; 10:4919. [PMID: 31664039 PMCID: PMC6820781 DOI: 10.1038/s41467-019-12763-9] [Citation(s) in RCA: 90] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Accepted: 09/23/2019] [Indexed: 12/30/2022] Open
Abstract
Familial Adult Myoclonic Epilepsy (FAME) is a genetically heterogeneous disorder characterized by cortical tremor and seizures. Intronic TTTTA/TTTCA repeat expansions in SAMD12 (FAME1) are the main cause of FAME in Asia. Using genome sequencing and repeat-primed PCR, we identify another site of this repeat expansion, in MARCH6 (FAME3) in four European families. Analysis of single DNA molecules with nanopore sequencing and molecular combing show that expansions range from 3.3 to 14 kb on average. However, we observe considerable variability in expansion length and structure, supporting the existence of multiple expansion configurations in blood cells and fibroblasts of the same individual. Moreover, the largest expansions are associated with micro-rearrangements occurring near the expansion in 20% of cells. This study provides further evidence that FAME is caused by intronic TTTTA/TTTCA expansions in distinct genes and reveals that expansions exhibit an unexpectedly high somatic instability that can ultimately result in genomic rearrangements.
Collapse
Affiliation(s)
- Rahel T Florian
- Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Hufelandstraße 55, 45147, Essen, Germany
| | - Florian Kraft
- Institute of Human Genetics, Medical Faculty, RWTH Aachen University, 52062, Aachen, Germany
| | - Elsa Leitão
- Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Hufelandstraße 55, 45147, Essen, Germany
| | - Sabine Kaya
- Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Hufelandstraße 55, 45147, Essen, Germany
| | - Stephan Klebe
- Department of Neurology, Universitätsklinikum Essen, Universität Duisburg-Essen, Hufelandstraße 55, 45147, Essen, Germany
| | - Eloi Magnin
- Department of Neurology, CHU Jean Minjoz, 25000, Besançon, France
| | - Anne-Fleur van Rootselaar
- Departments of Neurology and Clinical Neurophysiology, Amsterdam UMC, University of Amsterdam, Amsterdam Neuroscience, Meibergdreef 9, 1105, AZ, Amsterdam, The Netherlands
| | - Julien Buratti
- AP-HP, Hôpital Pitié-Salpêtrière, Département de Génétique, 75013, Paris, France
| | - Theresa Kühnel
- Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Hufelandstraße 55, 45147, Essen, Germany
| | - Christopher Schröder
- Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Hufelandstraße 55, 45147, Essen, Germany
| | - Sebastian Giesselmann
- Institute of Human Genetics, Medical Faculty, RWTH Aachen University, 52062, Aachen, Germany
| | - Nikolai Tschernoster
- Cologne Center for Genomics, Center for Molecular Medicine Cologne (CMMC), University of Cologne, Weyertal 115b, 50931, Cologne, Germany
| | - Janine Altmueller
- Cologne Center for Genomics, Center for Molecular Medicine Cologne (CMMC), University of Cologne, Weyertal 115b, 50931, Cologne, Germany
| | - Anaide Lamiral
- Department of Neurology, CHU Jean Minjoz, 25000, Besançon, France
| | - Boris Keren
- AP-HP, Hôpital Pitié-Salpêtrière, Département de Génétique, 75013, Paris, France
| | - Caroline Nava
- AP-HP, Hôpital Pitié-Salpêtrière, Département de Génétique, 75013, Paris, France
- Institut du Cerveau et de la Moelle épinière (ICM), Sorbonne Université, UMR S 1127, Inserm U1127, CNRS UMR 7225, F-75013, Paris, France
| | - Delphine Bouteiller
- Institut du Cerveau et de la Moelle épinière (ICM), Sorbonne Université, UMR S 1127, Inserm U1127, CNRS UMR 7225, F-75013, Paris, France
| | - Sylvie Forlani
- Institut du Cerveau et de la Moelle épinière (ICM), Sorbonne Université, UMR S 1127, Inserm U1127, CNRS UMR 7225, F-75013, Paris, France
| | - Ludmila Jornea
- Institut du Cerveau et de la Moelle épinière (ICM), Sorbonne Université, UMR S 1127, Inserm U1127, CNRS UMR 7225, F-75013, Paris, France
| | - Regina Kubica
- Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Hufelandstraße 55, 45147, Essen, Germany
| | - Tao Ye
- IGBMC, CNRS UMR 7104/INSERM U1258/Université de Strasbourg, 1 Rue Laurent Fries, 67400, Illkirch-Graffenstaden, France
| | - Damien Plassard
- IGBMC, CNRS UMR 7104/INSERM U1258/Université de Strasbourg, 1 Rue Laurent Fries, 67400, Illkirch-Graffenstaden, France
| | - Bernard Jost
- IGBMC, CNRS UMR 7104/INSERM U1258/Université de Strasbourg, 1 Rue Laurent Fries, 67400, Illkirch-Graffenstaden, France
| | - Vincent Meyer
- Centre National de Recherche en Génomique Humaine (CNRGH), Institut de Biologie François Jacob, CEA, Université Paris-Saclay, F-91057, Evry, France
| | - Jean-François Deleuze
- Centre National de Recherche en Génomique Humaine (CNRGH), Institut de Biologie François Jacob, CEA, Université Paris-Saclay, F-91057, Evry, France
| | - Yannick Delpu
- Genomic Vision, 80 Rue des Meuniers, 92220, Bagneux, France
| | | | - Lisanne S Vijfhuizen
- Department of Human Genetics, Leiden University Medical Center, Albinusdreef 2, 2333, ZA, Leiden, The Netherlands
| | - Gabrielle Rudolf
- IGBMC, CNRS UMR 7104/INSERM U1258/Université de Strasbourg, 1 Rue Laurent Fries, 67400, Illkirch-Graffenstaden, France
- Department of Neurology-centre de référence des epilepsies rares, University Hospital of Strasbourg, 1 Avenue Molière, 67200, Strasbourg, France
| | - Edouard Hirsch
- Department of Neurology-centre de référence des epilepsies rares, University Hospital of Strasbourg, 1 Avenue Molière, 67200, Strasbourg, France
| | - Thessa Kroes
- School of Biological Sciences, School of Medicine and Robinson Research Institute, The University of Adelaide, Adelaide, 5005, SA, Australia
| | - Philipp S Reif
- Epilepsy Center Frankfurt Rhine-Main, Department of Neurology, Goethe University and LOEWE Center for Personalized Translational Epilepsy Research (CePTER), 60323, Frankfurt am Main, Germany
- Department of Neurology, Epilepsy Center Hessen, Philipps University, 35037, Marburg, Germany
| | - Felix Rosenow
- Epilepsy Center Frankfurt Rhine-Main, Department of Neurology, Goethe University and LOEWE Center for Personalized Translational Epilepsy Research (CePTER), 60323, Frankfurt am Main, Germany
- Department of Neurology, Epilepsy Center Hessen, Philipps University, 35037, Marburg, Germany
| | - Christos Ganos
- Department of Neurology, Charité University Medicine Berlin, 10117, Berlin, Germany
| | - Marie Vidailhet
- Institut du Cerveau et de la Moelle épinière (ICM), Sorbonne Université, UMR S 1127, Inserm U1127, CNRS UMR 7225, F-75013, Paris, France
- APHP, Hôpital Pitié-Salpêtrière, Département de Neurologie, 75013, Paris, France
| | - Lionel Thivard
- APHP, Hôpital Pitié-Salpêtrière, Département de Neurologie, 75013, Paris, France
| | - Alexandre Mathieu
- Human Genetics and Cognitive Functions, Pasteur Institute, UMR3571 CNRS, Université de Paris, 75015, Paris, France
| | - Thomas Bourgeron
- Human Genetics and Cognitive Functions, Pasteur Institute, UMR3571 CNRS, Université de Paris, 75015, Paris, France
| | - Ingo Kurth
- Institute of Human Genetics, Medical Faculty, RWTH Aachen University, 52062, Aachen, Germany
| | - Haloom Rafehi
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, 3052, VIC, Australia
- Department of Medical Biology, University of Melbourne, Melbourne, 3010, VIC, Australia
- Epilepsy Research Centre, Department of Medicine, University of Melbourne, Austin Health, Heidelberg, 3084, VIC, Australia
| | - Laura Steenpass
- Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Hufelandstraße 55, 45147, Essen, Germany
| | - Bernhard Horsthemke
- Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Hufelandstraße 55, 45147, Essen, Germany
| | - Eric LeGuern
- AP-HP, Hôpital Pitié-Salpêtrière, Département de Génétique, 75013, Paris, France
- Institut du Cerveau et de la Moelle épinière (ICM), Sorbonne Université, UMR S 1127, Inserm U1127, CNRS UMR 7225, F-75013, Paris, France
| | - Karl Martin Klein
- Epilepsy Center Frankfurt Rhine-Main, Department of Neurology, Goethe University and LOEWE Center for Personalized Translational Epilepsy Research (CePTER), 60323, Frankfurt am Main, Germany
- Department of Neurology, Epilepsy Center Hessen, Philipps University, 35037, Marburg, Germany
- Departments of Clinical Neurosciences, Medical Genetics and Community Health Sciences, Hotchkiss Brain Institute & Alberta Children's Hospital Research Institute, Cumming School of Medicine, University of Calgary, 2500 University Dr NW, Calgary, AB, T2N 1N4, Canada
| | - Pierre Labauge
- Department of Neurology, Gui de Chauliac University Hospital, 34295, Montpellier, France
| | - Mark F Bennett
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, 3052, VIC, Australia
- Department of Medical Biology, University of Melbourne, Melbourne, 3010, VIC, Australia
- Epilepsy Research Centre, Department of Medicine, University of Melbourne, Austin Health, Heidelberg, 3084, VIC, Australia
| | - Melanie Bahlo
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, 3052, VIC, Australia
- Department of Medical Biology, University of Melbourne, Melbourne, 3010, VIC, Australia
| | - Jozef Gecz
- School of Biological Sciences, School of Medicine and Robinson Research Institute, The University of Adelaide, Adelaide, 5005, SA, Australia
- South Australian Health and Medical Research Institute, The University of Adelaide, Adelaide, 5005, SA, Australia
| | - Mark A Corbett
- School of Biological Sciences, School of Medicine and Robinson Research Institute, The University of Adelaide, Adelaide, 5005, SA, Australia
| | - Marina A J Tijssen
- Department of Neurology, University Medical Center Groningen, University of Groningen, 9700, AB, Groningen, the Netherlands
| | - Arn M J M van den Maagdenberg
- Department of Human Genetics, Leiden University Medical Center, Albinusdreef 2, 2333, ZA, Leiden, The Netherlands
- Department of Neurology, Leiden University Medical Center, Albinusdreef 2, 2333 ZA, Leiden, The Netherlands
| | - Christel Depienne
- Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Hufelandstraße 55, 45147, Essen, Germany.
- Institut du Cerveau et de la Moelle épinière (ICM), Sorbonne Université, UMR S 1127, Inserm U1127, CNRS UMR 7225, F-75013, Paris, France.
- IGBMC, CNRS UMR 7104/INSERM U1258/Université de Strasbourg, 1 Rue Laurent Fries, 67400, Illkirch-Graffenstaden, France.
| |
Collapse
|
17
|
Haghshenas E, Sahinalp SC, Hach F. lordFAST: sensitive and Fast Alignment Search Tool for LOng noisy Read sequencing Data. Bioinformatics 2019; 35:20-27. [PMID: 30561550 DOI: 10.1093/bioinformatics/bty544] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2017] [Accepted: 06/28/2018] [Indexed: 02/01/2023] Open
Abstract
Motivation Recent advances in genomics and precision medicine have been made possible through the application of high throughput sequencing (HTS) to large collections of human genomes. Although HTS technologies have proven their use in cataloging human genome variation, computational analysis of the data they generate is still far from being perfect. The main limitation of Illumina and other popular sequencing technologies is their short read length relative to the lengths of (common) genomic repeats. Newer (single molecule sequencing - SMS) technologies such as Pacific Biosciences and Oxford Nanopore are producing longer reads, making it theoretically possible to overcome the difficulties imposed by repeat regions. Unfortunately, because of their high sequencing error rate, reads generated by these technologies are very difficult to work with and cannot be used in many of the standard downstream analysis pipelines. Note that it is not only difficult to find the correct mapping locations of such reads in a reference genome, but also to establish their correct alignment so as to differentiate sequencing errors from real genomic variants. Furthermore, especially since newer SMS instruments provide higher throughput, mapping and alignment need to be performed much faster than before, maintaining high sensitivity. Results We introduce lordFAST, a novel long-read mapper that is specifically designed to align reads generated by PacBio and potentially other SMS technologies to a reference. lordFAST not only has higher sensitivity than the available alternatives, it is also among the fastest and has a very low memory footprint. Availability and implementation lordFAST is implemented in C++ and supports multi-threading. The source code of lordFAST is available at https://github.com/vpc-ccg/lordfast. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ehsan Haghshenas
- School of Computing Science, Simon Fraser University, Burnaby, BC, Canada
| | - S Cenk Sahinalp
- School of Computing Science, Simon Fraser University, Burnaby, BC, Canada.,School of Informatics and Computing, Indiana University, Bloomington, IN, USA
| | - Faraz Hach
- Vancouver Prostate Centre, Vancouver, BC, Canada.,Department of Urologic Sciences, University of British Columbia, Vancouver, BC, Canada
| |
Collapse
|
18
|
Demir G, Alkan C. Characterizing microsatellite polymorphisms using assembly-based and mapping-based tools. ACTA ACUST UNITED AC 2019; 43:264-273. [PMID: 31496881 PMCID: PMC6710001 DOI: 10.3906/biy-1903-16] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Microsatellite polymorphism has always been a challenge for genome assembly and sequence alignment due to sequencing errors, short read lengths, and high incidence of polymerase slippage in microsatellite regions. Despite the information they carry being very valuable, microsatellite variations have not gained enough attention to be a routine step in genome sequence analysis pipelines. After the completion of the 1000 Genomes Project, which aimed to establish the most detailed genetic variation catalog for humans, the consortium released only two microsatellite prediction sets generated by two tools. Many other large research efforts have failed to shed light on microsatellite variations. We evaluated the performance of three different local assembly methods on three different experimental settings, focusing on genotype-based performance, coverage impact, and preprocessing including flanking regions. All these experiments supported our initial expectations on assembly. We also demonstrate that overlap-layout-consensus (OLC)-basedassembly methods show higher sensitivity to microsatellite variant calling when compared to a de Bruijn graph-based approach. We conclude that assembly with OLC is the better method for genotyping microsatellites. Our pipeline is available at https://github.com/gulfemd/STRAssembly.
Collapse
Affiliation(s)
- Gülfem Demir
- Department of Computer Engineering, Faculty of Engineering, Bilkent University, Bilkent, Ankara Turkey
| | - Can Alkan
- Department of Computer Engineering, Faculty of Engineering, Bilkent University, Bilkent, Ankara Turkey
| |
Collapse
|
19
|
Ishiura H, Shibata S, Yoshimura J, Suzuki Y, Qu W, Doi K, Almansour MA, Kikuchi JK, Taira M, Mitsui J, Takahashi Y, Ichikawa Y, Mano T, Iwata A, Harigaya Y, Matsukawa MK, Matsukawa T, Tanaka M, Shirota Y, Ohtomo R, Kowa H, Date H, Mitsue A, Hatsuta H, Morimoto S, Murayama S, Shiio Y, Saito Y, Mitsutake A, Kawai M, Sasaki T, Sugiyama Y, Hamada M, Ohtomo G, Terao Y, Nakazato Y, Takeda A, Sakiyama Y, Umeda-Kameyama Y, Shinmi J, Ogata K, Kohno Y, Lim SY, Tan AH, Shimizu J, Goto J, Nishino I, Toda T, Morishita S, Tsuji S. Noncoding CGG repeat expansions in neuronal intranuclear inclusion disease, oculopharyngodistal myopathy and an overlapping disease. Nat Genet 2019; 51:1222-1232. [DOI: 10.1038/s41588-019-0458-z] [Citation(s) in RCA: 178] [Impact Index Per Article: 35.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2018] [Accepted: 05/29/2019] [Indexed: 11/09/2022]
|
20
|
Minio A, Massonnet M, Figueroa-Balderas R, Vondras AM, Blanco-Ulate B, Cantu D. Iso-Seq Allows Genome-Independent Transcriptome Profiling of Grape Berry Development. G3 (BETHESDA, MD.) 2019; 9:755-767. [PMID: 30642874 PMCID: PMC6404599 DOI: 10.1534/g3.118.201008] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Accepted: 01/09/2019] [Indexed: 01/13/2023]
Abstract
Transcriptomics has been widely applied to study grape berry development. With few exceptions, transcriptomic studies in grape are performed using the available genome sequence, PN40024, as reference. However, differences in gene content among grape accessions, which contribute to phenotypic differences among cultivars, suggest that a single reference genome does not represent the species' entire gene space. Though whole genome assembly and annotation can reveal the relatively unique or "private" gene space of any particular cultivar, transcriptome reconstruction is a more rapid, less costly, and less computationally intensive strategy to accomplish the same goal. In this study, we used single molecule-real time sequencing (SMRT) to sequence full-length cDNA (Iso-Seq) and reconstruct the transcriptome of Cabernet Sauvignon berries during berry ripening. In addition, short reads from ripening berries were used to error-correct low-expression isoforms and to profile isoform expression. By comparing the annotated gene space of Cabernet Sauvignon to other grape cultivars, we demonstrate that the transcriptome reference built with Iso-Seq data represents most of the expressed genes in the grape berries and includes 1,501 cultivar-specific genes. Iso-Seq produced transcriptome profiles similar to those obtained after mapping on a complete genome reference. Together, these results justify the application of Iso-Seq to identify cultivar-specific genes and build a comprehensive reference for transcriptional profiling that circumvents the necessity of a genome reference with its associated costs and computational weight.
Collapse
Affiliation(s)
- Andrea Minio
- Department of Viticulture and Enology, University of California Davis, Davis, CA
| | - Mélanie Massonnet
- Department of Viticulture and Enology, University of California Davis, Davis, CA
| | | | - Amanda M Vondras
- Department of Viticulture and Enology, University of California Davis, Davis, CA
| | | | - Dario Cantu
- Department of Viticulture and Enology, University of California Davis, Davis, CA
| |
Collapse
|
21
|
Detecting a long insertion variant in SAMD12 by SMRT sequencing: implications of long-read whole-genome sequencing for repeat expansion diseases. J Hum Genet 2018; 64:191-197. [DOI: 10.1038/s10038-018-0551-7] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2018] [Revised: 11/12/2018] [Accepted: 11/27/2018] [Indexed: 01/21/2023]
|
22
|
Cumming SA, Hamilton MJ, Robb Y, Gregory H, McWilliam C, Cooper A, Adam B, McGhie J, Hamilton G, Herzyk P, Tschannen MR, Worthey E, Petty R, Ballantyne B, Warner J, Farrugia ME, Longman C, Monckton DG. De novo repeat interruptions are associated with reduced somatic instability and mild or absent clinical features in myotonic dystrophy type 1. Eur J Hum Genet 2018; 26:1635-1647. [PMID: 29967337 PMCID: PMC6189127 DOI: 10.1038/s41431-018-0156-9] [Citation(s) in RCA: 65] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2018] [Revised: 03/23/2018] [Accepted: 03/30/2018] [Indexed: 01/10/2023] Open
Abstract
Myotonic dystrophy type 1 (DM1) is a multisystem disorder, caused by expansion of a CTG trinucleotide repeat in the 3'-untranslated region of the DMPK gene. The repeat expansion is somatically unstable and tends to increase in length with time, contributing to disease progression. In some individuals, the repeat array is interrupted by variant repeats such as CCG and CGG, stabilising the expansion and often leading to milder symptoms. We have characterised three families, each including one person with variant repeats that had arisen de novo on paternal transmission of the repeat expansion. Two individuals were identified for screening due to an unusual result in the laboratory diagnostic test, and the third due to exceptionally mild symptoms. The presence of variant repeats in all three expanded alleles was confirmed by restriction digestion of small pool PCR products, and allele structures were determined by PacBio sequencing. Each was different, but all contained CCG repeats close to the 3'-end of the repeat expansion. All other family members had inherited pure CTG repeats. The variant repeat-containing alleles were more stable in the blood than pure alleles of similar length, which may in part account for the mild symptoms observed in all three individuals. This emphasises the importance of somatic instability as a disease mechanism in DM1. Further, since patients with variant repeats may have unusually mild symptoms, identification of these individuals has important implications for genetic counselling and for patient stratification in DM1 clinical trials.
Collapse
Affiliation(s)
- Sarah A Cumming
- Institute of Molecular, Cell and Systems Biology, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, G12 8QQ, UK
| | - Mark J Hamilton
- Institute of Molecular, Cell and Systems Biology, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, G12 8QQ, UK.
- West of Scotland Clinical Genetics Service, Queen Elizabeth University Hospital, Glasgow, G51 4TF, UK.
| | - Yvonne Robb
- Clinical Genetics Service, Western General Hospital, Edinburgh, EH4 2XU, UK
| | - Helen Gregory
- Department of Clinical Genetics, Aberdeen Royal Hospital, Aberdeen, AB25 2ZA, UK
| | | | - Anneli Cooper
- Institute of Molecular, Cell and Systems Biology, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, G12 8QQ, UK
| | - Berit Adam
- Institute of Molecular, Cell and Systems Biology, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, G12 8QQ, UK
| | - Josephine McGhie
- Institute of Molecular, Cell and Systems Biology, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, G12 8QQ, UK
| | - Graham Hamilton
- Glasgow Polyomics, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, G61 1QH, UK
| | - Pawel Herzyk
- Glasgow Polyomics, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, G61 1QH, UK
| | - Michael R Tschannen
- Human and Molecular Genetics Center, Medical College Wisconsin, 8701 Watertown Plank Road, Milwaukee, WI, 53226, USA
| | - Elizabeth Worthey
- Human and Molecular Genetics Center, Medical College Wisconsin, 8701 Watertown Plank Road, Milwaukee, WI, 53226, USA
- Hudson Alpha Institute for Biotechnology, 601 Genome Way, NW, Huntsville, AL, 35806, USA
| | - Richard Petty
- Department of Neurology, Institute of Neurological Sciences, Queen Elizabeth University Hospital, Glasgow, G51 4TF, UK
| | - Bob Ballantyne
- West of Scotland Clinical Genetics Service, Queen Elizabeth University Hospital, Glasgow, G51 4TF, UK
| | - Jon Warner
- Molecular Genetics Service, Molecular Medicine Centre, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU, UK
| | - Maria Elena Farrugia
- Department of Neurology, Institute of Neurological Sciences, Queen Elizabeth University Hospital, Glasgow, G51 4TF, UK
| | - Cheryl Longman
- West of Scotland Clinical Genetics Service, Queen Elizabeth University Hospital, Glasgow, G51 4TF, UK
| | - Darren G Monckton
- Institute of Molecular, Cell and Systems Biology, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, G12 8QQ, UK
| |
Collapse
|
23
|
Abstract
This review explores the presence and functions of polyglutamine (polyQ) in viral proteins. In mammals, mutations in polyQ segments (and CAG repeats at the nucleotide level) have been linked to neural disorders and ataxias. PolyQ regions in normal human proteins have documented functional roles, in transcription factors and, more recently, in regulating autophagy. Despite the high frequency of polyQ repeats in eukaryotic genomes, little attention has been given to the presence or possible role of polyQ sequences in virus genomes. A survey described here revealed that polyQ repeats occur rarely in RNA viruses, suggesting that they have detrimental effects on virus replication at the nucleotide or protein level. However, there have been sporadic reports of polyQ segments in potyviruses and in reptilian nidoviruses (among the largest RNA viruses known). Conserved polyQ segments are found in the regulatory control proteins of many DNA viruses. Variable length polyQ tracts are found in proteins that contribute to transmissibility (cowpox A-type inclusion protein (ATI)) and control of latency (herpes viruses). New longer-read sequencing methods, using original biological samples, should reveal more details on the presence and functional role of polyQ in viruses, as well as the nucleotide regions that encode them. Given the known toxic effects of polyQ repeats, the role of these segments in neurovirulent and tumorigenic viruses should be further explored.
Collapse
|
24
|
Expansions of intronic TTTCA and TTTTA repeats in benign adult familial myoclonic epilepsy. Nat Genet 2018; 50:581-590. [PMID: 29507423 DOI: 10.1038/s41588-018-0067-2] [Citation(s) in RCA: 184] [Impact Index Per Article: 30.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2017] [Accepted: 01/09/2018] [Indexed: 11/09/2022]
Abstract
Epilepsy is a common neurological disorder, and mutations in genes encoding ion channels or neurotransmitter receptors are frequent causes of monogenic forms of epilepsy. Here we show that abnormal expansions of TTTCA and TTTTA repeats in intron 4 of SAMD12 cause benign adult familial myoclonic epilepsy (BAFME). Single-molecule, real-time sequencing of BAC clones and nanopore sequencing of genomic DNA identified two repeat configurations in SAMD12. Intriguingly, in two families with a clinical diagnosis of BAFME in which no repeat expansions in SAMD12 were observed, we identified similar expansions of TTTCA and TTTTA repeats in introns of TNRC6A and RAPGEF2, indicating that expansions of the same repeat motifs are involved in the pathogenesis of BAFME regardless of the genes in which the expanded repeats are located. This discovery that expansions of noncoding repeats lead to neuronal dysfunction responsible for myoclonic tremor and epilepsy extends the understanding of diseases with such repeat expansion.
Collapse
|
25
|
Abstract
Huntington's disease (HD) is caused by a CAG repeat expansion in the HTT gene. Repeat length can change over time, both in individual cells and between generations, and longer repeats may drive pathology. Cellular DNA repair systems have long been implicated in CAG repeat instability but recent genetic evidence from humans linking DNA repair variants to HD onset and progression has reignited interest in this area. The DNA damage response plays an essential role in maintaining genome stability, but may also license repeat expansions in the context of HD. In this chapter we summarize the methods developed to assay CAG repeat expansion/contraction in vitro and in cells, and review the DNA repair genes tested in mouse models of HD. While none of these systems is currently ideal, new technologies, such as long-read DNA sequencing, should improve the sensitivity of assays to assess the effects of DNA repair pathways in HD. Improved assays will be essential precursors to high-throughput testing of small molecules that can alter specific steps in DNA repair pathways and perhaps ameliorate expansion or enhance contraction of the HTT CAG repeat.
Collapse
|
26
|
Tang H, Kirkness EF, Lippert C, Biggs WH, Fabani M, Guzman E, Ramakrishnan S, Lavrenko V, Kakaradov B, Hou C, Hicks B, Heckerman D, Och FJ, Caskey CT, Venter JC, Telenti A. Profiling of Short-Tandem-Repeat Disease Alleles in 12,632 Human Whole Genomes. Am J Hum Genet 2017; 101:700-715. [PMID: 29100084 PMCID: PMC5673627 DOI: 10.1016/j.ajhg.2017.09.013] [Citation(s) in RCA: 98] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2017] [Accepted: 09/15/2017] [Indexed: 12/30/2022] Open
Abstract
Short tandem repeats (STRs) are hyper-mutable sequences in the human genome. They are often used in forensics and population genetics and are also the underlying cause of many genetic diseases. There are challenges associated with accurately determining the length polymorphism of STR loci in the genome by next-generation sequencing (NGS). In particular, accurate detection of pathological STR expansion is limited by the sequence read length during whole-genome analysis. We developed TREDPARSE, a software package that incorporates various cues from read alignment and paired-end distance distribution, as well as a sequence stutter model, in a probabilistic framework to infer repeat sizes for genetic loci, and we used this software to infer repeat sizes for 30 known disease loci. Using simulated data, we show that TREDPARSE outperforms other available software. We sampled the full genome sequences of 12,632 individuals to an average read depth of approximately 30× to 40× with Illumina HiSeq X. We identified 138 individuals with risk alleles at 15 STR disease loci. We validated a representative subset of the samples (n = 19) by Sanger and by Oxford Nanopore sequencing. Additionally, we validated the STR calls against known allele sizes in a set of GeT-RM reference cell-line materials (n = 6). Several STR loci that are entirely guanine or cytosines (G or C) have insufficient read evidence for inference and therefore could not be assayed precisely by TREDPARSE. TREDPARSE extends the limit of STR size detection beyond the physical sequence read length. This extension is critical because many of the disease risk cutoffs are close to or beyond the short sequence read length of 100 to 150 bases.
Collapse
Affiliation(s)
- Haibao Tang
- Human Longevity, Mountain View, CA 94041, USA
| | | | | | | | | | | | | | | | | | - Claire Hou
- Human Longevity, San Diego, CA 92121, USA
| | - Barry Hicks
- Human Longevity, Mountain View, CA 94041, USA
| | | | - Franz J Och
- Human Longevity, Mountain View, CA 94041, USA
| | | | | | | |
Collapse
|
27
|
Downing NR, Lourens S, De Soriano I, Long JD, Paulsen JS. Phenotype Characterization of HD Intermediate Alleles in PREDICT-HD. J Huntingtons Dis 2017; 5:357-368. [PMID: 27983559 DOI: 10.3233/jhd-160185] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
BACKGROUND Huntington disease (HD) is a neurodegenerative disease caused by a CAG repeat expansion on chromosome 4. Pathology is associated with CAG repeat length. Prior studies examining people in the intermediate allele (IA) range found subtle differences in motor, cognitive, and behavioral domains compared to controls. OBJECTIVE The purpose of this study was to examine baseline and longitudinal differences in motor, cognitive, behavioral, functional, and imaging outcomes between persons with CAG repeats in three ranges: normal (≤26), intermediate (27-35), and reduced penetrance (36-39). METHODS We examined longitudinal data from 389 participants in three allele groups: 280 normal controls (NC), 21 intermediate allele [IA], and 88 reduced penetrance [RP]. We used linear mixed models to identify differences in baseline and longitudinal outcomes between groups. Three models were tested: 1) no baseline or longitudinal differences; 2) baseline differences but no longitudinal differences; and 3) baseline and longitudinal differences. RESULTS Model 1 was the best fitting model for most outcome variables. Models 2 and 3 were best fitting for some of the variables. We found baseline and longitudinal trends of declining performance across increasing CAG repeat length groups, but no significant differences between the NC and IA groups. CONCLUSION We did not find evidence to support differences in the IA group compared to the NC group. These findings are limited by a small IA sample size.
Collapse
Affiliation(s)
| | - Spencer Lourens
- Department of Biostatistics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Isabella De Soriano
- Department of Psychiatry, Carver College of Medicine, University of Iowa, Iowa City, IA, USA
| | - Jeffrey D Long
- Department of Psychiatry, Carver College of Medicine, University of Iowa, Iowa City, IA, USA.,Department of Biostatistics, College of Public Health, The University of Iowa, Iowa City, IA, USA
| | - Jane S Paulsen
- Department of Psychiatry, Carver College of Medicine, University of Iowa, Iowa City, IA, USA.,Department of Neurology, Carver College of Medicine, The University of Iowa, Iowa City, IA, USA.,Department of Psychological and Brain Sciences, The University of Iowa, Iowa City, IA, USA
| | | |
Collapse
|
28
|
Garrido-Ramos MA. Satellite DNA: An Evolving Topic. Genes (Basel) 2017; 8:genes8090230. [PMID: 28926993 PMCID: PMC5615363 DOI: 10.3390/genes8090230] [Citation(s) in RCA: 217] [Impact Index Per Article: 31.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2017] [Revised: 09/12/2017] [Accepted: 09/13/2017] [Indexed: 12/22/2022] Open
Abstract
Satellite DNA represents one of the most fascinating parts of the repetitive fraction of the eukaryotic genome. Since the discovery of highly repetitive tandem DNA in the 1960s, a lot of literature has extensively covered various topics related to the structure, organization, function, and evolution of such sequences. Today, with the advent of genomic tools, the study of satellite DNA has regained a great interest. Thus, Next-Generation Sequencing (NGS), together with high-throughput in silico analysis of the information contained in NGS reads, has revolutionized the analysis of the repetitive fraction of the eukaryotic genomes. The whole of the historical and current approaches to the topic gives us a broad view of the function and evolution of satellite DNA and its role in chromosomal evolution. Currently, we have extensive information on the molecular, chromosomal, biological, and population factors that affect the evolutionary fate of satellite DNA, knowledge that gives rise to a series of hypotheses that get on well with each other about the origin, spreading, and evolution of satellite DNA. In this paper, I review these hypotheses from a methodological, conceptual, and historical perspective and frame them in the context of chromosomal organization and evolution.
Collapse
Affiliation(s)
- Manuel A Garrido-Ramos
- Departamento de Genética, Facultad de Ciencias, Universidad de Granada, 18071 Granada, Spain.
| |
Collapse
|
29
|
Haghshenas E, Hach F, Sahinalp SC, Chauve C. CoLoRMap: Correcting Long Reads by Mapping short reads. Bioinformatics 2017; 32:i545-i551. [PMID: 27587673 DOI: 10.1093/bioinformatics/btw463] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
MOTIVATION Second generation sequencing technologies paved the way to an exceptional increase in the number of sequenced genomes, both prokaryotic and eukaryotic. However, short reads are difficult to assemble and often lead to highly fragmented assemblies. The recent developments in long reads sequencing methods offer a promising way to address this issue. However, so far long reads are characterized by a high error rate, and assembling from long reads require a high depth of coverage. This motivates the development of hybrid approaches that leverage the high quality of short reads to correct errors in long reads. RESULTS We introduce CoLoRMap, a hybrid method for correcting noisy long reads, such as the ones produced by PacBio sequencing technology, using high-quality Illumina paired-end reads mapped onto the long reads. Our algorithm is based on two novel ideas: using a classical shortest path algorithm to find a sequence of overlapping short reads that minimizes the edit score to a long read and extending corrected regions by local assembly of unmapped mates of mapped short reads. Our results on bacterial, fungal and insect data sets show that CoLoRMap compares well with existing hybrid correction methods. AVAILABILITY AND IMPLEMENTATION The source code of CoLoRMap is freely available for non-commercial use at https://github.com/sfu-compbio/colormap CONTACT ehaghshe@sfu.ca or cedric.chauve@sfu.ca SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ehsan Haghshenas
- School of Computing Sciences MADD-Gen Graduate Program, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
| | - Faraz Hach
- School of Computing Sciences Vancouver Prostate Centre, Vancouver, BC V6H 3Z6, Canada
| | - S Cenk Sahinalp
- School of Computing Sciences Vancouver Prostate Centre, Vancouver, BC V6H 3Z6, Canada, School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA
| | - Cedric Chauve
- Department of Mathematics, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
| |
Collapse
|
30
|
Liu Q, Zhang P, Wang D, Gu W, Wang K. Interrogating the "unsequenceable" genomic trinucleotide repeat disorders by long-read sequencing. Genome Med 2017; 9:65. [PMID: 28720120 PMCID: PMC5514472 DOI: 10.1186/s13073-017-0456-7] [Citation(s) in RCA: 51] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2016] [Accepted: 06/30/2017] [Indexed: 12/26/2022] Open
Abstract
Microsatellite expansion, such as trinucleotide repeat expansion (TRE), is known to cause a number of genetic diseases. Sanger sequencing and next-generation short-read sequencing are unable to interrogate TRE reliably. We developed a novel algorithm called RepeatHMM to estimate repeat counts from long-read sequencing data. Evaluation on simulation data, real amplicon sequencing data on two repeat expansion disorders, and whole-genome sequencing data generated by PacBio and Oxford Nanopore technologies showed superior performance over competing approaches. We concluded that long-read sequencing coupled with RepeatHMM can estimate repeat counts on microsatellites and can interrogate the “unsequenceable” genomic trinucleotide repeat disorders.
Collapse
Affiliation(s)
- Qian Liu
- Institute for Genomic Medicine, Columbia University, New York, NY, 10032, USA
| | - Peng Zhang
- Nextomics Biosciences, Wuhan, Hubei, 430000, China
| | - Depeng Wang
- Nextomics Biosciences, Wuhan, Hubei, 430000, China
| | - Weihong Gu
- China-Japan Friendship Hospital, Beijing, 100029, China
| | - Kai Wang
- Institute for Genomic Medicine, Columbia University, New York, NY, 10032, USA. .,Department of Biomedical Informatics, Columbia University, New York, NY, 10032, USA.
| |
Collapse
|
31
|
Gymrek M. A genomic view of short tandem repeats. Curr Opin Genet Dev 2017; 44:9-16. [PMID: 28213161 DOI: 10.1016/j.gde.2017.01.012] [Citation(s) in RCA: 67] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2016] [Accepted: 01/30/2017] [Indexed: 12/31/2022]
Abstract
Short tandem repeats (STRs) are some of the fastest mutating loci in the genome. Tools for accurately profiling STRs from high-throughput sequencing data have enabled genome-wide interrogation of more than a million STRs across hundreds of individuals. These catalogs have revealed that STRs are highly multiallelic and may contribute more de novo mutations than any other variant class. Recent studies have leveraged these catalogs to show that STRs play a widespread role in regulating gene expression and other molecular phenotypes. These analyses suggest that STRs are an underappreciated but rich reservoir of variation that likely make significant contributions to Mendelian diseases, complex traits, and cancer.
Collapse
Affiliation(s)
- Melissa Gymrek
- Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA; Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|
32
|
Minio A, Lin J, Gaut BS, Cantu D. How Single Molecule Real-Time Sequencing and Haplotype Phasing Have Enabled Reference-Grade Diploid Genome Assembly of Wine Grapes. FRONTIERS IN PLANT SCIENCE 2017; 8:826. [PMID: 28567052 PMCID: PMC5434136 DOI: 10.3389/fpls.2017.00826] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2017] [Accepted: 05/02/2017] [Indexed: 05/23/2023]
Affiliation(s)
- Andrea Minio
- Department of Viticulture and Enology, University of California, DavisDavis, CA, United States
| | - Jerry Lin
- Department of Viticulture and Enology, University of California, DavisDavis, CA, United States
| | - Brandon S. Gaut
- Department of Ecology and Evolutionary Biology, University of California, IrvineIrvine, CA, United States
| | - Dario Cantu
- Department of Viticulture and Enology, University of California, DavisDavis, CA, United States
- *Correspondence: Dario Cantu
| |
Collapse
|
33
|
Artyomenko A, Wu NC, Mangul S, Eskin E, Sun R, Zelikovsky A. Long Single-Molecule Reads Can Resolve the Complexity of the Influenza Virus Composed of Rare, Closely Related Mutant Variants. J Comput Biol 2016; 24:558-570. [PMID: 27901586 DOI: 10.1089/cmb.2016.0146] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
As a result of a high rate of mutations and recombination events, an RNA-virus exists as a heterogeneous "swarm" of mutant variants. The long read length offered by single-molecule sequencing technologies allows each mutant variant to be sequenced in a single pass. However, high error rate limits the ability to reconstruct heterogeneous viral population composed of rare, related mutant variants. In this article, we present two single-nucleotide variants (2SNV), a method able to tolerate the high error rate of the single-molecule protocol and reconstruct mutant variants. 2SNV uses linkage between single-nucleotide variations to efficiently distinguish them from read errors. To benchmark the sensitivity of 2SNV, we performed a single-molecule sequencing experiment on a sample containing a titrated level of known viral mutant variants. Our method is able to accurately reconstruct clone with frequency of 0.2% and distinguish clones that differed in only two nucleotides distantly located on the genome. 2SNV outperforms existing methods for full-length viral mutant reconstruction.
Collapse
Affiliation(s)
| | - Nicholas C Wu
- 2 Department of Integrative Structural and Computational Biology, The Scripps Research Institute , La Jolla, California
| | - Serghei Mangul
- 3 Department of Computer Science, University of California , Los Angeles, Los Angeles, California.,4 Institute for Quantitative and Computational Biosciences, University of California Los Angeles , Los Angeles, California
| | - Eleazar Eskin
- 3 Department of Computer Science, University of California , Los Angeles, Los Angeles, California
| | - Ren Sun
- 5 Molecular and Medical Pharmacology, University of California , Los Angeles, Los Angeles, California
| | - Alex Zelikovsky
- 1 Department of Computer Science, Georgia State University , Atlanta, Georgia
| |
Collapse
|
34
|
Ishiura H, Tsuji S. Epidemiology and molecular mechanism of frontotemporal lobar degeneration/amyotrophic lateral sclerosis with repeat expansion mutation in C9orf72. J Neurogenet 2016; 29:85-94. [PMID: 26540641 DOI: 10.3109/01677063.2015.1085980] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
GGGGCC hexanucleotide repeat expansions in C9orf72 were identified in 2011 as the genetic cause of frontotemporal lobar degeneration (FTLD)/amyotrophic lateral sclerosis (ALS) linked to chromosome 9. Since then, a number of studies have been conducted to delineate the molecular epidemiology of the repeat expansions and the molecular pathophysiology of the disease. The frequency of the repeat expansions considerably varied among countries. The frequency of the repeat expansions was high in European populations and populations of European descent and a substantial proportion of sporadic FTLD or ALS patients also have the mutations in these populations. On the other hand, the frequency was extremely low in Asia or Oceania except for limited regions including Kii Peninsula of Japan. A founder effect seems to strongly influence the regional differences in the frequency, but there is no definitive evidence that supports the notion that the repeat expansions arose in a single founder or multiple founders. As a disease-causing mechanism, several molecular mechanisms have been proposed, including conformational changes of DNA (G-quadruplex formation and hypermethylation) or RNA (G-quadruplex formation) molecules, altered transcriptional levels of C9orf72, sequestration of RNA-binding proteins, bidirectional transcription, formation of RNA foci, and neurotoxicity of dipeptide repeat proteins generated by repeat-associated non-ATG-initiated translation. Further investigations on the molecular mechanisms of neurodegeneration are expected to lead to the development of therapeutic interventions for this disease as well as for other diseases associated with non-coding repeat expansions.
Collapse
Affiliation(s)
- Hiroyuki Ishiura
- a Department of Neurology , The University of Tokyo , Tokyo , Japan
| | - Shoji Tsuji
- a Department of Neurology , The University of Tokyo , Tokyo , Japan
| |
Collapse
|
35
|
Xiao W, Wu L, Yavas G, Simonyan V, Ning B, Hong H. Challenges, Solutions, and Quality Metrics of Personal Genome Assembly in Advancing Precision Medicine. Pharmaceutics 2016; 8:E15. [PMID: 27110816 PMCID: PMC4932478 DOI: 10.3390/pharmaceutics8020015] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2015] [Revised: 03/11/2016] [Accepted: 04/06/2016] [Indexed: 01/15/2023] Open
Abstract
Even though each of us shares more than 99% of the DNA sequences in our genome, there are millions of sequence codes or structure in small regions that differ between individuals, giving us different characteristics of appearance or responsiveness to medical treatments. Currently, genetic variants in diseased tissues, such as tumors, are uncovered by exploring the differences between the reference genome and the sequences detected in the diseased tissue. However, the public reference genome was derived with the DNA from multiple individuals. As a result of this, the reference genome is incomplete and may misrepresent the sequence variants of the general population. The more reliable solution is to compare sequences of diseased tissue with its own genome sequence derived from tissue in a normal state. As the price to sequence the human genome has dropped dramatically to around $1000, it shows a promising future of documenting the personal genome for every individual. However, de novo assembly of individual genomes at an affordable cost is still challenging. Thus, till now, only a few human genomes have been fully assembled. In this review, we introduce the history of human genome sequencing and the evolution of sequencing platforms, from Sanger sequencing to emerging "third generation sequencing" technologies. We present the currently available de novo assembly and post-assembly software packages for human genome assembly and their requirements for computational infrastructures. We recommend that a combined hybrid assembly with long and short reads would be a promising way to generate good quality human genome assemblies and specify parameters for the quality assessment of assembly outcomes. We provide a perspective view of the benefit of using personal genomes as references and suggestions for obtaining a quality personal genome. Finally, we discuss the usage of the personal genome in aiding vaccine design and development, monitoring host immune-response, tailoring drug therapy and detecting tumors. We believe the precision medicine would largely benefit from bioinformatics solutions, particularly for personal genome assembly.
Collapse
Affiliation(s)
- Wenming Xiao
- National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, USA.
| | - Leihong Wu
- National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, USA.
| | - Gokhan Yavas
- National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, USA.
| | - Vahan Simonyan
- Center for Biologics Evaluation and Research, U.S. Food and Drug Administration, 10903 New Hampshire Ave, Silver Spring, MD 20993, USA.
| | - Baitang Ning
- National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, USA.
| | - Huixiao Hong
- National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, USA.
| |
Collapse
|
36
|
Quilez J, Guilmatre A, Garg P, Highnam G, Gymrek M, Erlich Y, Joshi RS, Mittelman D, Sharp AJ. Polymorphic tandem repeats within gene promoters act as modifiers of gene expression and DNA methylation in humans. Nucleic Acids Res 2016; 44:3750-62. [PMID: 27060133 PMCID: PMC4857002 DOI: 10.1093/nar/gkw219] [Citation(s) in RCA: 92] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2015] [Accepted: 03/22/2016] [Indexed: 01/23/2023] Open
Abstract
Despite representing an important source of genetic variation, tandem repeats (TRs) remain poorly studied due to technical difficulties. We hypothesized that TRs can operate as expression (eQTLs) and methylation (mQTLs) quantitative trait loci. To test this we analyzed the effect of variation at 4849 promoter-associated TRs, genotyped in 120 individuals, on neighboring gene expression and DNA methylation. Polymorphic promoter TRs were associated with increased variance in local gene expression and DNA methylation, suggesting functional consequences related to TR variation. We identified >100 TRs associated with expression/methylation levels of adjacent genes. These potential eQTL/mQTL TRs were enriched for overlaps with transcription factor binding and DNaseI hypersensitivity sites, providing a rationale for their effects. Moreover, we showed that most TR variants are poorly tagged by nearby single nucleotide polymorphisms (SNPs) markers, indicating that many functional TR variants are not effectively assayed by SNP-based approaches. Our study assigns biological significance to TR variations in the human genome, and suggests that a significant fraction of TR variations exert functional effects via alterations of local gene expression or epigenetics. We conclude that targeted studies that focus on genotyping TR variants are required to fully ascertain functional variation in the genome.
Collapse
Affiliation(s)
- Javier Quilez
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Audrey Guilmatre
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Paras Garg
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Gareth Highnam
- Virginia Bioinformatics Institute and Department of Biological Sciences, Virginia Tech, Blacksburg, VA 24061, USA
| | - Melissa Gymrek
- Harvard-MIT Division of Health Sciences and Technology, MIT, Cambridge, MA 02139, USA Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA New York Genome Center, New York, NY 10038, USA
| | - Yaniv Erlich
- Harvard-MIT Division of Health Sciences and Technology, MIT, Cambridge, MA 02139, USA Department of Computer Science, Fu Foundation School of Engineering, Columbia University, New York, NY 10027, USA
| | - Ricky S Joshi
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - David Mittelman
- Virginia Bioinformatics Institute and Department of Biological Sciences, Virginia Tech, Blacksburg, VA 24061, USA
| | - Andrew J Sharp
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| |
Collapse
|
37
|
Long Single-Molecule Reads Can Resolve the Complexity of the Influenza Virus Composed of Rare, Closely Related Mutant Variants. LECTURE NOTES IN COMPUTER SCIENCE 2016. [DOI: 10.1007/978-3-319-31957-5_12] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|
38
|
Rhoads A, Au KF. PacBio Sequencing and Its Applications. GENOMICS PROTEOMICS & BIOINFORMATICS 2015; 13:278-89. [PMID: 26542840 PMCID: PMC4678779 DOI: 10.1016/j.gpb.2015.08.002] [Citation(s) in RCA: 1140] [Impact Index Per Article: 126.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/28/2015] [Revised: 08/06/2015] [Accepted: 08/11/2015] [Indexed: 12/15/2022]
Abstract
Single-molecule, real-time sequencing developed by Pacific BioSciences offers longer read lengths than the second-generation sequencing (SGS) technologies, making it well-suited for unsolved problems in genome, transcriptome, and epigenetics research. The highly-contiguous de novo assemblies using PacBio sequencing can close gaps in current reference assemblies and characterize structural variation (SV) in personal genomes. With longer reads, we can sequence through extended repetitive regions and detect mutations, many of which are associated with diseases. Moreover, PacBio transcriptome sequencing is advantageous for the identification of gene isoforms and facilitates reliable discoveries of novel genes and novel isoforms of annotated genes, due to its ability to sequence full-length transcripts or fragments with significant lengths. Additionally, PacBio’s sequencing technique provides information that is useful for the direct detection of base modifications, such as methylation. In addition to using PacBio sequencing alone, many hybrid sequencing strategies have been developed to make use of more accurate short reads in conjunction with PacBio long reads. In general, hybrid sequencing strategies are more affordable and scalable especially for small-size laboratories than using PacBio Sequencing alone. The advent of PacBio sequencing has made available much information that could not be obtained via SGS alone.
Collapse
Affiliation(s)
- Anthony Rhoads
- Department of Biostatistics, University of Iowa, Iowa City, IA 52242, USA
| | - Kin Fai Au
- Department of Biostatistics, University of Iowa, Iowa City, IA 52242, USA; Department of Internal Medicine, University of Iowa, Iowa City, IA 52242, USA.
| |
Collapse
|
39
|
Next-Generation Sequencing Approaches in Cancer: Where Have They Brought Us and Where Will They Take Us? Cancers (Basel) 2015; 7:1925-58. [PMID: 26404381 PMCID: PMC4586802 DOI: 10.3390/cancers7030869] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2015] [Accepted: 09/15/2015] [Indexed: 12/20/2022] Open
Abstract
Next-generation sequencing (NGS) technologies and data have revolutionized cancer research and are increasingly being deployed to guide clinicians in treatment decision-making. NGS technologies have allowed us to take an “omics” approach to cancer in order to reveal genomic, transcriptomic, and epigenomic landscapes of individual malignancies. Integrative multi-platform analyses are increasingly used in large-scale projects that aim to fully characterize individual tumours as well as general cancer types and subtypes. In this review, we examine how NGS technologies in particular have contributed to “omics” approaches in cancer research, allowing for large-scale integrative analyses that consider hundreds of tumour samples. These types of studies have provided us with an unprecedented wealth of information, providing the background knowledge needed to make small-scale (including “N of 1”) studies informative and relevant. We also take a look at emerging opportunities provided by NGS and state-of-the-art third-generation sequencing technologies, particularly in the context of translational research. Cancer research and care are currently poised to experience significant progress catalyzed by accessible sequencing technologies that will benefit both clinical- and research-based efforts.
Collapse
|
40
|
McFarland KN, Liu J, Landrian I, Godiska R, Shanker S, Yu F, Farmerie WG, Ashizawa T. SMRT Sequencing of Long Tandem Nucleotide Repeats in SCA10 Reveals Unique Insight of Repeat Expansion Structure. PLoS One 2015; 10:e0135906. [PMID: 26295943 PMCID: PMC4546671 DOI: 10.1371/journal.pone.0135906] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2014] [Accepted: 07/28/2015] [Indexed: 12/02/2022] Open
Abstract
A large, non-coding ATTCT repeat expansion causes the neurodegenerative disorder, spinocerebellar ataxia type 10 (SCA10). In a subset of SCA10 patients, interruption motifs are present at the 5’ end of the expansion and strongly correlate with epileptic seizures. Thus, interruption motifs are a predictor of the epileptic phenotype and are hypothesized to act as a phenotypic modifier in SCA10. Yet, the exact internal sequence structure of SCA10 expansions remains unknown due to limitations in current technologies for sequencing across long extended tracts of tandem nucleotide repeats. We used the third generation sequencing technology, Single Molecule Real Time (SMRT) sequencing, to obtain full-length contiguous expansion sequences, ranging from 2.5 to 4.4 kb in length, from three SCA10 patients with different clinical presentations. We obtained sequence spanning the entire length of the expansion and identified the structure of known and novel interruption motifs within the SCA10 expansion. The exact interruption patterns in expanded SCA10 alleles will allow us to further investigate the potential contributions of these interrupting sequences to the pathogenic modification leading to the epilepsy phenotype in SCA10. Our results also demonstrate that SMRT sequencing is useful for deciphering long tandem repeats that pose as “gaps” in the human genome sequence.
Collapse
Affiliation(s)
- Karen N. McFarland
- Department of Neurology and The McKnight Brain Institute, University of Florida, Gainesville, Florida, 32610, United States of America
| | - Jilin Liu
- Department of Neurology and The McKnight Brain Institute, University of Florida, Gainesville, Florida, 32610, United States of America
| | - Ivette Landrian
- Department of Neurology and The McKnight Brain Institute, University of Florida, Gainesville, Florida, 32610, United States of America
| | - Ronald Godiska
- Lucigen Corporation, Middleton, Wisconsin, 53562, United States of America
| | - Savita Shanker
- Interdisciplinary Center for Biotechnology Research, University of Florida, Gainesville, Florida, 32610, United States of America
| | - Fahong Yu
- Interdisciplinary Center for Biotechnology Research, University of Florida, Gainesville, Florida, 32610, United States of America
| | - William G. Farmerie
- Interdisciplinary Center for Biotechnology Research, University of Florida, Gainesville, Florida, 32610, United States of America
| | - Tetsuo Ashizawa
- Department of Neurology and The McKnight Brain Institute, University of Florida, Gainesville, Florida, 32610, United States of America
- * E-mail:
| |
Collapse
|
41
|
Meltz Steinberg K, Nicholas TJ, Koboldt DC, Yu B, Mardis E, Pamphlett R. Whole genome analyses reveal no pathogenetic single nucleotide or structural differences between monozygotic twins discordant for amyotrophic lateral sclerosis. Amyotroph Lateral Scler Frontotemporal Degener 2015; 16:385-92. [PMID: 25960086 DOI: 10.3109/21678421.2015.1040029] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The contribution of genetic and environmental factors to the pathogenesis of sporadic amyotrophic lateral sclerosis (ALS) remains unclear. To investigate the genetic component of the disease, we performed whole genome sequencing on ALS discordant monozygotic twins. Illumina whole genome sequencing on white blood cell DNA of five ALS-discordant monozygotic twin pairs (10 samples in total) yielded ∼30x coverage per individual. All single nucleotide variants, indels, and structural variants (copy number variants, inversions and translocations) were called and evaluated for functional consequence, evolutionary conservation, population frequency and overlap with known ALS associated variants and genes. Results showed that no validated discordant coding or regulatory single nucleotide variants or indels were found, and nor were any genome-wide discordant structural variants detected. Concordant variants of particular interest were: 1) two rare, highly-conserved heterozygous non-synonymous variants in SYT9 and EWSR1, genes previously associated with ALS (out of 2044 rare heterozygous variants detected); 2) three rare homozygous missense variants; and 3) three novel copy number deletions that overlapped genes. In conclusion, no convincing coding or regulatory nucleotide or genome-wide structural differences were found between ALS discordant monozygotic twins. The results suggest that more work is needed to elucidate possible environmental, epigenetic, oligogenic and somatic genetic factors that could underlie susceptibility to sporadic ALS.
Collapse
Affiliation(s)
- Karyn Meltz Steinberg
- a The Genome Institute, Washington University School of Medicine, St. Louis , Missouri , USA
| | - Thomas J Nicholas
- a The Genome Institute, Washington University School of Medicine, St. Louis , Missouri , USA
| | - Daniel C Koboldt
- a The Genome Institute, Washington University School of Medicine, St. Louis , Missouri , USA
| | - Bing Yu
- b Department of Medical Genomics , Royal Prince Alfred Hospital and Sydney Medical School, The University of Sydney , New South Wales , Australia
| | - Elaine Mardis
- a The Genome Institute, Washington University School of Medicine, St. Louis , Missouri , USA
| | - Roger Pamphlett
- c The Stacey MND Laboratory, Discipline of Pathology, Sydney Medical School, The Brain & Mind Research Institute, The University of Sydney , New South Wales , Australia
| |
Collapse
|
42
|
Kodama Y, Mashima J, Kosuge T, Katayama T, Fujisawa T, Kaminuma E, Ogasawara O, Okubo K, Takagi T, Nakamura Y. The DDBJ Japanese Genotype-phenotype Archive for genetic and phenotypic human data. Nucleic Acids Res 2014; 43:D18-22. [PMID: 25477381 PMCID: PMC4383935 DOI: 10.1093/nar/gku1120] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
The DNA Data Bank of Japan Center (DDBJ Center; http://www.ddbj.nig.ac.jp) maintains and provides public archival, retrieval and analytical services for biological information. Since October 2013, DDBJ Center has operated the Japanese Genotype-phenotype Archive (JGA) in collaboration with our partner institute, the National Bioscience Database Center (NBDC) of the Japan Science and Technology Agency. DDBJ Center provides the JGA database system which securely stores genotype and phenotype data collected from individuals whose consent agreements authorize data release only for specific research use. NBDC has established guidelines and policies for sharing human-derived data and reviews data submission and usage requests from researchers. In addition to the JGA project, DDBJ Center develops Semantic Web technologies for data integration and sharing in collaboration with the Database Center for Life Science. This paper describes the overview of the JGA project, updates to the DDBJ databases, and services for data retrieval, analysis and integration.
Collapse
Affiliation(s)
- Yuichi Kodama
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Jun Mashima
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Takehide Kosuge
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Toshiaki Katayama
- National Bioscience Database Center, Japan Science and Technology Agency, Tokyo 102-8666, Japan
| | - Takatomo Fujisawa
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Eli Kaminuma
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Osamu Ogasawara
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Kousaku Okubo
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Toshihisa Takagi
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan Database Center for Life Science, Chiba 277-0871, Japan
| | - Yasukazu Nakamura
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| |
Collapse
|
43
|
Santillan BA, Moye C, Mittelman D, Wilson JH. GFP-based fluorescence assay for CAG repeat instability in cultured human cells. PLoS One 2014; 9:e113952. [PMID: 25423602 PMCID: PMC4244167 DOI: 10.1371/journal.pone.0113952] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2014] [Accepted: 11/03/2014] [Indexed: 12/13/2022] Open
Abstract
Trinucleotide repeats can be highly unstable, mutating far more frequently than point mutations. Repeats typically mutate by addition or loss of units of the repeat. CAG repeat expansions in humans trigger neurological diseases that include myotonic dystrophy, Huntington disease, and several spinocerebellar ataxias. In human cells, diverse mechanisms promote CAG repeat instability, and in mice, the mechanisms of instability are varied and tissue-dependent. Dissection of mechanistic complexity and discovery of potential therapeutics necessitates quantitative and scalable screens for repeat mutation. We describe a GFP-based assay for screening modifiers of CAG repeat instability in human cells. The assay exploits an engineered intronic CAG repeat tract that interferes with expression of an inducible GFP minigene. Like the phenotypes of many trinucleotide repeat disorders, we find that GFP function is impaired by repeat expansion, in a length-dependent manner. The intensity of fluorescence varies inversely with repeat length, allowing estimates of repeat tract changes in live cells. We validate the assay using transcription through the repeat and engineered CAG-specific nucleases, which have previously been reported to induce CAG repeat instability. The assay is relatively fast and should be adaptable to large-scale screens of chemical and shRNA libraries.
Collapse
Affiliation(s)
- Beatriz A. Santillan
- Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas, United States of America
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Christopher Moye
- Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas, United States of America
| | - David Mittelman
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia, United States of America
- Department of Biological Sciences, Virginia Tech, Blacksburg, Virginia, United States of America
| | - John H. Wilson
- Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas, United States of America
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
- * E-mail:
| |
Collapse
|
44
|
Abstract
MOTIVATION Resolving tandemly repeated genomic sequences is a necessary step in improving our understanding of the human genome. Short tandem repeats (TRs), or microsatellites, are often used as molecular markers in genetics, and clinically, variation in microsatellites can lead to genetic disorders like Huntington's diseases. Accurately resolving repeats, and in particular TRs, remains a challenging task in genome alignment, assembly and variation calling. Though tools have been developed for detecting microsatellites in short-read sequencing data, these are limited in the size and types of events they can resolve. Single-molecule sequencing technologies may potentially resolve a broader spectrum of TRs given their increased length, but require new approaches given their significantly higher raw error profiles. However, due to inherent error profiles of the single-molecule technologies, these reads presents a unique challenge in terms of accurately identifying and estimating the TRs. RESULTS Here we present PacmonSTR, a reference-based probabilistic approach, to identify the TR region and estimate the number of these TR elements in long DNA reads. We present a multistep approach that requires as input, a reference region and the reference TR element. Initially, the TR region is identified from the long DNA reads via a 3-stage modified Smith-Waterman approach and then, expected number of TR elements is calculated using a pair-Hidden Markov Models-based method. Finally, TR-based genotype selection (or clustering: homozygous/heterozygous) is performed with Gaussian mixture models, using the Akaike information criteria, and coverage expectations.
Collapse
Affiliation(s)
- Ajay Ummat
- Department of Genetics and Genomic Science and Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ali Bashir
- Department of Genetics and Genomic Science and Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
45
|
Abstract
Genomic data have become commonplace in most branches of the biological sciences and have fundamentally altered the way research is conducted. However, the predominance of short-read sequence data from second-generation sequencing technologies has commonly resulted in fragmented and partial genomic data characteristics. In this opinion, I will highlight how long, unbiased reads from single molecule, real-time (SMRT) sequencing now allow for a return to more contiguous and comprehensive views of genomes.
Collapse
Affiliation(s)
- Jonas Korlach
- Pacific Biosciences, 1380 Willow Road, Menlo Park, CA 94025, United States
| |
Collapse
|