1
|
Maestri S, Scalzo D, Damaggio G, Zobel M, Besusso D, Cattaneo E. Navigating triplet repeats sequencing: concepts, methodological challenges and perspective for Huntington's disease. Nucleic Acids Res 2025; 53:gkae1155. [PMID: 39676657 PMCID: PMC11724279 DOI: 10.1093/nar/gkae1155] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Revised: 10/16/2024] [Accepted: 12/02/2024] [Indexed: 12/17/2024] Open
Abstract
The accurate characterization of triplet repeats, especially the overrepresented CAG repeats, is increasingly relevant for several reasons. First, germline expansion of CAG repeats above a gene-specific threshold causes multiple neurodegenerative disorders; for instance, Huntington's disease (HD) is triggered by >36 CAG repeats in the huntingtin (HTT) gene. Second, extreme expansions up to 800 CAG repeats have been found in specific cell types affected by the disease. Third, synonymous single nucleotide variants within the CAG repeat stretch influence the age of disease onset. Thus, new sequencing-based protocols that profile both the length and the exact nucleotide sequence of triplet repeats are crucial. Various strategies to enrich the target gene over the background, along with sequencing platforms and bioinformatic pipelines, are under development. This review discusses the concepts, challenges, and methodological opportunities for analyzing triplet repeats, using HD as a case study. Starting with traditional approaches, we will explore how sequencing-based methods have evolved to meet increasing scientific demands. We will also highlight experimental and bioinformatic challenges, aiming to provide a guide for accurate triplet repeat characterization for diagnostic and therapeutic purposes.
Collapse
Affiliation(s)
- Simone Maestri
- Department of Biosciences, University of Milan, Street Giovanni Celoria, 26, 20133, Milan, Italy
- INGM, Istituto Nazionale Genetica Molecolare ‘Romeo ed Enrica Invernizzi’, Street Francesco Sforza, 35, 20122, Milan, Italy
| | - Davide Scalzo
- Department of Biosciences, University of Milan, Street Giovanni Celoria, 26, 20133, Milan, Italy
- INGM, Istituto Nazionale Genetica Molecolare ‘Romeo ed Enrica Invernizzi’, Street Francesco Sforza, 35, 20122, Milan, Italy
| | - Gianluca Damaggio
- Department of Biosciences, University of Milan, Street Giovanni Celoria, 26, 20133, Milan, Italy
- INGM, Istituto Nazionale Genetica Molecolare ‘Romeo ed Enrica Invernizzi’, Street Francesco Sforza, 35, 20122, Milan, Italy
| | - Martina Zobel
- Department of Biosciences, University of Milan, Street Giovanni Celoria, 26, 20133, Milan, Italy
- INGM, Istituto Nazionale Genetica Molecolare ‘Romeo ed Enrica Invernizzi’, Street Francesco Sforza, 35, 20122, Milan, Italy
| | - Dario Besusso
- Department of Biosciences, University of Milan, Street Giovanni Celoria, 26, 20133, Milan, Italy
- INGM, Istituto Nazionale Genetica Molecolare ‘Romeo ed Enrica Invernizzi’, Street Francesco Sforza, 35, 20122, Milan, Italy
| | - Elena Cattaneo
- Department of Biosciences, University of Milan, Street Giovanni Celoria, 26, 20133, Milan, Italy
- INGM, Istituto Nazionale Genetica Molecolare ‘Romeo ed Enrica Invernizzi’, Street Francesco Sforza, 35, 20122, Milan, Italy
| |
Collapse
|
2
|
Laß J, Lüth T, Schlüter K, Schaake S, Laabs BH, Much C, Jamora RD, Rosales RL, Saranza G, Diesta CCE, Pearson CE, König IR, Brüggemann N, Klein C, Westenberger A, Trinh J. Stability of Mosaic Divergent Repeat Interruptions in X-Linked Dystonia-Parkinsonism. Mov Disord 2024; 39:1145-1153. [PMID: 38616406 DOI: 10.1002/mds.29809] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Revised: 02/27/2024] [Accepted: 03/25/2024] [Indexed: 04/16/2024] Open
Abstract
BACKGROUND X-Linked dystonia-parkinsonism (XDP) is an adult-onset neurodegenerative disorder characterized by rapidly progressive dystonia and parkinsonism. Mosaic Divergent Repeat Interruptions affecting motif Length and Sequence (mDRILS) were recently found within the TAF1 SVA repeat tract and were shown to associate with repeat stability and age at onset in XDP, specifically the AGGG [5'-SINE-VNTR-Alu(AGAGGG)2AGGG(AGAGGG)n] mDRILS. OBJECTIVE This study aimed to investigate the stability of mDRILS frequencies and stability of (AGAGGG)n repeat length during transmission in parent-offspring pairs. METHODS Fifty-six families (n = 130) were investigated for generational transmission of repeat length and mDRILS. The mDRILS stability of 16 individuals was assessed at two sampling points 1 year apart. DNA was sequenced with long-read technologies after long-range polymerase chain reaction amplification of the TAF1 SVA. Repeat number and mDRILS were detected with Noise-Cancelling Repeat Finder (NCRF). RESULTS When comparing the repeat domain, 51 of 65 children had either contractions or expansions of the repeat length. The AGGG frequency remained stable across generations at 0.074 (IQR: 0.069-0.078) (z = -0.526; P = 0.599). However, the median AGGG frequency in children with an expansion (0.072 [IQR: 0.066-0.076]) was lower compared with children with retention or contraction (0.080 [IQR: 0.073-0.083]) (z = -0.007; P = 0.003). In a logistic regression model, the AGGG frequency predicted the outcome of either expansion or retention/contraction when including repeat number and sex as covariates (β = 80.7; z-score = 2.63; P = 0.0085). The AGGG frequency varied slightly over 1 year (0.070 [IQR: 0.063-0.080] to 0.073 [IQR: 0.069-0.078]). CONCLUSIONS Our results show that a higher AGGG frequency may stabilize repeats across generations. This highlights the importance of further investigating mDRILS as a disease-modifying factor with generational differences. © 2024 The Authors. Movement Disorders published by Wiley Periodicals LLC on behalf of International Parkinson and Movement Disorder Society.
Collapse
Affiliation(s)
- Joshua Laß
- Institute of Neurogenetics, University of Lübeck, Lübeck, Germany
| | - Theresa Lüth
- Institute of Neurogenetics, University of Lübeck, Lübeck, Germany
| | | | - Susen Schaake
- Institute of Neurogenetics, University of Lübeck, Lübeck, Germany
| | - Björn-Hergen Laabs
- Institute of Medical Biometry and Statistics, University of Lübeck, Lübeck, Germany
| | - Christoph Much
- Institute of Neurogenetics, University of Lübeck, Lübeck, Germany
| | - Roland Dominic Jamora
- Department of Neurosciences, College of Medicine-Philippine General Hospital, University of the Philippines Manila, Manila, Philippines
| | - Raymond L Rosales
- Department of Neurology and Psychiatry, University of Santo Tomas and the CNS-Metropolitan Medical Center, Manila, Philippines Section of Neurology, Manila, Philippines
| | - Gerard Saranza
- Department of Internal Medicine, Chong Hua Hospital, Cebu, Philippines
| | - Cid Czarina E Diesta
- Department of Neurosciences, Movement Disorders Clinic, Makati Medical Center, Makati City, Philippines
| | | | - Inke R König
- Institute of Medical Biometry and Statistics, University of Lübeck, Lübeck, Germany
| | - Norbert Brüggemann
- Institute of Neurogenetics, University of Lübeck, Lübeck, Germany
- Department of Neurology, University of Lübeck, Lübeck, Germany
| | - Christine Klein
- Institute of Neurogenetics, University of Lübeck, Lübeck, Germany
| | - Ana Westenberger
- Institute of Neurogenetics, University of Lübeck, Lübeck, Germany
| | - Joanne Trinh
- Institute of Neurogenetics, University of Lübeck, Lübeck, Germany
| |
Collapse
|
3
|
Schmidt TT, Tyer C, Rughani P, Haggblom C, Jones JR, Dai X, Frazer KA, Gage FH, Juul S, Hickey S, Karlseder J. High resolution long-read telomere sequencing reveals dynamic mechanisms in aging and cancer. Nat Commun 2024; 15:5149. [PMID: 38890299 PMCID: PMC11189484 DOI: 10.1038/s41467-024-48917-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Accepted: 05/15/2024] [Indexed: 06/20/2024] Open
Abstract
Telomeres are the protective nucleoprotein structures at the end of linear eukaryotic chromosomes. Telomeres' repetitive nature and length have traditionally challenged the precise assessment of the composition and length of individual human telomeres. Here, we present Telo-seq to resolve bulk, chromosome arm-specific and allele-specific human telomere lengths using Oxford Nanopore Technologies' native long-read sequencing. Telo-seq resolves telomere shortening in five population doubling increments and reveals intrasample, chromosome arm-specific, allele-specific telomere length heterogeneity. Telo-seq can reliably discriminate between telomerase- and ALT-positive cancer cell lines. Thus, Telo-seq is a tool to study telomere biology during development, aging, and cancer at unprecedented resolution.
Collapse
Affiliation(s)
| | - Carly Tyer
- Oxford Nanopore Technologies, Inc., New York, NY, USA
| | | | - Candy Haggblom
- Salk Institute for Biological Studies, La Jolla, CA, 92037, USA
| | - Jeffrey R Jones
- Salk Institute for Biological Studies, La Jolla, CA, 92037, USA
| | - Xiaoguang Dai
- Oxford Nanopore Technologies, Inc., New York, NY, USA
| | - Kelly A Frazer
- Institute of Genomic Medicine, University of California, San Diego, La Jolla, CA, 92093-0761, USA
| | - Fred H Gage
- Salk Institute for Biological Studies, La Jolla, CA, 92037, USA
| | - Sissel Juul
- Oxford Nanopore Technologies, Inc., New York, NY, USA
| | - Scott Hickey
- Oxford Nanopore Technologies, Inc., New York, NY, USA.
| | - Jan Karlseder
- Salk Institute for Biological Studies, La Jolla, CA, 92037, USA.
| |
Collapse
|
4
|
Sales-Oliveira VC, Dos Santos RZ, Goes CAG, Calegari RM, Garrido-Ramos MA, Altmanová M, Ezaz T, Liehr T, Porto-Foresti F, Utsunomia R, Cioffi MB. Evolution of ancient satellite DNAs in extant alligators and caimans (Crocodylia, Reptilia). BMC Biol 2024; 22:47. [PMID: 38413947 PMCID: PMC10900743 DOI: 10.1186/s12915-024-01847-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 02/15/2024] [Indexed: 02/29/2024] Open
Abstract
BACKGROUND Crocodilians are one of the oldest extant vertebrate lineages, exhibiting a combination of evolutionary success and morphological resilience that has persisted throughout the history of life on Earth. This ability to endure over such a long geological time span is of great evolutionary importance. Here, we have utilized the combination of genomic and chromosomal data to identify and compare the full catalogs of satellite DNA families (satDNAs, i.e., the satellitomes) of 5 out of the 8 extant Alligatoridae species. As crocodilian genomes reveal ancestral patterns of evolution, by employing this multispecies data collection, we can investigate and assess how satDNA families evolve over time. RESULTS Alligators and caimans displayed a small number of satDNA families, ranging from 3 to 13 satDNAs in A. sinensis and C. latirostris, respectively. Together with little variation both within and between species it highlighted long-term conservation of satDNA elements throughout evolution. Furthermore, we traced the origin of the ancestral forms of all satDNAs belonging to the common ancestor of Caimaninae and Alligatorinae. Fluorescence in situ experiments showed distinct hybridization patterns for identical orthologous satDNAs, indicating their dynamic genomic placement. CONCLUSIONS Alligators and caimans possess one of the smallest satDNA libraries ever reported, comprising only four sets of satDNAs that are shared by all species. Besides, our findings indicated limited intraspecific variation in satellite DNA, suggesting that the majority of new satellite sequences likely evolved from pre-existing ones.
Collapse
Affiliation(s)
- Vanessa C Sales-Oliveira
- Departamento de Genética E Evolução, Universidade Federal de São Carlos, São Carlos, São Paulo, Brazil
| | | | | | | | - Manuel A Garrido-Ramos
- Departamento de Genética, Facultad de Ciencias, Universidad de Granada, 18071, Granada, Spain
| | - Marie Altmanová
- Institute of Animal Physiology and Genetics, Czech Academy of Sciences, 27721, Liběchov, Czech Republic
- Department of Ecology, Faculty of Science, Charles University, 12844, Prague, Czech Republic
| | - Tariq Ezaz
- Institute for Applied Ecology, University of Canberra, Canberra, Australia
| | - Thomas Liehr
- Institute of Human Genetics, Jena University Hospital, Friedrich Schiller University, Jena, Germany.
| | | | | | - Marcelo B Cioffi
- Departamento de Genética E Evolução, Universidade Federal de São Carlos, São Carlos, São Paulo, Brazil
- Institute of Human Genetics, Jena University Hospital, Friedrich Schiller University, Jena, Germany
| |
Collapse
|
5
|
Souza-Borges CH, Utsunomia R, Varani AM, Uliano-Silva M, Lira LVG, Butzge AJ, Gomez Agudelo JF, Manso S, Freitas MV, Ariede RB, Mastrochirico-Filho VA, Penaloza C, Barria A, Porto-Foresti F, Foresti F, Hattori R, Guiguen Y, Houston RD, Hashimoto DT. De novo assembly and characterization of a highly degenerated ZW sex chromosome in the fish Megaleporinus macrocephalus. Gigascience 2024; 13:giae085. [PMID: 39589439 PMCID: PMC11590113 DOI: 10.1093/gigascience/giae085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Revised: 07/31/2024] [Accepted: 10/14/2024] [Indexed: 11/27/2024] Open
Abstract
BACKGROUND Megaleporinus macrocephalus (piauçu) is a Neotropical fish within Characoidei that presents a well-established heteromorphic ZZ/ZW sex determination system and thus constitutes a good model for studying W and Z chromosomes in fishes. We used PacBio reads and Hi-C to assemble a chromosome-level reference genome for M. macrocephalus. We generated family segregation information to construct a genetic map, pool sequencing of males and females to characterize its sex system, and RNA sequencing to highlight candidate genes of M. macrocephalus sex determination. RESULTS The reference genome of M. macrocephalus is 1,282,030,339 bp in length and has a contig and scaffold N50 of 5.0 Mb and 45.03 Mb, respectively. In the sex chromosome, based on patterns of recombination suppression, coverage, FST, and sex-specific SNPs, we distinguished a putative W-specific region that is highly differentiated, a region where Z and W still share some similarities and is undergoing degeneration, and the PAR. The sex chromosome gene repertoire includes genes from the TGF-β family (amhr2, bmp7) and the Wnt/β-catenin pathway (wnt4, wnt7a), some of which are differentially expressed. CONCLUSIONS The chromosome-level genome of piauçu exhibits high quality, establishing a valuable resource for advancing research within the group. Our discoveries offer insights into the evolutionary dynamics of Z and W sex chromosomes in fish, emphasizing ongoing degenerative processes and indicating complex interactions between Z and W sequences in specific genomic regions. Notably, amhr2 and bmp7 are potential candidate genes for sex determination in M. macrocephalus.
Collapse
Affiliation(s)
| | - Ricardo Utsunomia
- School of Sciences, São Paulo State University (Unesp), Bauru, SP, 17033-360, Brazil
| | - Alessandro M Varani
- School of Agricultural and Veterinary Sciences, São Paulo State University (Unesp), Jaboticabal, SP, 14884-900, Brazil
| | | | - Lieschen Valeria G Lira
- Aquaculture Center of Unesp, São Paulo State University (Unesp), Jaboticabal, SP, 14884-900, Brazil
| | - Arno J Butzge
- Aquaculture Center of Unesp, São Paulo State University (Unesp), Jaboticabal, SP, 14884-900, Brazil
| | - John F Gomez Agudelo
- Aquaculture Center of Unesp, São Paulo State University (Unesp), Jaboticabal, SP, 14884-900, Brazil
| | - Shisley Manso
- Aquaculture Center of Unesp, São Paulo State University (Unesp), Jaboticabal, SP, 14884-900, Brazil
| | - Milena V Freitas
- Aquaculture Center of Unesp, São Paulo State University (Unesp), Jaboticabal, SP, 14884-900, Brazil
| | - Raquel B Ariede
- Aquaculture Center of Unesp, São Paulo State University (Unesp), Jaboticabal, SP, 14884-900, Brazil
| | | | - Carolina Penaloza
- The Roslin Institute, University of Edinburgh, Easter Bush, Midlothian EH25 9RG, United Kingdom
| | - Agustín Barria
- The Roslin Institute, University of Edinburgh, Easter Bush, Midlothian EH25 9RG, United Kingdom
| | - Fábio Porto-Foresti
- School of Sciences, São Paulo State University (Unesp), Bauru, SP, 17033-360, Brazil
| | - Fausto Foresti
- Institute of Biosciences, São Paulo State University (Unesp), Botucatu, SP, 18618-689, Brazil
| | - Ricardo Hattori
- São Paulo Agency of Agribusiness and Technology (APTA), São Paulo, SP, 01037-010, Brazil
| | | | - Ross D Houston
- The Roslin Institute, University of Edinburgh, Easter Bush, Midlothian EH25 9RG, United Kingdom
| | - Diogo Teruo Hashimoto
- Aquaculture Center of Unesp, São Paulo State University (Unesp), Jaboticabal, SP, 14884-900, Brazil
| |
Collapse
|
6
|
Li R, Wu J, Li G, Liu J, Xuan J, Zhu Q. Mdwgan-gp: data augmentation for gene expression data based on multiple discriminator WGAN-GP. BMC Bioinformatics 2023; 24:427. [PMID: 37957576 PMCID: PMC10644641 DOI: 10.1186/s12859-023-05558-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Accepted: 11/06/2023] [Indexed: 11/15/2023] Open
Abstract
BACKGROUND Although gene expression data play significant roles in biological and medical studies, their applications are hampered due to the difficulty and high expenses of gathering them through biological experiments. It is an urgent problem to generate high quality gene expression data with computational methods. WGAN-GP, a generative adversarial network-based method, has been successfully applied in augmenting gene expression data. However, mode collapse or over-fitting may take place for small training samples due to just one discriminator is adopted in the method. RESULTS In this study, an improved data augmentation approach MDWGAN-GP, a generative adversarial network model with multiple discriminators, is proposed. In addition, a novel method is devised for enriching training samples based on linear graph convolutional network. Extensive experiments were implemented on real biological data. CONCLUSIONS The experimental results have demonstrated that compared with other state-of-the-art methods, the MDWGAN-GP method can produce higher quality generated gene expression data in most cases.
Collapse
Affiliation(s)
- Rongyuan Li
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, China
| | - Jingli Wu
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, China.
| | - Gaoshi Li
- Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, China
| | - Jiafei Liu
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, China
| | - Junbo Xuan
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, China
| | - Qi Zhu
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, China
| |
Collapse
|
7
|
Sauvage T, Cormier A, Delphine P. A comparison of Oxford nanopore library strategies for bacterial genomics. BMC Genomics 2023; 24:627. [PMID: 37864145 PMCID: PMC10589936 DOI: 10.1186/s12864-023-09729-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Accepted: 10/11/2023] [Indexed: 10/22/2023] Open
Abstract
BACKGROUND Oxford nanopore Technologies (ONT) provides three main library preparation strategies to sequence bacterial genomes. These include tagmentation (TAG), ligation (LIG) and amplification (PCR). Despite ONT's recommendations, making an informed decision for preparation choice remains difficult without a side-by-side comparison. Here, we sequenced 12 bacterial strains to examine the overall output of these strategies, including sequencing noise, barcoding efficiency and assembly quality based on mapping to curated genomes established herein. RESULTS Average read length ranged closely for TAG and LIG (> 5,000 bp), while being drastically smaller for PCR (< 1,100 bp). LIG produced the largest output with 33.62 Gbp vs. 11.72 Gbp for TAG and 4.79 Gbp for PCR. PCR produced the most sequencing noise with only 22.7% of reads mappable to the curated genomes, vs. 92.9% for LIG and 87.3% for TAG. Output per channel was most homogenous in LIG and most variable in PCR, while intermediate in TAG. Artifactual tandem content was most abundant in PCR (22.5%) and least in LIG and TAG (0.9% and 2.2%). Basecalling and demultiplexing of barcoded libraries resulted in ~ 20% data loss as unclassified reads and 1.5% read leakage. CONCLUSION The output of LIG was best (low noise, high read numbers of long lengths), intermediate in TAG (some noise, moderate read numbers of long lengths) and less desirable in PCR (high noise, high read numbers of short lengths). Overall, users should not accept assembly results at face value without careful replicon verification, including the detection of plasmids assembled from leaked reads.
Collapse
Affiliation(s)
- Thomas Sauvage
- Ifremer, MASAE Microbiologie Aliment Santé Environnement, F-44000, Nantes, France.
| | | | - Passerini Delphine
- Ifremer, MASAE Microbiologie Aliment Santé Environnement, F-44000, Nantes, France
| |
Collapse
|
8
|
Xu C, Ji J, Zhu X, Huangfu N, Xue H, Wang L, Zhang K, Li D, Niu L, Chen R, Gao X, Luo J, Cui J. Chromosome level genome assembly of oriental armyworm Mythimna separata. Sci Data 2023; 10:597. [PMID: 37684242 PMCID: PMC10491670 DOI: 10.1038/s41597-023-02506-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Accepted: 08/29/2023] [Indexed: 09/10/2023] Open
Abstract
The oriental armyworm, Mythimna separata, is an extremely destructive polyphagous pest with a broad host range that seriously threatens the safety of agricultural production. Here, a high-quality chromosome-level genome was assembled using Illumina, PacBio HiFi long sequencing, and Hi-C scaffolding technologies. The genome size was 706.30 Mb with a contig N50 of 22.08 Mb, and 99.2% of the assembled sequences were anchored to 31 chromosomes. In addition, 20,375 protein-coding genes and 258.68 Mb transposable elements were identified. The chromosome-level genome assembly of M. separata provides a significant genetic resource for future studies of this insect and contributes to the development of management strategies.
Collapse
Affiliation(s)
- Chao Xu
- National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang, 455000, Henan, China
- Hubei Insect Resources Utilization and Sustainable Pest Management Key Laboratory, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan, 430070, Hubei, China
| | - Jichao Ji
- National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang, 455000, Henan, China.
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, School of Agricultural Sciences, Zhengzhou University, Zhengzhou, 450001, Henan, China.
- Western Agricultural Research Center, Chinese Academy of Agricultural Sciences, Changji, 831100, China.
| | - Xiangzhen Zhu
- National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang, 455000, Henan, China
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, School of Agricultural Sciences, Zhengzhou University, Zhengzhou, 450001, Henan, China
- Western Agricultural Research Center, Chinese Academy of Agricultural Sciences, Changji, 831100, China
| | - Ningbo Huangfu
- National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang, 455000, Henan, China
- Hubei Insect Resources Utilization and Sustainable Pest Management Key Laboratory, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan, 430070, Hubei, China
| | - Hui Xue
- National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang, 455000, Henan, China
- Hubei Insect Resources Utilization and Sustainable Pest Management Key Laboratory, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan, 430070, Hubei, China
| | - Li Wang
- National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang, 455000, Henan, China
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, School of Agricultural Sciences, Zhengzhou University, Zhengzhou, 450001, Henan, China
- Western Agricultural Research Center, Chinese Academy of Agricultural Sciences, Changji, 831100, China
| | - Kaixin Zhang
- National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang, 455000, Henan, China
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, School of Agricultural Sciences, Zhengzhou University, Zhengzhou, 450001, Henan, China
- Western Agricultural Research Center, Chinese Academy of Agricultural Sciences, Changji, 831100, China
| | - Dongyang Li
- National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang, 455000, Henan, China
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, School of Agricultural Sciences, Zhengzhou University, Zhengzhou, 450001, Henan, China
- Western Agricultural Research Center, Chinese Academy of Agricultural Sciences, Changji, 831100, China
| | - Lin Niu
- National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang, 455000, Henan, China
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, School of Agricultural Sciences, Zhengzhou University, Zhengzhou, 450001, Henan, China
- Western Agricultural Research Center, Chinese Academy of Agricultural Sciences, Changji, 831100, China
| | - Ran Chen
- National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang, 455000, Henan, China
- College of Agronomy, Xinjiang Agricultural University, Urumqi, 830052, China
| | - Xueke Gao
- National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang, 455000, Henan, China.
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, School of Agricultural Sciences, Zhengzhou University, Zhengzhou, 450001, Henan, China.
- Western Agricultural Research Center, Chinese Academy of Agricultural Sciences, Changji, 831100, China.
| | - Junyu Luo
- National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang, 455000, Henan, China.
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, School of Agricultural Sciences, Zhengzhou University, Zhengzhou, 450001, Henan, China.
- Western Agricultural Research Center, Chinese Academy of Agricultural Sciences, Changji, 831100, China.
| | - Jinjie Cui
- National Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang, 455000, Henan, China.
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, School of Agricultural Sciences, Zhengzhou University, Zhengzhou, 450001, Henan, China.
- Western Agricultural Research Center, Chinese Academy of Agricultural Sciences, Changji, 831100, China.
| |
Collapse
|
9
|
Lang J, Xu Z, Wang Y, Sun J, Yang Z. NanoSTR: A method for detection of target short tandem repeats based on nanopore sequencing data. Front Mol Biosci 2023; 10:1093519. [PMID: 36743210 PMCID: PMC9889824 DOI: 10.3389/fmolb.2023.1093519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Accepted: 01/06/2023] [Indexed: 01/19/2023] Open
Abstract
Short tandem repeats (STRs) are widely present in the human genome. Studies have confirmed that STRs are associated with more than 30 diseases, and they have also been used in forensic identification and paternity testing. However, there are few methods for STR detection based on nanopore sequencing due to the challenges posed by the sequencing principles and the data characteristics of nanopore sequencing. We developed NanoSTR for detection of target STR loci based on the length-number-rank (LNR) information of reads. NanoSTR can be used for STR detection and genotyping based on long-read data from nanopore sequencing with improved accuracy and efficiency compared with other existing methods, such as Tandem-Genotypes and TRiCoLOR. NanoSTR showed 100% concordance with the expected genotypes using error-free simulated data, and also achieved >85% concordance using the standard samples (containing autosomal and Y-chromosomal loci) with MinION sequencing platform, respectively. NanoSTR showed high performance for detection of target STR markers. Although NanoSTR needs further optimization and development, it is useful as an analytical method for the detection of STR loci by nanopore sequencing. This method adds to the toolbox for nanopore-based STR analysis and expands the applications of nanopore sequencing in scientific research and clinical scenarios. The main code and the data are available at https://github.com/langjidong/NanoSTR.
Collapse
|
10
|
Rosenbohm A, Pott H, Thomsen M, Rafehi H, Kaya S, Szymczak S, Volk AE, Mueller K, Silveira I, Weishaupt JH, Tönnies H, Seibler P, Zschiedrich K, Schaake S, Westenberger A, Zühlke C, Depienne C, Trinh J, Ludolph AC, Klein C, Bahlo M, Lohmann K. Familial Cerebellar Ataxia and Amyotrophic Lateral Sclerosis/Frontotemporal Dementia with DAB1 and C9ORF72 Repeat Expansions: An 18-Year Study. Mov Disord 2022; 37:2427-2439. [PMID: 36148898 PMCID: PMC10900262 DOI: 10.1002/mds.29221] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 07/27/2022] [Accepted: 08/10/2022] [Indexed: 01/13/2023] Open
Abstract
BACKGROUND Coding and noncoding repeat expansions are an important cause of neurodegenerative diseases. OBJECTIVE This study determined the clinical and genetic features of a large German family that has been followed for almost 2 decades with an autosomal dominantly inherited spinocerebellar ataxia (SCA) and independent co-occurrence of amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD). METHODS We carried out clinical examinations and telephone interviews, reviewed medical records, and performed magnetic resonance imaging and positron emission tomography scans of all available family members. Comprehensive genetic investigations included linkage analysis, short-read genome sequencing, long-read sequencing, repeat-primed polymerase chain reaction, and Southern blotting. RESULTS The family comprises 118 members across seven generations, 30 of whom were definitely and five possibly affected. In this family, two different pathogenic mutations were found, a heterozygous repeat expansion in C9ORF72 in four patients with ALS/FTD and a heterozygous repeat expansion in DAB1 in at least nine patients with SCA, leading to a diagnosis of DAB1-related ataxia (ATX-DAB1; SCA37). One patient was affected by ALS and SCA and carried both repeat expansions. The repeat in DAB1 had the same configuration but was larger than those previously described ([ATTTT]≈75 [ATTTC]≈40-100 [ATTTT]≈415 ). Clinical features in patients with SCA included spinocerebellar symptoms, sometimes accompanied by additional ophthalmoplegia, vertical nystagmus, tremor, sensory deficits, and dystonia. After several decades, some of these patients suffered from cognitive decline and one from additional nonprogressive lower motor neuron affection. CONCLUSION We demonstrate genetic and clinical findings during an 18-year period in a unique family carrying two different pathogenic repeat expansions, providing novel insights into their genotypic and phenotypic spectrums. © 2022 The Authors. Movement Disorders published by Wiley Periodicals LLC on behalf of International Parkinson and Movement Disorder Society.
Collapse
Affiliation(s)
| | - Hendrik Pott
- Institute of NeurogeneticsUniversity of LübeckLübeckGermany
| | - Mirja Thomsen
- Institute of NeurogeneticsUniversity of LübeckLübeckGermany
| | - Haloom Rafehi
- Division of Population Health and ImmunityThe Walter and Eliza Hall Institute of Medical ResearchParkvilleAustralia
- Department of Medical BiologyThe University of MelbourneParkvilleAustralia
| | - Sabine Kaya
- Institute of Human GeneticsUniversity Hospital EssenEssenGermany
| | - Silke Szymczak
- Insitute of Medical Biometry and StatisticsUniversity of LübeckLübeckGermany
| | - Alexander E. Volk
- Institute of Human GeneticsUniversity Medical Center Hamburg‐EppendorfHamburgGermany
| | | | - Isabel Silveira
- i3S‐Instituto de Investigação e Inovação em SaúdeUniversidade do PortoPortoPortugal
| | - Jochen H. Weishaupt
- Division of Neurodegeneration, Neurology DepartmentUniversity Medicine Mannheim, Heidelberg UniversityMannheimGermany
| | - Holger Tönnies
- Institute of Human GeneticsChristian‐Albrechts‐UniversityKielGermany
| | - Philip Seibler
- Institute of NeurogeneticsUniversity of LübeckLübeckGermany
| | | | - Susen Schaake
- Institute of NeurogeneticsUniversity of LübeckLübeckGermany
| | | | | | | | - Joanne Trinh
- Institute of NeurogeneticsUniversity of LübeckLübeckGermany
| | - Albert C. Ludolph
- Department of NeurologyUniversity of UlmUlmGermany
- German Center for Neurodegenerative Diseases, Site UlmUlmGermany
| | | | - Melanie Bahlo
- Division of Population Health and ImmunityThe Walter and Eliza Hall Institute of Medical ResearchParkvilleAustralia
- Department of Medical BiologyThe University of MelbourneParkvilleAustralia
| | - Katja Lohmann
- Institute of NeurogeneticsUniversity of LübeckLübeckGermany
| |
Collapse
|
11
|
Kirov I, Kolganova E, Dudnikov M, Yurkevich OY, Amosova AV, Muravenko OV. A Pipeline NanoTRF as a New Tool for De Novo Satellite DNA Identification in the Raw Nanopore Sequencing Reads of Plant Genomes. PLANTS (BASEL, SWITZERLAND) 2022; 11:2103. [PMID: 36015406 PMCID: PMC9413040 DOI: 10.3390/plants11162103] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 08/08/2022] [Accepted: 08/11/2022] [Indexed: 06/15/2023]
Abstract
High-copy tandemly organized repeats (TRs), or satellite DNA, is an important but still enigmatic component of eukaryotic genomes. TRs comprise arrays of multi-copy and highly similar tandem repeats, which makes the elucidation of TRs a very challenging task. Oxford Nanopore sequencing data provide a valuable source of information on TR organization at the single molecule level. However, bioinformatics tools for de novo identification of TRs in raw Nanopore data have not been reported so far. We developed NanoTRF, a new python pipeline for TR repeat identification, characterization and consensus monomer sequence assembly. This new pipeline requires only a raw Nanopore read file from low-depth (<1×) genome sequencing. The program generates an informative html report and figures on TR genome abundance, monomer sequence and monomer length. In addition, NanoTRF performs annotation of transposable elements (TEs) sequences within or near satDNA arrays, and the information can be used to elucidate how TR−TE co-evolve in the genome. Moreover, we validated by FISH that the NanoTRF report is useful for the evaluation of TR chromosome organization—clustered or dispersed. Our findings showed that NanoTRF is a robust method for the de novo identification of satellite repeats in raw Nanopore data without prior read assembly. The obtained sequences can be used in many downstream analyses including genome assembly assistance and gap estimation, chromosome mapping and cytogenetic marker development.
Collapse
Affiliation(s)
- Ilya Kirov
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, Moscow 127550, Russia
- Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia
| | - Elizaveta Kolganova
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, Moscow 127550, Russia
| | - Maxim Dudnikov
- All-Russia Research Institute of Agricultural Biotechnology, Timiryazevskaya Str. 42, Moscow 127550, Russia
- Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia
| | - Olga Yu. Yurkevich
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow 119991, Russia
| | - Alexandra V. Amosova
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow 119991, Russia
| | - Olga V. Muravenko
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow 119991, Russia
| |
Collapse
|
12
|
Navrátilová P, Toegelová H, Tulpová Z, Kuo Y, Stein N, Doležel J, Houben A, Šimková H, Mascher M. Prospects of telomere-to-telomere assembly in barley: Analysis of sequence gaps in the MorexV3 reference genome. PLANT BIOTECHNOLOGY JOURNAL 2022; 20:1373-1386. [PMID: 35338551 PMCID: PMC9241371 DOI: 10.1111/pbi.13816] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 02/11/2022] [Accepted: 03/20/2022] [Indexed: 05/06/2023]
Abstract
The first gapless, telomere-to-telomere (T2T) sequence assemblies of plant chromosomes were reported recently. However, sequence assemblies of most plant genomes remain fragmented. Only recent breakthroughs in accurate long-read sequencing have made it possible to achieve highly contiguous sequence assemblies with a few tens of contigs per chromosome, that is a number small enough to allow for a systematic inquiry into the causes of the remaining sequence gaps and the approaches and resources needed to close them. Here, we analyse sequence gaps in the current reference genome sequence of barley cv. Morex (MorexV3). Optical map and sequence raw data, complemented by ChIP-seq data for centromeric histone variant CENH3, were used to estimate the abundance of centromeric, ribosomal DNA, and subtelomeric repeats in the barley genome. These estimates were compared with copy numbers in the MorexV3 pseudomolecule sequence. We found that almost all centromeric sequences and 45S ribosomal DNA repeat arrays were absent from the MorexV3 pseudomolecules and that the majority of sequence gaps can be attributed to assembly breakdown in long stretches of satellite repeats. However, missing sequences cannot fully account for the difference between assembly size and flow cytometric genome size estimates. We discuss the prospects of gap closure with ultra-long sequence reads.
Collapse
Affiliation(s)
- Pavla Navrátilová
- Institute of Experimental Botany of the Czech Academy of SciencesOlomoucCzech Republic
| | - Helena Toegelová
- Institute of Experimental Botany of the Czech Academy of SciencesOlomoucCzech Republic
| | - Zuzana Tulpová
- Institute of Experimental Botany of the Czech Academy of SciencesOlomoucCzech Republic
| | - Yi‐Tzu Kuo
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) GaterslebenSeelandGermany
| | - Nils Stein
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) GaterslebenSeelandGermany
- Center for Integrated Breeding Research (CiBreed)Georg‐August‐University GöttingenGöttingenGermany
| | - Jaroslav Doležel
- Institute of Experimental Botany of the Czech Academy of SciencesOlomoucCzech Republic
| | - Andreas Houben
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) GaterslebenSeelandGermany
| | - Hana Šimková
- Institute of Experimental Botany of the Czech Academy of SciencesOlomoucCzech Republic
| | - Martin Mascher
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) GaterslebenSeelandGermany
- German Centre for Integrative Biodiversity Research (iDiv) Halle‐Jena‐LeipzigLeipzigGermany
| |
Collapse
|
13
|
Lüth T, Schaake S, Grünewald A, May P, Trinh J, Weissensteiner H. Benchmarking Low-Frequency Variant Calling With Long-Read Data on Mitochondrial DNA. Front Genet 2022; 13:887644. [PMID: 35664331 PMCID: PMC9161029 DOI: 10.3389/fgene.2022.887644] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Accepted: 04/18/2022] [Indexed: 11/13/2022] Open
Abstract
Background: Sequencing quality has improved over the last decade for long-reads, allowing for more accurate detection of somatic low-frequency variants. In this study, we used mixtures of mitochondrial samples with different haplogroups (i.e., a specific set of mitochondrial variants) to investigate the applicability of nanopore sequencing for low-frequency single nucleotide variant detection. Methods: We investigated the impact of base-calling, alignment/mapping, quality control steps, and variant calling by comparing the results to a previously derived short-read gold standard generated on the Illumina NextSeq. For nanopore sequencing, six mixtures of four different haplotypes were prepared, allowing us to reliably check for expected variants at the predefined 5%, 2%, and 1% mixture levels. We used two different versions of Guppy for base-calling, two aligners (i.e., Minimap2 and Ngmlr), and three variant callers (i.e., Mutserve2, Freebayes, and Nanopanel2) to compare low-frequency variants. We used F1 score measurements to assess the performance of variant calling. Results: We observed a mean read length of 11 kb and a mean overall read quality of 15. Ngmlr showed not only higher F1 scores but also higher allele frequencies (AF) of false-positive calls across the mixtures (mean F1 score = 0.83; false-positive allele frequencies < 0.17) compared to Minimap2 (mean F1 score = 0.82; false-positive AF < 0.06). Mutserve2 had the highest F1 scores (5% level: F1 score >0.99, 2% level: F1 score >0.54, and 1% level: F1 score >0.70) across all callers and mixture levels. Conclusion: We here present the benchmarking for low-frequency variant calling with nanopore sequencing by identifying current limitations.
Collapse
Affiliation(s)
- Theresa Lüth
- Institute of Neurogenetics, University of Lübeck and University Hospital Schleswig-Holstein, Lübeck, Germany
| | - Susen Schaake
- Institute of Neurogenetics, University of Lübeck and University Hospital Schleswig-Holstein, Lübeck, Germany
| | - Anne Grünewald
- Institute of Neurogenetics, University of Lübeck and University Hospital Schleswig-Holstein, Lübeck, Germany
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Patrick May
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Joanne Trinh
- Institute of Neurogenetics, University of Lübeck and University Hospital Schleswig-Holstein, Lübeck, Germany
- *Correspondence: Joanne Trinh, ; Hansi Weissensteiner,
| | - Hansi Weissensteiner
- Institute of Genetic Epidemiology, Medical University of Innsbruck, Innsbruck, Austria
- *Correspondence: Joanne Trinh, ; Hansi Weissensteiner,
| |
Collapse
|
14
|
Trinh J, Lüth T, Schaake S, Laabs BH, Schlüter K, Laβ J, Pozojevic J, Tse R, König I, Jamora RD, Rosales RL, Brüggemann N, Saranza G, Diesta CCE, Kaiser FJ, Depienne C, Pearson CE, Westenberger A, Klein C. Mosaic divergent repeat interruptions in XDP influence repeat stability and disease onset. Brain 2022; 146:1075-1082. [PMID: 35481544 PMCID: PMC9976955 DOI: 10.1093/brain/awac160] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Revised: 03/14/2022] [Accepted: 04/14/2022] [Indexed: 11/14/2022] Open
Abstract
While many genetic causes of movement disorders have been identified, modifiers of disease expression are largely unknown. X-linked dystonia-parkinsonism (XDP) is a neurodegenerative disease caused by a SINE-VNTR-Alu(AGAGGG)n retrotransposon insertion in TAF1, with a polymorphic (AGAGGG)n repeat. Repeat length and variants in MSH3 and PMS2 explain ∼65% of the variance in age at onset (AAO) in XDP. However, additional genetic modifiers are conceivably at play in XDP, such as repeat interruptions. Long-read nanopore sequencing of PCR amplicons from XDP patients (n = 202) was performed to assess potential repeat interruption and instability. Repeat-primed PCR and Cas9-mediated targeted enrichment confirmed the presence of identified divergent repeat motifs. In addition to the canonical pure SINE-VNTR-Alu-5'-(AGAGGG)n, we observed a mosaic of divergent repeat motifs that polarized at the beginning of the tract, where the divergent repeat interruptions varied in motif length by having one, two, or three nucleotides fewer than the hexameric motif, distinct from interruptions in other disease-associated repeats, which match the lengths of the canonical motifs. All divergent configurations occurred mosaically and in two investigated brain regions (basal ganglia, cerebellum) and in blood-derived DNA from the same patient. The most common divergent interruption was AGG [5'-SINE-VNTR-Alu(AGAGGG)2AGG(AGAGGG)n], similar to the pure tract, followed by AGGG [5'-SINE-VNTR-Alu(AGAGGG)2AGGG(AGAGGG)n], at median frequencies of 0.425 (IQR: 0.42-0.43) and 0.128 (IQR: 0.12-0.13), respectively. The mosaic AGG motif was not associated with repeat number (estimate = -3.8342, P = 0.869). The mosaic pure tract frequency was associated with repeat number (estimate = 45.32, P = 0.0441) but not AAO (estimate = -41.486, P = 0.378). Importantly, the mosaic frequency of the AGGG negatively correlated with repeat number after adjusting for age at sampling (estimate = -161.09, P = 3.44 × 10-5). When including the XDP-relevant MSH3/PMS2 modifier single nucleotide polymorphisms into the model, the mosaic AGGG frequency was associated with AAO (estimate = 155.1063, P = 0.047); however, the association dissipated after including the repeat number (estimate = -92.46430, P = 0.079). We reveal novel mosaic divergent repeat interruptions affecting both motif length and sequence (DRILS) of the canonical motif polarized within the SINE-VNTR-Alu(AGAGGG)n repeat. Our study illustrates: (i) the importance of somatic mosaic genotypes; (ii) the biological plausibility of multiple modifiers (both germline and somatic) that can have additive effects on repeat instability; and (iii) that these variations may remain undetected without assessment of single molecules.
Collapse
Affiliation(s)
- Joanne Trinh
- Correspondence to: Joanne Trinh, PhD University of Lübeck, Ratzeburger Allee 160 23538 Lübeck, Germany E-mail:
| | - Theresa Lüth
- Institute of Neurogenetics, University of Lübeck and University Hospital Schleswig-Holstein, Lübeck, Germany
| | - Susen Schaake
- Institute of Neurogenetics, University of Lübeck and University Hospital Schleswig-Holstein, Lübeck, Germany
| | - Björn-Hergen Laabs
- Institute of Medical Biometry and Statistics, University of Lübeck, Lübeck, Germany
| | - Kathleen Schlüter
- Institute of Neurogenetics, University of Lübeck and University Hospital Schleswig-Holstein, Lübeck, Germany
| | - Joshua Laβ
- Institute of Neurogenetics, University of Lübeck and University Hospital Schleswig-Holstein, Lübeck, Germany
| | - Jelena Pozojevic
- Institute of Neurogenetics, University of Lübeck and University Hospital Schleswig-Holstein, Lübeck, Germany
| | - Ronnie Tse
- Institute of Neurogenetics, University of Lübeck and University Hospital Schleswig-Holstein, Lübeck, Germany
| | - Inke König
- Institute of Medical Biometry and Statistics, University of Lübeck, Lübeck, Germany
| | - Roland Dominic Jamora
- Department of Neurosciences, College of Medicine—Philippine General Hospital, University of the Philippines Manila, Manila, Philippines
| | - Raymond L Rosales
- Department of Neurology and Psychiatry, University of Santo Tomas and the CNS-Metropolitan Medical Center, Manila, Philippines
| | - Norbert Brüggemann
- Institute of Neurogenetics, University of Lübeck and University Hospital Schleswig-Holstein, Lübeck, Germany,Department of Neurology, University of Lübeck, Lübeck, Germany
| | - Gerard Saranza
- Section of Neurology, Department of Internal Medicine, Chong Hua Hospital, Cebu, Philippines
| | - Cid Czarina E Diesta
- Department of Neurosciences, Movement Disorders Clinic, Makati Medical Center, Makati City, Philippines
| | - Frank J Kaiser
- Institute for Human Genetics at the University Hospital Essen, Essen, Germany,Center for Rare Diseases (Essenser Zentrum für Seltene Erkrankungen—EZSE) at the University Hospital Essen, Essen, Germany
| | - Christel Depienne
- Institute for Human Genetics at the University Hospital Essen, Essen, Germany
| | - Christopher E Pearson
- Program of Genetics and Genome Biology, The Hospital for Sick Children, The Peter Gilgan Centre for Research and Learning, Toronto, Canada,University of Toronto, Program of Molecular Genetics, Toronto, Canada
| | - Ana Westenberger
- Institute of Neurogenetics, University of Lübeck and University Hospital Schleswig-Holstein, Lübeck, Germany
| | - Christine Klein
- Institute of Neurogenetics, University of Lübeck and University Hospital Schleswig-Holstein, Lübeck, Germany
| |
Collapse
|
15
|
Storer JM, Hubley R, Rosen J, Smit AFA. Methodologies for the De novo Discovery of Transposable Element Families. Genes (Basel) 2022; 13:709. [PMID: 35456515 PMCID: PMC9025800 DOI: 10.3390/genes13040709] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 04/14/2022] [Accepted: 04/15/2022] [Indexed: 02/07/2023] Open
Abstract
The discovery and characterization of transposable element (TE) families are crucial tasks in the process of genome annotation. Careful curation of TE libraries for each organism is necessary as each has been exposed to a unique and often complex set of TE families. De novo methods have been developed; however, a fully automated and accurate approach to the development of complete libraries remains elusive. In this review, we cover established methods and recent developments in de novo TE analysis. We also present various methodologies used to assess these tools and discuss opportunities for further advancement of the field.
Collapse
Affiliation(s)
| | | | | | - Arian F. A. Smit
- Institute for Systems Biology, Seattle, WA 98109, USA; (J.M.S.); (R.H.); (J.R.)
| |
Collapse
|
16
|
Zhou Y, Wang Y, Xiong X, Appel AG, Zhang C, Wang X. Profiles of telomeric repeats in Insecta reveal diverse forms of telomeric motifs in Hymenopterans. Life Sci Alliance 2022; 5:5/7/e202101163. [PMID: 35365574 PMCID: PMC8977481 DOI: 10.26508/lsa.202101163] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Revised: 03/04/2022] [Accepted: 03/04/2022] [Indexed: 12/23/2022] Open
Abstract
Telomeres consist of highly conserved simple tandem telomeric repeat motif (TRM): (TTAGG)n in arthropods, (TTAGGG)n in vertebrates, and (TTTAGGG)n in most plants. TRM can be detected from chromosome-level assembly, which typically requires long-read sequencing data. To take advantage of short-read data, we developed an ultra-fast Telomeric Repeats Identification Pipeline and evaluated its performance on 91 species. With proven accuracy, we applied Telomeric Repeats Identification Pipeline in 129 insect species, using 7 Tbp of short-read sequences. We confirmed (TTAGG)n as the TRM in 19 orders, suggesting it is the ancestral form in insects. Systematic profiling in Hymenopterans revealed a diverse range of TRMs, including the canonical 5-bp TTAGG (bees, ants, and basal sawflies), three independent losses of tandem repeat form TRM (Ichneumonoids, hunting wasps, and gall-forming wasps), and most interestingly, a common 8-bp (TTATTGGG)n in Chalcid wasps with two 9-bp variants in the miniature wasp (TTACTTGGG) and fig wasps (TTATTGGGG). Our results identified extraordinary evolutionary fluidity of Hymenopteran TRMs, and rapid evolution of TRM and repeat abundance at all evolutionary scales, providing novel insights into telomere evolution.
Collapse
Affiliation(s)
- Yihang Zhou
- Fundamental Research Center, Shanghai YangZhi Rehabilitation Hospital (Shanghai Sunshine Rehabilitation Center), School of Life Sciences and Technology, Tongji University, Shanghai, China.,Department of Pathobiology, College of Veterinary Medicine, Auburn University, Auburn, AL, USA.,Auburn University Center for Advanced Science, Innovation, and Commerce, Alabama Agricultural Experiment Station, Auburn, AL, USA
| | - Yi Wang
- Ministry of Education Key Laboratory of Contemporary Anthropology, Department of Anthropology and Human Genetics, School of Life Sciences, Fudan University, Shanghai, China.,Human Phenome Institute, Fudan University, Shanghai, China
| | - Xiao Xiong
- Fundamental Research Center, Shanghai YangZhi Rehabilitation Hospital (Shanghai Sunshine Rehabilitation Center), School of Life Sciences and Technology, Tongji University, Shanghai, China.,Department of Pathobiology, College of Veterinary Medicine, Auburn University, Auburn, AL, USA.,Auburn University Center for Advanced Science, Innovation, and Commerce, Alabama Agricultural Experiment Station, Auburn, AL, USA
| | - Arthur G Appel
- Auburn University Center for Advanced Science, Innovation, and Commerce, Alabama Agricultural Experiment Station, Auburn, AL, USA.,Department of Entomology and Plant Pathology, Auburn University, AL, USA
| | - Chao Zhang
- Fundamental Research Center, Shanghai YangZhi Rehabilitation Hospital (Shanghai Sunshine Rehabilitation Center), School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Xu Wang
- Department of Pathobiology, College of Veterinary Medicine, Auburn University, Auburn, AL, USA.,Auburn University Center for Advanced Science, Innovation, and Commerce, Alabama Agricultural Experiment Station, Auburn, AL, USA.,Department of Entomology and Plant Pathology, Auburn University, AL, USA.,HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
| |
Collapse
|
17
|
Finding and Characterizing Repeats in Plant Genomes. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2443:327-385. [PMID: 35037215 DOI: 10.1007/978-1-0716-2067-0_18] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Plant genomes contain a particularly high proportion of repeated structures of various types. This chapter proposes a guided tour of the available software that can help biologists to scan automatically for these repeats in sequence data or check hypothetical models intended to characterize their structures. Since transposable elements (TEs) are a major source of repeats in plants, many methods have been used or developed for this broad class of sequences. They are representative of the range of tools available for other classes of repeats and we have provided two sections on this topic (for the analysis of genomes or directly of sequenced reads), as well as a selection of the main existing software. It may be hard to keep up with the profusion of proposals in this dynamic field and the rest of the chapter is devoted to the foundations of an efficient search for repeats and more complex patterns. We first introduce the key concepts of the art of indexing and mapping or querying sequences. We end the chapter with the more prospective issue of building models of repeat families. We present the Machine Learning approach first, seeking to build predictors automatically for some families of ET, from a set of sequences known to belong to this family. A second approach, the linguistic (or syntactic) approach, allows biologists to describe themselves and check the validity of models of their favorite repeat family.
Collapse
|
18
|
Lüth T, Laβ J, Schaake S, Wohlers I, Pozojevic J, Jamora RDG, Rosales RL, Brüggemann N, Saranza G, Diesta CCE, Schlüter K, Tse R, Reyes CJ, Brand M, Busch H, Klein C, Westenberger A, Trinh J. Elucidating Hexanucleotide Repeat Number and Methylation within the X-Linked Dystonia-Parkinsonism (XDP)-Related SVA Retrotransposon in TAF1 with Nanopore Sequencing. Genes (Basel) 2022; 13:genes13010126. [PMID: 35052466 PMCID: PMC8775018 DOI: 10.3390/genes13010126] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Revised: 01/05/2022] [Accepted: 01/07/2022] [Indexed: 12/13/2022] Open
Abstract
Background: X-linked dystonia-parkinsonism (XDP) is an adult-onset neurodegenerative disorder characterized by progressive dystonia and parkinsonism. It is caused by a SINE-VNTR-Alu (SVA) retrotransposon insertion in the TAF1 gene with a polymorphic (CCCTCT)n domain that acts as a genetic modifier of disease onset and expressivity. Methods: Herein, we used Nanopore sequencing to investigate SVA genetic variability and methylation. We used blood-derived DNA from 96 XDP patients for amplicon-based deep Nanopore sequencing and validated it with fragment analysis which was performed using fluorescence-based PCR. To detect methylation from blood- and brain-derived DNA, we used a Cas9-targeted approach. Results: High concordance was observed for hexanucleotide repeat numbers detected with Nanopore sequencing and fragment analysis. Within the SVA locus, there was no difference in genetic variability other than variations of the repeat motif between patients. We detected high CpG methylation frequency (MF) of the SVA and flanking regions (mean MF = 0.94, SD = ±0.12). Our preliminary results suggest only subtle differences between the XDP patient and the control in predicted enhancer sites directly flanking the SVA locus. Conclusions: Nanopore sequencing can reliably detect SVA hexanucleotide repeat numbers, methylation and, lastly, variation in the repeat motif.
Collapse
Affiliation(s)
- Theresa Lüth
- Institute of Neurogenetics, University of Luebeck, 23538 Luebeck, Germany; (T.L.); (J.L.); (S.S.); (J.P.); (N.B.); (K.S.); (R.T.); (C.J.R.); (M.B.); (C.K.); (A.W.)
| | - Joshua Laβ
- Institute of Neurogenetics, University of Luebeck, 23538 Luebeck, Germany; (T.L.); (J.L.); (S.S.); (J.P.); (N.B.); (K.S.); (R.T.); (C.J.R.); (M.B.); (C.K.); (A.W.)
| | - Susen Schaake
- Institute of Neurogenetics, University of Luebeck, 23538 Luebeck, Germany; (T.L.); (J.L.); (S.S.); (J.P.); (N.B.); (K.S.); (R.T.); (C.J.R.); (M.B.); (C.K.); (A.W.)
| | - Inken Wohlers
- Medical Systems Biology Division, Luebeck Institute of Experimental Dermatology, University of Luebeck, 23538 Luebeck, Germany; (I.W.); (H.B.)
- Institute for Cardiogenetics, University of Luebeck, 23538 Luebeck, Germany
| | - Jelena Pozojevic
- Institute of Neurogenetics, University of Luebeck, 23538 Luebeck, Germany; (T.L.); (J.L.); (S.S.); (J.P.); (N.B.); (K.S.); (R.T.); (C.J.R.); (M.B.); (C.K.); (A.W.)
| | - Roland Dominic G. Jamora
- Department of Neurosciences, College of Medicine, Philippine General Hospital, University of the Philippines Manila, Manila 1000, Philippines;
| | - Raymond L. Rosales
- Department of Neurology and Psychiatry, The Hospital Neuroscience Institute, University of Santo Tomas, Manila 1008, Philippines;
| | - Norbert Brüggemann
- Institute of Neurogenetics, University of Luebeck, 23538 Luebeck, Germany; (T.L.); (J.L.); (S.S.); (J.P.); (N.B.); (K.S.); (R.T.); (C.J.R.); (M.B.); (C.K.); (A.W.)
- Department of Neurology, University of Luebeck, 23538 Luebeck, Germany
| | - Gerard Saranza
- Section of Neurology, Department of Internal Medicine, Chong Hua Hospital, Cebu City 6000, Philippines;
| | - Cid Czarina E. Diesta
- Department of Neurosciences, Movement Disorders Clinic, Makati Medical Center, Makati 1229, Philippines;
| | - Kathleen Schlüter
- Institute of Neurogenetics, University of Luebeck, 23538 Luebeck, Germany; (T.L.); (J.L.); (S.S.); (J.P.); (N.B.); (K.S.); (R.T.); (C.J.R.); (M.B.); (C.K.); (A.W.)
| | - Ronnie Tse
- Institute of Neurogenetics, University of Luebeck, 23538 Luebeck, Germany; (T.L.); (J.L.); (S.S.); (J.P.); (N.B.); (K.S.); (R.T.); (C.J.R.); (M.B.); (C.K.); (A.W.)
| | - Charles Jourdan Reyes
- Institute of Neurogenetics, University of Luebeck, 23538 Luebeck, Germany; (T.L.); (J.L.); (S.S.); (J.P.); (N.B.); (K.S.); (R.T.); (C.J.R.); (M.B.); (C.K.); (A.W.)
| | - Max Brand
- Institute of Neurogenetics, University of Luebeck, 23538 Luebeck, Germany; (T.L.); (J.L.); (S.S.); (J.P.); (N.B.); (K.S.); (R.T.); (C.J.R.); (M.B.); (C.K.); (A.W.)
| | - Hauke Busch
- Medical Systems Biology Division, Luebeck Institute of Experimental Dermatology, University of Luebeck, 23538 Luebeck, Germany; (I.W.); (H.B.)
- Institute for Cardiogenetics, University of Luebeck, 23538 Luebeck, Germany
| | - Christine Klein
- Institute of Neurogenetics, University of Luebeck, 23538 Luebeck, Germany; (T.L.); (J.L.); (S.S.); (J.P.); (N.B.); (K.S.); (R.T.); (C.J.R.); (M.B.); (C.K.); (A.W.)
| | - Ana Westenberger
- Institute of Neurogenetics, University of Luebeck, 23538 Luebeck, Germany; (T.L.); (J.L.); (S.S.); (J.P.); (N.B.); (K.S.); (R.T.); (C.J.R.); (M.B.); (C.K.); (A.W.)
| | - Joanne Trinh
- Institute of Neurogenetics, University of Luebeck, 23538 Luebeck, Germany; (T.L.); (J.L.); (S.S.); (J.P.); (N.B.); (K.S.); (R.T.); (C.J.R.); (M.B.); (C.K.); (A.W.)
- Correspondence:
| |
Collapse
|
19
|
Xiao X, Zhang CY, Zhang Z, Hu Z, Li M, Li T. Revisiting tandem repeats in psychiatric disorders from perspectives of genetics, physiology, and brain evolution. Mol Psychiatry 2022; 27:466-475. [PMID: 34650204 DOI: 10.1038/s41380-021-01329-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/22/2021] [Revised: 09/16/2021] [Accepted: 09/28/2021] [Indexed: 01/28/2023]
Abstract
Genome-wide association studies (GWASs) have revealed substantial genetic components comprised of single nucleotide polymorphisms (SNPs) in the heritable risk of psychiatric disorders. However, genetic risk factors not covered by GWAS also play pivotal roles in these illnesses. Tandem repeats, which are likely functional but frequently overlooked by GWAS, may account for an important proportion in the "missing heritability" of psychiatric disorders. Despite difficulties in characterizing and quantifying tandem repeats in the genome, studies have been carried out in an attempt to describe impact of tandem repeats on gene regulation and human phenotypes. In this review, we have introduced recent research progress regarding the genomic distribution and regulatory mechanisms of tandem repeats. We have also summarized the current knowledge of the genetic architecture and biological underpinnings of psychiatric disorders brought by studies of tandem repeats. These findings suggest that tandem repeats, in candidate psychiatric risk genes or in different levels of linkage disequilibrium (LD) with psychiatric GWAS SNPs and haplotypes, may modulate biological phenotypes related to psychiatric disorders (e.g., cognitive function and brain physiology) through regulating alternative splicing, promoter activity, enhancer activity and so on. In addition, many tandem repeats undergo tight natural selection in the human lineage, and likely exert crucial roles in human brain evolution. Taken together, the putative roles of tandem repeats in the pathogenesis of psychiatric disorders is strongly implicated, and using examples from previous literatures, we wish to call for further attention to tandem repeats in the post-GWAS era of psychiatric disorders.
Collapse
Affiliation(s)
- Xiao Xiao
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences and Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Chu-Yi Zhang
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences and Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China.,Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Zhuohua Zhang
- Institute of Molecular Precision Medicine and Hunan Key Laboratory of Molecular Precision Medicine, Xiangya Hospital, Central South University, Changsha, Hunan, China.,Center for Medical Genetics and Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, China
| | - Zhonghua Hu
- Institute of Molecular Precision Medicine and Hunan Key Laboratory of Molecular Precision Medicine, Xiangya Hospital, Central South University, Changsha, Hunan, China. .,Center for Medical Genetics and Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, China. .,Department of Critical Care Medicine, Xiangya Hospital, Central South University, Changsha, Hunan, China. .,National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, Hunan, China. .,Hunan Key Laboratory of Animal Models for Human Diseases, School of Life Sciences, Central South University, Changsha, Hunan, China. .,Eye Center of Xiangya Hospital and Hunan Key Laboratory of Ophthalmology, Central South University, Changsha, Hunan, China. .,National Clinical Research Center on Mental Disorders, Changsha, Hunan, China.
| | - Ming Li
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences and Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China. .,CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, China. .,KIZ-CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China.
| | - Tao Li
- Affiliated Mental Health Center & Hangzhou Seventh People's Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China. .,Guangdong-Hong Kong-Macao Greater Bay Area Center for Brain Science and Brain-Inspired Intelligence, Guangzhou, China.
| |
Collapse
|
20
|
Grosso V, Marcolungo L, Maestri S, Alfano M, Lavezzari D, Iadarola B, Salviati A, Mariotti B, Botta A, D’Apice MR, Novelli G, Delledonne M, Rossato M. Characterization of FMR1 Repeat Expansion and Intragenic Variants by Indirect Sequence Capture. Front Genet 2021; 12:743230. [PMID: 34646309 PMCID: PMC8504923 DOI: 10.3389/fgene.2021.743230] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2021] [Accepted: 08/26/2021] [Indexed: 11/30/2022] Open
Abstract
Traditional methods for the analysis of repeat expansions, which underlie genetic disorders, such as fragile X syndrome (FXS), lack single-nucleotide resolution in repeat analysis and the ability to characterize causative variants outside the repeat array. These drawbacks can be overcome by long-read and short-read sequencing, respectively. However, the routine application of next-generation sequencing in the clinic requires target enrichment, and none of the available methods allows parallel analysis of long-DNA fragments using both sequencing technologies. In this study, we investigated the use of indirect sequence capture (Xdrop technology) coupled to Nanopore and Illumina sequencing to characterize FMR1, the gene responsible of FXS. We achieved the efficient enrichment (> 200×) of large target DNA fragments (~60-80 kbp) encompassing the entire FMR1 gene. The analysis of Xdrop-enriched samples by Nanopore long-read sequencing allowed the complete characterization of repeat lengths in samples with normal, pre-mutation, and full mutation status (> 1 kbp), and correctly identified repeat interruptions relevant for disease prognosis and transmission. Single-nucleotide variants (SNVs) and small insertions/deletions (indels) could be detected in the same samples by Illumina short-read sequencing, completing the mutational testing through the identification of pathogenic variants within the FMR1 gene, when no typical CGG repeat expansion is detected. The study successfully demonstrated the parallel analysis of repeat expansions and SNVs/indels in the FMR1 gene at single-nucleotide resolution by combining Xdrop enrichment with two next-generation sequencing approaches. With the appropriate optimization necessary for the clinical settings, the system could facilitate both the study of genotype-phenotype correlation in FXS and enable a more efficient diagnosis and genetic counseling for patients and their relatives.
Collapse
Affiliation(s)
- Valentina Grosso
- Department of Biotechnology, University of Verona, Verona, Italy
| | - Luca Marcolungo
- Department of Biotechnology, University of Verona, Verona, Italy
| | - Simone Maestri
- Department of Biotechnology, University of Verona, Verona, Italy
| | | | - Denise Lavezzari
- Department of Biotechnology, University of Verona, Verona, Italy
| | - Barbara Iadarola
- Department of Biotechnology, University of Verona, Verona, Italy
| | - Alessandro Salviati
- Department of Biotechnology, University of Verona, Verona, Italy
- GENARTIS srl, Verona, Italy
| | - Barbara Mariotti
- Department of Medicine, Section of General Pathology, University of Verona, Verona, Italy
| | - Annalisa Botta
- Department of Biomedicine and Prevention, Medical Genetics Section, University of Rome "Tor Vergata", Rome, Italy
| | | | - Giuseppe Novelli
- Department of Biomedicine and Prevention, Medical Genetics Section, University of Rome "Tor Vergata", Rome, Italy
- IRCCS Neuromed Mediterranean Neurological Institute, Pozzilli, Italy
- Department of Pharmacology, School of Medicine, University of Nevada, Reno, NV, United States
| | - Massimo Delledonne
- Department of Biotechnology, University of Verona, Verona, Italy
- GENARTIS srl, Verona, Italy
| | - Marzia Rossato
- Department of Biotechnology, University of Verona, Verona, Italy
- GENARTIS srl, Verona, Italy
| |
Collapse
|
21
|
Reyes CJ, Laabs BH, Schaake S, Lüth T, Ardicoglu R, Rakovic A, Grütz K, Alvarez-Fischer D, Jamora RD, Rosales RL, Weyers I, König IR, Brüggemann N, Klein C, Dobricic V, Westenberger A, Trinh J. Brain Regional Differences in Hexanucleotide Repeat Length in X-Linked Dystonia-Parkinsonism Using Nanopore Sequencing. NEUROLOGY-GENETICS 2021; 7:e608. [PMID: 34250228 PMCID: PMC8265576 DOI: 10.1212/nxg.0000000000000608] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Accepted: 06/03/2021] [Indexed: 12/14/2022]
Abstract
Objective Our study investigated the presence of regional differences in hexanucleotide repeat number in postmortem brain tissues of 2 patients with X-linked dystonia-parkinsonism (XDP), a combined dystonia-parkinsonism syndrome modified by a (CCCTCT)n repeat within the causal SINE-VNTR-Alu retrotransposon insertion in the TAF1 gene. Methods Genomic DNA was extracted from blood and postmortem brain samples, including the basal ganglia and cortex from both patients and from the cerebellum, midbrain, and pituitary gland from 1 patient. Repeat sizing was performed using fragment analysis, small-pool PCR-based Southern blotting, and Oxford nanopore sequencing. Results The basal ganglia (p < 0.001) and cerebellum (p < 0.001) showed higher median repeat numbers and higher degrees of repeat instability compared with blood. Conclusions Somatic repeat instability may predominate in brain regions selectively affected in XDP, thereby hinting at its potential role in disease manifestation and modification.
Collapse
Affiliation(s)
- Charles Jourdan Reyes
- Institute of Neurogenetics (C.J.R., S.S., T.L., R.A., A.R., K.G., D.A.-F., N.B., C.K., V.D., A.W., J.T.), University of Lübeck, and Institute of Medical Biometry and Statistics (B.-H.L., I.R.K.), University of Lübeck, Germany; Department of Neurosciences (R.D.J.), College of Medicine-Philippine General Hospital, University of the Philippines Manila; Department of Neurology and Psychiatry (R.L.R.), University of Santo Tomas Hospital, Manila, Philippines; Institute of Anatomy (I.W.), Department of Neurology (N.B.), and Lübeck Interdisciplinary Platform for Genome Analytics (V.D.), University of Lübeck, Germany
| | - Björn-Hergen Laabs
- Institute of Neurogenetics (C.J.R., S.S., T.L., R.A., A.R., K.G., D.A.-F., N.B., C.K., V.D., A.W., J.T.), University of Lübeck, and Institute of Medical Biometry and Statistics (B.-H.L., I.R.K.), University of Lübeck, Germany; Department of Neurosciences (R.D.J.), College of Medicine-Philippine General Hospital, University of the Philippines Manila; Department of Neurology and Psychiatry (R.L.R.), University of Santo Tomas Hospital, Manila, Philippines; Institute of Anatomy (I.W.), Department of Neurology (N.B.), and Lübeck Interdisciplinary Platform for Genome Analytics (V.D.), University of Lübeck, Germany
| | - Susen Schaake
- Institute of Neurogenetics (C.J.R., S.S., T.L., R.A., A.R., K.G., D.A.-F., N.B., C.K., V.D., A.W., J.T.), University of Lübeck, and Institute of Medical Biometry and Statistics (B.-H.L., I.R.K.), University of Lübeck, Germany; Department of Neurosciences (R.D.J.), College of Medicine-Philippine General Hospital, University of the Philippines Manila; Department of Neurology and Psychiatry (R.L.R.), University of Santo Tomas Hospital, Manila, Philippines; Institute of Anatomy (I.W.), Department of Neurology (N.B.), and Lübeck Interdisciplinary Platform for Genome Analytics (V.D.), University of Lübeck, Germany
| | - Theresa Lüth
- Institute of Neurogenetics (C.J.R., S.S., T.L., R.A., A.R., K.G., D.A.-F., N.B., C.K., V.D., A.W., J.T.), University of Lübeck, and Institute of Medical Biometry and Statistics (B.-H.L., I.R.K.), University of Lübeck, Germany; Department of Neurosciences (R.D.J.), College of Medicine-Philippine General Hospital, University of the Philippines Manila; Department of Neurology and Psychiatry (R.L.R.), University of Santo Tomas Hospital, Manila, Philippines; Institute of Anatomy (I.W.), Department of Neurology (N.B.), and Lübeck Interdisciplinary Platform for Genome Analytics (V.D.), University of Lübeck, Germany
| | - Raphaela Ardicoglu
- Institute of Neurogenetics (C.J.R., S.S., T.L., R.A., A.R., K.G., D.A.-F., N.B., C.K., V.D., A.W., J.T.), University of Lübeck, and Institute of Medical Biometry and Statistics (B.-H.L., I.R.K.), University of Lübeck, Germany; Department of Neurosciences (R.D.J.), College of Medicine-Philippine General Hospital, University of the Philippines Manila; Department of Neurology and Psychiatry (R.L.R.), University of Santo Tomas Hospital, Manila, Philippines; Institute of Anatomy (I.W.), Department of Neurology (N.B.), and Lübeck Interdisciplinary Platform for Genome Analytics (V.D.), University of Lübeck, Germany
| | - Aleksandar Rakovic
- Institute of Neurogenetics (C.J.R., S.S., T.L., R.A., A.R., K.G., D.A.-F., N.B., C.K., V.D., A.W., J.T.), University of Lübeck, and Institute of Medical Biometry and Statistics (B.-H.L., I.R.K.), University of Lübeck, Germany; Department of Neurosciences (R.D.J.), College of Medicine-Philippine General Hospital, University of the Philippines Manila; Department of Neurology and Psychiatry (R.L.R.), University of Santo Tomas Hospital, Manila, Philippines; Institute of Anatomy (I.W.), Department of Neurology (N.B.), and Lübeck Interdisciplinary Platform for Genome Analytics (V.D.), University of Lübeck, Germany
| | - Karen Grütz
- Institute of Neurogenetics (C.J.R., S.S., T.L., R.A., A.R., K.G., D.A.-F., N.B., C.K., V.D., A.W., J.T.), University of Lübeck, and Institute of Medical Biometry and Statistics (B.-H.L., I.R.K.), University of Lübeck, Germany; Department of Neurosciences (R.D.J.), College of Medicine-Philippine General Hospital, University of the Philippines Manila; Department of Neurology and Psychiatry (R.L.R.), University of Santo Tomas Hospital, Manila, Philippines; Institute of Anatomy (I.W.), Department of Neurology (N.B.), and Lübeck Interdisciplinary Platform for Genome Analytics (V.D.), University of Lübeck, Germany
| | - Daniel Alvarez-Fischer
- Institute of Neurogenetics (C.J.R., S.S., T.L., R.A., A.R., K.G., D.A.-F., N.B., C.K., V.D., A.W., J.T.), University of Lübeck, and Institute of Medical Biometry and Statistics (B.-H.L., I.R.K.), University of Lübeck, Germany; Department of Neurosciences (R.D.J.), College of Medicine-Philippine General Hospital, University of the Philippines Manila; Department of Neurology and Psychiatry (R.L.R.), University of Santo Tomas Hospital, Manila, Philippines; Institute of Anatomy (I.W.), Department of Neurology (N.B.), and Lübeck Interdisciplinary Platform for Genome Analytics (V.D.), University of Lübeck, Germany
| | - Roland Dominic Jamora
- Institute of Neurogenetics (C.J.R., S.S., T.L., R.A., A.R., K.G., D.A.-F., N.B., C.K., V.D., A.W., J.T.), University of Lübeck, and Institute of Medical Biometry and Statistics (B.-H.L., I.R.K.), University of Lübeck, Germany; Department of Neurosciences (R.D.J.), College of Medicine-Philippine General Hospital, University of the Philippines Manila; Department of Neurology and Psychiatry (R.L.R.), University of Santo Tomas Hospital, Manila, Philippines; Institute of Anatomy (I.W.), Department of Neurology (N.B.), and Lübeck Interdisciplinary Platform for Genome Analytics (V.D.), University of Lübeck, Germany
| | - Raymond L Rosales
- Institute of Neurogenetics (C.J.R., S.S., T.L., R.A., A.R., K.G., D.A.-F., N.B., C.K., V.D., A.W., J.T.), University of Lübeck, and Institute of Medical Biometry and Statistics (B.-H.L., I.R.K.), University of Lübeck, Germany; Department of Neurosciences (R.D.J.), College of Medicine-Philippine General Hospital, University of the Philippines Manila; Department of Neurology and Psychiatry (R.L.R.), University of Santo Tomas Hospital, Manila, Philippines; Institute of Anatomy (I.W.), Department of Neurology (N.B.), and Lübeck Interdisciplinary Platform for Genome Analytics (V.D.), University of Lübeck, Germany
| | - Imke Weyers
- Institute of Neurogenetics (C.J.R., S.S., T.L., R.A., A.R., K.G., D.A.-F., N.B., C.K., V.D., A.W., J.T.), University of Lübeck, and Institute of Medical Biometry and Statistics (B.-H.L., I.R.K.), University of Lübeck, Germany; Department of Neurosciences (R.D.J.), College of Medicine-Philippine General Hospital, University of the Philippines Manila; Department of Neurology and Psychiatry (R.L.R.), University of Santo Tomas Hospital, Manila, Philippines; Institute of Anatomy (I.W.), Department of Neurology (N.B.), and Lübeck Interdisciplinary Platform for Genome Analytics (V.D.), University of Lübeck, Germany
| | - Inke R König
- Institute of Neurogenetics (C.J.R., S.S., T.L., R.A., A.R., K.G., D.A.-F., N.B., C.K., V.D., A.W., J.T.), University of Lübeck, and Institute of Medical Biometry and Statistics (B.-H.L., I.R.K.), University of Lübeck, Germany; Department of Neurosciences (R.D.J.), College of Medicine-Philippine General Hospital, University of the Philippines Manila; Department of Neurology and Psychiatry (R.L.R.), University of Santo Tomas Hospital, Manila, Philippines; Institute of Anatomy (I.W.), Department of Neurology (N.B.), and Lübeck Interdisciplinary Platform for Genome Analytics (V.D.), University of Lübeck, Germany
| | - Norbert Brüggemann
- Institute of Neurogenetics (C.J.R., S.S., T.L., R.A., A.R., K.G., D.A.-F., N.B., C.K., V.D., A.W., J.T.), University of Lübeck, and Institute of Medical Biometry and Statistics (B.-H.L., I.R.K.), University of Lübeck, Germany; Department of Neurosciences (R.D.J.), College of Medicine-Philippine General Hospital, University of the Philippines Manila; Department of Neurology and Psychiatry (R.L.R.), University of Santo Tomas Hospital, Manila, Philippines; Institute of Anatomy (I.W.), Department of Neurology (N.B.), and Lübeck Interdisciplinary Platform for Genome Analytics (V.D.), University of Lübeck, Germany
| | - Christine Klein
- Institute of Neurogenetics (C.J.R., S.S., T.L., R.A., A.R., K.G., D.A.-F., N.B., C.K., V.D., A.W., J.T.), University of Lübeck, and Institute of Medical Biometry and Statistics (B.-H.L., I.R.K.), University of Lübeck, Germany; Department of Neurosciences (R.D.J.), College of Medicine-Philippine General Hospital, University of the Philippines Manila; Department of Neurology and Psychiatry (R.L.R.), University of Santo Tomas Hospital, Manila, Philippines; Institute of Anatomy (I.W.), Department of Neurology (N.B.), and Lübeck Interdisciplinary Platform for Genome Analytics (V.D.), University of Lübeck, Germany
| | - Valerija Dobricic
- Institute of Neurogenetics (C.J.R., S.S., T.L., R.A., A.R., K.G., D.A.-F., N.B., C.K., V.D., A.W., J.T.), University of Lübeck, and Institute of Medical Biometry and Statistics (B.-H.L., I.R.K.), University of Lübeck, Germany; Department of Neurosciences (R.D.J.), College of Medicine-Philippine General Hospital, University of the Philippines Manila; Department of Neurology and Psychiatry (R.L.R.), University of Santo Tomas Hospital, Manila, Philippines; Institute of Anatomy (I.W.), Department of Neurology (N.B.), and Lübeck Interdisciplinary Platform for Genome Analytics (V.D.), University of Lübeck, Germany
| | - Ana Westenberger
- Institute of Neurogenetics (C.J.R., S.S., T.L., R.A., A.R., K.G., D.A.-F., N.B., C.K., V.D., A.W., J.T.), University of Lübeck, and Institute of Medical Biometry and Statistics (B.-H.L., I.R.K.), University of Lübeck, Germany; Department of Neurosciences (R.D.J.), College of Medicine-Philippine General Hospital, University of the Philippines Manila; Department of Neurology and Psychiatry (R.L.R.), University of Santo Tomas Hospital, Manila, Philippines; Institute of Anatomy (I.W.), Department of Neurology (N.B.), and Lübeck Interdisciplinary Platform for Genome Analytics (V.D.), University of Lübeck, Germany
| | - Joanne Trinh
- Institute of Neurogenetics (C.J.R., S.S., T.L., R.A., A.R., K.G., D.A.-F., N.B., C.K., V.D., A.W., J.T.), University of Lübeck, and Institute of Medical Biometry and Statistics (B.-H.L., I.R.K.), University of Lübeck, Germany; Department of Neurosciences (R.D.J.), College of Medicine-Philippine General Hospital, University of the Philippines Manila; Department of Neurology and Psychiatry (R.L.R.), University of Santo Tomas Hospital, Manila, Philippines; Institute of Anatomy (I.W.), Department of Neurology (N.B.), and Lübeck Interdisciplinary Platform for Genome Analytics (V.D.), University of Lübeck, Germany
| |
Collapse
|
22
|
The B Chromosomes of Prochilodus lineatus (Teleostei, Characiformes) Are Highly Enriched in Satellite DNAs. Cells 2021; 10:cells10061527. [PMID: 34204462 PMCID: PMC8235050 DOI: 10.3390/cells10061527] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Revised: 06/07/2021] [Accepted: 06/11/2021] [Indexed: 12/18/2022] Open
Abstract
B or supernumerary chromosomes are dispensable elements that are widely present in numerous eukaryotes. Due to their non-recombining nature, there is an evident tendency for repetitive DNA accumulation in these elements. Thus, satellite DNA plays an important role in the evolution and diversification of B chromosomes and can provide clues regarding their origin. The characiform Prochilodus lineatus was one of the first discovered fish species bearing B chromosomes, with all populations analyzed so far showing one to nine micro-B chromosomes and exhibiting at least three morphological variants (Ba, Bsm, and Bm). To date, a single satellite DNA is known to be located on the B chromosomes of this species, but no information regarding the differentiation of the proposed B-types is available. Here, we characterized the satellitome of P. lineatus and mapped 35 satellite DNAs against the chromosomes of P. lineatus, of which six were equally located on all B-types and this indicates a similar genomic content. In addition, we describe, for the first time, an entire population without B chromosomes.
Collapse
|
23
|
Lopes M, Louzada S, Gama-Carvalho M, Chaves R. Genomic Tackling of Human Satellite DNA: Breaking Barriers through Time. Int J Mol Sci 2021; 22:4707. [PMID: 33946766 PMCID: PMC8125562 DOI: 10.3390/ijms22094707] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 04/24/2021] [Accepted: 04/27/2021] [Indexed: 12/12/2022] Open
Abstract
(Peri)centromeric repetitive sequences and, more specifically, satellite DNA (satDNA) sequences, constitute a major human genomic component. SatDNA sequences can vary on a large number of features, including nucleotide composition, complexity, and abundance. Several satDNA families have been identified and characterized in the human genome through time, albeit at different speeds. Human satDNA families present a high degree of sub-variability, leading to the definition of various subfamilies with different organization and clustered localization. Evolution of satDNA analysis has enabled the progressive characterization of satDNA features. Despite recent advances in the sequencing of centromeric arrays, comprehensive genomic studies to assess their variability are still required to provide accurate and proportional representation of satDNA (peri)centromeric/acrocentric short arm sequences. Approaches combining multiple techniques have been successfully applied and seem to be the path to follow for generating integrated knowledge in the promising field of human satDNA biology.
Collapse
Affiliation(s)
- Mariana Lopes
- Laboratory of Cytogenomics and Animal Genomics (CAG), Department of Genetics and Biotechnology (DGB), University of Trás-os-Montes and Alto Douro (UTAD), 5000-801 Vila Real, Portugal; (M.L.); (S.L.)
- Biosystems and Integrative Sciences Institute (BioISI), Faculty of Sciences, University of Lisbon, 1749-016 Lisbon, Portugal;
| | - Sandra Louzada
- Laboratory of Cytogenomics and Animal Genomics (CAG), Department of Genetics and Biotechnology (DGB), University of Trás-os-Montes and Alto Douro (UTAD), 5000-801 Vila Real, Portugal; (M.L.); (S.L.)
- Biosystems and Integrative Sciences Institute (BioISI), Faculty of Sciences, University of Lisbon, 1749-016 Lisbon, Portugal;
| | - Margarida Gama-Carvalho
- Biosystems and Integrative Sciences Institute (BioISI), Faculty of Sciences, University of Lisbon, 1749-016 Lisbon, Portugal;
| | - Raquel Chaves
- Laboratory of Cytogenomics and Animal Genomics (CAG), Department of Genetics and Biotechnology (DGB), University of Trás-os-Montes and Alto Douro (UTAD), 5000-801 Vila Real, Portugal; (M.L.); (S.L.)
- Biosystems and Integrative Sciences Institute (BioISI), Faculty of Sciences, University of Lisbon, 1749-016 Lisbon, Portugal;
| |
Collapse
|
24
|
Flynn JM, Long M, Wing RA, Clark AG. Evolutionary Dynamics of Abundant 7-bp Satellites in the Genome of Drosophila virilis. Mol Biol Evol 2021; 37:1362-1375. [PMID: 31960929 DOI: 10.1093/molbev/msaa010] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
The factors that drive the rapid changes in abundance of tandem arrays of highly repetitive sequences, known as satellite DNA, are not well understood. Drosophila virilis has one of the highest relative amounts of simple satellites of any organism that has been studied, with an estimated >40% of its genome composed of a few related 7-bp satellites. Here, we use D. virilis as a model to understand technical biases affecting satellite sequencing and the evolutionary processes that drive satellite composition. By analyzing sequencing data from Illumina, PacBio, and Nanopore platforms, we identify platform-specific biases and suggest best practices for accurate characterization of satellites by sequencing. We use comparative genomics and cytogenetics to demonstrate that the highly abundant AAACTAC satellite family arose from a related satellite in the branch leading to the virilis phylad 4.5-11 Ma before exploding in abundance in some species of the clade. The most abundant satellite is conserved in sequence and location in the pericentromeric region but has diverged widely in abundance among species, whereas the satellites nearest the centromere are rapidly turning over in sequence composition. By analyzing multiple strains of D. virilis, we saw that the abundances of two centromere-proximal satellites are anticorrelated along a geographical gradient, which we suggest could be caused by ongoing conflicts at the centromere. In conclusion, we illuminate several key attributes of satellite evolutionary dynamics that we hypothesize to be driven by processes including selection, meiotic drive, and constraints on satellite sequence and abundance.
Collapse
Affiliation(s)
- Jullien M Flynn
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY
| | - Manyuan Long
- Department of Ecology and Evolution, University of Chicago, Chicago, IL
| | - Rod A Wing
- School of Plant Sciences, Arizona Genomics Institute, University of Arizona, Tucson, AZ
| | - Andrew G Clark
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY
| |
Collapse
|
25
|
Dvorkina T, Bzikadze AV, Pevzner PA. The string decomposition problem and its applications to centromere analysis and assembly. Bioinformatics 2021; 36:i93-i101. [PMID: 32657390 PMCID: PMC7428072 DOI: 10.1093/bioinformatics/btaa454] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Motivation Recent attempts to assemble extra-long tandem repeats (such as centromeres) faced the challenge of translating long error-prone reads from the nucleotide alphabet into the alphabet of repeat units. Human centromeres represent a particularly complex type of high-order repeats (HORs) formed by chromosome-specific monomers. Given a set of all human monomers, translating a read from a centromere into the monomer alphabet is modeled as the String Decomposition Problem. The accurate translation of reads into the monomer alphabet turns the notoriously difficult problem of assembling centromeres from reads (in the nucleotide alphabet) into a more tractable problem of assembling centromeres from translated reads. Results We describe a StringDecomposer (SD) algorithm for solving this problem, benchmark it on the set of long error-prone Oxford Nanopore reads generated by the Telomere-to-Telomere consortium and identify a novel (rare) monomer that extends the set of known X-chromosome specific monomers. Our identification of a novel monomer emphasizes the importance of identification of all (even rare) monomers for future centromere assembly efforts and evolutionary studies. To further analyze novel monomers, we applied SD to the set of recently generated long accurate Pacific Biosciences HiFi reads. This analysis revealed that the set of known human monomers and HORs remains incomplete. SD opens a possibility to generate a complete set of human monomers and HORs for using in the ongoing efforts to generate the complete assembly of the human genome. Availability and implementation StringDecomposer is publicly available on https://github.com/ablab/stringdecomposer. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tatiana Dvorkina
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg 199034, Russia
| | - Andrey V Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California, San Diego, CA 92093, USA
| | - Pavel A Pevzner
- Department of Computer Science and Engineering, University of California, San Diego, CA 92093, USA
| |
Collapse
|
26
|
dos Santos RZ, Calegari RM, Silva DMZDA, Ruiz-Ruano FJ, Melo S, Oliveira C, Foresti F, Uliano-Silva M, Porto-Foresti F, Utsunomia R. A Long-Term Conserved Satellite DNA That Remains Unexpanded in Several Genomes of Characiformes Fish Is Actively Transcribed. Genome Biol Evol 2021; 13:evab002. [PMID: 33502491 PMCID: PMC8210747 DOI: 10.1093/gbe/evab002] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/03/2021] [Indexed: 12/12/2022] Open
Abstract
Eukaryotic genomes contain large amounts of repetitive DNA sequences, such as tandemly repeated satellite DNAs (satDNAs). These sequences are highly dynamic and tend to be genus- or species-specific due to their particular evolutionary pathways, although there are few unusual cases of conserved satDNAs over long periods of time. Here, we used multiple approaches to reveal that an satDNA named CharSat01-52 originated in the last common ancestor of Characoidei fish, a superfamily within the Characiformes order, ∼140-78 Ma, whereas its nucleotide composition has remained considerably conserved in several taxa. We show that 14 distantly related species within Characoidei share the presence of this satDNA, which is highly amplified and clustered in subtelomeric regions in a single species (Characidium gomesi), while remained organized as small clusters in all the other species. Defying predictions of the molecular drive of satellite evolution, CharSat01-52 shows similar values of intra- and interspecific divergence. Although we did not provide evidence for a specific functional role of CharSat01-52, its transcriptional activity was demonstrated in different species. In addition, we identified short tandem arrays of CharSat01-52 embedded within single-molecule real-time long reads of Astyanax paranae (536 bp-3.1 kb) and A. mexicanus (501 bp-3.9 kb). Such arrays consisted of head-to-tail repeats and could be found interspersed with other sequences, inverted sequences, or neighbored by other satellites. Our results provide a detailed characterization of an old and conserved satDNA, challenging general predictions of satDNA evolution.
Collapse
Affiliation(s)
- Rodrigo Zeni dos Santos
- Departamento de Ciências Biológicas, Faculdade de Ciências, Universidade
Estadual Paulista, UNESP, Campus de Bauru, Bauru, Sao Paulo, Brazil
| | - Rodrigo Milan Calegari
- Departamento de Ciências Biológicas, Faculdade de Ciências, Universidade
Estadual Paulista, UNESP, Campus de Bauru, Bauru, Sao Paulo, Brazil
| | | | - Francisco J Ruiz-Ruano
- Department of Organismal Biology—Systematic Biology, Evolutionary Biology
Centre, Uppsala University, Uppsala, Sweden
| | - Silvana Melo
- Departamento de Biologia Estrutural e Funcional, Instituto de Biociências de
Botucatu, Universidade Estadual Paulista, UNESP, Botucatu, Sao Paulo,
Brazil
| | - Claudio Oliveira
- Departamento de Biologia Estrutural e Funcional, Instituto de Biociências de
Botucatu, Universidade Estadual Paulista, UNESP, Botucatu, Sao Paulo,
Brazil
| | - Fausto Foresti
- Departamento de Biologia Estrutural e Funcional, Instituto de Biociências de
Botucatu, Universidade Estadual Paulista, UNESP, Botucatu, Sao Paulo,
Brazil
| | | | - Fábio Porto-Foresti
- Departamento de Ciências Biológicas, Faculdade de Ciências, Universidade
Estadual Paulista, UNESP, Campus de Bauru, Bauru, Sao Paulo, Brazil
| | - Ricardo Utsunomia
- Departamento de Ciências Biológicas, Faculdade de Ciências, Universidade
Estadual Paulista, UNESP, Campus de Bauru, Bauru, Sao Paulo, Brazil
- Departamento de Genética, Instituto de Ciências Biológicas e da Saúde, ICBS,
Universidade Federal Rural do Rio de Janeiro, Seropédica, Rio de Janerio,
Brazil
| |
Collapse
|
27
|
Cechova M. Probably Correct: Rescuing Repeats with Short and Long Reads. Genes (Basel) 2020; 12:48. [PMID: 33396198 PMCID: PMC7823596 DOI: 10.3390/genes12010048] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Revised: 12/23/2020] [Accepted: 12/24/2020] [Indexed: 02/07/2023] Open
Abstract
Ever since the introduction of high-throughput sequencing following the human genome project, assembling short reads into a reference of sufficient quality posed a significant problem as a large portion of the human genome-estimated 50-69%-is repetitive. As a result, a sizable proportion of sequencing reads is multi-mapping, i.e., without a unique placement in the genome. The two key parameters for whether or not a read is multi-mapping are the read length and genome complexity. Long reads are now able to span difficult, heterochromatic regions, including full centromeres, and characterize chromosomes from "telomere to telomere". Moreover, identical reads or repeat arrays can be differentiated based on their epigenetic marks, such as methylation patterns, aiding in the assembly process. This is despite the fact that long reads still contain a modest percentage of sequencing errors, disorienting the aligners and assemblers both in accuracy and speed. Here, I review the proposed and implemented solutions to the repeat resolution and the multi-mapping read problem, as well as the downstream consequences of reference choice, repeat masking, and proper representation of sex chromosomes. I also consider the forthcoming challenges and solutions with regards to long reads, where we expect the shift from the problem of repeat localization within a single individual to the problem of repeat positioning within pangenomes.
Collapse
Affiliation(s)
- Monika Cechova
- Genetics and Reproductive Biotechnologies, Veterinary Research Institute, Central European Institute of Technology (CEITEC), 621 00 Brno, Czech Republic
| |
Collapse
|
28
|
Bzikadze AV, Pevzner PA. Automated assembly of centromeres from ultra-long error-prone reads. Nat Biotechnol 2020; 38:1309-1316. [PMID: 32665660 PMCID: PMC10718184 DOI: 10.1038/s41587-020-0582-4] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2019] [Accepted: 05/29/2020] [Indexed: 12/12/2022]
Abstract
Centromeric variation has been linked to cancer and infertility, but centromere sequences contain multiple tandem repeats and can only be assembled manually from long error-prone reads. Here we describe the centroFlye algorithm for centromere assembly using long error-prone reads, and apply it to assemble human centromeres on chromosomes 6 and X. Our analyses reveal putative breakpoints in the manual reconstruction of the human X centromere, demonstrate that human X chromosome is partitioned into repeat subfamilies and provide initial insights into centromere evolution. We anticipate that centroFlye could be applied to automatically close remaining multimegabase gaps in the reference human genome.
Collapse
Affiliation(s)
- Andrey V Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California San Diego, La Jolla, CA, USA
| | - Pavel A Pevzner
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
29
|
Bolognini D, Magi A, Benes V, Korbel JO, Rausch T. TRiCoLOR: tandem repeat profiling using whole-genome long-read sequencing data. Gigascience 2020; 9:giaa101. [PMID: 33034633 PMCID: PMC7539535 DOI: 10.1093/gigascience/giaa101] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Revised: 08/07/2020] [Accepted: 09/07/2020] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Tandem repeat sequences are widespread in the human genome, and their expansions cause multiple repeat-mediated disorders. Genome-wide discovery approaches are needed to fully elucidate their roles in health and disease, but resolving tandem repeat variation accurately remains a challenging task. While traditional mapping-based approaches using short-read data have severe limitations in the size and type of tandem repeats they can resolve, recent third-generation sequencing technologies exhibit substantially higher sequencing error rates, which complicates repeat resolution. RESULTS We developed TRiCoLOR, a freely available tool for tandem repeat profiling using error-prone long reads from third-generation sequencing technologies. The method can identify repetitive regions in sequencing data without a prior knowledge of their motifs or locations and resolve repeat multiplicity and period size in a haplotype-specific manner. The tool includes methods to interactively visualize the identified repeats and to trace their Mendelian consistency in pedigrees. CONCLUSIONS TRiCoLOR demonstrates excellent performance and improved sensitivity and specificity compared with alternative tools on synthetic data. For real human whole-genome sequencing data, TRiCoLOR achieves high validation rates, suggesting its suitability to identify tandem repeat variation in personal genomes.
Collapse
Affiliation(s)
- Davide Bolognini
- Department of Experimental and Clinical Medicine, University of Florence, Viale Pieraccini 6, Florence 50134, Italy
- European Molecular Biology Laboratory (EMBL), GeneCore, Meyerhofstraße 1, Heidelberg 69117, Germany
| | - Alberto Magi
- Department of Information Engineering, University of Florence, Via di S. Marta 3, Florence 50134, Italy
| | - Vladimir Benes
- European Molecular Biology Laboratory (EMBL), GeneCore, Meyerhofstraße 1, Heidelberg 69117, Germany
| | - Jan O Korbel
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, Heidelberg 69117, Germany
| | - Tobias Rausch
- European Molecular Biology Laboratory (EMBL), GeneCore, Meyerhofstraße 1, Heidelberg 69117, Germany
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, Heidelberg 69117, Germany
| |
Collapse
|
30
|
Cechova M, Harris RS, Tomaszkiewicz M, Arbeithuber B, Chiaromonte F, Makova KD. High Satellite Repeat Turnover in Great Apes Studied with Short- and Long-Read Technologies. Mol Biol Evol 2019; 36:2415-2431. [PMID: 31273383 PMCID: PMC6805231 DOI: 10.1093/molbev/msz156] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2019] [Revised: 06/12/2019] [Accepted: 06/13/2019] [Indexed: 12/23/2022] Open
Abstract
Satellite repeats are a structural component of centromeres and telomeres, and in some instances, their divergence is known to drive speciation. Due to their highly repetitive nature, satellite sequences have been understudied and underrepresented in genome assemblies. To investigate their turnover in great apes, we studied satellite repeats of unit sizes up to 50 bp in human, chimpanzee, bonobo, gorilla, and Sumatran and Bornean orangutans, using unassembled short and long sequencing reads. The density of satellite repeats, as identified from accurate short reads (Illumina), varied greatly among great ape genomes. These were dominated by a handful of abundant repeated motifs, frequently shared among species, which formed two groups: 1) the (AATGG)n repeat (critical for heat shock response) and its derivatives; and 2) subtelomeric 32-mers involved in telomeric metabolism. Using the densities of abundant repeats, individuals could be classified into species. However, clustering did not reproduce the accepted species phylogeny, suggesting rapid repeat evolution. Several abundant repeats were enriched in males versus females; using Y chromosome assemblies or Fluorescent In Situ Hybridization, we validated their location on the Y. Finally, applying a novel computational tool, we identified many satellite repeats completely embedded within long Oxford Nanopore and Pacific Biosciences reads. Such repeats were up to 59 kb in length and consisted of perfect repeats interspersed with other similar sequences. Our results based on sequencing reads generated with three different technologies provide the first detailed characterization of great ape satellite repeats, and open new avenues for exploring their functions.
Collapse
Affiliation(s)
- Monika Cechova
- Department of Biology, Pennsylvania State University, University Park, PA
| | - Robert S Harris
- Department of Biology, Pennsylvania State University, University Park, PA
| | | | | | - Francesca Chiaromonte
- Department of Statistics, Pennsylvania State University, University Park, PA
- EMbeDS, Sant’Anna School of Advanced Studies, Pisa, Italy
- Center for Medical Genomics, Penn State, University Park, PA
| | - Kateryna D Makova
- Department of Biology, Pennsylvania State University, University Park, PA
- Center for Medical Genomics, Penn State, University Park, PA
| |
Collapse
|