1
|
Glunčić M, Vlahović I, Rosandić M, Paar V. Novel Concept of Alpha Satellite Cascading Higher-Order Repeats (HORs) and Precise Identification of 15mer and 20mer Cascading HORs in Complete T2T-CHM13 Assembly of Human Chromosome 15. Int J Mol Sci 2024; 25:4395. [PMID: 38673983 PMCID: PMC11050224 DOI: 10.3390/ijms25084395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Revised: 04/08/2024] [Accepted: 04/11/2024] [Indexed: 04/28/2024] Open
Abstract
Unraveling the intricate centromere structure of human chromosomes holds profound implications, illuminating fundamental genetic mechanisms and potentially advancing our comprehension of genetic disorders and therapeutic interventions. This study rigorously identified and structurally analyzed alpha satellite higher-order repeats (HORs) within the centromere of human chromosome 15 in the complete T2T-CHM13 assembly using the high-precision GRM2023 algorithm. The most extensive alpha satellite HOR array in chromosome 15 reveals a novel cascading HOR, housing 429 15mer HOR copies, containing 4-, 7- and 11-monomer subfragments. Within each row of cascading HORs, all alpha satellite monomers are of distinct types, as in regular Willard's HORs. However, different HOR copies within the same cascading 15mer HOR contain more than one monomer of the same type. Each canonical 15mer HOR copy comprises 15 monomers belonging to only 9 different monomer types. Notably, 65% of the 429 15mer cascading HOR copies exhibit canonical structures, while 35% display variant configurations. Identified as the second most extensive alpha satellite HOR, another novel cascading HOR within human chromosome 15 encompasses 164 20mer HOR copies, each featuring two subfragments. Moreover, a distinct pattern emerges as interspersed 25mer/26mer structures differing from regular Willard's HORs and giving rise to a 34-monomer subfragment. Only a minor 18mer HOR array of 12 HOR copies is of the regular Willard's type. These revelations highlight the complexity within the chromosome 15 centromeric region, accentuating deviations from anticipated highly regular patterns and hinting at profound information encoding and functional potential within the human centromere.
Collapse
Affiliation(s)
- Matko Glunčić
- Faculty of Science, University of Zagreb, 10000 Zagreb, Croatia;
| | - Ines Vlahović
- Algebra LAB, Algebra University College, 10000 Zagreb, Croatia;
| | - Marija Rosandić
- Department of Internal Medicine, University Hospital Centre Zagreb, 10000 Zagreb, Croatia;
- Croatian Academy of Sciences and Arts, 10000 Zagreb, Croatia
| | - Vladimir Paar
- Faculty of Science, University of Zagreb, 10000 Zagreb, Croatia;
- Croatian Academy of Sciences and Arts, 10000 Zagreb, Croatia
| |
Collapse
|
2
|
Chaisson MJP, Sulovari A, Valdmanis PN, Miller DE, Eichler EE. Advances in the discovery and analyses of human tandem repeats. Emerg Top Life Sci 2023; 7:361-381. [PMID: 37905568 PMCID: PMC10806765 DOI: 10.1042/etls20230074] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 10/18/2023] [Accepted: 10/18/2023] [Indexed: 11/02/2023]
Abstract
Long-read sequencing platforms provide unparalleled access to the structure and composition of all classes of tandemly repeated DNA from STRs to satellite arrays. This review summarizes our current understanding of their organization within the human genome, their importance with respect to disease, as well as the advances and challenges in understanding their genetic diversity and functional effects. Novel computational methods are being developed to visualize and associate these complex patterns of human variation with disease, expression, and epigenetic differences. We predict accurate characterization of this repeat-rich form of human variation will become increasingly relevant to both basic and clinical human genetics.
Collapse
Affiliation(s)
- Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, U.S.A
- The Genomic and Epigenomic Regulation Program, USC Norris Cancer Center, University of Southern California, Los Angeles, CA 90089, U.S.A
| | - Arvis Sulovari
- Computational Biology, Cajal Neuroscience Inc, Seattle, WA 98102, U.S.A
| | - Paul N Valdmanis
- Division of Medical Genetics, Department of Medicine, University of Washington School of Medicine, Seattle, WA 98195, U.S.A
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, U.S.A
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, U.S.A
| | - Danny E Miller
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, U.S.A
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA 98195, U.S.A
- Department of Pediatrics, University of Washington, Seattle, WA 98195, U.S.A
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, U.S.A
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, U.S.A
| |
Collapse
|
3
|
Zhang Y, Chu J, Cheng H, Li H. De novo reconstruction of satellite repeat units from sequence data. Genome Res 2023; 33:gr.278005.123. [PMID: 37918962 PMCID: PMC10760446 DOI: 10.1101/gr.278005.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Accepted: 10/18/2023] [Indexed: 11/04/2023]
Abstract
Satellite DNA are long tandemly repeating sequences in a genome and may be organized as high-order repeats (HORs). They are enriched in centromeres and are challenging to assemble. Existing algorithms for identifying satellite repeats either require the complete assembly of satellites or only work for simple repeat structures without HORs. Here we describe Satellite Repeat Finder (SRF), a new algorithm for reconstructing satellite repeat units and HORs from accurate reads or assemblies without prior knowledge on repeat structures. Applying SRF to real sequence data, we show that SRF could reconstruct known satellites in human and well-studied model organisms. We also find satellite repeats are pervasive in various other species, accounting for up to 12% of their genome contents but are often underrepresented in assemblies. With the rapid progress in genome sequencing, SRF will help the annotation of new genomes and the study of satellite DNA evolution even if such repeats are not fully assembled.
Collapse
Affiliation(s)
- Yujie Zhang
- Harvard School of Public Health, Boston, Massachusetts 02115, USA
| | - Justin Chu
- Department of Data Science, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115, USA
| | - Haoyu Cheng
- Department of Data Science, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115, USA
| | - Heng Li
- Department of Data Science, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA;
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115, USA
| |
Collapse
|
4
|
Scelfo A, Fachinetti D. Centromere: A Trojan horse for genome stability. DNA Repair (Amst) 2023; 130:103569. [PMID: 37708591 DOI: 10.1016/j.dnarep.2023.103569] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 08/22/2023] [Accepted: 09/05/2023] [Indexed: 09/16/2023]
Abstract
Centromeres play a key role in the maintenance of genome stability to prevent carcinogenesis and diseases. They are specialized chromosome loci essential to ensure faithful transmission of genomic information across cell generations by mediating the interaction with spindle microtubules. Nonetheless, while fulfilling these essential roles, their distinct repetitive composition and susceptibility to mechanical stresses during cell division render them susceptible to breakage events. In this review, we delve into the present understanding of the underlying causes of centromere fragility, from the mechanisms governing its DNA replication and repair, to the pathways acting to counteract potential challenges. We propose that the centromere represents a "Trojan horse" exerting vital functions that, at the same time, potentially threatens whole genome stability.
Collapse
Affiliation(s)
- Andrea Scelfo
- Institut Curie, CNRS, UMR 144, Sorbonne University, 26 rue d'Ulm, 75005 Paris, France.
| | - Daniele Fachinetti
- Institut Curie, CNRS, UMR 144, Sorbonne University, 26 rue d'Ulm, 75005 Paris, France.
| |
Collapse
|
5
|
Gao S, Yang X, Guo H, Zhao X, Wang B, Ye K. HiCAT: a tool for automatic annotation of centromere structure. Genome Biol 2023; 24:58. [PMID: 36978122 PMCID: PMC10053651 DOI: 10.1186/s13059-023-02900-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Accepted: 03/17/2023] [Indexed: 03/30/2023] Open
Abstract
Significant improvements in long-read sequencing technologies have unlocked complex genomic areas, such as centromeres, in the genome and introduced the centromere annotation problem. Currently, centromeres are annotated in a semi-manual way. Here, we propose HiCAT, a generalizable automatic centromere annotation tool, based on hierarchical tandem repeat mining to facilitate decoding of centromere architecture. We apply HiCAT to simulated datasets, human CHM13-T2T and gapless Arabidopsis thaliana genomes. Our results are generally consistent with previous inferences but also greatly improve annotation continuity and reveal additional fine structures, demonstrating HiCAT's performance and general applicability.
Collapse
Affiliation(s)
- Shenghan Gao
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, China
| | - Xiaofei Yang
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, China.
- School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, China.
- Genome Institute, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, Shaanxi, China.
| | - Hongtao Guo
- School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, China
| | - Xixi Zhao
- Genome Institute, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, Shaanxi, China
| | - Bo Wang
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, China
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, China
| | - Kai Ye
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, China.
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, China.
- Genome Institute, the First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, Shaanxi, China.
- School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, Shaanxi, China.
- Faculty of Science, Leiden University, Leiden, The Netherlands.
| |
Collapse
|
6
|
Kunyavskaya O, Dvorkina T, Bzikadze AV, Alexandrov I, Pevzner PA. Automated annotation of human centromeres with HORmon. Genome Res 2022; 32:1137-1151. [PMID: 35545449 DOI: 10.1101/gr.276362.121] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Accepted: 05/06/2022] [Indexed: 11/24/2022]
Abstract
Recent advances in long-read sequencing opened a possibility to address the long-standing questions about the architecture and evolution of human centromeres. They also emphasized the need for centromere annotation (partitioning human centromeres into monomers and higher-order repeats (HORs)). Even though there was a half-century-long series of semi-manual studies of centromere architecture, a rigorous centromere annotation algorithm is still lacking. Moreover, an automated centromere annotation is a prerequisite for studies of genetic diseases associated with centromeres, and evolutionary studies of centromeres across multiple species. Although the monomer decomposition (transforming a centromere into a monocentromere written in the monomer alphabet) and the HOR decomposition (representing a monocentromere in the alphabet of HORs) are currently viewed as two separate problems, we demonstrate that they should be integrated into a single framework in such a way that HOR (monomer) inference affects monomer (HOR) inference. We thus developed the HORmon algorithm that integrates the monomer/HOR inference and automatically generates the human monomers/HORs that are largely consistent with the previous semi-manual inference.
Collapse
Affiliation(s)
- Olga Kunyavskaya
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University
| | - Tatiana Dvorkina
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University
| | | | - Ivan Alexandrov
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University
| | | |
Collapse
|
7
|
Altemose N, Glennis A, Bzikadze AV, Sidhwani P, Langley SA, Caldas GV, Hoyt SJ, Uralsky L, Ryabov FD, Shew CJ, Sauria MEG, Borchers M, Gershman A, Mikheenko A, Shepelev VA, Dvorkina T, Kunyavskaya O, Vollger MR, Rhie A, McCartney AM, Asri M, Lorig-Roach R, Shafin K, Aganezov S, Olson D, de Lima LG, Potapova T, Hartley GA, Haukness M, Kerpedjiev P, Gusev F, Tigyi K, Brooks S, Young A, Nurk S, Koren S, Salama SR, Paten B, Rogaev EI, Streets A, Karpen GH, Dernburg AF, Sullivan BA, Straight AF, Wheeler TJ, Gerton JL, Eichler EE, Phillippy AM, Timp W, Dennis MY, O'Neill RJ, Zook JM, Schatz MC, Pevzner PA, Diekhans M, Langley CH, Alexandrov IA, Miga KH. Complete genomic and epigenetic maps of human centromeres. Science 2022; 376:eabl4178. [PMID: 35357911 PMCID: PMC9233505 DOI: 10.1126/science.abl4178] [Citation(s) in RCA: 167] [Impact Index Per Article: 83.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Existing human genome assemblies have almost entirely excluded repetitive sequences within and near centromeres, limiting our understanding of their organization, evolution, and functions, which include facilitating proper chromosome segregation. Now, a complete, telomere-to-telomere human genome assembly (T2T-CHM13) has enabled us to comprehensively characterize pericentromeric and centromeric repeats, which constitute 6.2% of the genome (189.9 megabases). Detailed maps of these regions revealed multimegabase structural rearrangements, including in active centromeric repeat arrays. Analysis of centromere-associated sequences uncovered a strong relationship between the position of the centromere and the evolution of the surrounding DNA through layered repeat expansions. Furthermore, comparisons of chromosome X centromeres across a diverse panel of individuals illuminated high degrees of structural, epigenetic, and sequence variation in these complex and rapidly evolving regions.
Collapse
Affiliation(s)
- Nicolas Altemose
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
| | - A. Glennis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Andrey V. Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California San Diego, La Jolla, CA, USA
| | - Pragya Sidhwani
- Department of Biochemistry, Stanford University, Stanford, CA, USA
| | - Sasha A. Langley
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Gina V. Caldas
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Savannah J. Hoyt
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Lev Uralsky
- Sirius University of Science and Technology, Sochi, Russia
- Vavilov Institute of General Genetics, Moscow, Russia
| | | | - Colin J. Shew
- Genome Center, MIND Institute, and Department of Biochemistry and Molecular Medicine, School of Medicine, University of California, Davis, Davis, CA, USA
| | | | | | - Ariel Gershman
- Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, MD, USA
| | - Alla Mikheenko
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia
| | | | - Tatiana Dvorkina
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia
| | - Olga Kunyavskaya
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia
| | - Mitchell R. Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Ann M. McCartney
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Mobin Asri
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Ryan Lorig-Roach
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Kishwar Shafin
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Sergey Aganezov
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Daniel Olson
- Department of Computer Science, University of Montana, Missoula, MT. USA
| | | | - Tamara Potapova
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Gabrielle A. Hartley
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Marina Haukness
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | | | - Fedor Gusev
- Vavilov Institute of General Genetics, Moscow, Russia
| | - Kristof Tigyi
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Shelise Brooks
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Alice Young
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sofie R. Salama
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
- Department of Biomolecular Engineering, University of California Santa Cruz, CA, USA
| | - Evgeny I. Rogaev
- Sirius University of Science and Technology, Sochi, Russia
- Vavilov Institute of General Genetics, Moscow, Russia
- Department of Psychiatry, University of Massachusetts Medical School, Worcester, MA, USA
- Faculty of Biology, Lomonosov Moscow State University, Moscow, Russia
| | - Aaron Streets
- Department of Bioengineering, University of California, Berkeley, Berkeley, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Gary H. Karpen
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
- BioEngineering and BioMedical Sciences Department, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Abby F. Dernburg
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
- Institute for Quantitative Biosciences (QB3), University of California, Berkeley, Berkeley, CA, USA
| | - Beth A. Sullivan
- Department of Molecular Genetics and Microbiology, Duke University School of Medicine, Durham, NC, USA
| | | | - Travis J. Wheeler
- Department of Computer Science, University of Montana, Missoula, MT. USA
| | - Jennifer L. Gerton
- Stowers Institute for Medical Research, Kansas City, MO, USA
- University of Kansas Medical School, Department of Biochemistry and Molecular Biology and Cancer Center, University of Kansas, Kansas City, KS, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Adam M. Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Winston Timp
- Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Megan Y. Dennis
- Genome Center, MIND Institute, and Department of Biochemistry and Molecular Medicine, School of Medicine, University of California, Davis, Davis, CA, USA
| | - Rachel J. O'Neill
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Justin M. Zook
- Biosystems and Biomaterials Division, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Michael C. Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Pavel A. Pevzner
- Department of Computer Science and Engineering, University of California at San Diego, San Diego, CA, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Charles H. Langley
- Department of Evolution and Ecology, University of California Davis, Davis, CA, USA
| | - Ivan A. Alexandrov
- Vavilov Institute of General Genetics, Moscow, Russia
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia
- Research Center of Biotechnology of the Russian Academy of Sciences, Moscow, Russia
| | - Karen H. Miga
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
- Department of Biomolecular Engineering, University of California Santa Cruz, CA, USA
| |
Collapse
|
8
|
Abstract
We are entering a new era in genomics where entire centromeric regions are accurately represented in human reference assemblies. Access to these high-resolution maps will enable new surveys of sequence and epigenetic variation in the population and offer new insight into satellite array genomics and centromere function. Here, we focus on the sequence organization and evolution of alpha satellites, which are credited as the genetic and genomic definition of human centromeres due to their interaction with inner kinetochore proteins and their importance in the development of human artificial chromosome assays. We provide an overview of alpha satellite repeat structure and array organization in the context of these high-quality reference data sets; discuss the emergence of variation-based surveys; and provide perspective on the role of this new source of genetic and epigenetic variation in the context of chromosome biology, genome instability, and human disease.
Collapse
Affiliation(s)
- Karen H Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, California 95064, USA; .,Department of Biomolecular Engineering, University of California, Santa Cruz, California 95064, USA
| | - Ivan A Alexandrov
- Department of Genomics and Human Genetics, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119991, Russia; .,Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg 199004, Russia.,Research Center of Biotechnology of the Russian Academy of Sciences, Moscow 119071, Russia
| |
Collapse
|
9
|
Suzuki Y, Morishita S. The time is ripe to investigate human centromeres by long-read sequencing†. DNA Res 2021; 28:6381569. [PMID: 34609504 PMCID: PMC8502840 DOI: 10.1093/dnares/dsab021] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Accepted: 09/28/2021] [Indexed: 01/05/2023] Open
Abstract
The complete sequencing of human centromeres, which are filled with highly repetitive elements, has long been challenging. In human centromeres, α-satellite monomers of about 171 bp in length are the basic repeating units, but α-satellite monomers constitute the higher-order repeat (HOR) units, and thousands of copies of highly homologous HOR units form large arrays, which have hampered sequence assembly of human centromeres. Because most HOR unit occurrences are covered by long reads of about 10 kb, the recent availability of much longer reads is expected to enable observation of individual HOR occurrences in terms of their single-nucleotide or structural variants. The time has come to examine the complete sequence of human centromeres.
Collapse
Affiliation(s)
- Yuta Suzuki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8568, Japan
| | - Shinichi Morishita
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8568, Japan
| |
Collapse
|
10
|
Dvorkina T, Kunyavskaya O, Bzikadze AV, Alexandrov I, Pevzner PA. CentromereArchitect: inference and analysis of the architecture of centromeres. Bioinformatics 2021; 37:i196-i204. [PMID: 34252949 PMCID: PMC8336445 DOI: 10.1093/bioinformatics/btab265] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Motivation Recent advances in long-read sequencing technologies led to rapid progress in centromere assembly in the last year and, for the first time, opened a possibility to address the long-standing questions about the architecture and evolution of human centromeres. However, since these advances have not been yet accompanied by the development of the centromere-specific bioinformatics algorithms, even the fundamental questions (e.g. centromere annotation by deriving the complete set of human monomers and high-order repeats), let alone more complex questions (e.g. explaining how monomers and high-order repeats evolved) about human centromeres remain open. Moreover, even though there was a four-decade-long series of studies aimed at cataloging all human monomers and high-order repeats, the rigorous algorithmic definitions of these concepts are still lacking. Thus, the development of a centromere annotation tool is a prerequisite for follow-up personalized biomedical studies of centromeres across the human population and evolutionary studies of centromeres across various species. Results We describe the CentromereArchitect, the first tool for the centromere annotation in a newly sequenced genome, apply it to the recently generated complete assembly of a human genome by the Telomere-to-Telomere consortium, generate the complete set of human monomers and high-order repeats for ‘live’ centromeres, and reveal a vast set of hybrid monomers that may represent the focal points of centromere evolution. Availability and implementation CentromereArchitect is publicly available on https://github.com/ablab/stringdecomposer/tree/ismb2021 Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tatiana Dvorkina
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg 199034, Russia
| | - Olga Kunyavskaya
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg 199034, Russia
| | - Andrey V Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California, San Diego, CA 92093, USA
| | - Ivan Alexandrov
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg 199034, Russia
| | - Pavel A Pevzner
- Department of Computer Science and Engineering, University of California, San Diego, CA 92093, USA
| |
Collapse
|
11
|
Lopes M, Louzada S, Gama-Carvalho M, Chaves R. Genomic Tackling of Human Satellite DNA: Breaking Barriers through Time. Int J Mol Sci 2021; 22:4707. [PMID: 33946766 PMCID: PMC8125562 DOI: 10.3390/ijms22094707] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 04/24/2021] [Accepted: 04/27/2021] [Indexed: 12/12/2022] Open
Abstract
(Peri)centromeric repetitive sequences and, more specifically, satellite DNA (satDNA) sequences, constitute a major human genomic component. SatDNA sequences can vary on a large number of features, including nucleotide composition, complexity, and abundance. Several satDNA families have been identified and characterized in the human genome through time, albeit at different speeds. Human satDNA families present a high degree of sub-variability, leading to the definition of various subfamilies with different organization and clustered localization. Evolution of satDNA analysis has enabled the progressive characterization of satDNA features. Despite recent advances in the sequencing of centromeric arrays, comprehensive genomic studies to assess their variability are still required to provide accurate and proportional representation of satDNA (peri)centromeric/acrocentric short arm sequences. Approaches combining multiple techniques have been successfully applied and seem to be the path to follow for generating integrated knowledge in the promising field of human satDNA biology.
Collapse
Affiliation(s)
- Mariana Lopes
- Laboratory of Cytogenomics and Animal Genomics (CAG), Department of Genetics and Biotechnology (DGB), University of Trás-os-Montes and Alto Douro (UTAD), 5000-801 Vila Real, Portugal; (M.L.); (S.L.)
- Biosystems and Integrative Sciences Institute (BioISI), Faculty of Sciences, University of Lisbon, 1749-016 Lisbon, Portugal;
| | - Sandra Louzada
- Laboratory of Cytogenomics and Animal Genomics (CAG), Department of Genetics and Biotechnology (DGB), University of Trás-os-Montes and Alto Douro (UTAD), 5000-801 Vila Real, Portugal; (M.L.); (S.L.)
- Biosystems and Integrative Sciences Institute (BioISI), Faculty of Sciences, University of Lisbon, 1749-016 Lisbon, Portugal;
| | - Margarida Gama-Carvalho
- Biosystems and Integrative Sciences Institute (BioISI), Faculty of Sciences, University of Lisbon, 1749-016 Lisbon, Portugal;
| | - Raquel Chaves
- Laboratory of Cytogenomics and Animal Genomics (CAG), Department of Genetics and Biotechnology (DGB), University of Trás-os-Montes and Alto Douro (UTAD), 5000-801 Vila Real, Portugal; (M.L.); (S.L.)
- Biosystems and Integrative Sciences Institute (BioISI), Faculty of Sciences, University of Lisbon, 1749-016 Lisbon, Portugal;
| |
Collapse
|
12
|
Dvorkina T, Bzikadze AV, Pevzner PA. The string decomposition problem and its applications to centromere analysis and assembly. Bioinformatics 2021; 36:i93-i101. [PMID: 32657390 PMCID: PMC7428072 DOI: 10.1093/bioinformatics/btaa454] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Motivation Recent attempts to assemble extra-long tandem repeats (such as centromeres) faced the challenge of translating long error-prone reads from the nucleotide alphabet into the alphabet of repeat units. Human centromeres represent a particularly complex type of high-order repeats (HORs) formed by chromosome-specific monomers. Given a set of all human monomers, translating a read from a centromere into the monomer alphabet is modeled as the String Decomposition Problem. The accurate translation of reads into the monomer alphabet turns the notoriously difficult problem of assembling centromeres from reads (in the nucleotide alphabet) into a more tractable problem of assembling centromeres from translated reads. Results We describe a StringDecomposer (SD) algorithm for solving this problem, benchmark it on the set of long error-prone Oxford Nanopore reads generated by the Telomere-to-Telomere consortium and identify a novel (rare) monomer that extends the set of known X-chromosome specific monomers. Our identification of a novel monomer emphasizes the importance of identification of all (even rare) monomers for future centromere assembly efforts and evolutionary studies. To further analyze novel monomers, we applied SD to the set of recently generated long accurate Pacific Biosciences HiFi reads. This analysis revealed that the set of known human monomers and HORs remains incomplete. SD opens a possibility to generate a complete set of human monomers and HORs for using in the ongoing efforts to generate the complete assembly of the human genome. Availability and implementation StringDecomposer is publicly available on https://github.com/ablab/stringdecomposer. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tatiana Dvorkina
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg 199034, Russia
| | - Andrey V Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California, San Diego, CA 92093, USA
| | - Pavel A Pevzner
- Department of Computer Science and Engineering, University of California, San Diego, CA 92093, USA
| |
Collapse
|
13
|
Suzuki Y, Myers EW, Morishita S. Rapid and ongoing evolution of repetitive sequence structures in human centromeres. SCIENCE ADVANCES 2020; 6:6/50/eabd9230. [PMID: 33310858 PMCID: PMC7732198 DOI: 10.1126/sciadv.abd9230] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Accepted: 10/30/2020] [Indexed: 06/12/2023]
Abstract
Our understanding of centromere sequence variation across human populations is limited by its extremely long nested repeat structures called higher-order repeats that are challenging to sequence. Here, we analyzed chromosomes 11, 17, and X using long-read sequencing data for 36 individuals from diverse populations including a Han Chinese trio and 21 Japanese. We revealed substantial structural diversity with many previously unidentified variant higher-order repeats specific to individuals characterizing rapid, haplotype-specific evolution of human centromeric arrays, while frequent single-nucleotide variants are largely conserved. We found a characteristic pattern shared among prevalent variants in human and chimpanzee. Our findings pave the way for studying sequence evolution in human and primate centromeres.
Collapse
Affiliation(s)
- Yuta Suzuki
- The University of Tokyo, Graduate School of Frontier Sciences, Department of Computational Biology and Medical Sciences, Kashiwa, Chiba 277-8568, Japan.
| | - Eugene W Myers
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
| | - Shinichi Morishita
- The University of Tokyo, Graduate School of Frontier Sciences, Department of Computational Biology and Medical Sciences, Kashiwa, Chiba 277-8568, Japan.
| |
Collapse
|
14
|
Bzikadze AV, Pevzner PA. Automated assembly of centromeres from ultra-long error-prone reads. Nat Biotechnol 2020; 38:1309-1316. [PMID: 32665660 PMCID: PMC10718184 DOI: 10.1038/s41587-020-0582-4] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2019] [Accepted: 05/29/2020] [Indexed: 12/12/2022]
Abstract
Centromeric variation has been linked to cancer and infertility, but centromere sequences contain multiple tandem repeats and can only be assembled manually from long error-prone reads. Here we describe the centroFlye algorithm for centromere assembly using long error-prone reads, and apply it to assemble human centromeres on chromosomes 6 and X. Our analyses reveal putative breakpoints in the manual reconstruction of the human X centromere, demonstrate that human X chromosome is partitioned into repeat subfamilies and provide initial insights into centromere evolution. We anticipate that centroFlye could be applied to automatically close remaining multimegabase gaps in the reference human genome.
Collapse
Affiliation(s)
- Andrey V Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California San Diego, La Jolla, CA, USA
| | - Pavel A Pevzner
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
15
|
Miga KH, Koren S, Rhie A, Vollger MR, Gershman A, Bzikadze A, Brooks S, Howe E, Porubsky D, Logsdon GA, Schneider VA, Potapova T, Wood J, Chow W, Armstrong J, Fredrickson J, Pak E, Tigyi K, Kremitzki M, Markovic C, Maduro V, Dutra A, Bouffard GG, Chang AM, Hansen NF, Wilfert AB, Thibaud-Nissen F, Schmitt AD, Belton JM, Selvaraj S, Dennis MY, Soto DC, Sahasrabudhe R, Kaya G, Quick J, Loman NJ, Holmes N, Loose M, Surti U, Risques RA, Graves Lindsay TA, Fulton R, Hall I, Paten B, Howe K, Timp W, Young A, Mullikin JC, Pevzner PA, Gerton JL, Sullivan BA, Eichler EE, Phillippy AM. Telomere-to-telomere assembly of a complete human X chromosome. Nature 2020; 585:79-84. [PMID: 32663838 PMCID: PMC7484160 DOI: 10.1038/s41586-020-2547-7] [Citation(s) in RCA: 390] [Impact Index Per Article: 97.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Accepted: 05/29/2020] [Indexed: 12/15/2022]
Abstract
After two decades of improvements, the current human reference genome (GRCh38) is the most accurate and complete vertebrate genome ever produced. However, no single chromosome has been finished end to end, and hundreds of unresolved gaps persist1,2. Here we present a human genome assembly that surpasses the continuity of GRCh382, along with a gapless, telomere-to-telomere assembly of a human chromosome. This was enabled by high-coverage, ultra-long-read nanopore sequencing of the complete hydatidiform mole CHM13 genome, combined with complementary technologies for quality improvement and validation. Focusing our efforts on the human X chromosome3, we reconstructed the centromeric satellite DNA array (approximately 3.1 Mb) and closed the 29 remaining gaps in the current reference, including new sequences from the human pseudoautosomal regions and from cancer-testis ampliconic gene families (CT-X and GAGE). These sequences will be integrated into future human reference genome releases. In addition, the complete chromosome X, combined with the ultra-long nanopore data, allowed us to map methylation patterns across complex tandem repeats and satellite arrays. Our results demonstrate that finishing the entire human genome is now within reach, and the data presented here will facilitate ongoing efforts to complete the other human chromosomes.
Collapse
Affiliation(s)
- Karen H Miga
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA.
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Mitchell R Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Ariel Gershman
- Department of Molecular Biology and Genetics, Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Andrey Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California San Diego, San Diego, CA, USA
| | - Shelise Brooks
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Rockville, MD, USA
| | - Edmund Howe
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Valerie A Schneider
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Tamara Potapova
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | | | | | - Joel Armstrong
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | | | - Evgenia Pak
- Cytogenetic and Microscopy Core, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Kristof Tigyi
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Milinn Kremitzki
- McDonnell Genome Institute at Washington University, St Louis, MO, USA
| | | | - Valerie Maduro
- Undiagnosed Diseases Program, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Amalia Dutra
- Cytogenetic and Microscopy Core, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Gerard G Bouffard
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Rockville, MD, USA
| | - Alexander M Chang
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Nancy F Hansen
- Comparative Genomics Analysis Unit, Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Amy B Wilfert
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | | | | | | | - Megan Y Dennis
- Department of Biochemistry and Molecular Medicine, Genome Center, MIND Institute, University of California Davis, Davis, CA, USA
| | - Daniela C Soto
- Department of Biochemistry and Molecular Medicine, Genome Center, MIND Institute, University of California Davis, Davis, CA, USA
| | - Ruta Sahasrabudhe
- DNA Technologies Core, Genome Center, University of California Davis, Davis, CA, USA
| | - Gulhan Kaya
- Department of Biochemistry and Molecular Medicine, Genome Center, MIND Institute, University of California Davis, Davis, CA, USA
| | - Josh Quick
- Institute of Microbiology and Infection, University of Birmingham, Birmingham, UK
| | - Nicholas J Loman
- Institute of Microbiology and Infection, University of Birmingham, Birmingham, UK
| | - Nadine Holmes
- DeepSeq, School of Life Sciences, University of Nottingham, Nottingham, UK
| | - Matthew Loose
- DeepSeq, School of Life Sciences, University of Nottingham, Nottingham, UK
| | - Urvashi Surti
- Department of Pathology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Rosa Ana Risques
- Department of Pathology, University of Washington, Seattle, WA, USA
| | | | - Robert Fulton
- McDonnell Genome Institute at Washington University, St Louis, MO, USA
| | - Ira Hall
- McDonnell Genome Institute at Washington University, St Louis, MO, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | | | - Winston Timp
- Department of Molecular Biology and Genetics, Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Alice Young
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Rockville, MD, USA
| | - James C Mullikin
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Rockville, MD, USA
| | - Pavel A Pevzner
- Department of Computer Science and Engineering, University of California San Diego, San Diego, CA, USA
| | | | - Beth A Sullivan
- Department of Molecular Genetics and Microbiology, Division of Human Genetics, Duke University Medical Center, Durham, NC, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
16
|
Heitkam T, Weber B, Walter I, Liedtke S, Ost C, Schmidt T. Satellite DNA landscapes after allotetraploidization of quinoa (Chenopodium quinoa) reveal unique A and B subgenomes. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2020; 103:32-52. [PMID: 31981259 DOI: 10.1111/tpj.14705] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/04/2019] [Revised: 12/10/2019] [Accepted: 01/17/2020] [Indexed: 06/10/2023]
Abstract
If two related plant species hybridize, their genomes may be combined and duplicated within a single nucleus, thereby forming an allotetraploid. How the emerging plant balances two co-evolved genomes is still a matter of ongoing research. Here, we focus on satellite DNA (satDNA), the fastest turn-over sequence class in eukaryotes, aiming to trace its emergence, amplification, and loss during plant speciation and allopolyploidization. As a model, we used Chenopodium quinoa Willd. (quinoa), an allopolyploid crop with 2n = 4x = 36 chromosomes. Quinoa originated by hybridization of an unknown female American Chenopodium diploid (AA genome) with an unknown male Old World diploid species (BB genome), dating back 3.3-6.3 million years. Applying short read clustering to quinoa (AABB), C. pallidicaule (AA), and C. suecicum (BB) whole genome shotgun sequences, we classified their repetitive fractions, and identified and characterized seven satDNA families, together with the 5S rDNA model repeat. We show unequal satDNA amplification (two families) and exclusive occurrence (four families) in the AA and BB diploids by read mapping as well as Southern, genomic, and fluorescent in situ hybridization. Whereas the satDNA distributions support C. suecicum as possible parental species, we were able to exclude C. pallidicaule as progenitor due to unique repeat profiles. Using quinoa long reads and scaffolds, we detected only limited evidence of intergenomic homogenization of satDNA after allopolyploidization, but were able to exclude dispersal of 5S rRNA genes between subgenomes. Our results exemplify the complex route of tandem repeat evolution through Chenopodium speciation and allopolyploidization, and may provide sequence targets for the identification of quinoa's progenitors.
Collapse
Affiliation(s)
- Tony Heitkam
- Institute of Botany, Technische Universität Dresden, 01069, Dresden, Germany
| | - Beatrice Weber
- Institute of Botany, Technische Universität Dresden, 01069, Dresden, Germany
| | - Ines Walter
- Institute of Botany, Technische Universität Dresden, 01069, Dresden, Germany
| | - Susan Liedtke
- Institute of Botany, Technische Universität Dresden, 01069, Dresden, Germany
| | - Charlotte Ost
- Institute of Botany, Technische Universität Dresden, 01069, Dresden, Germany
- Institute of Biology, Martin-Luther-Universität Halle-Wittenberg, 06120, Halle (Saale), Germany
| | - Thomas Schmidt
- Institute of Botany, Technische Universität Dresden, 01069, Dresden, Germany
| |
Collapse
|
17
|
Easterling KA, Pitra NJ, Morcol TB, Aquino JR, Lopes LG, Bussey KC, Matthews PD, Bass HW. Identification of tandem repeat families from long-read sequences of Humulus lupulus. PLoS One 2020; 15:e0233971. [PMID: 32502183 PMCID: PMC7274563 DOI: 10.1371/journal.pone.0233971] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2020] [Accepted: 05/16/2020] [Indexed: 11/28/2022] Open
Abstract
Hop (Humulus lupulus L.) is known for its use as a bittering agent in beer and has a rich history of cultivation, beginning in Europe and now spanning the globe. There are five wild varieties worldwide, which may have been introgressed with cultivated varieties. As a dioecious species, its obligate outcrossing, non-Mendelian inheritance, and genomic structural variability have confounded directed breeding efforts. Consequently, understanding the hop genome represents a considerable challenge, requiring additional resources. In order to facilitate investigations into the transmission genetics of hop, we report here a tandem repeat discovery pipeline developed using k-mer filtering and dot plot analysis of PacBio long-read sequences from the hop cultivar Apollo. From this we identified 17 new and distinct tandem repeat sequence families, which represent candidates for FISH probe development. For two of these candidates, HuluTR120 and HuluTR225, we produced oligonucleotide FISH probes from conserved regions of and demonstrated their utility by staining meiotic chromosomes from wild hop, var. neomexicanus to address, for example, questions about hop transmission genetics. Collectively, these tandem repeat sequence families represent new resources suitable for development of additional cytogenomic tools for hop research.
Collapse
Affiliation(s)
- Katherine A. Easterling
- Department of Biological Science, Florida State University, Tallahassee, FL, United States America
- Hopsteiner, S.S. Steiner, Inc., New York, New York, United States America
| | - Nicholi J. Pitra
- Hopsteiner, S.S. Steiner, Inc., New York, New York, United States America
| | - Taylan B. Morcol
- Hopsteiner, S.S. Steiner, Inc., New York, New York, United States America
- Department of Biological Sciences, Lehman College, City University of New York, Bronx, New York, United States America
- The Graduate Center, City University of New York, New York, New York, United States America
| | - Jenna R. Aquino
- Department of Biological Science, Florida State University, Tallahassee, FL, United States America
| | - Lauren G. Lopes
- Department of Biological Science, Florida State University, Tallahassee, FL, United States America
| | - Kristin C. Bussey
- Department of Biological Science, Florida State University, Tallahassee, FL, United States America
| | - Paul D. Matthews
- Hopsteiner, S.S. Steiner, Inc., New York, New York, United States America
| | - Hank W. Bass
- Department of Biological Science, Florida State University, Tallahassee, FL, United States America
| |
Collapse
|
18
|
Discovery of 33mer in chromosome 21 - the largest alpha satellite higher order repeat unit among all human somatic chromosomes. Sci Rep 2019; 9:12629. [PMID: 31477765 PMCID: PMC6718397 DOI: 10.1038/s41598-019-49022-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2019] [Accepted: 08/13/2019] [Indexed: 11/10/2022] Open
Abstract
The centromere is important for segregation of chromosomes during cell division in eukaryotes. Its destabilization results in chromosomal missegregation, aneuploidy, hallmarks of cancers and birth defects. In primate genomes centromeres contain tandem repeats of ~171 bp alpha satellite DNA, commonly organized into higher order repeats (HORs). In spite of crucial importance, satellites have been understudied because of gaps in sequencing - genomic “black holes”. Bioinformatical studies of genomic sequences open possibilities to revolutionize understanding of repetitive DNA datasets. Here, using robust (Global Repeat Map) algorithm we identified in hg38 sequence of human chromosome 21 complete ensemble of alpha satellite HORs with six long repeat units (≥20 mers), five of them novel. Novel 33mer HOR has the longest HOR unit identified so far among all somatic chromosomes and novel 23mer reverse HOR is distant far from the centromere. Also, we discovered that for hg38 assembly the 33mer sequences in chromosomes 21, 13, 14, and 22 are 100% identical but nearby gaps are present; that seems to require an additional more precise sequencing. Chromosome 21 is of significant interest for deciphering the molecular base of Down syndrome and of aneuploidies in general. Since the chromosome identifier probes are largely based on the detection of higher order alpha satellite repeats, distinctions between alpha satellite HORs in chromosomes 21 and 13 here identified might lead to a unique chromosome 21 probe in molecular cytogenetics, which would find utility in diagnostics. It is expected that its complete sequence analysis will have profound implications for understanding pathogenesis of diseases and development of new therapeutic approaches.
Collapse
|
19
|
Rodrigues PHDM, Dos Santos RZ, Silva DMZDA, Goes CAG, Oliveira C, Foresti F, Porto-Foresti F, Utsunomia R. Chromosomal and Genomic Dynamics of Satellite DNAs in Characidae (Characiformes, Teleostei) Species. Zebrafish 2019; 16:408-414. [PMID: 31145041 DOI: 10.1089/zeb.2019.1738] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Satellite DNAs (satDNAs) are tandemly repeated DNA sequences with great abundance in eukaryotic genomes. A single species may carry up to hundreds of satDNA families, which is collectively called as "satellitome," each showing its own dynamics and evolution rates. In this context, all live species contain a satDNA library that may be partially or totally shared with other related species/populations. In the late few years, next-generation sequencing (NGS) and novel bioinformatic tools facilitated the massive characterization of these sequences at low costs, and consequently, comparing satDNAs between species. In this study, we characterized two novel satDNAs (MsaSat03-80 and MsaSat04-142) in three characid fish (Astyanax paranae and Astyanax fasciatus and two populations of Moenkhausia sanctaefilomenae) and mapped their chromosomal location to unveil the evolutionary dynamics of satDNA repeats in those species. Our results evidenced that MsaSat03 is present in the genomes of all analyzed species, but is clustered only in the chromosomes of M. sanctaefilomenae, exhibiting a conserved number and location of sites. Conversely, MsaSat04 sequences is restricted to M. sanctaefilomenae and shows a differential distribution between the two analyzed populations. Altogether, our analyses point to a complex history of satDNA families in characid fish and the utility of NGS data for comparative satDNA analysis.
Collapse
Affiliation(s)
| | - Rodrigo Zeni Dos Santos
- 1Departamento de Ciências Biológicas, Faculdade de Ciências, Universidade Estadual Paulista-UNESP, Bauru, Brazil
| | | | - Caio Augusto Gomes Goes
- 1Departamento de Ciências Biológicas, Faculdade de Ciências, Universidade Estadual Paulista-UNESP, Bauru, Brazil
| | - Claudio Oliveira
- 2Departamento de Morfologia, Instituto de Biociências, Universidade Estadual Paulista-UNESP, Botucatu, Brazil
| | - Fausto Foresti
- 2Departamento de Morfologia, Instituto de Biociências, Universidade Estadual Paulista-UNESP, Botucatu, Brazil
| | - Fábio Porto-Foresti
- 1Departamento de Ciências Biológicas, Faculdade de Ciências, Universidade Estadual Paulista-UNESP, Bauru, Brazil
| | - Ricardo Utsunomia
- 1Departamento de Ciências Biológicas, Faculdade de Ciências, Universidade Estadual Paulista-UNESP, Bauru, Brazil.,2Departamento de Morfologia, Instituto de Biociências, Universidade Estadual Paulista-UNESP, Botucatu, Brazil
| |
Collapse
|
20
|
Miga KH. Centromeric Satellite DNAs: Hidden Sequence Variation in the Human Population. Genes (Basel) 2019; 10:E352. [PMID: 31072070 PMCID: PMC6562703 DOI: 10.3390/genes10050352] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Revised: 05/03/2019] [Accepted: 05/03/2019] [Indexed: 12/30/2022] Open
Abstract
The central goal of medical genomics is to understand the inherited basis of sequence variation that underlies human physiology, evolution, and disease. Functional association studies currently ignore millions of bases that span each centromeric region and acrocentric short arm. These regions are enriched in long arrays of tandem repeats, or satellite DNAs, that are known to vary extensively in copy number and repeat structure in the human population. Satellite sequence variation in the human genome is often so large that it is detected cytogenetically, yet due to the lack of a reference assembly and informatics tools to measure this variability, contemporary high-resolution disease association studies are unable to detect causal variants in these regions. Nevertheless, recently uncovered associations between satellite DNA variation and human disease support that these regions present a substantial and biologically important fraction of human sequence variation. Therefore, there is a pressing and unmet need to detect and incorporate this uncharacterized sequence variation into broad studies of human evolution and medical genomics. Here I discuss the current knowledge of satellite DNA variation in the human genome, focusing on centromeric satellites and their potential implications for disease.
Collapse
Affiliation(s)
- Karen H Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, California, CA 95064, USA.
| |
Collapse
|
21
|
Abstract
Over 100 whole-genome sequences from algae are published or soon to be published. The rapidly increasing availability of these fundamental resources is changing how we understand one of the most diverse, complex, and understudied groups of photosynthetic eukaryotes. Genome sequences provide a window into the functional potential of individual algae, with phylogenomics and functional genomics as tools for contextualizing and transferring knowledge from reference organisms into less well-characterized systems. Remarkably, over half of the proteins encoded by algal genomes are of unknown function, highlighting the volume of functional capabilities yet to be discovered. In this review, we provide an overview of publicly available algal genomes, their associated protein inventories, and their quality, with a summary of the statuses of protein function understanding and predictions.
Collapse
Affiliation(s)
| | - Sabeeha S Merchant
- Departments of Plant and Microbial Biology and Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- Institute for Genomics and Proteomics, University of California, Los Angeles, California 90095, USA
| |
Collapse
|
22
|
Conte MA, Joshi R, Moore EC, Nandamuri SP, Gammerdinger WJ, Roberts RB, Carleton KL, Lien S, Kocher TD. Chromosome-scale assemblies reveal the structural evolution of African cichlid genomes. Gigascience 2019; 8:giz030. [PMID: 30942871 PMCID: PMC6447674 DOI: 10.1093/gigascience/giz030] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2018] [Revised: 01/11/2019] [Accepted: 03/07/2019] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND African cichlid fishes are well known for their rapid radiations and are a model system for studying evolutionary processes. Here we compare multiple, high-quality, chromosome-scale genome assemblies to elucidate the genetic mechanisms underlying cichlid diversification and study how genome structure evolves in rapidly radiating lineages. RESULTS We re-anchored our recent assembly of the Nile tilapia (Oreochromis niloticus) genome using a new high-density genetic map. We also developed a new de novo genome assembly of the Lake Malawi cichlid, Metriaclima zebra, using high-coverage Pacific Biosciences sequencing, and anchored contigs to linkage groups (LGs) using 4 different genetic maps. These new anchored assemblies allow the first chromosome-scale comparisons of African cichlid genomes. Large intra-chromosomal structural differences (∼2-28 megabase pairs) among species are common, while inter-chromosomal differences are rare (<10 megabase pairs total). Placement of the centromeres within the chromosome-scale assemblies identifies large structural differences that explain many of the karyotype differences among species. Structural differences are also associated with unique patterns of recombination on sex chromosomes. Structural differences on LG9, LG11, and LG20 are associated with reduced recombination, indicative of inversions between the rock- and sand-dwelling clades of Lake Malawi cichlids. M. zebra has a larger number of recent transposable element insertions compared with O. niloticus, suggesting that several transposable element families have a higher rate of insertion in the haplochromine cichlid lineage. CONCLUSION This study identifies novel structural variation among East African cichlid genomes and provides a new set of genomic resources to support research on the mechanisms driving cichlid adaptation and speciation.
Collapse
Affiliation(s)
- Matthew A Conte
- Department of Biology, University of Maryland, College Park, MD 20742, USA
| | - Rajesh Joshi
- Centre for Integrative Genetics (CIGENE), Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, PO Box 5003, Ås, Norway
| | - Emily C Moore
- Department of Biological Sciences and W. M. Keck Center for Behavioral Biology, North Carolina State University, Raleigh, NC 27695, USA
| | | | | | - Reade B Roberts
- Department of Biological Sciences and W. M. Keck Center for Behavioral Biology, North Carolina State University, Raleigh, NC 27695, USA
| | - Karen L Carleton
- Department of Biology, University of Maryland, College Park, MD 20742, USA
| | - Sigbjørn Lien
- Centre for Integrative Genetics (CIGENE), Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, PO Box 5003, Ås, Norway
| | - Thomas D Kocher
- Department of Biology, University of Maryland, College Park, MD 20742, USA
| |
Collapse
|
23
|
Uralsky L, Shepelev V, Alexandrov A, Yurov Y, Rogaev E, Alexandrov I. Classification and monomer-by-monomer annotation dataset of suprachromosomal family 1 alpha satellite higher-order repeats in hg38 human genome assembly. Data Brief 2019; 24:103708. [PMID: 30989093 PMCID: PMC6447721 DOI: 10.1016/j.dib.2019.103708] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2018] [Revised: 01/16/2019] [Accepted: 01/22/2019] [Indexed: 01/27/2023] Open
Abstract
In the latest hg38 human genome assembly, centromeric gaps has been filled in by alpha satellite (AS) reference models (RMs) which are statistical representations of homogeneous higher-order repeat (HOR) arrays that make up the bulk of the centromeric regions. We analyzed these models to compose an atlas of human AS HORs where each monomer of a HOR was represented by a number of its polymorphic sequence variants. We combined these data and HMMER sequence analysis platform to annotate AS HORs in the assembly. This led to discovery of a new type of low copy number highly divergent HORs which were not represented by RMs. These were included in the dataset. The annotation can be viewed as UCSC Genome Browser custom track (the HOR-track) and used together with our previous annotation of AS suprachromosomal families (SFs) in the same assembly, where each AS monomer can be viewed in its genomic context together with its classification into one of the 5 major SFs (the SF-track). To catalog the diversity of AS HORs in the human genome we introduced a new naming system. Each HOR received a name which showed its SF, chromosomal location and index number. Here we present the first installment of the HOR-track covering only the 17 HORs that belong to SF1 which forms live functional centromeres in chromosomes 1, 3, 5, 6, 7, 10, 12, 16 and 19 and also a large number of minor dead HOR domains, both homogeneous and divergent. Monomer-by-monomer HOR annotation used for this dataset as opposed to annotation of whole HOR repeats provides for mapping and quantification of various structural variants of AS HORs which can be used to collect data on inter-individual polymorphism of AS.
Collapse
Affiliation(s)
- L.I. Uralsky
- Institute of Molecular Genetics, Russian Academy of Sciences, Kurchatov Sq. 2, Moscow 123182, Russia
- Department of Genomics and Human Genetics, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119991, Russia
| | - V.A. Shepelev
- Institute of Molecular Genetics, Russian Academy of Sciences, Kurchatov Sq. 2, Moscow 123182, Russia
- Department of Genomics and Human Genetics, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119991, Russia
| | - A.A. Alexandrov
- Institute of Molecular Genetics, Russian Academy of Sciences, Kurchatov Sq. 2, Moscow 123182, Russia
| | - Y.B. Yurov
- Research Center of Mental Health, Zagorodnoe Sh. 2, Moscow 113152, Russia
| | - E.I. Rogaev
- Department of Genomics and Human Genetics, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119991, Russia
- Department of Psychiatry, Brudnick Neuropsychiatric Research Institute, University of Massachusetts Medical School, Worcester, MA 01604, USA
- Lomonosov Moscow State University, Biological Department, Center for Genetics and Genetic Technologies, Moscow, 119192, Russia
- Corresponding authors. Department of Genomics and Human Genetics, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119991, Russia.
| | - I.A. Alexandrov
- Department of Genomics and Human Genetics, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119991, Russia
- Research Center of Mental Health, Zagorodnoe Sh. 2, Moscow 113152, Russia
- Corresponding authors. Department of Genomics and Human Genetics, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119991, Russia.
| |
Collapse
|
24
|
Lower SS, McGurk MP, Clark AG, Barbash DA. Satellite DNA evolution: old ideas, new approaches. Curr Opin Genet Dev 2018; 49:70-78. [PMID: 29579574 PMCID: PMC5975084 DOI: 10.1016/j.gde.2018.03.003] [Citation(s) in RCA: 93] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2017] [Revised: 02/02/2018] [Accepted: 03/08/2018] [Indexed: 12/22/2022]
Abstract
A substantial portion of the genomes of most multicellular eukaryotes consists of large arrays of tandemly repeated sequence, collectively called satellite DNA. The processes generating and maintaining different satellite DNA abundances across lineages are important to understand as satellites have been linked to chromosome mis-segregation, disease phenotypes, and reproductive isolation between species. While much theory has been developed to describe satellite evolution, empirical tests of these models have fallen short because of the challenges in assessing satellite repeat regions of the genome. Advances in computational tools and sequencing technologies now enable identification and quantification of satellite sequences genome-wide. Here, we describe some of these tools and how their applications are furthering our knowledge of satellite evolution and function.
Collapse
Affiliation(s)
- Sarah Sander Lower
- Department of Molecular Biology and Genetics, Cornell University, 526 Campus Rd, Ithaca, NY 14853, United States
| | - Michael P McGurk
- Department of Molecular Biology and Genetics, Cornell University, 526 Campus Rd, Ithaca, NY 14853, United States
| | - Andrew G Clark
- Department of Molecular Biology and Genetics, Cornell University, 526 Campus Rd, Ithaca, NY 14853, United States
| | - Daniel A Barbash
- Department of Molecular Biology and Genetics, Cornell University, 526 Campus Rd, Ithaca, NY 14853, United States.
| |
Collapse
|
25
|
Ichikawa K, Tomioka S, Suzuki Y, Nakamura R, Doi K, Yoshimura J, Kumagai M, Inoue Y, Uchida Y, Irie N, Takeda H, Morishita S. Centromere evolution and CpG methylation during vertebrate speciation. Nat Commun 2017. [PMID: 29184138 DOI: 10.1038/s41467-017-01982-7.] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
Centromeres and large-scale structural variants evolve and contribute to genome diversity during vertebrate speciation. Here, we perform de novo long-read genome assembly of three inbred medaka strains that are derived from geographically isolated subpopulations and undergo speciation. Using single-molecule real-time (SMRT) sequencing, we obtain three chromosome-mapped genomes of length ~734, ~678, and ~744Mbp with a resource of twenty-two centromeric regions of length 20-345kbp. Centromeres are positionally conserved among the three strains and even between four pairs of chromosomes that were duplicated by the teleost-specific whole-genome duplication 320-350 million years ago. The centromeres do not all evolve at a similar pace; rather, centromeric monomers in non-acrocentric chromosomes evolve significantly faster than those in acrocentric chromosomes. Using methylation sensitive SMRT reads, we uncover centromeres are mostly hypermethylated but have hypomethylated sub-regions that acquire unique sequence compositions independently. These findings reveal the potential of non-acrocentric centromere evolution to contribute to speciation.
Collapse
Affiliation(s)
- Kazuki Ichikawa
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba, 277-8583, Japan
| | - Shingo Tomioka
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba, 277-8583, Japan
| | - Yuta Suzuki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba, 277-8583, Japan
| | - Ryohei Nakamura
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan
| | - Koichiro Doi
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba, 277-8583, Japan
| | - Jun Yoshimura
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba, 277-8583, Japan
| | - Masahiko Kumagai
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan
| | - Yusuke Inoue
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan
| | - Yui Uchida
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan
| | - Naoki Irie
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan
| | - Hiroyuki Takeda
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan.
| | - Shinich Morishita
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba, 277-8583, Japan.
| |
Collapse
|
26
|
Centromere evolution and CpG methylation during vertebrate speciation. Nat Commun 2017; 8:1833. [PMID: 29184138 PMCID: PMC5705604 DOI: 10.1038/s41467-017-01982-7] [Citation(s) in RCA: 46] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2017] [Accepted: 10/31/2017] [Indexed: 11/10/2022] Open
Abstract
Centromeres and large-scale structural variants evolve and contribute to genome diversity during vertebrate speciation. Here, we perform de novo long-read genome assembly of three inbred medaka strains that are derived from geographically isolated subpopulations and undergo speciation. Using single-molecule real-time (SMRT) sequencing, we obtain three chromosome-mapped genomes of length ~734, ~678, and ~744Mbp with a resource of twenty-two centromeric regions of length 20–345kbp. Centromeres are positionally conserved among the three strains and even between four pairs of chromosomes that were duplicated by the teleost-specific whole-genome duplication 320–350 million years ago. The centromeres do not all evolve at a similar pace; rather, centromeric monomers in non-acrocentric chromosomes evolve significantly faster than those in acrocentric chromosomes. Using methylation sensitive SMRT reads, we uncover centromeres are mostly hypermethylated but have hypomethylated sub-regions that acquire unique sequence compositions independently. These findings reveal the potential of non-acrocentric centromere evolution to contribute to speciation. Centromeres and large-scale structural variants evolve and contribute to genome diversity during vertebrate speciation. Here Ichikawa et al perform de novo long-read genome assembly of three inbred medaka strains, and report long-range structure of centromeres and their methylation as well as correlation of structural variants with differential gene expression.
Collapse
|
27
|
Larsen PA, Harris RA, Liu Y, Murali SC, Campbell CR, Brown AD, Sullivan BA, Shelton J, Brown SJ, Raveendran M, Dudchenko O, Machol I, Durand NC, Shamim MS, Aiden EL, Muzny DM, Gibbs RA, Yoder AD, Rogers J, Worley KC. Hybrid de novo genome assembly and centromere characterization of the gray mouse lemur (Microcebus murinus). BMC Biol 2017; 15:110. [PMID: 29145861 PMCID: PMC5689209 DOI: 10.1186/s12915-017-0439-6] [Citation(s) in RCA: 50] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2017] [Accepted: 10/10/2017] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND The de novo assembly of repeat-rich mammalian genomes using only high-throughput short read sequencing data typically results in highly fragmented genome assemblies that limit downstream applications. Here, we present an iterative approach to hybrid de novo genome assembly that incorporates datasets stemming from multiple genomic technologies and methods. We used this approach to improve the gray mouse lemur (Microcebus murinus) genome from early draft status to a near chromosome-scale assembly. METHODS We used a combination of advanced genomic technologies to iteratively resolve conflicts and super-scaffold the M. murinus genome. RESULTS We improved the M. murinus genome assembly to a scaffold N50 of 93.32 Mb. Whole genome alignments between our primary super-scaffolds and 23 human chromosomes revealed patterns that are congruent with historical comparative cytogenetic data, thus demonstrating the accuracy of our de novo scaffolding approach and allowing assignment of scaffolds to M. murinus chromosomes. Moreover, we utilized our independent datasets to discover and characterize sequences associated with centromeres across the mouse lemur genome. Quality assessment of the final assembly found 96% of mouse lemur canonical transcripts nearly complete, comparable to other published high-quality reference genome assemblies. CONCLUSIONS We describe a new assembly of the gray mouse lemur (Microcebus murinus) genome with chromosome-scale scaffolds produced using a hybrid bioinformatic and sequencing approach. The approach is cost effective and produces superior results based on metrics of contiguity and completeness. Our results show that emerging genomic technologies can be used in combination to characterize centromeres of non-model species and to produce accurate de novo chromosome-scale genome assemblies of complex mammalian genomes.
Collapse
Affiliation(s)
- Peter A. Larsen
- Department of Biology, Duke University, Durham, NC 27708 USA
| | - R. Alan Harris
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030 USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030 USA
| | - Yue Liu
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030 USA
| | - Shwetha C. Murali
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030 USA
- Present address: Department of Genome Sciences, University of Washington, Seattle, WA 98195 USA
| | | | - Adam D. Brown
- Department of Pharmacology and Cancer Biology, Duke University, Durham, NC 27710 USA
- Present address: Bristol Myers-Squibb, 420 W Round Grove Rd, Lewisville, TX 75067 USA
| | - Beth A. Sullivan
- Department of Molecular Genetics and Microbiology, Duke University, Durham, NC 27710 USA
| | - Jennifer Shelton
- Kansas State University Bioinformatics Center, Division of Biology, Kansas State University, Manhattan, KS 66506 USA
- Present address: New York Genome Center, 101 Avenue of the Americas, New York, NY 10013 USA
| | - Susan J. Brown
- Kansas State University Bioinformatics Center, Division of Biology, Kansas State University, Manhattan, KS 66506 USA
| | | | - Olga Dudchenko
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030 USA
- The Center for Theoretical Biological Physics, Rice University, Houston, TX 77005 USA
- Department of Computer Science, Rice University, Houston, TX 77005 USA
| | - Ido Machol
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030 USA
- The Center for Theoretical Biological Physics, Rice University, Houston, TX 77005 USA
- Department of Computer Science, Rice University, Houston, TX 77005 USA
| | - Neva C. Durand
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030 USA
- The Center for Theoretical Biological Physics, Rice University, Houston, TX 77005 USA
- Department of Computer Science, Rice University, Houston, TX 77005 USA
| | - Muhammad S. Shamim
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030 USA
- The Center for Theoretical Biological Physics, Rice University, Houston, TX 77005 USA
- Department of Computer Science, Rice University, Houston, TX 77005 USA
| | - Erez Lieberman Aiden
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030 USA
- The Center for Theoretical Biological Physics, Rice University, Houston, TX 77005 USA
- Department of Computer Science, Rice University, Houston, TX 77005 USA
| | - Donna M. Muzny
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030 USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030 USA
| | - Richard A. Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030 USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030 USA
| | - Anne D. Yoder
- Department of Biology, Duke University, Durham, NC 27708 USA
| | - Jeffrey Rogers
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030 USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030 USA
| | - Kim C. Worley
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030 USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030 USA
| |
Collapse
|
28
|
Abstract
The genetic material, contained on chromosomes, is often described as the "blueprint for life." During nuclear division, the chromosomes are pulled into each of the two daughter nuclei by the coordination of spindle microtubules, kinetochores, centromeres, and chromatin. These four functional units must link the chromosomes to the microtubules, signal to the cell when the attachment is made so that division can proceed, and withstand the force generated by pulling the chromosomes to either daughter cell. To perform each of these functions, kinetochores are large protein complexes, approximately 5MDa in size, and they contain at least 45 unique proteins. Many of the central components in the kinetochore are well conserved, yielding a common core of proteins forming consistent structures. However, many of the peripheral subcomplexes vary between different taxonomic groups, including changes in primary sequence and gain or loss of whole proteins. It is still unclear how significant these changes are, and answers to this question may provide insights into adaptation to specific lifestyles or progression of disease that involve chromosome instability.
Collapse
|
29
|
Higher-order organisation of extremely amplified, potentially functional and massively methylated 5S rDNA in European pikes (Esox sp.). BMC Genomics 2017; 18:391. [PMID: 28521734 PMCID: PMC5437419 DOI: 10.1186/s12864-017-3774-7] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2016] [Accepted: 05/09/2017] [Indexed: 12/16/2022] Open
Abstract
Background Pikes represent an important genus (Esox) harbouring a pre-duplication karyotype (2n = 2x = 50) of economically important salmonid pseudopolyploids. Here, we have characterized the 5S ribosomal RNA genes (rDNA) in Esox lucius and its closely related E. cisalpinus using cytogenetic, molecular and genomic approaches. Intragenomic homogeneity and copy number estimation was carried out using Illumina reads. The higher-order structure of rDNA arrays was investigated by the analysis of long PacBio reads. Position of loci on chromosomes was determined by FISH. DNA methylation was analysed by methylation-sensitive restriction enzymes. Results The 5S rDNA loci occupy exclusively (peri)centromeric regions on 30–38 acrocentric chromosomes in both E. lucius and E. cisalpinus. The large number of loci is accompanied by extreme amplification of genes (>20,000 copies), which is to the best of our knowledge one of the highest copy number of rRNA genes in animals ever reported. Conserved secondary structures of predicted 5S rRNAs indicate that most of the amplified genes are potentially functional. Only few SNPs were found in genic regions indicating their high homogeneity while intergenic spacers were more heterogeneous and several families were identified. Analysis of 10–30 kb-long molecules sequenced by the PacBio technology (containing about 40% of total 5S rDNA) revealed that the vast majority (96%) of genes are organised in large several kilobase-long blocks. Dispersed genes or short tandems were less common (4%). The adjacent 5S blocks were directly linked, separated by intervening DNA and even inverted. The 5S units differing in the intergenic spacers formed both homogeneous and heterogeneous (mixed) blocks indicating variable degree of homogenisation between the loci. Both E. lucius and E. cisalpinus 5S rDNA was heavily methylated at CG dinucleotides. Conclusions Extreme amplification of 5S rRNA genes in the Esox genome occurred in the absence of significant pseudogenisation suggesting its recent origin and/or intensive homogenisation processes. The dense methylation of units indicates that powerful epigenetic mechanisms have evolved in this group of fish to silence amplified genes. We discuss how the higher-order repeat structures impact on homogenisation of 5S rDNA in the genome. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3774-7) contains supplementary material, which is available to authorized users.
Collapse
|
30
|
Weissensteiner MH, Pang AWC, Bunikis I, Höijer I, Vinnere-Petterson O, Suh A, Wolf JBW. Combination of short-read, long-read, and optical mapping assemblies reveals large-scale tandem repeat arrays with population genetic implications. Genome Res 2017; 27:697-708. [PMID: 28360231 PMCID: PMC5411765 DOI: 10.1101/gr.215095.116] [Citation(s) in RCA: 67] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2016] [Accepted: 03/10/2017] [Indexed: 12/27/2022]
Abstract
Accurate and contiguous genome assembly is key to a comprehensive understanding of the processes shaping genomic diversity and evolution. Yet, it is frequently constrained by constitutive heterochromatin, usually characterized by highly repetitive DNA. As a key feature of genome architecture associated with centromeric and subtelomeric regions, it locally influences meiotic recombination. In this study, we assess the impact of large tandem repeat arrays on the recombination rate landscape in an avian speciation model, the Eurasian crow. We assembled two high-quality genome references using single-molecule real-time sequencing (long-read assembly [LR]) and single-molecule optical maps (optical map assembly [OM]). A three-way comparison including the published short-read assembly (SR) constructed for the same individual allowed assessing assembly properties and pinpointing misassemblies. By combining information from all three assemblies, we characterized 36 previously unidentified large repetitive regions in the proximity of sequence assembly breakpoints, the majority of which contained complex arrays of a 14-kb satellite repeat or its 1.2-kb subunit. Using whole-genome population resequencing data, we estimated the population-scaled recombination rate (ρ) and found it to be significantly reduced in these regions. These findings are consistent with an effect of low recombination in regions adjacent to centromeric or subtelomeric heterochromatin and add to our understanding of the processes generating widespread heterogeneity in genetic diversity and differentiation along the genome. By combining three different technologies, our results highlight the importance of adding a layer of information on genome structure that is inaccessible to each approach independently.
Collapse
Affiliation(s)
- Matthias H Weissensteiner
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, SE-752 36 Uppsala, Sweden
- Division of Evolutionary Biology, Faculty of Biology, Ludwig-Maximilian University of Munich, 82152 Planegg-Martinsried, Germany
| | | | - Ignas Bunikis
- SciLife Lab Uppsala, Uppsala University SE-751 85 Uppsala, Sweden
| | - Ida Höijer
- SciLife Lab Uppsala, Uppsala University SE-751 85 Uppsala, Sweden
| | | | - Alexander Suh
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, SE-752 36 Uppsala, Sweden
| | - Jochen B W Wolf
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, SE-752 36 Uppsala, Sweden
- Division of Evolutionary Biology, Faculty of Biology, Ludwig-Maximilian University of Munich, 82152 Planegg-Martinsried, Germany
| |
Collapse
|
31
|
Abstract
Centromeric chromatin undergoes major changes in composition and architecture during each cell cycle. These changes in specialized chromatin facilitate kinetochore formation in mitosis to ensure proper chromosome segregation. Thus, proper orchestration of centromeric chromatin dynamics during interphase, including replication in S phase, is crucial. We provide the current view concerning the centromeric architecture associated with satellite repeat sequences in mammals and its dynamics during the cell cycle. We summarize the contributions of deposited histone variants and their chaperones, other centromeric components - including proteins and their post-translational modifications, and RNAs - and we link the expression and deposition timing of each component during the cell cycle. Because neocentromeres occur at ectopic sites, we highlight how cell cycle processes can go wrong, leading to neocentromere formation and potentially disease.
Collapse
Affiliation(s)
- Sebastian Müller
- Institut Curie, PSL Research University, CNRS, UMR3664, Equipe Labellisée Ligue contre le Cancer, F-75005 Paris, France.,Sorbonne Universités, UPMC Université Paris 06, CNRS, UMR3664, F-75005 Paris, France
| | - Geneviève Almouzni
- Institut Curie, PSL Research University, CNRS, UMR3664, Equipe Labellisée Ligue contre le Cancer, F-75005 Paris, France.,Sorbonne Universités, UPMC Université Paris 06, CNRS, UMR3664, F-75005 Paris, France
| |
Collapse
|