1
|
Glunčić M, Vlahović I, Rosandić M, Paar V. Novel Cascade Alpha Satellite HORs in Orangutan Chromosome 13 Assembly: Discovery of the 59mer HOR-The largest Unit in Primates-And the Missing Triplet 45/27/18 HOR in Human T2T-CHM13v2.0 Assembly. Int J Mol Sci 2024; 25:7596. [PMID: 39062839 PMCID: PMC11276891 DOI: 10.3390/ijms25147596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Revised: 07/05/2024] [Accepted: 07/09/2024] [Indexed: 07/28/2024] Open
Abstract
From the recent genome assembly NHGRI_mPonAbe1-v2.0_NCBI (GCF_028885655.2) of orangutan chromosome 13, we computed the precise alpha satellite higher-order repeat (HOR) structure using the novel high-precision GRM2023 algorithm with Global Repeat Map (GRM) and Monomer Distance (MD) diagrams. This study rigorously identified alpha satellite HORs in the centromere of orangutan chromosome 13, discovering a novel 59mer HOR-the longest HOR unit identified in any primate to date. Additionally, it revealed the first intertwined sequence of three HORs, 18mer/27mer/45mer HORs, with a common aligned "backbone" across all HOR copies. The major 7mer HOR exhibits a Willard's-type canonical copy, although some segments of the array display significant irregularities. In contrast, the 14mer HOR forms a regular Willard's-type HOR array. Surprisingly, the GRM2023 high-precision analysis of chromosome 13 of human genome assembly T2T-CHM13v2.0 reveals the presence of only a 7mer HOR, despite both the orangutan and human genome assemblies being derived from whole genome shotgun sequences.
Collapse
Affiliation(s)
- Matko Glunčić
- Faculty of Science, University of Zagreb, 10000 Zagreb, Croatia;
| | - Ines Vlahović
- Department of Interdisciplinary Sciences, Algebra University College, 10000 Zagreb, Croatia;
| | - Marija Rosandić
- University Hospital Centre Zagreb (Ret.), 10000 Zagreb, Croatia;
- Croatian Academy of Sciences and Arts, 10000 Zagreb, Croatia
| | - Vladimir Paar
- Faculty of Science, University of Zagreb, 10000 Zagreb, Croatia;
- Croatian Academy of Sciences and Arts, 10000 Zagreb, Croatia
| |
Collapse
|
2
|
Glunčić M, Vlahović I, Rosandić M, Paar V. Precise identification of cascading alpha satellite higher order repeats in T2T-CHM13 assembly of human chromosome 3. Croat Med J 2024; 65:209-219. [PMID: 38868967 PMCID: PMC11157248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Accepted: 05/28/2024] [Indexed: 06/14/2024] Open
Abstract
AIM To precisely identify and analyze alpha-satellite higher-order repeats (HORs) in T2T-CHM13 assembly of human chromosome 3. METHODS From the recently sequenced complete T2T-CHM13 assembly of human chromosome 3, the precise alpha satellite HOR structure was computed by using the novel high-precision GRM2023 algorithm with global repeat map (GRM) and monomer distance (MD) diagrams. RESULTS The major alpha satellite HOR array in chromosome 3 revealed a novel cascading HOR, housing 17mer HOR copies with subfragments of periods 15 and 2. Within each row in the cascading HOR, the monomers were of different types, but different rows within the same cascading 17mer HOR contained more than one monomer of the same type. Each canonical 17mer HOR copy comprised 17 monomers belonging to 16 different monomer types. Another pronounced 10mer HOR array was of the regular Willard's type. CONCLUSION Our findings emphasize the complexity within the chromosome 3 centromere as well as deviations from expected highly regular patterns.
Collapse
Affiliation(s)
- Matko Glunčić
- Matko Glunčić, Department of Physics, Faculty of Science, University of Zagreb, Bijenička cesta 32, 10000 Zagreb, Croatia,
| | | | | | | |
Collapse
|
3
|
Logsdon GA, Rozanski AN, Ryabov F, Potapova T, Shepelev VA, Catacchio CR, Porubsky D, Mao Y, Yoo D, Rautiainen M, Koren S, Nurk S, Lucas JK, Hoekzema K, Munson KM, Gerton JL, Phillippy AM, Ventura M, Alexandrov IA, Eichler EE. The variation and evolution of complete human centromeres. Nature 2024; 629:136-145. [PMID: 38570684 PMCID: PMC11062924 DOI: 10.1038/s41586-024-07278-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Accepted: 03/07/2024] [Indexed: 04/05/2024]
Abstract
Human centromeres have been traditionally very difficult to sequence and assemble owing to their repetitive nature and large size1. As a result, patterns of human centromeric variation and models for their evolution and function remain incomplete, despite centromeres being among the most rapidly mutating regions2,3. Here, using long-read sequencing, we completely sequenced and assembled all centromeres from a second human genome and compared it to the finished reference genome4,5. We find that the two sets of centromeres show at least a 4.1-fold increase in single-nucleotide variation when compared with their unique flanks and vary up to 3-fold in size. Moreover, we find that 45.8% of centromeric sequence cannot be reliably aligned using standard methods owing to the emergence of new α-satellite higher-order repeats (HORs). DNA methylation and CENP-A chromatin immunoprecipitation experiments show that 26% of the centromeres differ in their kinetochore position by >500 kb. To understand evolutionary change, we selected six chromosomes and sequenced and assembled 31 orthologous centromeres from the common chimpanzee, orangutan and macaque genomes. Comparative analyses reveal a nearly complete turnover of α-satellite HORs, with characteristic idiosyncratic changes in α-satellite HORs for each species. Phylogenetic reconstruction of human haplotypes supports limited to no recombination between the short (p) and long (q) arms across centromeres and reveals that novel α-satellite HORs share a monophyletic origin, providing a strategy to estimate the rate of saltatory amplification and mutation of human centromeric DNA.
Collapse
Affiliation(s)
- Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Department of Genetics, Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Allison N Rozanski
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Fedor Ryabov
- Masters Program in National Research University Higher School of Economics, Moscow, Russia
| | - Tamara Potapova
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | | | - Claudia R Catacchio
- Department of Biosciences, Biotechnology and Environment, University of Bari Aldo Moro, Bari, Italy
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Yafei Mao
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - DongAhn Yoo
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Mikko Rautiainen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Institute for Molecular Medicine Finland (FIMM), Helsinki Institute of Life Science (HiLIFE), University of Helsinki, Helsinki, Finland
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Oxford Nanopore Technologies, Oxford, United Kingdom
| | - Julian K Lucas
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Mario Ventura
- Department of Biosciences, Biotechnology and Environment, University of Bari Aldo Moro, Bari, Italy
| | - Ivan A Alexandrov
- Department of Human Molecular Genetics and Biochemistry, Tel Aviv University, Tel Aviv, Israel
- Department of Anatomy and Anthropology, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
- Dan David Center for Human Evolution and Biohistory Research, Tel Aviv University, Tel Aviv, Israel
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
| |
Collapse
|
4
|
Glunčić M, Vlahović I, Rosandić M, Paar V. Novel Concept of Alpha Satellite Cascading Higher-Order Repeats (HORs) and Precise Identification of 15mer and 20mer Cascading HORs in Complete T2T-CHM13 Assembly of Human Chromosome 15. Int J Mol Sci 2024; 25:4395. [PMID: 38673983 PMCID: PMC11050224 DOI: 10.3390/ijms25084395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Revised: 04/08/2024] [Accepted: 04/11/2024] [Indexed: 04/28/2024] Open
Abstract
Unraveling the intricate centromere structure of human chromosomes holds profound implications, illuminating fundamental genetic mechanisms and potentially advancing our comprehension of genetic disorders and therapeutic interventions. This study rigorously identified and structurally analyzed alpha satellite higher-order repeats (HORs) within the centromere of human chromosome 15 in the complete T2T-CHM13 assembly using the high-precision GRM2023 algorithm. The most extensive alpha satellite HOR array in chromosome 15 reveals a novel cascading HOR, housing 429 15mer HOR copies, containing 4-, 7- and 11-monomer subfragments. Within each row of cascading HORs, all alpha satellite monomers are of distinct types, as in regular Willard's HORs. However, different HOR copies within the same cascading 15mer HOR contain more than one monomer of the same type. Each canonical 15mer HOR copy comprises 15 monomers belonging to only 9 different monomer types. Notably, 65% of the 429 15mer cascading HOR copies exhibit canonical structures, while 35% display variant configurations. Identified as the second most extensive alpha satellite HOR, another novel cascading HOR within human chromosome 15 encompasses 164 20mer HOR copies, each featuring two subfragments. Moreover, a distinct pattern emerges as interspersed 25mer/26mer structures differing from regular Willard's HORs and giving rise to a 34-monomer subfragment. Only a minor 18mer HOR array of 12 HOR copies is of the regular Willard's type. These revelations highlight the complexity within the chromosome 15 centromeric region, accentuating deviations from anticipated highly regular patterns and hinting at profound information encoding and functional potential within the human centromere.
Collapse
Affiliation(s)
- Matko Glunčić
- Faculty of Science, University of Zagreb, 10000 Zagreb, Croatia;
| | - Ines Vlahović
- Algebra LAB, Algebra University College, 10000 Zagreb, Croatia;
| | - Marija Rosandić
- Department of Internal Medicine, University Hospital Centre Zagreb, 10000 Zagreb, Croatia;
- Croatian Academy of Sciences and Arts, 10000 Zagreb, Croatia
| | - Vladimir Paar
- Faculty of Science, University of Zagreb, 10000 Zagreb, Croatia;
- Croatian Academy of Sciences and Arts, 10000 Zagreb, Croatia
| |
Collapse
|
5
|
Gambogi CW, Pandey N, Dawicki-McKenna JM, Arora UP, Liskovykh MA, Ma J, Lamelza P, Larionov V, Lampson MA, Logsdon GA, Dumont BL, Black BE. Centromere innovations within a mouse species. SCIENCE ADVANCES 2023; 9:eadi5764. [PMID: 37967185 PMCID: PMC10651114 DOI: 10.1126/sciadv.adi5764] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 10/13/2023] [Indexed: 11/17/2023]
Abstract
Mammalian centromeres direct faithful genetic inheritance and are typically characterized by regions of highly repetitive and rapidly evolving DNA. We focused on a mouse species, Mus pahari, that we found has evolved to house centromere-specifying centromere protein-A (CENP-A) nucleosomes at the nexus of a satellite repeat that we identified and termed π-satellite (π-sat), a small number of recruitment sites for CENP-B, and short stretches of perfect telomere repeats. One M. pahari chromosome, however, houses a radically divergent centromere harboring ~6 mega-base pairs of a homogenized π-sat-related repeat, π-satB, that contains >20,000 functional CENP-B boxes. There, CENP-B abundance promotes accumulation of microtubule-binding components of the kinetochore and a microtubule-destabilizing kinesin of the inner centromere. We propose that the balance of pro- and anti-microtubule binding by the new centromere is what permits it to segregate during cell division with high fidelity alongside the older ones whose sequence creates a markedly different molecular composition.
Collapse
Affiliation(s)
- Craig W. Gambogi
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, PA 19104, USA
- Penn Center for Genome Integrity, University of Pennsylvania, Philadelphia, PA 19104, USA
- Epigenetics Institute, University of Pennsylvania, Philadelphia, PA 19104, USA
- Biochemistry and Molecular Biophysics Graduate Group, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Nootan Pandey
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, PA 19104, USA
- Penn Center for Genome Integrity, University of Pennsylvania, Philadelphia, PA 19104, USA
- Epigenetics Institute, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Jennine M. Dawicki-McKenna
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, PA 19104, USA
- Penn Center for Genome Integrity, University of Pennsylvania, Philadelphia, PA 19104, USA
- Epigenetics Institute, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Uma P. Arora
- The Jackson Laboratory, Bar Harbor, ME 04609, USA
- Graduate School of Biomedical Sciences, Tufts University, Boston, MA 02111, USA
| | - Mikhail A. Liskovykh
- Developmental Therapeutics Branch, National Cancer Institute, Bethesda, MD 20892, USA
| | - Jun Ma
- Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Piero Lamelza
- Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Vladimir Larionov
- Developmental Therapeutics Branch, National Cancer Institute, Bethesda, MD 20892, USA
| | - Michael A. Lampson
- Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Glennis A. Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Beth L. Dumont
- The Jackson Laboratory, Bar Harbor, ME 04609, USA
- Graduate School of Biomedical Sciences, Tufts University, Boston, MA 02111, USA
- Graduate School of Biomedical Science and Engineering, University of Maine, Orono, ME 04469, USA
| | - Ben E. Black
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, PA 19104, USA
- Penn Center for Genome Integrity, University of Pennsylvania, Philadelphia, PA 19104, USA
- Epigenetics Institute, University of Pennsylvania, Philadelphia, PA 19104, USA
- Biochemistry and Molecular Biophysics Graduate Group, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
6
|
Bzikadze AV, Pevzner PA. UniAligner: a parameter-free framework for fast sequence alignment. Nat Methods 2023; 20:1346-1354. [PMID: 37580559 DOI: 10.1038/s41592-023-01970-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Accepted: 07/05/2023] [Indexed: 08/16/2023]
Abstract
Even though the recent advances in 'complete genomics' revealed the previously inaccessible genomic regions, analysis of variations in centromeres and other extra-long tandem repeats (ETRs) faces an algorithmic challenge since there are currently no tools for accurate sequence comparison of ETRs. Counterintuitively, the classical alignment approaches, such as the Smith-Waterman algorithm, fail to construct biologically adequate alignments of ETRs. We present UniAligner-the parameter-free sequence alignment algorithm with sequence-dependent alignment scoring that automatically changes for any pair of compared sequences. UniAligner prioritizes matches of rare substrings that are more likely to be relevant to the evolutionary relationship between two sequences. We apply UniAligner to estimate the mutation rates in human centromeres, and quantify the extremely high rate of large duplications and deletions in centromeres. This high rate suggests that centromeres may represent some of the most rapidly evolving regions of the human genome with respect to their structural organization.
Collapse
Affiliation(s)
- Andrey V Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California, San Diego, La Jolla, CA, USA
| | - Pavel A Pevzner
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA.
| |
Collapse
|
7
|
Logsdon GA, Rozanski AN, Ryabov F, Potapova T, Shepelev VA, Mao Y, Rautiainen M, Koren S, Nurk S, Porubsky D, Lucas JK, Hoekzema K, Munson KM, Gerton JL, Phillippy AM, Alexandrov IA, Eichler EE. The variation and evolution of complete human centromeres. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.30.542849. [PMID: 37398417 PMCID: PMC10312506 DOI: 10.1101/2023.05.30.542849] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
We completely sequenced and assembled all centromeres from a second human genome and used two reference sets to benchmark genetic, epigenetic, and evolutionary variation within centromeres from a diversity panel of humans and apes. We find that centromere single-nucleotide variation can increase by up to 4.1-fold relative to other genomic regions, with the caveat that up to 45.8% of centromeric sequence, on average, cannot be reliably aligned with current methods due to the emergence of new α-satellite higher-order repeat (HOR) structures and two to threefold differences in the length of the centromeres. The extent to which this occurs differs depending on the chromosome and haplotype. Comparing the two sets of complete human centromeres, we find that eight harbor distinctly different α-satellite HOR array structures and four contain novel α-satellite HOR variants in high abundance. DNA methylation and CENP-A chromatin immunoprecipitation experiments show that 26% of the centromeres differ in their kinetochore position by at least 500 kbp-a property not readily associated with novel α-satellite HORs. To understand evolutionary change, we selected six chromosomes and sequenced and assembled 31 orthologous centromeres from the common chimpanzee, orangutan, and macaque genomes. Comparative analyses reveal nearly complete turnover of α-satellite HORs, but with idiosyncratic changes in structure characteristic to each species. Phylogenetic reconstruction of human haplotypes supports limited to no recombination between the p- and q-arms of human chromosomes and reveals that novel α-satellite HORs share a monophyletic origin, providing a strategy to estimate the rate of saltatory amplification and mutation of human centromeric DNA.
Collapse
Affiliation(s)
- Glennis A. Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Allison N. Rozanski
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Fedor Ryabov
- Masters Program in National Research University Higher School of Economics, Moscow, Russia
| | - Tamara Potapova
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | | | - Yafei Mao
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Mikko Rautiainen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Julian K. Lucas
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Katherine M. Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Adam M. Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Ivan A. Alexandrov
- Department of Human Molecular Genetics and Biochemistry, Tel Aviv University, Tel Aviv, Israel
- Department of Anatomy and Anthropology, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
- Dan David Center for Human Evolution and Biohistory Research, Tel Aviv University, Tel Aviv, Israel
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
8
|
Logsdon GA, Eichler EE. The Dynamic Structure and Rapid Evolution of Human Centromeric Satellite DNA. Genes (Basel) 2022; 14:92. [PMID: 36672831 PMCID: PMC9859433 DOI: 10.3390/genes14010092] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 12/22/2022] [Accepted: 12/24/2022] [Indexed: 12/31/2022] Open
Abstract
The complete sequence of a human genome provided our first comprehensive view of the organization of satellite DNA associated with heterochromatin. We review how our understanding of the genetic architecture and epigenetic properties of human centromeric DNA have advanced as a result. Preliminary studies of human and nonhuman ape centromeres reveal complex, saltatory mutational changes organized around distinct evolutionary layers. Pockets of regional hypomethylation within higher-order α-satellite DNA, termed centromere dip regions, appear to define the site of kinetochore attachment in all human chromosomes, although such epigenetic features can vary even within the same chromosome. Sequence resolution of satellite DNA is providing new insights into centromeric function with potential implications for improving our understanding of human biology and health.
Collapse
Affiliation(s)
- Glennis A. Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
9
|
Kunyavskaya O, Dvorkina T, Bzikadze AV, Alexandrov I, Pevzner PA. Automated annotation of human centromeres with HORmon. Genome Res 2022; 32:1137-1151. [PMID: 35545449 DOI: 10.1101/gr.276362.121] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Accepted: 05/06/2022] [Indexed: 11/24/2022]
Abstract
Recent advances in long-read sequencing opened a possibility to address the long-standing questions about the architecture and evolution of human centromeres. They also emphasized the need for centromere annotation (partitioning human centromeres into monomers and higher-order repeats (HORs)). Even though there was a half-century-long series of semi-manual studies of centromere architecture, a rigorous centromere annotation algorithm is still lacking. Moreover, an automated centromere annotation is a prerequisite for studies of genetic diseases associated with centromeres, and evolutionary studies of centromeres across multiple species. Although the monomer decomposition (transforming a centromere into a monocentromere written in the monomer alphabet) and the HOR decomposition (representing a monocentromere in the alphabet of HORs) are currently viewed as two separate problems, we demonstrate that they should be integrated into a single framework in such a way that HOR (monomer) inference affects monomer (HOR) inference. We thus developed the HORmon algorithm that integrates the monomer/HOR inference and automatically generates the human monomers/HORs that are largely consistent with the previous semi-manual inference.
Collapse
Affiliation(s)
- Olga Kunyavskaya
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University
| | - Tatiana Dvorkina
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University
| | | | - Ivan Alexandrov
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University
| | | |
Collapse
|
10
|
Kermi C, Lau L, Asadi Shahmirzadi A, Classon M. Disrupting Mechanisms that Regulate Genomic Repeat Elements to Combat Cancer and Drug Resistance. Front Cell Dev Biol 2022; 10:826461. [PMID: 35602594 PMCID: PMC9114874 DOI: 10.3389/fcell.2022.826461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Accepted: 03/30/2022] [Indexed: 11/13/2022] Open
Abstract
Despite advancements in understanding cancer pathogenesis and the development of many effective therapeutic agents, resistance to drug treatment remains a widespread challenge that substantially limits curative outcomes. The historical focus on genetic evolution under drug “pressure” as a key driver of resistance has uncovered numerous mechanisms of therapeutic value, especially with respect to acquired resistance. However, recent discoveries have also revealed a potential role for an ancient evolutionary balance between endogenous “viral” elements in the human genome and diverse factors involved in their restriction in tumor evolution and drug resistance. It has long been appreciated that the stability of genomic repeats such as telomeres and centromeres affect tumor fitness, but recent findings suggest that de-regulation of other repetitive genome elements, including retrotransposons, might also be exploited as cancer therapy. This review aims to present an overview of these recent findings.
Collapse
|
11
|
Altemose N, Glennis A, Bzikadze AV, Sidhwani P, Langley SA, Caldas GV, Hoyt SJ, Uralsky L, Ryabov FD, Shew CJ, Sauria MEG, Borchers M, Gershman A, Mikheenko A, Shepelev VA, Dvorkina T, Kunyavskaya O, Vollger MR, Rhie A, McCartney AM, Asri M, Lorig-Roach R, Shafin K, Aganezov S, Olson D, de Lima LG, Potapova T, Hartley GA, Haukness M, Kerpedjiev P, Gusev F, Tigyi K, Brooks S, Young A, Nurk S, Koren S, Salama SR, Paten B, Rogaev EI, Streets A, Karpen GH, Dernburg AF, Sullivan BA, Straight AF, Wheeler TJ, Gerton JL, Eichler EE, Phillippy AM, Timp W, Dennis MY, O'Neill RJ, Zook JM, Schatz MC, Pevzner PA, Diekhans M, Langley CH, Alexandrov IA, Miga KH. Complete genomic and epigenetic maps of human centromeres. Science 2022; 376:eabl4178. [PMID: 35357911 PMCID: PMC9233505 DOI: 10.1126/science.abl4178] [Citation(s) in RCA: 270] [Impact Index Per Article: 90.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Existing human genome assemblies have almost entirely excluded repetitive sequences within and near centromeres, limiting our understanding of their organization, evolution, and functions, which include facilitating proper chromosome segregation. Now, a complete, telomere-to-telomere human genome assembly (T2T-CHM13) has enabled us to comprehensively characterize pericentromeric and centromeric repeats, which constitute 6.2% of the genome (189.9 megabases). Detailed maps of these regions revealed multimegabase structural rearrangements, including in active centromeric repeat arrays. Analysis of centromere-associated sequences uncovered a strong relationship between the position of the centromere and the evolution of the surrounding DNA through layered repeat expansions. Furthermore, comparisons of chromosome X centromeres across a diverse panel of individuals illuminated high degrees of structural, epigenetic, and sequence variation in these complex and rapidly evolving regions.
Collapse
Affiliation(s)
- Nicolas Altemose
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
| | - A. Glennis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Andrey V. Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California San Diego, La Jolla, CA, USA
| | - Pragya Sidhwani
- Department of Biochemistry, Stanford University, Stanford, CA, USA
| | - Sasha A. Langley
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Gina V. Caldas
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Savannah J. Hoyt
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Lev Uralsky
- Sirius University of Science and Technology, Sochi, Russia
- Vavilov Institute of General Genetics, Moscow, Russia
| | | | - Colin J. Shew
- Genome Center, MIND Institute, and Department of Biochemistry and Molecular Medicine, School of Medicine, University of California, Davis, Davis, CA, USA
| | | | | | - Ariel Gershman
- Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, MD, USA
| | - Alla Mikheenko
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia
| | | | - Tatiana Dvorkina
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia
| | - Olga Kunyavskaya
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia
| | - Mitchell R. Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Ann M. McCartney
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Mobin Asri
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Ryan Lorig-Roach
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Kishwar Shafin
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Sergey Aganezov
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Daniel Olson
- Department of Computer Science, University of Montana, Missoula, MT. USA
| | | | - Tamara Potapova
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Gabrielle A. Hartley
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Marina Haukness
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | | | - Fedor Gusev
- Vavilov Institute of General Genetics, Moscow, Russia
| | - Kristof Tigyi
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Shelise Brooks
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Alice Young
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sofie R. Salama
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
- Department of Biomolecular Engineering, University of California Santa Cruz, CA, USA
| | - Evgeny I. Rogaev
- Sirius University of Science and Technology, Sochi, Russia
- Vavilov Institute of General Genetics, Moscow, Russia
- Department of Psychiatry, University of Massachusetts Medical School, Worcester, MA, USA
- Faculty of Biology, Lomonosov Moscow State University, Moscow, Russia
| | - Aaron Streets
- Department of Bioengineering, University of California, Berkeley, Berkeley, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Gary H. Karpen
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
- BioEngineering and BioMedical Sciences Department, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Abby F. Dernburg
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
- Institute for Quantitative Biosciences (QB3), University of California, Berkeley, Berkeley, CA, USA
| | - Beth A. Sullivan
- Department of Molecular Genetics and Microbiology, Duke University School of Medicine, Durham, NC, USA
| | | | - Travis J. Wheeler
- Department of Computer Science, University of Montana, Missoula, MT. USA
| | - Jennifer L. Gerton
- Stowers Institute for Medical Research, Kansas City, MO, USA
- University of Kansas Medical School, Department of Biochemistry and Molecular Biology and Cancer Center, University of Kansas, Kansas City, KS, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Adam M. Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Winston Timp
- Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Megan Y. Dennis
- Genome Center, MIND Institute, and Department of Biochemistry and Molecular Medicine, School of Medicine, University of California, Davis, Davis, CA, USA
| | - Rachel J. O'Neill
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Justin M. Zook
- Biosystems and Biomaterials Division, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Michael C. Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Pavel A. Pevzner
- Department of Computer Science and Engineering, University of California at San Diego, San Diego, CA, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Charles H. Langley
- Department of Evolution and Ecology, University of California Davis, Davis, CA, USA
| | - Ivan A. Alexandrov
- Vavilov Institute of General Genetics, Moscow, Russia
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia
- Research Center of Biotechnology of the Russian Academy of Sciences, Moscow, Russia
| | - Karen H. Miga
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
- Department of Biomolecular Engineering, University of California Santa Cruz, CA, USA
| |
Collapse
|
12
|
Abstract
We are entering a new era in genomics where entire centromeric regions are accurately represented in human reference assemblies. Access to these high-resolution maps will enable new surveys of sequence and epigenetic variation in the population and offer new insight into satellite array genomics and centromere function. Here, we focus on the sequence organization and evolution of alpha satellites, which are credited as the genetic and genomic definition of human centromeres due to their interaction with inner kinetochore proteins and their importance in the development of human artificial chromosome assays. We provide an overview of alpha satellite repeat structure and array organization in the context of these high-quality reference data sets; discuss the emergence of variation-based surveys; and provide perspective on the role of this new source of genetic and epigenetic variation in the context of chromosome biology, genome instability, and human disease.
Collapse
Affiliation(s)
- Karen H Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, California 95064, USA; .,Department of Biomolecular Engineering, University of California, Santa Cruz, California 95064, USA
| | - Ivan A Alexandrov
- Department of Genomics and Human Genetics, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119991, Russia; .,Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg 199004, Russia.,Research Center of Biotechnology of the Russian Academy of Sciences, Moscow 119071, Russia
| |
Collapse
|
13
|
Valeri MP, Dias GB, do Espírito Santo AA, Moreira CN, Yonenaga-Yassuda Y, Sommer IB, Kuhn GCS, Svartman M. First Description of a Satellite DNA in Manatees' Centromeric Regions. Front Genet 2021; 12:694866. [PMID: 34504514 PMCID: PMC8421680 DOI: 10.3389/fgene.2021.694866] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Accepted: 07/30/2021] [Indexed: 11/18/2022] Open
Abstract
Trichechus manatus and Trichechus inunguis are the two Sirenia species that occur in the Americas. Despite their increasing extinction risk, many aspects of their biology remain understudied, including the repetitive DNA fraction of their genomes. Here we used the sequenced genome of T. manatus and TAREAN to identify satellite DNAs (satDNAs) in this species. We report the first description of TMAsat, a satDNA comprising ~0.87% of the genome, with ~684bp monomers and centromeric localization. In T. inunguis, TMAsat showed similar monomer length, chromosome localization and conserved CENP-B box-like motifs as in T. manatus. We also detected this satDNA in the Dugong dugon and in the now extinct Hydrodamalis gigas genomes. The neighbor-joining tree shows that TMAsat sequences from T. manatus, T. inunguis, D. dugon, and H. gigas lack species-specific clusters, which disagrees with the predictions of concerted evolution. We detected a divergent TMAsat-like homologous sequence in elephants and hyraxes, but not in other mammals, suggesting this sequence was already present in the common ancestor of Paenungulata, and later became a satDNA in the Sirenians. This is the first description of a centromeric satDNA in manatees and will facilitate the inclusion of Sirenia in future studies of centromeres and satDNA biology.
Collapse
Affiliation(s)
- Mirela Pelizaro Valeri
- Laboratório de Citogenômica Evolutiva, Departamento de Genética, Ecologia e Evolução, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Guilherme Borges Dias
- Department of Genetics and Institute of Bioinformatics, University of Georgia, Athens, GA, United States
| | - Alice Alves do Espírito Santo
- Laboratório de Citogenômica Evolutiva, Departamento de Genética, Ecologia e Evolução, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Camila Nascimento Moreira
- Departamento de Genética e Biologia Evolutiva, Instituto de Biociências, Universidade de São Paulo, São Paulo, Brazil
| | - Yatiyo Yonenaga-Yassuda
- Departamento de Genética e Biologia Evolutiva, Instituto de Biociências, Universidade de São Paulo, São Paulo, Brazil
| | - Iara Braga Sommer
- Centro Nacional de Pesquisa e Conservação da Biodiversidade Marinha do Nordeste, Instituto Chico Mendes de Conservação da Biodiversidade, Brasília, Brazil
| | - Gustavo C. S. Kuhn
- Laboratório de Citogenômica Evolutiva, Departamento de Genética, Ecologia e Evolução, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Marta Svartman
- Laboratório de Citogenômica Evolutiva, Departamento de Genética, Ecologia e Evolução, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| |
Collapse
|
14
|
Dvorkina T, Kunyavskaya O, Bzikadze AV, Alexandrov I, Pevzner PA. CentromereArchitect: inference and analysis of the architecture of centromeres. Bioinformatics 2021; 37:i196-i204. [PMID: 34252949 PMCID: PMC8336445 DOI: 10.1093/bioinformatics/btab265] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Motivation Recent advances in long-read sequencing technologies led to rapid progress in centromere assembly in the last year and, for the first time, opened a possibility to address the long-standing questions about the architecture and evolution of human centromeres. However, since these advances have not been yet accompanied by the development of the centromere-specific bioinformatics algorithms, even the fundamental questions (e.g. centromere annotation by deriving the complete set of human monomers and high-order repeats), let alone more complex questions (e.g. explaining how monomers and high-order repeats evolved) about human centromeres remain open. Moreover, even though there was a four-decade-long series of studies aimed at cataloging all human monomers and high-order repeats, the rigorous algorithmic definitions of these concepts are still lacking. Thus, the development of a centromere annotation tool is a prerequisite for follow-up personalized biomedical studies of centromeres across the human population and evolutionary studies of centromeres across various species. Results We describe the CentromereArchitect, the first tool for the centromere annotation in a newly sequenced genome, apply it to the recently generated complete assembly of a human genome by the Telomere-to-Telomere consortium, generate the complete set of human monomers and high-order repeats for ‘live’ centromeres, and reveal a vast set of hybrid monomers that may represent the focal points of centromere evolution. Availability and implementation CentromereArchitect is publicly available on https://github.com/ablab/stringdecomposer/tree/ismb2021 Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tatiana Dvorkina
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg 199034, Russia
| | - Olga Kunyavskaya
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg 199034, Russia
| | - Andrey V Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California, San Diego, CA 92093, USA
| | - Ivan Alexandrov
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg 199034, Russia
| | - Pavel A Pevzner
- Department of Computer Science and Engineering, University of California, San Diego, CA 92093, USA
| |
Collapse
|
15
|
Tunjić-Cvitanić M, Pasantes JJ, García-Souto D, Cvitanić T, Plohl M, Šatović-Vukšić E. Satellitome Analysis of the Pacific Oyster Crassostrea gigas Reveals New Pattern of Satellite DNA Organization, Highly Scattered across the Genome. Int J Mol Sci 2021; 22:ijms22136798. [PMID: 34202698 PMCID: PMC8268682 DOI: 10.3390/ijms22136798] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Revised: 06/18/2021] [Accepted: 06/19/2021] [Indexed: 12/22/2022] Open
Abstract
Several features already qualified the invasive bivalve species Crassostrea gigas as a valuable non-standard model organism in genome research. C. gigas is characterized by the low contribution of satellite DNAs (satDNAs) vs. mobile elements and has an extremely low amount of heterochromatin, predominantly built of DNA transposons. In this work, we have identified 52 satDNAs composing the satellitome of C. gigas and constituting about 6.33% of the genome. Satellitome analysis reveals unusual, highly scattered organization of relatively short satDNA arrays across the whole genome. However, peculiar chromosomal distribution and densities are specific for each satDNA. The inspection of the organizational forms of the 11 most abundant satDNAs shows association with constitutive parts of Helitron mobile elements. Nine of the inspected satDNAs are dominantly found in mobile element-associated form, two mostly appear standalone, and only one is present exclusively as Helitron-associated sequence. The Helitron-related satDNAs appear in more chromosomes than other satDNAs, indicating that these mobile elements could be leading satDNA propagation in C. gigas. No significant accumulation of satDNAs on certain chromosomal positions was detected in C. gigas, thus establishing a novel pattern of satDNA organization on the genome level.
Collapse
Affiliation(s)
- Monika Tunjić-Cvitanić
- Division of Molecular Biology, Ruđer Bošković Institute, 10000 Zagreb, Croatia; (M.T.-C.); (M.P.)
| | - Juan J. Pasantes
- Centro de Investigación Mariña, Universidade de Vigo, Dpto de Bioquímica, Xenética e Inmunoloxía, 36310 Vigo, Spain;
| | - Daniel García-Souto
- Genomes and Disease, Centre for Research in Molecular Medicine and Chronic Diseases (CIMUS), Universidade de Santiago de Compostela, 15706 Santiago de Compostela, Spain;
- Department of Zoology, Genetics and Physical Anthropology, Universidade de Santiago de Compostela, 15706 Santiago de Compostela, Spain
| | - Tonči Cvitanić
- Rimac Automobili d.o.o., Ljubljanska ulica 7, 10431 Sveta Nedelja, Croatia;
| | - Miroslav Plohl
- Division of Molecular Biology, Ruđer Bošković Institute, 10000 Zagreb, Croatia; (M.T.-C.); (M.P.)
| | - Eva Šatović-Vukšić
- Division of Molecular Biology, Ruđer Bošković Institute, 10000 Zagreb, Croatia; (M.T.-C.); (M.P.)
- Correspondence:
| |
Collapse
|
16
|
Hartley GA, Okhovat M, O'Neill RJ, Carbone L. Comparative analyses of gibbon centromeres reveal dynamic genus specific shifts in repeat composition. Mol Biol Evol 2021; 38:3972-3992. [PMID: 33983366 PMCID: PMC8382927 DOI: 10.1093/molbev/msab148] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Centromeres are functionally conserved chromosomal loci essential for proper chromosome segregation during cell division, yet they show high sequence diversity across species. Despite their variation, a near universal feature of centromeres is the presence of repetitive sequences, such as DNA satellites and transposable elements (TEs). Because of their rapidly evolving karyotypes, gibbons represent a compelling model to investigate divergence of functional centromere sequences across short evolutionary timescales. In this study, we use ChIP-seq, RNA-seq, and fluorescence in situ hybridization to comprehensively investigate the centromeric repeat content of the four extant gibbon genera (Hoolock, Hylobates, Nomascus, and Siamang). In all gibbon genera, we find that CENP-A nucleosomes and the DNA-proteins that interface with the inner kinetochore preferentially bind retroelements of broad classes rather than satellite DNA. A previously identified gibbon-specific composite retrotransposon, LAVA, known to be expanded within the centromere regions of one gibbon genus (Hoolock), displays centromere- and species-specific sequence differences, potentially as a result of its co-option to a centromeric function. When dissecting centromere satellite composition, we discovered the presence of the retroelement-derived macrosatellite SST1 in multiple centromeres of Hoolock, whereas alpha-satellites represent the predominate satellite in the other genera, further suggesting an independent evolutionary trajectory for Hoolock centromeres. Finally, using de novo assembly of centromere sequences, we determined that transcripts originating from gibbon centromeres recapitulate the species-specific TE composition. Combined, our data reveal dynamic shifts in the repeat content that define gibbon centromeres and coincide with the extensive karyotypic diversity within this lineage.
Collapse
Affiliation(s)
- Gabrielle A Hartley
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, 06269
| | - Mariam Okhovat
- Department of Medicine, Knight Cardiovascular Institute, Oregon Health and Science University, Portland, OR, 97239
| | - Rachel J O'Neill
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, 06269.,Institute for Systems Genomics, University of Connecticut, Storrs, CT, 06269.,Department of Genomics and Genome Sciences, UConn Health, Farmington, CT, 06030
| | - Lucia Carbone
- Department of Medicine, Knight Cardiovascular Institute, Oregon Health and Science University, Portland, OR, 97239.,Division of Genetics, Oregon National Primate Research Center, Beaverton, OR, 97006.,Department of Molecular and Medical Genetics, Oregon Health and Science University, Portland, OR, 97239.,Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Science University, Portland, OR, 97239
| |
Collapse
|
17
|
Dvorkina T, Bzikadze AV, Pevzner PA. The string decomposition problem and its applications to centromere analysis and assembly. Bioinformatics 2021; 36:i93-i101. [PMID: 32657390 PMCID: PMC7428072 DOI: 10.1093/bioinformatics/btaa454] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Motivation Recent attempts to assemble extra-long tandem repeats (such as centromeres) faced the challenge of translating long error-prone reads from the nucleotide alphabet into the alphabet of repeat units. Human centromeres represent a particularly complex type of high-order repeats (HORs) formed by chromosome-specific monomers. Given a set of all human monomers, translating a read from a centromere into the monomer alphabet is modeled as the String Decomposition Problem. The accurate translation of reads into the monomer alphabet turns the notoriously difficult problem of assembling centromeres from reads (in the nucleotide alphabet) into a more tractable problem of assembling centromeres from translated reads. Results We describe a StringDecomposer (SD) algorithm for solving this problem, benchmark it on the set of long error-prone Oxford Nanopore reads generated by the Telomere-to-Telomere consortium and identify a novel (rare) monomer that extends the set of known X-chromosome specific monomers. Our identification of a novel monomer emphasizes the importance of identification of all (even rare) monomers for future centromere assembly efforts and evolutionary studies. To further analyze novel monomers, we applied SD to the set of recently generated long accurate Pacific Biosciences HiFi reads. This analysis revealed that the set of known human monomers and HORs remains incomplete. SD opens a possibility to generate a complete set of human monomers and HORs for using in the ongoing efforts to generate the complete assembly of the human genome. Availability and implementation StringDecomposer is publicly available on https://github.com/ablab/stringdecomposer. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tatiana Dvorkina
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg 199034, Russia
| | - Andrey V Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California, San Diego, CA 92093, USA
| | - Pavel A Pevzner
- Department of Computer Science and Engineering, University of California, San Diego, CA 92093, USA
| |
Collapse
|
18
|
The structure, function and evolution of a complete human chromosome 8. Nature 2021; 593:101-107. [PMID: 33828295 PMCID: PMC8099727 DOI: 10.1038/s41586-021-03420-7] [Citation(s) in RCA: 204] [Impact Index Per Article: 51.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2020] [Accepted: 03/04/2021] [Indexed: 02/07/2023]
Abstract
The complete assembly of each human chromosome is essential for understanding human biology and evolution1,2. Here we use complementary long-read sequencing technologies to complete the linear assembly of human chromosome 8. Our assembly resolves the sequence of five previously long-standing gaps, including a 2.08-Mb centromeric α-satellite array, a 644-kb copy number polymorphism in the β-defensin gene cluster that is important for disease risk, and an 863-kb variable number tandem repeat at chromosome 8q21.2 that can function as a neocentromere. We show that the centromeric α-satellite array is generally methylated except for a 73-kb hypomethylated region of diverse higher-order α-satellites enriched with CENP-A nucleosomes, consistent with the location of the kinetochore. In addition, we confirm the overall organization and methylation pattern of the centromere in a diploid human genome. Using a dual long-read sequencing approach, we complete high-quality draft assemblies of the orthologous centromere from chromosome 8 in chimpanzee, orangutan and macaque to reconstruct its evolutionary history. Comparative and phylogenetic analyses show that the higher-order α-satellite structure evolved in the great ape ancestor with a layered symmetry, in which more ancient higher-order repeats locate peripherally to monomeric α-satellites. We estimate that the mutation rate of centromeric satellite DNA is accelerated by more than 2.2-fold compared to the unique portions of the genome, and this acceleration extends into the flanking sequence.
Collapse
|
19
|
Ahmad SF, Singchat W, Jehangir M, Suntronpong A, Panthum T, Malaivijitnond S, Srikulnath K. Dark Matter of Primate Genomes: Satellite DNA Repeats and Their Evolutionary Dynamics. Cells 2020; 9:E2714. [PMID: 33352976 PMCID: PMC7767330 DOI: 10.3390/cells9122714] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Revised: 12/15/2020] [Accepted: 12/16/2020] [Indexed: 12/12/2022] Open
Abstract
A substantial portion of the primate genome is composed of non-coding regions, so-called "dark matter", which includes an abundance of tandemly repeated sequences called satellite DNA. Collectively known as the satellitome, this genomic component offers exciting evolutionary insights into aspects of primate genome biology that raise new questions and challenge existing paradigms. A complete human reference genome was recently reported with telomere-to-telomere human X chromosome assembly that resolved hundreds of dark regions, encompassing a 3.1 Mb centromeric satellite array that had not been identified previously. With the recent exponential increase in the availability of primate genomes, and the development of modern genomic and bioinformatics tools, extensive growth in our knowledge concerning the structure, function, and evolution of satellite elements is expected. The current state of knowledge on this topic is summarized, highlighting various types of primate-specific satellite repeats to compare their proportions across diverse lineages. Inter- and intraspecific variation of satellite repeats in the primate genome are reviewed. The functional significance of these sequences is discussed by describing how the transcriptional activity of satellite repeats can affect gene expression during different cellular processes. Sex-linked satellites are outlined, together with their respective genomic organization. Mechanisms are proposed whereby satellite repeats might have emerged as novel sequences during different evolutionary phases. Finally, the main challenges that hinder the detection of satellite DNA are outlined and an overview of the latest methodologies to address technological limitations is presented.
Collapse
Affiliation(s)
- Syed Farhan Ahmad
- Laboratory of Animal Cytogenetics and Comparative Genomics (ACCG), Department of Genetics, Faculty of Science, Kasetsart University, Bangkok 10900, Thailand; (S.F.A.); (W.S.); (M.J.); (A.S.); (T.P.)
- Special Research Unit for Wildlife Genomics (SRUWG), Department of Forest Biology, Faculty of Forestry, Kasetsart University, Bangkok 10900, Thailand
| | - Worapong Singchat
- Laboratory of Animal Cytogenetics and Comparative Genomics (ACCG), Department of Genetics, Faculty of Science, Kasetsart University, Bangkok 10900, Thailand; (S.F.A.); (W.S.); (M.J.); (A.S.); (T.P.)
- Special Research Unit for Wildlife Genomics (SRUWG), Department of Forest Biology, Faculty of Forestry, Kasetsart University, Bangkok 10900, Thailand
| | - Maryam Jehangir
- Laboratory of Animal Cytogenetics and Comparative Genomics (ACCG), Department of Genetics, Faculty of Science, Kasetsart University, Bangkok 10900, Thailand; (S.F.A.); (W.S.); (M.J.); (A.S.); (T.P.)
- Department of Structural and Functional Biology, Institute of Bioscience at Botucatu, São Paulo State University (UNESP), Botucatu, São Paulo 18618-689, Brazil
| | - Aorarat Suntronpong
- Laboratory of Animal Cytogenetics and Comparative Genomics (ACCG), Department of Genetics, Faculty of Science, Kasetsart University, Bangkok 10900, Thailand; (S.F.A.); (W.S.); (M.J.); (A.S.); (T.P.)
- Special Research Unit for Wildlife Genomics (SRUWG), Department of Forest Biology, Faculty of Forestry, Kasetsart University, Bangkok 10900, Thailand
| | - Thitipong Panthum
- Laboratory of Animal Cytogenetics and Comparative Genomics (ACCG), Department of Genetics, Faculty of Science, Kasetsart University, Bangkok 10900, Thailand; (S.F.A.); (W.S.); (M.J.); (A.S.); (T.P.)
- Special Research Unit for Wildlife Genomics (SRUWG), Department of Forest Biology, Faculty of Forestry, Kasetsart University, Bangkok 10900, Thailand
| | - Suchinda Malaivijitnond
- National Primate Research Center of Thailand, Chulalongkorn University, Saraburi 18110, Thailand;
- Department of Biology, Faculty of Science, Chulalongkorn University, Bangkok 10330, Thailand
| | - Kornsorn Srikulnath
- Laboratory of Animal Cytogenetics and Comparative Genomics (ACCG), Department of Genetics, Faculty of Science, Kasetsart University, Bangkok 10900, Thailand; (S.F.A.); (W.S.); (M.J.); (A.S.); (T.P.)
- Special Research Unit for Wildlife Genomics (SRUWG), Department of Forest Biology, Faculty of Forestry, Kasetsart University, Bangkok 10900, Thailand
- National Primate Research Center of Thailand, Chulalongkorn University, Saraburi 18110, Thailand;
- Center of Excellence on Agricultural Biotechnology (AG-BIO/PERDO-CHE), Bangkok 10900, Thailand
- Omics Center for Agriculture, Bioresources, Food and Health, Kasetsart University (OmiKU), Bangkok 10900, Thailand
| |
Collapse
|
20
|
de Lima LG, Hanlon SL, Gerton JL. Origins and Evolutionary Patterns of the 1.688 Satellite DNA Family in Drosophila Phylogeny. G3 (BETHESDA, MD.) 2020; 10:4129-4146. [PMID: 32934018 PMCID: PMC7642928 DOI: 10.1534/g3.120.401727] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/06/2020] [Accepted: 09/09/2020] [Indexed: 12/11/2022]
Abstract
Satellite DNAs (satDNAs) are a ubiquitous feature of eukaryotic genomes and are usually the major components of constitutive heterochromatin. The 1.688 satDNA, also known as the 359 bp satellite, is one of the most abundant repetitive sequences in Drosophila melanogaster and has been linked to several different biological functions. We investigated the presence and evolution of the 1.688 satDNA in 16 Drosophila genomes. We find that the 1.688 satDNA family is much more ancient than previously appreciated, being shared among part of the melanogaster group that diverged from a common ancestor ∼27 Mya. We found that the 1.688 satDNA family has two major subfamilies spread throughout Drosophila phylogeny (∼360 bp and ∼190 bp). Phylogenetic analysis of ∼10,000 repeats extracted from 14 of the species revealed that the 1.688 satDNA family is present within heterochromatin and euchromatin. A high number of euchromatic repeats are gene proximal, suggesting the potential for local gene regulation. Notably, heterochromatic copies display concerted evolution and a species-specific pattern, whereas euchromatic repeats display a more typical evolutionary pattern, suggesting that chromatin domains may influence the evolution of these sequences. Overall, our data indicate the 1.688 satDNA as the most perduring satDNA family described in Drosophila phylogeny to date. Our study provides a strong foundation for future work on the functional roles of 1.688 satDNA across many Drosophila species.
Collapse
Affiliation(s)
| | - Stacey L Hanlon
- Stowers Institute for Medical Research, Kansas City, Missouri 64110
| | | |
Collapse
|
21
|
Bzikadze AV, Pevzner PA. Automated assembly of centromeres from ultra-long error-prone reads. Nat Biotechnol 2020; 38:1309-1316. [PMID: 32665660 PMCID: PMC10718184 DOI: 10.1038/s41587-020-0582-4] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2019] [Accepted: 05/29/2020] [Indexed: 12/12/2022]
Abstract
Centromeric variation has been linked to cancer and infertility, but centromere sequences contain multiple tandem repeats and can only be assembled manually from long error-prone reads. Here we describe the centroFlye algorithm for centromere assembly using long error-prone reads, and apply it to assemble human centromeres on chromosomes 6 and X. Our analyses reveal putative breakpoints in the manual reconstruction of the human X centromere, demonstrate that human X chromosome is partitioned into repeat subfamilies and provide initial insights into centromere evolution. We anticipate that centroFlye could be applied to automatically close remaining multimegabase gaps in the reference human genome.
Collapse
Affiliation(s)
- Andrey V Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California San Diego, La Jolla, CA, USA
| | - Pavel A Pevzner
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
22
|
Valeri MP, Dias GB, Moreira CN, Yonenaga-Yassuda Y, Stanyon R, Kuhn GCES, Svartman M. Characterization of Satellite DNAs in Squirrel Monkeys genus Saimiri (Cebidae, Platyrrhini). Sci Rep 2020; 10:7783. [PMID: 32385398 PMCID: PMC7210261 DOI: 10.1038/s41598-020-64620-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Accepted: 04/15/2020] [Indexed: 02/01/2023] Open
Abstract
The genus Saimiri is a decades-long taxonomic and phylogenetic puzzle to which cytogenetics has contributed crucial data. All Saimiri species apparently have a diploid number of 2n = 44 but vary in the number of chromosome arms. Repetitive sequences such as satellite DNAs are potentially informative cytogenetic markers because they display high evolutionary rates. Our goal is to increase the pertinent karyological data by more fully characterizing satellite DNA sequences in the Saimiri genus. We were able to identify two abundant satellite DNAs, alpha (~340 bp) and CapA (~1,500 bp), from short-read clustering of sequencing datasets from S. boliviensis. The alpha sequences comprise about 1% and the CapA 2.2% of the S. boliviensis genome. We also mapped both satellite DNAs in S. boliviensis, S. sciureus, S. vanzolinii, and S. ustus. The alpha has high interspecific repeat homogeneity and was mapped to the centromeres of all analyzed species. CapA is associated with non-pericentromeric heterochromatin and its distribution varies among Saimiri species. We conclude that CapA genomic distribution and its pervasiveness across Platyrrhini makes it an attractive cytogenetic marker for Saimiri and other New World monkeys.
Collapse
Affiliation(s)
- Mirela Pelizaro Valeri
- Laboratório de Citogenômica Evolutiva, Departamento de Genética, Ecologia e Evolução, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil
| | - Guilherme Borges Dias
- Department of Genetics and Institute of Bioinformatics, University of Georgia, Athens, GA, United States of America
| | - Camila Nascimento Moreira
- Departamento de Genética e Biologia Evolutiva, Instituto de Biociências, Universidade de São Paulo, São Paulo, SP, Brazil
| | - Yatiyo Yonenaga-Yassuda
- Departamento de Genética e Biologia Evolutiva, Instituto de Biociências, Universidade de São Paulo, São Paulo, SP, Brazil
| | - Roscoe Stanyon
- Department of Biology, University of Florence, Florence, Italy
| | - Gustavo Campos E Silva Kuhn
- Laboratório de Citogenômica Evolutiva, Departamento de Genética, Ecologia e Evolução, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil
| | - Marta Svartman
- Laboratório de Citogenômica Evolutiva, Departamento de Genética, Ecologia e Evolução, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil.
| |
Collapse
|
23
|
Schotanus K, Heitman J. Centromere deletion in Cryptococcus deuterogattii leads to neocentromere formation and chromosome fusions. eLife 2020; 9:56026. [PMID: 32310085 PMCID: PMC7188483 DOI: 10.7554/elife.56026] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2020] [Accepted: 04/16/2020] [Indexed: 02/06/2023] Open
Abstract
The human fungal pathogen Cryptococcus deuterogattii is RNAi-deficient and lacks active transposons in its genome. C. deuterogattii has regional centromeres that contain only transposon relics. To investigate the impact of centromere loss on the C. deuterogattii genome, either centromere 9 or 10 was deleted. Deletion of either centromere resulted in neocentromere formation and interestingly, the genes covered by these neocentromeres maintained wild-type expression levels. In contrast to cen9∆ mutants, cen10∆ mutant strains exhibited growth defects and were aneuploid for chromosome 10. At an elevated growth temperature (37°C), the cen10∆ chromosome was found to have undergone fusion with another native chromosome in some isolates and this fusion restored wild-type growth. Following chromosomal fusion, the neocentromere was inactivated, and the native centromere of the fused chromosome served as the active centromere. The neocentromere formation and chromosomal fusion events observed in this study in C. deuterogattii may be similar to events that triggered genomic changes within the Cryptococcus/Kwoniella species complex and may contribute to speciation throughout the eukaryotic domain.
Collapse
Affiliation(s)
- Klaas Schotanus
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, United States
| | - Joseph Heitman
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, United States
| |
Collapse
|
24
|
Puppo IL, Saifitdinova AF, Tonyan ZN. The Role of Satellite DNA in Causing Structural Rearrangements in Human Karyotype. RUSS J GENET+ 2020. [DOI: 10.1134/s1022795419080155] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
|
25
|
Achrem M, Szućko I, Kalinka A. The epigenetic regulation of centromeres and telomeres in plants and animals. COMPARATIVE CYTOGENETICS 2020; 14:265-311. [PMID: 32733650 PMCID: PMC7360632 DOI: 10.3897/compcytogen.v14i2.51895] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Accepted: 05/18/2020] [Indexed: 05/10/2023]
Abstract
The centromere is a chromosomal region where the kinetochore is formed, which is the attachment point of spindle fibers. Thus, it is responsible for the correct chromosome segregation during cell division. Telomeres protect chromosome ends against enzymatic degradation and fusions, and localize chromosomes in the cell nucleus. For this reason, centromeres and telomeres are parts of each linear chromosome that are necessary for their proper functioning. More and more research results show that the identity and functions of these chromosomal regions are epigenetically determined. Telomeres and centromeres are both usually described as highly condensed heterochromatin regions. However, the epigenetic nature of centromeres and telomeres is unique, as epigenetic modifications characteristic of both eu- and heterochromatin have been found in these areas. This specificity allows for the proper functioning of both regions, thereby affecting chromosome homeostasis. This review focuses on demonstrating the role of epigenetic mechanisms in the functioning of centromeres and telomeres in plants and animals.
Collapse
Affiliation(s)
- Magdalena Achrem
- Institute of Biology, University of Szczecin, Szczecin, PolandUniversity of SzczecinSzczecinPoland
- Molecular Biology and Biotechnology Center, University of Szczecin, Szczecin, PolandUniversity of SzczecinSzczecinPoland
| | - Izabela Szućko
- Institute of Biology, University of Szczecin, Szczecin, PolandUniversity of SzczecinSzczecinPoland
- Molecular Biology and Biotechnology Center, University of Szczecin, Szczecin, PolandUniversity of SzczecinSzczecinPoland
| | - Anna Kalinka
- Institute of Biology, University of Szczecin, Szczecin, PolandUniversity of SzczecinSzczecinPoland
- Molecular Biology and Biotechnology Center, University of Szczecin, Szczecin, PolandUniversity of SzczecinSzczecinPoland
| |
Collapse
|
26
|
Discovery of 33mer in chromosome 21 - the largest alpha satellite higher order repeat unit among all human somatic chromosomes. Sci Rep 2019; 9:12629. [PMID: 31477765 PMCID: PMC6718397 DOI: 10.1038/s41598-019-49022-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2019] [Accepted: 08/13/2019] [Indexed: 11/10/2022] Open
Abstract
The centromere is important for segregation of chromosomes during cell division in eukaryotes. Its destabilization results in chromosomal missegregation, aneuploidy, hallmarks of cancers and birth defects. In primate genomes centromeres contain tandem repeats of ~171 bp alpha satellite DNA, commonly organized into higher order repeats (HORs). In spite of crucial importance, satellites have been understudied because of gaps in sequencing - genomic “black holes”. Bioinformatical studies of genomic sequences open possibilities to revolutionize understanding of repetitive DNA datasets. Here, using robust (Global Repeat Map) algorithm we identified in hg38 sequence of human chromosome 21 complete ensemble of alpha satellite HORs with six long repeat units (≥20 mers), five of them novel. Novel 33mer HOR has the longest HOR unit identified so far among all somatic chromosomes and novel 23mer reverse HOR is distant far from the centromere. Also, we discovered that for hg38 assembly the 33mer sequences in chromosomes 21, 13, 14, and 22 are 100% identical but nearby gaps are present; that seems to require an additional more precise sequencing. Chromosome 21 is of significant interest for deciphering the molecular base of Down syndrome and of aneuploidies in general. Since the chromosome identifier probes are largely based on the detection of higher order alpha satellite repeats, distinctions between alpha satellite HORs in chromosomes 21 and 13 here identified might lead to a unique chromosome 21 probe in molecular cytogenetics, which would find utility in diagnostics. It is expected that its complete sequence analysis will have profound implications for understanding pathogenesis of diseases and development of new therapeutic approaches.
Collapse
|
27
|
Smalec BM, Heider TN, Flynn BL, O'Neill RJ. A centromere satellite concomitant with extensive karyotypic diversity across the Peromyscus genus defies predictions of molecular drive. Chromosome Res 2019; 27:237-252. [PMID: 30771198 PMCID: PMC6733818 DOI: 10.1007/s10577-019-09605-1] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2018] [Revised: 01/26/2019] [Accepted: 01/29/2019] [Indexed: 12/17/2022]
Abstract
A common feature of eukaryotic centromeres is the presence of large tracts of tandemly arranged repeats, known as satellite DNA. However, these centromeric repeats appear to experience rapid evolution under forces such as molecular drive and centromere drive, seemingly without consequence to the integrity of the centromere. Moreover, blocks of heterochromatin within the karyotype, including the centromere, are hotspots for chromosome rearrangements that may drive speciation events by contributing to reproductive isolation. However, the relationship between the evolution of heterochromatic sequences and the karyotypic dynamics of these regions remains largely unknown. Here, we show that a single conserved satellite DNA sequence in the order Rodentia of the genus Peromyscus localizes to recurrent sites of chromosome rearrangements and heterochromatic amplifications. Peromyscine species display several unique features of chromosome evolution compared to other Rodentia, including stable maintenance of a strict chromosome number of 48 among all known species in the absence of any detectable interchromosomal rearrangements. Rather, the diverse karyotypes of Peromyscine species are due to intrachromosomal variation in blocks of repeated DNA content. Despite wide variation in the copy number and location of repeat blocks among different species, we find that a single satellite monomer maintains a conserved sequence and homogenized tandem repeat structure, defying predictions of molecular drive. The conservation of this satellite monomer results in common, abundant, and large blocks of chromatin that are homologous among chromosomes within one species and among diverged species. Thus, such a conserved repeat may have facilitated the retention of polymorphic chromosome variants within individuals and intrachromosomal rearrangements between species-both factors that have previously been hypothesized to contribute towards the extremely wide range of ecological adaptations that this genus exhibits.
Collapse
Affiliation(s)
- Brendan M Smalec
- Institute for Systems Genomics and Department of Molecular and Cell Biology, University of Connecticut, 67 North Eagleville Road, Unit 3127, Storrs, CT, 06269, USA
| | - Thomas N Heider
- Institute for Systems Genomics and Department of Molecular and Cell Biology, University of Connecticut, 67 North Eagleville Road, Unit 3127, Storrs, CT, 06269, USA
| | - Brianna L Flynn
- Institute for Systems Genomics and Department of Molecular and Cell Biology, University of Connecticut, 67 North Eagleville Road, Unit 3127, Storrs, CT, 06269, USA
| | - Rachel J O'Neill
- Institute for Systems Genomics and Department of Molecular and Cell Biology, University of Connecticut, 67 North Eagleville Road, Unit 3127, Storrs, CT, 06269, USA.
| |
Collapse
|
28
|
Hirai H, Hirai Y, Udono T, Matsubayashi K, Tosi AJ, Koga A. Structural variations of subterminal satellite blocks and their source mechanisms as inferred from the meiotic configurations of chimpanzee chromosome termini. Chromosome Res 2019; 27:321-332. [PMID: 31418128 DOI: 10.1007/s10577-019-09615-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Revised: 07/09/2019] [Accepted: 07/29/2019] [Indexed: 10/26/2022]
Abstract
African great apes have large constitutive heterochromatin (C-band) blocks in subtelomeric regions of the majority of their chromosomes, but humans lack these. Additionally, the chimpanzee meiotic cell division process demonstrates unique partial terminal associations in the first meiotic prophase (pachytene). These are likely formed as a result of interaction among subtelomeric C-band blocks. We thus conducted an extensive study to define the features in the subtelomeric heterochromatic regions of chimpanzee chromosomes undergoing mitotic metaphase and meiotic cell division. Molecular cytogenetic analyses with probes of both subterminal satellite DNA (a main component of C-band) and rDNA demonstrated principles of interaction among DNA arrays. The results suggest that homologous and ectopic recombination through persistent subtelomeric associations (post-bouquet association observed in 32% of spermatocytes in the pachytene stage) appears to create variability in heterochromatin patterns and simultaneously restrain subtelomeric genome polymorphisms. That is, the meeting of non-homologous chromosome termini sets the stage for ectopic pairing which, in turn, is the mechanism for generating variability and genomic dispersion of subtelomeric C-band blocks through a system of concerted evolution. Comparison between the present study and previous reports indicated that the chromosomal distribution rate of sutelomeric regions seems to have antagonistic correlation with arm numbers holding subterminal satellite blocks in humans, chimpanzees, and gorillas. That is, the increase of subterminal satellite blocks probably reduces genomic diversity in the subtelomeric regions. The acquisition vs. loss of the subtelomeric C-band blocks is postulated as the underlying engine of this chromosomal differentiation yielded by meiotic chromosomal interaction.
Collapse
Affiliation(s)
- Hirohisa Hirai
- Primate Research Institute, Kyoto University, Inuyama, Aichi, 484-8506, Japan. .,The Unit of Human-Nature Interlaced Life Science, Kyoto University Research Coordination Alliance, Kyoto, Japan.
| | - Yuriko Hirai
- Primate Research Institute, Kyoto University, Inuyama, Aichi, 484-8506, Japan
| | - Toshifumi Udono
- Kumamoto Sanctuary, Wildlife Research Center, Kyoto University, Uto, Kumamoto, Japan
| | | | - Anthony J Tosi
- Department of Anthropology and School of Biomedical Science, Kent State University, Kent, OH, 44242, USA
| | - Akihiko Koga
- Primate Research Institute, Kyoto University, Inuyama, Aichi, 484-8506, Japan
| |
Collapse
|
29
|
Black EM, Giunta S. Repetitive Fragile Sites: Centromere Satellite DNA As a Source of Genome Instability in Human Diseases. Genes (Basel) 2018; 9:E615. [PMID: 30544645 PMCID: PMC6315641 DOI: 10.3390/genes9120615] [Citation(s) in RCA: 66] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2018] [Revised: 12/03/2018] [Accepted: 12/03/2018] [Indexed: 12/31/2022] Open
Abstract
Maintenance of an intact genome is essential for cellular and organismal homeostasis. The centromere is a specialized chromosomal locus required for faithful genome inheritance at each round of cell division. Human centromeres are composed of large tandem arrays of repetitive alpha-satellite DNA, which are often sites of aberrant rearrangements that may lead to chromosome fusions and genetic abnormalities. While the centromere has an essential role in chromosome segregation during mitosis, the long and repetitive nature of the highly identical repeats has greatly hindered in-depth genetic studies, and complete annotation of all human centromeres is still lacking. Here, we review our current understanding of human centromere genetics and epigenetics as well as recent investigations into the role of centromere DNA in disease, with a special focus on cancer, aging, and human immunodeficiency⁻centromeric instability⁻facial anomalies (ICF) syndrome. We also highlight the causes and consequences of genomic instability at these large repetitive arrays and describe the possible sources of centromere fragility. The novel connection between alpha-satellite DNA instability and human pathological conditions emphasizes the importance of obtaining a truly complete human genome assembly and accelerating our understanding of centromere repeats' role in physiology and beyond.
Collapse
Affiliation(s)
- Elizabeth M Black
- Laboratory of Chromosome and Cell Biology, The Rockefeller University, 1230 York Avenue, New York, NY 10065, USA.
| | - Simona Giunta
- Laboratory of Chromosome and Cell Biology, The Rockefeller University, 1230 York Avenue, New York, NY 10065, USA.
| |
Collapse
|
30
|
Cacheux L, Ponger L, Gerbault-Seureau M, Loll F, Gey D, Richard FA, Escudé C. The Targeted Sequencing of Alpha Satellite DNA in Cercopithecus pogonias Provides New Insight Into the Diversity and Dynamics of Centromeric Repeats in Old World Monkeys. Genome Biol Evol 2018; 10:1837-1851. [PMID: 29860303 PMCID: PMC6061836 DOI: 10.1093/gbe/evy109] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/29/2018] [Indexed: 02/06/2023] Open
Abstract
Alpha satellite is the major repeated DNA element of primate centromeres. Specific evolutionary mechanisms have led to a great diversity of sequence families with peculiar genomic organization and distribution, which have till now been studied mostly in great apes. Using high throughput sequencing of alpha satellite monomers obtained by enzymatic digestion followed by computational and cytogenetic analysis, we compare here the diversity and genomic distribution of alpha satellite DNA in two related Old World monkey species, Cercopithecus pogonias and Cercopithecus solatus, which are known to have diverged about 7 Ma. Two main families of monomers, called C1 and C2, are found in both species. A detailed analysis of our data sets revealed the existence of numerous subfamilies within the centromeric C1 family. Although the most abundant subfamily is conserved between both species, our fluorescence in situ hybridization (FISH) experiments clearly show that some subfamilies are specific for each species and that their distribution is restricted to a subset of chromosomes, thereby pointing to the existence of recurrent amplification/homogenization events. The pericentromeric C2 family is very abundant on the short arm of all acrocentric chromosomes in both species, pointing to specific mechanisms that lead to this distribution. Results obtained using two different restriction enzymes are fully consistent with a predominant monomeric organization of alpha satellite DNA that coexists with higher order organization patterns in the C. pogonias genome. Our study suggests a high dynamics of alpha satellite DNA in Cercopithecini, with recurrent apparition of new sequence variants and interchromosomal sequence transfer.
Collapse
Affiliation(s)
- Lauriane Cacheux
- Département Adaptations du Vivant, Structure et Instabilité des Génomes, INSERM U1154, CNRS UMR7196, Sorbonne Universités, Muséum National d’Histoire Naturelle, Paris, France
- Département Origines et Evolution, Institut de Systématique, Evolution, Biodiversité, UMR 7205 MNHN, CNRS, UPMC, EPHE, Sorbonne Universités, Muséum National d’Histoire Naturelle, Paris, France
| | - Loïc Ponger
- Département Adaptations du Vivant, Structure et Instabilité des Génomes, INSERM U1154, CNRS UMR7196, Sorbonne Universités, Muséum National d’Histoire Naturelle, Paris, France
| | - Michèle Gerbault-Seureau
- Département Origines et Evolution, Institut de Systématique, Evolution, Biodiversité, UMR 7205 MNHN, CNRS, UPMC, EPHE, Sorbonne Universités, Muséum National d’Histoire Naturelle, Paris, France
| | - François Loll
- Département Adaptations du Vivant, Structure et Instabilité des Génomes, INSERM U1154, CNRS UMR7196, Sorbonne Universités, Muséum National d’Histoire Naturelle, Paris, France
| | - Delphine Gey
- Service de Systématique Moléculaire, UMS 2700 CNRS, Sorbonne Universités, Muséum National d’Histoire Naturelle, Paris, France
| | - Florence Anne Richard
- Département Origines et Evolution, Institut de Systématique, Evolution, Biodiversité, UMR 7205 MNHN, CNRS, UPMC, EPHE, Sorbonne Universités, Muséum National d’Histoire Naturelle, Paris, France
- Université Versailles St-Quentin, Montigny-le-Bretonneux, France
| | - Christophe Escudé
- Département Adaptations du Vivant, Structure et Instabilité des Génomes, INSERM U1154, CNRS UMR7196, Sorbonne Universités, Muséum National d’Histoire Naturelle, Paris, France
| |
Collapse
|
31
|
Larsen PA, Harris RA, Liu Y, Murali SC, Campbell CR, Brown AD, Sullivan BA, Shelton J, Brown SJ, Raveendran M, Dudchenko O, Machol I, Durand NC, Shamim MS, Aiden EL, Muzny DM, Gibbs RA, Yoder AD, Rogers J, Worley KC. Hybrid de novo genome assembly and centromere characterization of the gray mouse lemur (Microcebus murinus). BMC Biol 2017; 15:110. [PMID: 29145861 PMCID: PMC5689209 DOI: 10.1186/s12915-017-0439-6] [Citation(s) in RCA: 50] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2017] [Accepted: 10/10/2017] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND The de novo assembly of repeat-rich mammalian genomes using only high-throughput short read sequencing data typically results in highly fragmented genome assemblies that limit downstream applications. Here, we present an iterative approach to hybrid de novo genome assembly that incorporates datasets stemming from multiple genomic technologies and methods. We used this approach to improve the gray mouse lemur (Microcebus murinus) genome from early draft status to a near chromosome-scale assembly. METHODS We used a combination of advanced genomic technologies to iteratively resolve conflicts and super-scaffold the M. murinus genome. RESULTS We improved the M. murinus genome assembly to a scaffold N50 of 93.32 Mb. Whole genome alignments between our primary super-scaffolds and 23 human chromosomes revealed patterns that are congruent with historical comparative cytogenetic data, thus demonstrating the accuracy of our de novo scaffolding approach and allowing assignment of scaffolds to M. murinus chromosomes. Moreover, we utilized our independent datasets to discover and characterize sequences associated with centromeres across the mouse lemur genome. Quality assessment of the final assembly found 96% of mouse lemur canonical transcripts nearly complete, comparable to other published high-quality reference genome assemblies. CONCLUSIONS We describe a new assembly of the gray mouse lemur (Microcebus murinus) genome with chromosome-scale scaffolds produced using a hybrid bioinformatic and sequencing approach. The approach is cost effective and produces superior results based on metrics of contiguity and completeness. Our results show that emerging genomic technologies can be used in combination to characterize centromeres of non-model species and to produce accurate de novo chromosome-scale genome assemblies of complex mammalian genomes.
Collapse
Affiliation(s)
- Peter A. Larsen
- Department of Biology, Duke University, Durham, NC 27708 USA
| | - R. Alan Harris
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030 USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030 USA
| | - Yue Liu
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030 USA
| | - Shwetha C. Murali
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030 USA
- Present address: Department of Genome Sciences, University of Washington, Seattle, WA 98195 USA
| | | | - Adam D. Brown
- Department of Pharmacology and Cancer Biology, Duke University, Durham, NC 27710 USA
- Present address: Bristol Myers-Squibb, 420 W Round Grove Rd, Lewisville, TX 75067 USA
| | - Beth A. Sullivan
- Department of Molecular Genetics and Microbiology, Duke University, Durham, NC 27710 USA
| | - Jennifer Shelton
- Kansas State University Bioinformatics Center, Division of Biology, Kansas State University, Manhattan, KS 66506 USA
- Present address: New York Genome Center, 101 Avenue of the Americas, New York, NY 10013 USA
| | - Susan J. Brown
- Kansas State University Bioinformatics Center, Division of Biology, Kansas State University, Manhattan, KS 66506 USA
| | | | - Olga Dudchenko
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030 USA
- The Center for Theoretical Biological Physics, Rice University, Houston, TX 77005 USA
- Department of Computer Science, Rice University, Houston, TX 77005 USA
| | - Ido Machol
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030 USA
- The Center for Theoretical Biological Physics, Rice University, Houston, TX 77005 USA
- Department of Computer Science, Rice University, Houston, TX 77005 USA
| | - Neva C. Durand
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030 USA
- The Center for Theoretical Biological Physics, Rice University, Houston, TX 77005 USA
- Department of Computer Science, Rice University, Houston, TX 77005 USA
| | - Muhammad S. Shamim
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030 USA
- The Center for Theoretical Biological Physics, Rice University, Houston, TX 77005 USA
- Department of Computer Science, Rice University, Houston, TX 77005 USA
| | - Erez Lieberman Aiden
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030 USA
- The Center for Theoretical Biological Physics, Rice University, Houston, TX 77005 USA
- Department of Computer Science, Rice University, Houston, TX 77005 USA
| | - Donna M. Muzny
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030 USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030 USA
| | - Richard A. Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030 USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030 USA
| | - Anne D. Yoder
- Department of Biology, Duke University, Durham, NC 27708 USA
| | - Jeffrey Rogers
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030 USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030 USA
| | - Kim C. Worley
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030 USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030 USA
| |
Collapse
|
32
|
Araújo NP, de Lima LG, Dias GB, Kuhn GCS, de Melo AL, Yonenaga-Yassuda Y, Stanyon R, Svartman M. Identification and characterization of a subtelomeric satellite DNA in Callitrichini monkeys. DNA Res 2017; 24:377-385. [PMID: 28854689 PMCID: PMC5737874 DOI: 10.1093/dnares/dsx010] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2016] [Accepted: 03/02/2017] [Indexed: 02/01/2023] Open
Abstract
Repetitive DNAs are abundant fast-evolving components of eukaryotic genomes, which often possess important structural and functional roles. Despite their ubiquity, repetitive DNAs are poorly studied when compared with the genic fraction of genomes. Here, we took advantage of the availability of the sequenced genome of the common marmoset Callithrix jacchus to assess its satellite DNAs (satDNAs) and their distribution in Callitrichini. After clustering analysis of all reads and comparisons by similarity, we identified a satDNA composed by 171 bp motifs, named MarmoSAT, which composes 1.09% of the C. jacchus genome. Fluorescent in situ hybridization on chromosomes of species from the genera Callithrix, Mico and Callimico showed that MarmoSAT had a subtelomeric location. In addition to the common monomeric, we found that MarmoSAT was also organized in higher-order repeats of 338 bp in Callimico goeldii. Our phylogenetic analyses showed that MarmoSAT repeats from C. jacchus lack chromosome-specific features, suggesting exchange events among subterminal regions of non-homologous chromosomes. MarmoSAT is transcribed in several tissues of C. jacchus, with the highest transcription levels in spleen, thymus and heart. The transcription profile and subtelomeric location suggest that MarmoSAT may be involved in the regulation of telomerase and modulation of telomeric chromatin.
Collapse
Affiliation(s)
- Naiara Pereira Araújo
- Universidade Federal de Minas Gerais, Laboratório de Citogenômica Evolutiva, Departamento de Biologia Geral, Instituto de Ciências Biológicas, Avenida Presidente Antônio Carlos, 6627 - Pampulha, 31270-901, Belo Horizonte, Brazil
| | - Leonardo Gomes de Lima
- Universidade Federal de Minas Gerais, Laboratório de Citogenômica Evolutiva, Departamento de Biologia Geral, Instituto de Ciências Biológicas, Avenida Presidente Antônio Carlos, 6627 - Pampulha, 31270-901, Belo Horizonte, Brazil
| | - Guilherme Borges Dias
- Universidade Federal de Minas Gerais, Laboratório de Citogenômica Evolutiva, Departamento de Biologia Geral, Instituto de Ciências Biológicas, Avenida Presidente Antônio Carlos, 6627 - Pampulha, 31270-901, Belo Horizonte, Brazil
| | - Gustavo Campos Silva Kuhn
- Universidade Federal de Minas Gerais, Laboratório de Citogenômica Evolutiva, Departamento de Biologia Geral, Instituto de Ciências Biológicas, Avenida Presidente Antônio Carlos, 6627 - Pampulha, 31270-901, Belo Horizonte, Brazil
| | - Alan Lane de Melo
- Universidade Federal de Minas Gerais, Laboratório de Taxonomia e Biologia de Invertebrados, Departamento de Parasitologia, Instituto de Ciências Biológicas, Belo Horizonte, Brazil
| | - Yatiyo Yonenaga-Yassuda
- Universidade de São Paulo, Laboratório de Citogenética de Vertebrados, Departamento de Genética e Biologia Evolutiva, Instituto de Biociências, São Paulo, Brazil
| | - Roscoe Stanyon
- University of Florence, Department of Biology, Florence, Italy
| | - Marta Svartman
- Universidade Federal de Minas Gerais, Laboratório de Citogenômica Evolutiva, Departamento de Biologia Geral, Instituto de Ciências Biológicas, Avenida Presidente Antônio Carlos, 6627 - Pampulha, 31270-901, Belo Horizonte, Brazil
| |
Collapse
|
33
|
Garrido-Ramos MA. Satellite DNA: An Evolving Topic. Genes (Basel) 2017; 8:genes8090230. [PMID: 28926993 PMCID: PMC5615363 DOI: 10.3390/genes8090230] [Citation(s) in RCA: 260] [Impact Index Per Article: 32.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2017] [Revised: 09/12/2017] [Accepted: 09/13/2017] [Indexed: 12/22/2022] Open
Abstract
Satellite DNA represents one of the most fascinating parts of the repetitive fraction of the eukaryotic genome. Since the discovery of highly repetitive tandem DNA in the 1960s, a lot of literature has extensively covered various topics related to the structure, organization, function, and evolution of such sequences. Today, with the advent of genomic tools, the study of satellite DNA has regained a great interest. Thus, Next-Generation Sequencing (NGS), together with high-throughput in silico analysis of the information contained in NGS reads, has revolutionized the analysis of the repetitive fraction of the eukaryotic genomes. The whole of the historical and current approaches to the topic gives us a broad view of the function and evolution of satellite DNA and its role in chromosomal evolution. Currently, we have extensive information on the molecular, chromosomal, biological, and population factors that affect the evolutionary fate of satellite DNA, knowledge that gives rise to a series of hypotheses that get on well with each other about the origin, spreading, and evolution of satellite DNA. In this paper, I review these hypotheses from a methodological, conceptual, and historical perspective and frame them in the context of chromosomal organization and evolution.
Collapse
Affiliation(s)
- Manuel A Garrido-Ramos
- Departamento de Genética, Facultad de Ciencias, Universidad de Granada, 18071 Granada, Spain.
| |
Collapse
|
34
|
Chen Q, Tan B, He JL, Liu XQ, Chen XM, Gao RF, Zhu J, Wang YX, Qi HB. Mutational spectrum of CENP-B box and α-satellite DNA on chromosome 21 in Down syndrome children. Mol Med Rep 2017; 15:2313-2317. [PMID: 28259924 DOI: 10.3892/mmr.2017.6247] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2015] [Accepted: 01/13/2017] [Indexed: 11/06/2022] Open
Abstract
The centromere is responsible for the correct inheritance of eukaryotic chromosomes during cell division. Centromere protein B (CENP‑B) and its 17 base pair binding site (CENP‑B box), which appears at regular intervals in centromeric α-satellite DNA (α-satDNA), are important for the assembly of the centromere components. Therefore, it is conceivable that CENP-B box mutations may induce errors in cell division. However, the association between the deoxynucleotide alterations of the CENP‑B box and the extra chromosome 21 (Chr21) present in patients with Down syndrome (DS) remains to be elucidated. The mutational spectrum of the α‑satDNA, including 4 functional CENP‑B boxes in Chr21 from 127 DS and 100 healthy children were analyzed by direct sequencing. The de novo occurrences of mutations within CENP‑B boxes in patients with DS were excluded. The prevalence of 6 novel mutations (g.661delC, g.1035_1036insA, g.1076_1077insC, g.670T>G, g.1239A>T, g.1343T>C) and 3 single nucleotide polymorphisms (g.727C/T, g.863A/C, g.1264C/G) were not significantly different between DS and controls (P>0.05). However, g.525C/G (P=0.01), g.601T/C (P=0.00000002), g.1279A/G (P=0.002), g.1294C/T (P=0.0006) and g.1302 G/T (P=0.004) were significantly associated with the prevalence of DS (P<0.05). The results indicated that CENP‑B boxes are highly conserved in DS patients and may not be responsible for Chr21 nondisjunction events. However, α‑satDNA in Chr21 is variable and deoxynucleotide deletions, mutations and polymorphisms may act as potential molecular diagnostic markers of DS.
Collapse
Affiliation(s)
- Qian Chen
- Department of Obstetrics and Gynecology, The First Affiliated Hospital of Chongqing Medical University, Chongqing 400016, P.R. China
| | - Bin Tan
- Pediatrics Research Institute, Children's Hospital of Chongqing Medical University, Ministry of Education Key Laboratory of Child Development and Disorders, Chongqing 400014, P.R. China
| | - Jun-Lin He
- Laboratory of Reproductive Biology, Public Health College, Chongqing Medical University, Chongqing 400016, P.R. China
| | - Xue-Qing Liu
- Laboratory of Reproductive Biology, Public Health College, Chongqing Medical University, Chongqing 400016, P.R. China
| | - Xue-Mei Chen
- Laboratory of Reproductive Biology, Public Health College, Chongqing Medical University, Chongqing 400016, P.R. China
| | - Ru-Fei Gao
- Laboratory of Reproductive Biology, Public Health College, Chongqing Medical University, Chongqing 400016, P.R. China
| | - Jing Zhu
- Pediatrics Research Institute, Children's Hospital of Chongqing Medical University, Ministry of Education Key Laboratory of Child Development and Disorders, Chongqing 400014, P.R. China
| | - Ying-Xiong Wang
- Laboratory of Reproductive Biology, Public Health College, Chongqing Medical University, Chongqing 400016, P.R. China
| | - Hong-Bo Qi
- Department of Obstetrics and Gynecology, The First Affiliated Hospital of Chongqing Medical University, Chongqing 400016, P.R. China
| |
Collapse
|
35
|
Miga KH. The Promises and Challenges of Genomic Studies of Human Centromeres. PROGRESS IN MOLECULAR AND SUBCELLULAR BIOLOGY 2017; 56:285-304. [PMID: 28840242 DOI: 10.1007/978-3-319-58592-5_12] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Human centromeres are genomic regions that act as sites of kinetochore assembly to ensure proper chromosome segregation during mitosis and meiosis. Although the biological importance of centromeres in genome stability, and ultimately, cell viability are well understood, the complete sequence content and organization in these multi-megabase-sized regions remains unknown. The lack of a high-resolution reference assembly inhibits standard bioinformatics protocols, and as a result, sequence-based studies involving human centromeres lag far behind the advances made for the non-repetitive sequences in the human genome. In this chapter, I introduce what is known about the genomic organization in the highly repetitive regions spanning human centromeres, and discuss the challenges these sequences pose for assembly, alignment, and data interpretation. Overcoming these obstacles is expected to issue a new era for centromere genomics, which will offer new discoveries in basic cell biology and human biomedical research.
Collapse
Affiliation(s)
- Karen H Miga
- Center for Biomolecular Science and Engineering, University of California, Santa Cruz, CA, USA.
| |
Collapse
|
36
|
Cacheux L, Ponger L, Gerbault-Seureau M, Richard FA, Escudé C. Diversity and distribution of alpha satellite DNA in the genome of an Old World monkey: Cercopithecus solatus. BMC Genomics 2016; 17:916. [PMID: 27842493 PMCID: PMC5109768 DOI: 10.1186/s12864-016-3246-5] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2016] [Accepted: 11/02/2016] [Indexed: 11/10/2022] Open
Abstract
Background Alpha satellite is the major repeated DNA element of primate centromeres. Evolution of these tandemly repeated sequences has led to the existence of numerous families of monomers exhibiting specific organizational patterns. The limited amount of information available in non-human primates is a restriction to the understanding of the evolutionary dynamics of alpha satellite DNA. Results We carried out the targeted high-throughput sequencing of alpha satellite monomers and dimers from the Cercopithecus solatus genome, an Old World monkey from the Cercopithecini tribe. Computational approaches were used to infer the existence of sequence families and to study how these families are organized with respect to each other. While previous studies had suggested that alpha satellites in Old World monkeys were poorly diversified, our analysis provides evidence for the existence of at least four distinct families of sequences within the studied species and of higher order organizational patterns. Fluorescence in situ hybridization using oligonucleotide probes that are able to target each family in a specific way showed that the different families had distinct distributions on chromosomes and were not homogeneously distributed between chromosomes. Conclusions Our new approach provides an unprecedented and comprehensive view of the diversity and organization of alpha satellites in a species outside the hominoid group. We consider these data with respect to previously known alpha satellite families and to potential mechanisms for satellite DNA evolution. Applying this approach to other species will open new perspectives regarding the integration of satellite DNA into comparative genomic and cytogenetic studies. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3246-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Lauriane Cacheux
- Département Régulations, Développement et Diversité Moléculaire, Structure et Instabilité des Génomes, INSERM U1154, CNRS UMR7196, Sorbonne Universités, Muséum national d'Histoire naturelle, Paris, France.,Département Systématique et Evolution, Institut de Systématique, Evolution, Biodiversité, UMR 7205 MNHN, CNRS, UPMC, EPHE, Sorbonne Universités, Muséum national d'Histoire naturelle, Paris, France
| | - Loïc Ponger
- Département Régulations, Développement et Diversité Moléculaire, Structure et Instabilité des Génomes, INSERM U1154, CNRS UMR7196, Sorbonne Universités, Muséum national d'Histoire naturelle, Paris, France
| | - Michèle Gerbault-Seureau
- Département Systématique et Evolution, Institut de Systématique, Evolution, Biodiversité, UMR 7205 MNHN, CNRS, UPMC, EPHE, Sorbonne Universités, Muséum national d'Histoire naturelle, Paris, France
| | - Florence Anne Richard
- Département Systématique et Evolution, Institut de Systématique, Evolution, Biodiversité, UMR 7205 MNHN, CNRS, UPMC, EPHE, Sorbonne Universités, Muséum national d'Histoire naturelle, Paris, France.,Université Versailles St-Quentin, Montigny-le-Bretonneux, France
| | - Christophe Escudé
- Département Régulations, Développement et Diversité Moléculaire, Structure et Instabilité des Génomes, INSERM U1154, CNRS UMR7196, Sorbonne Universités, Muséum national d'Histoire naturelle, Paris, France.
| |
Collapse
|
37
|
Abstract
Genomic studies rely on accurate chromosome assemblies to explore sequence-based models of cell biology, evolution and biomedical disease. However, even the extensively studied human genome has not yet reached a complete, 'telomere-to-telomere', chromosome assembly. The largest assembly gaps remain in centromeric regions and acrocentric short arms, sites known to contain megabase-sized arrays of tandem repeats, or satellite DNAs. This review aims to briefly address the progress and challenges of generating correct assemblies of satellite DNA arrays. Although the focus is placed on the human genome, many concepts presented here are applicable to other genomes.
Collapse
Affiliation(s)
- Karen H Miga
- Center for Biomolecular Science and Engineering, University of California Santa Cruz, Santa Cruz, CA, 95064, USA.
| |
Collapse
|
38
|
Pugacheva EM, Teplyakov E, Wu Q, Li J, Chen C, Meng C, Liu J, Robinson S, Loukinov D, Boukaba A, Hutchins AP, Lobanenkov V, Strunnikov A. The cancer-associated CTCFL/BORIS protein targets multiple classes of genomic repeats, with a distinct binding and functional preference for humanoid-specific SVA transposable elements. Epigenetics Chromatin 2016; 9:35. [PMID: 27588042 PMCID: PMC5007689 DOI: 10.1186/s13072-016-0084-2] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2016] [Accepted: 08/18/2016] [Indexed: 12/20/2022] Open
Abstract
Background A common aberration in cancer is the activation of germline-specific proteins. The DNA-binding proteins among them could generate novel chromatin states, not found in normal cells. The germline-specific transcription factor BORIS/CTCFL, a paralog of chromatin architecture protein CTCF, is often erroneously activated in cancers and rewires the epigenome for the germline-like transcription program. Another common feature of malignancies is the changed expression and epigenetic states of genomic repeats, which could alter the transcription of neighboring genes and cause somatic mutations upon transposition. The role of BORIS in transposable elements and other repeats has never been assessed. Results The investigation of BORIS and CTCF binding to DNA repeats in the K562 cancer cells dependent on BORIS for self-renewal by ChIP-chip and ChIP-seq revealed three classes of occupancy by these proteins: elements cohabited by BORIS and CTCF, CTCF-only bound, or BORIS-only bound. The CTCF-only enrichment is characteristic for evolutionary old and inactive repeat classes, while BORIS and CTCF co-binding predominately occurs at uncharacterized tandem repeats. These repeats form staggered cluster binding sites, which are a prerequisite for CTCF and BORIS co-binding. At the same time, BORIS preferentially occupies a specific subset of the evolutionary young, transcribed, and mobile genomic repeat family, SVA. Unlike CTCF, BORIS prominently binds to the VNTR region of the SVA repeats in vivo. This suggests a role of BORIS in SVA expression regulation. RNA-seq analysis indicates that BORIS largely serves as a repressor of SVA expression, alongside DNA and histone methylation, with the exception of promoter capture by SVA. Conclusions Thus, BORIS directly binds to, and regulates SVA repeats, which are essentially movable CpG islands, via clusters of BORIS binding sites. This finding uncovers a new function of the global germline-specific transcriptional regulator BORIS in regulating and repressing the newest class of transposable elements that are actively transposed in human genome when activated. This function of BORIS in cancer cells is likely a reflection of its roles in the germline. Electronic supplementary material The online version of this article (doi:10.1186/s13072-016-0084-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | - Evgeny Teplyakov
- Molecular Epigenetics Laboratory, Guangzhou Institutes of Biomedicine and Health, Guangzhou, 510530 Guangdong China
| | - Qiongfang Wu
- Molecular Epigenetics Laboratory, Guangzhou Institutes of Biomedicine and Health, Guangzhou, 510530 Guangdong China
| | - Jingjing Li
- Molecular Epigenetics Laboratory, Guangzhou Institutes of Biomedicine and Health, Guangzhou, 510530 Guangdong China
| | - Cheng Chen
- Molecular Epigenetics Laboratory, Guangzhou Institutes of Biomedicine and Health, Guangzhou, 510530 Guangdong China
| | - Chengcheng Meng
- Molecular Epigenetics Laboratory, Guangzhou Institutes of Biomedicine and Health, Guangzhou, 510530 Guangdong China
| | - Jian Liu
- Molecular Epigenetics Laboratory, Guangzhou Institutes of Biomedicine and Health, Guangzhou, 510530 Guangdong China
| | - Susan Robinson
- Laboratory of Immunogenetics, NIH, NIAID, Rockville, MD 20852 USA
| | - Dmitry Loukinov
- Laboratory of Immunogenetics, NIH, NIAID, Rockville, MD 20852 USA
| | - Abdelhalim Boukaba
- Molecular Epigenetics Laboratory, Guangzhou Institutes of Biomedicine and Health, Guangzhou, 510530 Guangdong China
| | - Andrew Paul Hutchins
- Department of Biology, Southern University of Science and Technology of China, Shenzhen, 518055 Guangdong China
| | | | - Alexander Strunnikov
- Molecular Epigenetics Laboratory, Guangzhou Institutes of Biomedicine and Health, Guangzhou, 510530 Guangdong China
| |
Collapse
|
39
|
Evtushenko EV, Levitsky VG, Elisafenko EA, Gunbin KV, Belousov AI, Šafář J, Doležel J, Vershinin AV. The expansion of heterochromatin blocks in rye reflects the co-amplification of tandem repeats and adjacent transposable elements. BMC Genomics 2016; 17:337. [PMID: 27146967 PMCID: PMC4857426 DOI: 10.1186/s12864-016-2667-5] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2015] [Accepted: 04/25/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A prominent and distinctive feature of the rye (Secale cereale) chromosomes is the presence of massive blocks of subtelomeric heterochromatin, the size of which is correlated with the copy number of tandem arrays. The rapidity with which these regions have formed over the period of speciation remains unexplained. RESULTS Using a BAC library created from the short arm telosome of rye chromosome 1R we uncovered numerous arrays of the pSc200 and pSc250 tandem repeat families which are concentrated in subtelomeric heterochromatin and identified the adjacent DNA sequences. The arrays show significant heterogeneity in monomer organization. 454 reads were used to gain a representation of the expansion of these tandem repeats across the whole rye genome. The presence of multiple, relatively short monomer arrays, coupled with the mainly star-like topology of the monomer phylogenetic trees, was taken as indicative of a rapid expansion of the pSc200 and pSc250 arrays. The evolution of subtelomeric heterochromatin appears to have included a significant contribution of illegitimate recombination. The composition of transposable elements (TEs) within the regions flanking the pSc200 and pSc250 arrays differed markedly from that in the genome a whole. Solo-LTRs were strongly enriched, suggestive of a history of active ectopic exchange. Several DNA motifs were over-represented within the LTR sequences. CONCLUSION The large blocks of subtelomeric heterochromatin have arisen from the combined activity of TEs and the expansion of the tandem repeats. The expansion was likely based on a highly complex network of recombination mechanisms.
Collapse
Affiliation(s)
- E V Evtushenko
- Institute of Molecular and Cellular Biology, Siberian Branch of the RAS, Novosibirsk, Russia
| | - V G Levitsky
- Institute of Cytology and Genetics, Siberian Branch of the RAS, Novosibirsk, Russia
- Novosibirsk State University, Novosibirsk, Russia
| | - E A Elisafenko
- Institute of Cytology and Genetics, Siberian Branch of the RAS, Novosibirsk, Russia
| | - K V Gunbin
- Institute of Cytology and Genetics, Siberian Branch of the RAS, Novosibirsk, Russia
- Novosibirsk State University, Novosibirsk, Russia
| | - A I Belousov
- Institute of Molecular and Cellular Biology, Siberian Branch of the RAS, Novosibirsk, Russia
| | - J Šafář
- Institute of Experimental Botany, Centre of the Region Haná for Biotechnological and Agricultural Research, Olomouc, Czech Republic
| | - J Doležel
- Institute of Experimental Botany, Centre of the Region Haná for Biotechnological and Agricultural Research, Olomouc, Czech Republic
| | - A V Vershinin
- Institute of Molecular and Cellular Biology, Siberian Branch of the RAS, Novosibirsk, Russia.
| |
Collapse
|
40
|
Sevim V, Bashir A, Chin CS, Miga KH. Alpha-CENTAURI: assessing novel centromeric repeat sequence variation with long read sequencing. Bioinformatics 2016; 32:1921-1924. [PMID: 27153570 PMCID: PMC4920115 DOI: 10.1093/bioinformatics/btw101] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2015] [Accepted: 02/17/2016] [Indexed: 11/13/2022] Open
Abstract
Motivation: Long arrays of near-identical tandem repeats are a common feature of centromeric and subtelomeric regions in complex genomes. These sequences present a source of repeat structure diversity that is commonly ignored by standard genomic tools. Unlike reads shorter than the underlying repeat structure that rely on indirect inference methods, e.g. assembly, long reads allow direct inference of satellite higher order repeat structure. To automate characterization of local centromeric tandem repeat sequence variation we have designed Alpha-CENTAURI (ALPHA satellite CENTromeric AUtomated Repeat Identification), that takes advantage of Pacific Bioscience long-reads from whole-genome sequencing datasets. By operating on reads prior to assembly, our approach provides a more comprehensive set of repeat-structure variants and is not impacted by rearrangements or sequence underrepresentation due to misassembly. Results: We demonstrate the utility of Alpha-CENTAURI in characterizing repeat structure for alpha satellite containing reads in the hydatidiform mole (CHM1, haploid-like) genome. The pipeline is designed to report local repeat organization summaries for each read, thereby monitoring rearrangements in repeat units, shifts in repeat orientation and sites of array transition into non-satellite DNA, typically defined by transposable element insertion. We validate the method by showing consistency with existing centromere high order repeat references. Alpha-CENTAURI can, in principle, run on any sequence data, offering a method to generate a sequence repeat resolution that could be readily performed using consensus sequences available for other satellite families in genomes without high-quality reference assemblies. Availability and implementation: Documentation and source code for Alpha-CENTAURI are freely available at http://github.com/volkansevim/alpha-CENTAURI. Contact:ali.bashir@mssm.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Volkan Sevim
- Pacific Biosciences, Inc., Menlo Park, CA 94025, USA
| | - Ali Bashir
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1425 Madison Avenue, New York, NY 10029, USA.,Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, 1425 Madison Avenue, New York, NY 10029, USA
| | | | - Karen H Miga
- Center for Biomolecular Science and Engineering, University of California, Santa Cruz, CA 95064, USA
| |
Collapse
|
41
|
Catacchio CR, Ragone R, Chiatante G, Ventura M. Organization and evolution of Gorilla centromeric DNA from old strategies to new approaches. Sci Rep 2015; 5:14189. [PMID: 26387916 PMCID: PMC4585704 DOI: 10.1038/srep14189] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2015] [Accepted: 08/18/2015] [Indexed: 11/09/2022] Open
Abstract
The centromere/kinetochore interaction is responsible for the pairing and segregation of replicated chromosomes in eukaryotes. Centromere DNA is portrayed as scarcely conserved, repetitive in nature, quickly evolving and protein-binding competent. Among primates, the major class of centromeric DNA is the pancentromeric α-satellite, made of arrays of 171 bp monomers, repeated in a head-to-tail pattern. α-satellite sequences can either form tandem heterogeneous monomeric arrays or assemble in higher-order repeats (HORs). Gorilla centromere DNA has barely been characterized, and data are mainly based on hybridizations of human alphoid sequences. We isolated and finely characterized gorilla α-satellite sequences and revealed relevant structure and chromosomal distribution similarities with other great apes as well as gorilla-specific features, such as the uniquely octameric structure of the suprachromosomal family-2 (SF2). We demonstrated for the first time the orthologous localization of alphoid suprachromosomal families-1 and −2 (SF1 and SF2) between human and gorilla in contrast to chimpanzee centromeres. Finally, the discovery of a new 189 bp monomer type in gorilla centromeres unravels clues to the role of the centromere protein B, paving the way to solve the significance of the centromere DNA’s essential repetitive nature in association with its function and the peculiar evolution of the α-satellite sequence.
Collapse
Affiliation(s)
- C R Catacchio
- University of Bari Aldo Moro, Department of Biology, Via Orabona 4, Bari, 70125, Italy
| | - R Ragone
- University of Bari Aldo Moro, Department of Biology, Via Orabona 4, Bari, 70125, Italy
| | - G Chiatante
- University of Bari Aldo Moro, Department of Biology, Via Orabona 4, Bari, 70125, Italy
| | - M Ventura
- University of Bari Aldo Moro, Department of Biology, Via Orabona 4, Bari, 70125, Italy
| |
Collapse
|
42
|
Shepelev VA, Uralsky LI, Alexandrov AA, Yurov YB, Rogaev EI, Alexandrov IA. Annotation of suprachromosomal families reveals uncommon types of alpha satellite organization in pericentromeric regions of hg38 human genome assembly. GENOMICS DATA 2015; 5:139-146. [PMID: 26167452 PMCID: PMC4496801 DOI: 10.1016/j.gdata.2015.05.035] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
- V A Shepelev
- Institute of Molecular Genetics, Russian Academy of Sciences, Kurchatov sq. 2, Moscow 123182, Russia ; Department of Genomics and Human Genetics, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119991, Russia ; Center for Brain Neurobiology and Neurogenetics, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk 630090, Russia
| | - L I Uralsky
- Institute of Molecular Genetics, Russian Academy of Sciences, Kurchatov sq. 2, Moscow 123182, Russia ; Center for Brain Neurobiology and Neurogenetics, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk 630090, Russia
| | - A A Alexandrov
- Institute of Molecular Genetics, Russian Academy of Sciences, Kurchatov sq. 2, Moscow 123182, Russia
| | - Y B Yurov
- Research Center of Mental Health, Russian Academy of Medical Sciences, Zagorodnoe sh. 2, Moscow 113152, Russia
| | - E I Rogaev
- Department of Genomics and Human Genetics, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119991, Russia ; Center for Brain Neurobiology and Neurogenetics, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk 630090, Russia ; Department of Psychiatry, Brudnick Neuropsychiatric Research Institute, University of Massachusetts Medical School, Worcester, MA 01604, USA ; Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow 119234, Russia
| | - I A Alexandrov
- Research Center of Mental Health, Russian Academy of Medical Sciences, Zagorodnoe sh. 2, Moscow 113152, Russia
| |
Collapse
|
43
|
Sujiwattanarat P, Thapana W, Srikulnath K, Hirai Y, Hirai H, Koga A. Higher-order repeat structure in alpha satellite DNA occurs in New World monkeys and is not confined to hominoids. Sci Rep 2015; 5:10315. [PMID: 25974220 PMCID: PMC4431391 DOI: 10.1038/srep10315] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2014] [Accepted: 03/25/2015] [Indexed: 11/17/2022] Open
Abstract
Centromeres usually contain large amounts of tandem repeat DNA. Alpha satellite DNA (AS) is the most abundant tandem repeat DNA found in the centromeres of simian primates. The AS of humans contains sequences organized into higher-order repeat (HOR) structures, which are tandem arrays of larger repeat units consisting of multiple basic repeat units. HOR-carrying AS also occurs in other hominoids, but results reported to date for phylogenetically more remote taxa have been negative. Here we show direct evidence for clear HOR structures in AS of the owl monkey and common marmoset. These monkeys are New World monkey species that are located phylogenetically outside of hominoids. It is currently postulated that the presence of HOR structures in AS is unique to hominoids. Our results suggest that this view must be modified. A plausible explanation is that generation of HOR structures is a general event that occurs occasionally or frequently in primate centromeres, and that, in humans, HOR-carrying AS became predominant in the central region of the centromere. It is often difficult to assemble sequence reads of tandem repeat DNAs into accurate contig sequences; our careful sequencing strategy allowed us to overcome this problem.
Collapse
Affiliation(s)
- Penporn Sujiwattanarat
- 1] Primate Research Institute, Kyoto University, Inuyama City 484-8506, Japan [2] Faculty of Science, Kasetsart University, Bangkok 10900, Thailand
| | - Watcharaporn Thapana
- 1] Primate Research Institute, Kyoto University, Inuyama City 484-8506, Japan [2] Faculty of Science, Kasetsart University, Bangkok 10900, Thailand
| | | | - Yuriko Hirai
- Primate Research Institute, Kyoto University, Inuyama City 484-8506, Japan
| | - Hirohisa Hirai
- Primate Research Institute, Kyoto University, Inuyama City 484-8506, Japan
| | - Akihiko Koga
- Primate Research Institute, Kyoto University, Inuyama City 484-8506, Japan
| |
Collapse
|
44
|
Altemose N, Miga KH, Maggioni M, Willard HF. Genomic characterization of large heterochromatic gaps in the human genome assembly. PLoS Comput Biol 2014; 10:e1003628. [PMID: 24831296 PMCID: PMC4022460 DOI: 10.1371/journal.pcbi.1003628] [Citation(s) in RCA: 79] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2013] [Accepted: 03/26/2014] [Indexed: 01/24/2023] Open
Abstract
The largest gaps in the human genome assembly correspond to multi-megabase heterochromatic regions composed primarily of two related families of tandem repeats, Human Satellites 2 and 3 (HSat2,3). The abundance of repetitive DNA in these regions challenges standard mapping and assembly algorithms, and as a result, the sequence composition and potential biological functions of these regions remain largely unexplored. Furthermore, existing genomic tools designed to predict consensus-based descriptions of repeat families cannot be readily applied to complex satellite repeats such as HSat2,3, which lack a consistent repeat unit reference sequence. Here we present an alignment-free method to characterize complex satellites using whole-genome shotgun read datasets. Utilizing this approach, we classify HSat2,3 sequences into fourteen subfamilies and predict their chromosomal distributions, resulting in a comprehensive satellite reference database to further enable genomic studies of heterochromatic regions. We also identify 1.3 Mb of non-repetitive sequence interspersed with HSat2,3 across 17 unmapped assembly scaffolds, including eight annotated gene predictions. Finally, we apply our satellite reference database to high-throughput sequence data from 396 males to estimate array size variation of the predominant HSat3 array on the Y chromosome, confirming that satellite array sizes can vary between individuals over an order of magnitude (7 to 98 Mb) and further demonstrating that array sizes are distributed differently within distinct Y haplogroups. In summary, we present a novel framework for generating initial reference databases for unassembled genomic regions enriched with complex satellite DNA, and we further demonstrate the utility of these reference databases for studying patterns of sequence variation within human populations. At least 5–10% of the human genome remains unassembled, unmapped, and poorly characterized. The reference assembly annotates these missing regions as multi-megabase heterochromatic gaps, found primarily near centromeres and on the short arms of the acrocentric chromosomes. This missing fraction of the genome consists predominantly of long arrays of near-identical tandem repeats called satellite DNA. Due to the repetitive nature of satellite DNA, sequence assembly algorithms cannot uniquely align overlapping sequence reads, and thus satellite-rich domains have been omitted from the reference assembly and from most genome-wide studies of variation and function. Existing methods for analyzing some satellite DNAs cannot be easily extended to a large portion of satellites whose repeat structures are complex and largely uncharacterized, such as Human Satellites 2 and 3 (HSat2,3). Here we characterize HSat2,3 using a novel approach that does not depend on having a well-defined repeat structure. By classifying genome-wide HSat2,3 sequences into subfamilies and localizing them to chromosomes, we have generated an initial HSat2,3 genomic reference, which serves as a critical foundation for future studies of variation and function in these regions. This approach should be generally applicable to other classes of satellite DNA, in both the human genome and other complex genomes.
Collapse
Affiliation(s)
- Nicolas Altemose
- Genome Biology Group, Duke Institute for Genome Sciences & Policy, Duke University, Durham, North Carolina, United States of America
| | - Karen H. Miga
- Genome Biology Group, Duke Institute for Genome Sciences & Policy, Duke University, Durham, North Carolina, United States of America
- * E-mail:
| | - Mauro Maggioni
- Department of Mathematics, Duke University, Durham, North Carolina, United States of America
| | - Huntington F. Willard
- Genome Biology Group, Duke Institute for Genome Sciences & Policy, Duke University, Durham, North Carolina, United States of America
| |
Collapse
|
45
|
Abstract
The centromere is the chromosomal locus essential for chromosome inheritance and genome stability. Human centromeres are located at repetitive alpha satellite DNA arrays that compose approximately 5% of the genome. Contiguous alpha satellite DNA sequence is absent from the assembled reference genome, limiting current understanding of centromere organization and function. Here, we review the progress in centromere genomics spanning the discovery of the sequence to its molecular characterization and the work done during the Human Genome Project era to elucidate alpha satellite structure and sequence variation. We discuss exciting recent advances in alpha satellite sequence assembly that have provided important insight into the abundance and complex organization of this sequence on human chromosomes. In light of these new findings, we offer perspectives for future studies of human centromere assembly and function.
Collapse
Affiliation(s)
- Megan E. Aldrup-MacDonald
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, NC 27710, USA; E-Mail:
- Division of Human Genetics, Duke University, Durham, NC 27710, USA
| | - Beth A. Sullivan
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, NC 27710, USA; E-Mail:
- Division of Human Genetics, Duke University, Durham, NC 27710, USA
- Author to whom correspondence should be addressed; E-Mail: ; Tel.: +1-919-684-9038
| |
Collapse
|
46
|
Higher-order repeat structure in alpha satellite DNA is an attribute of hominoids rather than hominids. J Hum Genet 2013; 58:752-4. [PMID: 23945983 DOI: 10.1038/jhg.2013.87] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2013] [Revised: 07/16/2013] [Accepted: 07/24/2013] [Indexed: 11/08/2022]
Abstract
Alpha satellite DNA (AS), a major DNA component of primate centromeres, is composed of a tandem array of repeat units of approximately 170 bp. The AS of hominids (family Hominidae; humans and great apes) includes sequences organized into higher-order repeat (HOR) structures, with a periodic appearance of multiple copies of the basic repeat units. Here, we identified an HOR in AS of the siamang, a small ape phylogenetically distinct from hominids but included in hominoids (superfamily Hominoidea). We sequenced long stretches of genomic DNA, and found a repetition of blocks consisting of six and four basic repeat units. Thus, AS organization into HOR is an attribute of hominoids, rather than, as currently postulated, hominids. In addition to centromeres, siamangs carry AS in terminal heterochromatin blocks, and it cannot be determined at present whether these HOR-containing AS sequences originate from the centromere or from the terminal heterochromatin. Even if the latter is the case, these sequences might affect the composition of centromeric AS by being transferred to the centromere.
Collapse
|
47
|
Meštrović N, Pavlek M, Car A, Castagnone-Sereno P, Abad P, Plohl M. Conserved DNA Motifs, Including the CENP-B Box-like, Are Possible Promoters of Satellite DNA Array Rearrangements in Nematodes. PLoS One 2013; 8:e67328. [PMID: 23826269 PMCID: PMC3694981 DOI: 10.1371/journal.pone.0067328] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2013] [Accepted: 05/17/2013] [Indexed: 12/27/2022] Open
Abstract
Tandemly arrayed non-coding sequences or satellite DNAs (satDNAs) are rapidly evolving segments of eukaryotic genomes, including the centromere, and may raise a genetic barrier that leads to speciation. However, determinants and mechanisms of satDNA sequence dynamics are only partially understood. Sequence analyses of a library of five satDNAs common to the root-knot nematodes Meloidogyne chitwoodi and M. fallax together with a satDNA, which is specific for M. chitwoodi only revealed low sequence identity (32-64%) among them. However, despite sequence differences, two conserved motifs were recovered. One of them turned out to be highly similar to the CENP-B box of human alpha satDNA, identical in 10-12 out of 17 nucleotides. In addition, organization of nematode satDNAs was comparable to that found in alpha satDNA of human and primates, characterized by monomers concurrently arranged in simple and higher-order repeat (HOR) arrays. In contrast to alpha satDNA, phylogenetic clustering of nematode satDNA monomers extracted either from simple or from HOR array indicated frequent shuffling between these two organizational forms. Comparison of homogeneous simple arrays and complex HORs composed of different satDNAs, enabled, for the first time, the identification of conserved motifs as obligatory components of monomer junctions. This observation highlights the role of short motifs in rearrangements, even among highly divergent sequences. Two mechanisms are proposed to be involved in this process, i.e., putative transposition-related cut-and-paste insertions and/or illegitimate recombination. Possibility for involvement of the nematode CENP-B box-like sequence in the transposition-related mechanism and together with previously established similarity of the human CENP-B protein and pogo-like transposases implicate a novel role of the CENP-B box and related sequence motifs in addition to the known function in centromere protein binding.
Collapse
Affiliation(s)
- Nevenka Meštrović
- Department of Molecular Biology, Rudjer Bošković Institute, Zagreb, Croatia
| | | | | | | | | | | |
Collapse
|
48
|
Prakhongcheep O, Hirai Y, Hara T, Srikulnath K, Hirai H, Koga A. Two types of alpha satellite DNA in distinct chromosomal locations in Azara's owl monkey. DNA Res 2013; 20:235-40. [PMID: 23477842 PMCID: PMC3686428 DOI: 10.1093/dnares/dst004] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2012] [Accepted: 02/15/2013] [Indexed: 11/13/2022] Open
Abstract
Alpha satellite DNA is a repetitive sequence known to be a major DNA component of centromeres in primates (order Primates). New World monkeys form one major taxon (parvorder Platyrrhini) of primates, and their alpha satellite DNA is known to comprise repeat units of around 340 bp. In one species (Azara's owl monkey Aotus azarae) of this taxon, we identified two types of alpha satellite DNA consisting of 185- and 344-bp repeat units that we designated as OwlAlp1 and OwlAlp2, respectively. OwlAlp2 exhibits similarity throughout its entire sequence to the alpha satellite DNA of other New World monkeys. The chromosomal locations of the two types of sequence are markedly distinct: OwlAlp1 was observed at the centromeric constrictions, whereas OwlAlp2 was found in the pericentric regions. From these results, we inferred that OwlAlp1 was derived from OwlAlp2 and rapidly replaced OwlAlp2 as the principal alpha satellite DNA on a short time scale at the speciation level. A less likely alternative explanation is also discussed.
Collapse
Affiliation(s)
- Ornjira Prakhongcheep
- Primate Research Institute, Kyoto University, Inuyama City 484-8506, Japan
- Department of Genetics, Faculty of Science, Kasetsart University, Bangkok 10900, Thailand
| | - Yuriko Hirai
- Primate Research Institute, Kyoto University, Inuyama City 484-8506, Japan
| | - Toru Hara
- Primate Research Institute, Kyoto University, Inuyama City 484-8506, Japan
| | - Kornsorn Srikulnath
- Department of Genetics, Faculty of Science, Kasetsart University, Bangkok 10900, Thailand
| | - Hirohisa Hirai
- Primate Research Institute, Kyoto University, Inuyama City 484-8506, Japan
| | - Akihiko Koga
- Primate Research Institute, Kyoto University, Inuyama City 484-8506, Japan
| |
Collapse
|
49
|
Podgornaya O, Gavrilova E, Stephanova V, Demin S, Komissarov A. Large tandem repeats make up the chromosome bar code: a hypothesis. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2013; 90:1-30. [PMID: 23582200 DOI: 10.1016/b978-0-12-410523-2.00001-8] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Much of tandem repeats' functional nature in any genome remains enigmatic because there are only few tools available for dissecting and elucidating the functions of repeated DNA. The large tandem repeat arrays (satellite DNA) found in two mouse whole-genome shotgun assemblies were classified into 4 superfamilies, 8 families, and 62 subfamilies. With the simplified variant of chromosome positioning of different tandem repeats, we noticed the nonuniform distribution instead of the positions reported for mouse major and minor satellites. It is visible that each chromosome possesses a kind of unique code made up of different large tandem repeats. The reference genomes allow marking only internal tandem repeats, and even with such a limited data, the colored "bar code" made up of tandem repeats is visible. We suppose that tandem repeats bare the mechanism for chromosomes to recognize the regions to be associated. The associations, initially established via RNA, become fixed by histone modifications (the histone or chromatin code) and specific proteins. In such a way, associations, being at the beginning flexible and regulated, that is, adjustable, appear as irreversible and inheritable in cell generations. Tandem repeat multiformity tunes the developed nuclei 3D pattern by sequential steps of associations. Tandem repeats-based chromosome bar code could be the carrier of the genome structural information; that is, the order of precise tandem repeat association is the DNA morphogenetic program. Tandem repeats are the cores of the distinct 3D structures postulated in "gene gating" hypothesis.
Collapse
|
50
|
Abstract
Two distinct classes of repetitive sequences, interspersed mobile elements and satellite DNAs, shape eukaryotic genomes and drive their evolution. Short arrays of tandem repeats can also be present within nonautonomous miniature inverted repeat transposable elements (MITEs). In the clam Donax trunculus, we characterized a composite, high copy number MITE, named DTC84. It is composed of a central region built of up to five core repeats linked to a microsatellite segment at one array end and flanked by sequences holding short inverted repeats. The modular composition and the conserved putative target site duplication sequence AA at the element termini are equivalent to the composition of several elements found in the cupped oyster Crassostrea virginica and in some insects. A unique feature of D. trunculus element is ordered array of core repeat variants, distinctive by diagnostic changes. Position of variants in the array is fixed, regardless of alterations in the core repeat copy number. Each repeat harbors a palindrome near the junction with the following unit, being a potential hotspot responsible for array length variations. As a consequence, variations in number of tandem repeats and variations in flanking sequences make every sequenced element unique. Core repeats may be thus considered as individual units within the MITE, with flanking sequences representing a "cassette" for internal repeats. Our results demonstrate that onset and spread of tandem repeats can be more intimately linked to processes of transposition than previously thought and suggest that genomes are shaped by interplays within a complex network of repetitive sequences.
Collapse
Affiliation(s)
- Eva Šatović
- Division of Molecular Biology, Ruđer Bošković Institute, Zagreb, Croatia
| | - Miroslav Plohl
- Division of Molecular Biology, Ruđer Bošković Institute, Zagreb, Croatia
| |
Collapse
|