1
|
Macas J, Ávila Robledillo L, Kreplak J, Novák P, Koblížková A, Vrbová I, Burstin J, Neumann P. Assembly of the 81.6 Mb centromere of pea chromosome 6 elucidates the structure and evolution of metapolycentric chromosomes. PLoS Genet 2023; 19:e1010633. [PMID: 36735726 PMCID: PMC10027222 DOI: 10.1371/journal.pgen.1010633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 03/20/2023] [Accepted: 01/23/2023] [Indexed: 02/04/2023] Open
Abstract
Centromeres in the legume genera Pisum and Lathyrus exhibit unique morphological characteristics, including extended primary constrictions and multiple separate domains of centromeric chromatin. These so-called metapolycentromeres resemble an intermediate form between monocentric and holocentric types, and therefore provide a great opportunity for studying the transitions between different types of centromere organizations. However, because of the exceedingly large and highly repetitive nature of metapolycentromeres, highly contiguous assemblies needed for these studies are lacking. Here, we report on the assembly and analysis of a 177.6 Mb region of pea (Pisum sativum) chromosome 6, including the 81.6 Mb centromere region (CEN6) and adjacent chromosome arms. Genes, DNA methylation profiles, and most of the repeats were uniformly distributed within the centromere, and their densities in CEN6 and chromosome arms were similar. The exception was an accumulation of satellite DNA in CEN6, where it formed multiple arrays up to 2 Mb in length. Centromeric chromatin, characterized by the presence of the CENH3 protein, was predominantly associated with arrays of three different satellite repeats; however, five other satellites present in CEN6 lacked CENH3. The presence of CENH3 chromatin was found to determine the spatial distribution of the respective satellites during the cell cycle. Finally, oligo-FISH painting experiments, performed using probes specifically designed to label the genomic regions corresponding to CEN6 in Pisum, Lathyrus, and Vicia species, revealed that metapolycentromeres evolved via the expansion of centromeric chromatin into neighboring chromosomal regions and the accumulation of novel satellite repeats. However, in some of these species, centromere evolution also involved chromosomal translocations and centromere repositioning.
Collapse
Affiliation(s)
- Jiří Macas
- Biology Centre, Czech Academy of Sciences, Institute of Plant Molecular Biology, Branišovská 31, České Budějovice, Czech Republic
| | - Laura Ávila Robledillo
- Biology Centre, Czech Academy of Sciences, Institute of Plant Molecular Biology, Branišovská 31, České Budějovice, Czech Republic
| | - Jonathan Kreplak
- Agroécologie, AgroSup Dijon, INRA, Univ. Bourgogne, Univ. Bourgogne Franche-Comté, Dijon, France
| | - Petr Novák
- Biology Centre, Czech Academy of Sciences, Institute of Plant Molecular Biology, Branišovská 31, České Budějovice, Czech Republic
| | - Andrea Koblížková
- Biology Centre, Czech Academy of Sciences, Institute of Plant Molecular Biology, Branišovská 31, České Budějovice, Czech Republic
| | - Iva Vrbová
- Biology Centre, Czech Academy of Sciences, Institute of Plant Molecular Biology, Branišovská 31, České Budějovice, Czech Republic
| | - Judith Burstin
- Agroécologie, AgroSup Dijon, INRA, Univ. Bourgogne, Univ. Bourgogne Franche-Comté, Dijon, France
| | - Pavel Neumann
- Biology Centre, Czech Academy of Sciences, Institute of Plant Molecular Biology, Branišovská 31, České Budějovice, Czech Republic
| |
Collapse
|
2
|
Mola LM, Vrbová I, Tosto DS, Zrzavá M, Marec F. On the Origin of Neo-Sex Chromosomes in the Neotropical Dragonflies Rhionaeschna bonariensis and R. planaltica (Aeshnidae, Odonata). Insects 2022; 13:1159. [PMID: 36555069 PMCID: PMC9784284 DOI: 10.3390/insects13121159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 12/06/2022] [Accepted: 12/11/2022] [Indexed: 06/17/2023]
Abstract
Odonata have holokinetic chromosomes. About 95% of species have an XX/X0 sex chromosome system, with heterogametic males. There are species with neo-XX/neo-XY sex chromosomes resulting from an X chromosome/autosome fusion. The genus Rhionaeschna includes 42 species found in the Americas. We analyzed the distribution of the nucleolar organizer region (NOR) using FISH with rDNA probes in Rhionaeschna bonariensis (n = 12 + neo-XY), R. planaltica (n = 7 + neo-XY), and Aeshna cyanea (n = 13 + X0). In R. bonariensis and A. cyanea, the NOR is located on a large pair of autosomes, which have a secondary constriction in the latter species. In R. planaltica, the NOR is located on the ancestral part of the neo-X chromosome. Meiotic analysis and FISH results in R. planaltica led to the conclusion that the neo-XY system arose by insertion of the ancestral X chromosome into an autosome. Genomic in situ hybridization, performed for the first time in Odonata, highlighted the entire neo-Y chromosome in meiosis of R. bonariensis, suggesting that it consists mainly of repetitive DNA. This feature and the terminal chiasma localization suggest an ancient origin of the neo-XY system. Our study provides new information on the origin and evolution of neo-sex chromosomes in Odonata, including new types of chromosomal rearrangements, NOR transposition, and heterochromatin accumulation.
Collapse
Affiliation(s)
- Liliana M. Mola
- Laboratory of Cytogenetics and Evolution, Faculty of Exact and Natural Sciences, University of Buenos Aires, Buenos Aires C1428EGA, Argentina
- Institute of Ecology, Genetics and Evolution of Buenos Aires, National Council of Scientific and Technical Research, Buenos Aires C1428EGA, Argentina
| | - Iva Vrbová
- Biology Centre CAS, Institute of Entomology, Branišovská 31, 370 05 České Budějovice, Czech Republic
- Biology Centre CAS, Institute of Plant Molecular Biology, Branišovská 31, 370 05 České Budějovice, Czech Republic
| | - Daniela S. Tosto
- Instituto de Agrobiotecnología y Biología Molecular (IABIMO), Instituto Nacional de Tecnología Agropecuaria (INTA), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Hurlingham, Buenos Aires 1686, Argentina
| | - Magda Zrzavá
- Biology Centre CAS, Institute of Entomology, Branišovská 31, 370 05 České Budějovice, Czech Republic
- Department of Molecular Biology and Genetics, Faculty of Science, University of South Bohemia, Branišovská 1760, 370 05 České Budějovice, Czech Republic
| | - František Marec
- Biology Centre CAS, Institute of Entomology, Branišovská 31, 370 05 České Budějovice, Czech Republic
| |
Collapse
|
3
|
Ávila Robledillo L, Neumann P, Koblížková A, Novák P, Vrbová I, Macas J. Extraordinary Sequence Diversity and Promiscuity of Centromeric Satellites in the Legume Tribe Fabeae. Mol Biol Evol 2020; 37:2341-2356. [PMID: 32259249 PMCID: PMC7403623 DOI: 10.1093/molbev/msaa090] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Satellite repeats are major sequence constituents of centromeres in many plant and animal species. Within a species, a single family of satellite sequences typically occupies centromeres of all chromosomes and is absent from other parts of the genome. Due to their common origin, sequence similarities exist among the centromere-specific satellites in related species. Here, we report a remarkably different pattern of centromere evolution in the plant tribe Fabeae, which includes genera Pisum, Lathyrus, Vicia, and Lens. By immunoprecipitation of centromeric chromatin with CENH3 antibodies, we identified and characterized a large and diverse set of 64 families of centromeric satellites in 14 species. These families differed in their nucleotide sequence, monomer length (33-2,979 bp), and abundance in individual species. Most families were species-specific, and most species possessed multiple (2-12) satellites in their centromeres. Some of the repeats that were shared by several species exhibited promiscuous patterns of centromere association, being located within CENH3 chromatin in some species, but apart from the centromeres in others. Moreover, FISH experiments revealed that the same family could assume centromeric and noncentromeric positions even within a single species. Taken together, these findings suggest that Fabeae centromeres are not shaped by the coevolution of a single centromeric satellite with its interacting CENH3 proteins, as proposed by the centromere drive model. This conclusion is also supported by the absence of pervasive adaptive evolution of CENH3 sequences retrieved from Fabeae species.
Collapse
Affiliation(s)
- Laura Ávila Robledillo
- Biology Centre, Czech Academy of Sciences, České Budějovice, Czech Republic
- Faculty of Science, University of South Bohemia, České Budějovice, Czech Republic
| | - Pavel Neumann
- Biology Centre, Czech Academy of Sciences, České Budějovice, Czech Republic
| | - Andrea Koblížková
- Biology Centre, Czech Academy of Sciences, České Budějovice, Czech Republic
| | - Petr Novák
- Biology Centre, Czech Academy of Sciences, České Budějovice, Czech Republic
| | - Iva Vrbová
- Biology Centre, Czech Academy of Sciences, České Budějovice, Czech Republic
| | - Jiří Macas
- Biology Centre, Czech Academy of Sciences, České Budějovice, Czech Republic
| |
Collapse
|
4
|
Kreplak J, Madoui MA, Cápal P, Novák P, Labadie K, Aubert G, Bayer PE, Gali KK, Syme RA, Main D, Klein A, Bérard A, Vrbová I, Fournier C, d'Agata L, Belser C, Berrabah W, Toegelová H, Milec Z, Vrána J, Lee H, Kougbeadjo A, Térézol M, Huneau C, Turo CJ, Mohellibi N, Neumann P, Falque M, Gallardo K, McGee R, Tar'an B, Bendahmane A, Aury JM, Batley J, Le Paslier MC, Ellis N, Warkentin TD, Coyne CJ, Salse J, Edwards D, Lichtenzveig J, Macas J, Doležel J, Wincker P, Burstin J. A reference genome for pea provides insight into legume genome evolution. Nat Genet 2019; 51:1411-1422. [PMID: 31477930 DOI: 10.1038/s41588-019-0480-1] [Citation(s) in RCA: 230] [Impact Index Per Article: 46.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2018] [Accepted: 07/10/2019] [Indexed: 02/03/2023]
Abstract
We report the first annotated chromosome-level reference genome assembly for pea, Gregor Mendel's original genetic model. Phylogenetics and paleogenomics show genomic rearrangements across legumes and suggest a major role for repetitive elements in pea genome evolution. Compared to other sequenced Leguminosae genomes, the pea genome shows intense gene dynamics, most likely associated with genome size expansion when the Fabeae diverged from its sister tribes. During Pisum evolution, translocation and transposition differentially occurred across lineages. This reference sequence will accelerate our understanding of the molecular basis of agronomically important traits and support crop improvement.
Collapse
Affiliation(s)
- Jonathan Kreplak
- Agroécologie, AgroSup Dijon, INRA, Université Bourgogne Franche-Comté Bourgogne, Université Bourgogne Franche-Comté, Dijon, France
| | - Mohammed-Amin Madoui
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Université Evry, Université Paris-Saclay, Evry, France
| | - Petr Cápal
- Institute of Experimental Botany, Centre of the Region Haná for Biotechnological and Agricultural Research, Olomouc, Czech Republic
| | - Petr Novák
- Biology Centre, Czech Academy of Sciences, České Budějovice, Czech Republic
| | - Karine Labadie
- Genoscope, Institut François Jacob, CEA, Université Paris-Saclay, Evry, France
| | - Grégoire Aubert
- Agroécologie, AgroSup Dijon, INRA, Université Bourgogne Franche-Comté Bourgogne, Université Bourgogne Franche-Comté, Dijon, France
| | - Philipp E Bayer
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, Western Australia, Australia
| | - Krishna K Gali
- Crop Development Centre/Department of Plant Sciences, University of Saskatchewan, Saskatoon, Saskatchewan, Canada
| | - Robert A Syme
- Centre for Crop and Disease Management, Curtin University, Bentley, Western Australia, Australia
| | - Dorrie Main
- Department of Horticulture, Washington State University, Pullman, WA, USA
| | - Anthony Klein
- Agroécologie, AgroSup Dijon, INRA, Université Bourgogne Franche-Comté Bourgogne, Université Bourgogne Franche-Comté, Dijon, France
| | - Aurélie Bérard
- Etude du Polymorphisme des Génomes Végétaux, INRA, Université Paris-Saclay, Evry, France
| | - Iva Vrbová
- Biology Centre, Czech Academy of Sciences, České Budějovice, Czech Republic
| | - Cyril Fournier
- Agroécologie, AgroSup Dijon, INRA, Université Bourgogne Franche-Comté Bourgogne, Université Bourgogne Franche-Comté, Dijon, France
| | - Leo d'Agata
- Genoscope, Institut François Jacob, CEA, Université Paris-Saclay, Evry, France
| | - Caroline Belser
- Genoscope, Institut François Jacob, CEA, Université Paris-Saclay, Evry, France
| | - Wahiba Berrabah
- Genoscope, Institut François Jacob, CEA, Université Paris-Saclay, Evry, France
| | - Helena Toegelová
- Institute of Experimental Botany, Centre of the Region Haná for Biotechnological and Agricultural Research, Olomouc, Czech Republic
| | - Zbyněk Milec
- Institute of Experimental Botany, Centre of the Region Haná for Biotechnological and Agricultural Research, Olomouc, Czech Republic
| | - Jan Vrána
- Institute of Experimental Botany, Centre of the Region Haná for Biotechnological and Agricultural Research, Olomouc, Czech Republic
| | - HueyTyng Lee
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, Western Australia, Australia
- Department of Plant Breeding, IFZ Research Centre for Biosystems, Land Use and Nutrition, Justus Liebig University, Giessen, Germany
| | - Ayité Kougbeadjo
- Agroécologie, AgroSup Dijon, INRA, Université Bourgogne Franche-Comté Bourgogne, Université Bourgogne Franche-Comté, Dijon, France
| | - Morgane Térézol
- Agroécologie, AgroSup Dijon, INRA, Université Bourgogne Franche-Comté Bourgogne, Université Bourgogne Franche-Comté, Dijon, France
| | - Cécile Huneau
- UMR 1095 Génétique, Diversité, Ecophysiologie des Céréales, INRA, Université Clermont Auvergne, Clermont-Ferrand, France
| | - Chala J Turo
- Centre for Crop and Disease Management, School of Molecular and Life Science, Curtin University, Bentley, Western Australia, Australia
| | | | - Pavel Neumann
- Biology Centre, Czech Academy of Sciences, České Budějovice, Czech Republic
| | - Matthieu Falque
- GQE-Le Moulon, INRA, University of Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, Gif-sur-Yvette, France
| | - Karine Gallardo
- Agroécologie, AgroSup Dijon, INRA, Université Bourgogne Franche-Comté Bourgogne, Université Bourgogne Franche-Comté, Dijon, France
| | - Rebecca McGee
- USDA Agricultural Research Service, Pullman, WA, USA
| | - Bunyamin Tar'an
- Crop Development Centre/Department of Plant Sciences, University of Saskatchewan, Saskatoon, Saskatchewan, Canada
| | - Abdelhafid Bendahmane
- Institute of Plant Sciences Paris-Saclay, INRA, CNRS, University of Paris-Sud, University of Evry, University Paris-Diderot, Sorbonne Paris-Cite, University of Paris-Saclay, Orsay, France
| | - Jean-Marc Aury
- Genoscope, Institut François Jacob, CEA, Université Paris-Saclay, Evry, France
| | - Jacqueline Batley
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, Western Australia, Australia
| | | | - Noel Ellis
- School of Biological Sciences, University of Auckland, Auckland, New Zealand
| | - Thomas D Warkentin
- Crop Development Centre/Department of Plant Sciences, University of Saskatchewan, Saskatoon, Saskatchewan, Canada
| | | | - Jérome Salse
- UMR 1095 Génétique, Diversité, Ecophysiologie des Céréales, INRA, Université Clermont Auvergne, Clermont-Ferrand, France
| | - David Edwards
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, Western Australia, Australia
| | - Judith Lichtenzveig
- School of Agriculture and Environment, University of Western Australia, Perth, Western Australia, Australia
| | - Jiří Macas
- Biology Centre, Czech Academy of Sciences, České Budějovice, Czech Republic
| | - Jaroslav Doležel
- Institute of Experimental Botany, Centre of the Region Haná for Biotechnological and Agricultural Research, Olomouc, Czech Republic
| | - Patrick Wincker
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Université Evry, Université Paris-Saclay, Evry, France
| | - Judith Burstin
- Agroécologie, AgroSup Dijon, INRA, Université Bourgogne Franche-Comté Bourgogne, Université Bourgogne Franche-Comté, Dijon, France.
| |
Collapse
|
5
|
Ávila Robledillo L, Koblížková A, Novák P, Böttinger K, Vrbová I, Neumann P, Schubert I, Macas J. Satellite DNA in Vicia faba is characterized by remarkable diversity in its sequence composition, association with centromeres, and replication timing. Sci Rep 2018; 8:5838. [PMID: 29643436 PMCID: PMC5895790 DOI: 10.1038/s41598-018-24196-3] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2017] [Accepted: 03/28/2018] [Indexed: 11/17/2022] Open
Abstract
Satellite DNA, a class of repetitive sequences forming long arrays of tandemly repeated units, represents substantial portions of many plant genomes yet remains poorly characterized due to various methodological obstacles. Here we show that the genome of the field bean (Vicia faba, 2n = 12), a long-established model for cytogenetic studies in plants, contains a diverse set of satellite repeats, most of which remained concealed until their present investigation. Using next-generation sequencing combined with novel bioinformatics tools, we reconstructed consensus sequences of 23 novel satellite repeats representing 0.008–2.700% of the genome and mapped their distribution on chromosomes. We found that in addition to typical satellites with monomers hundreds of nucleotides long, V. faba contains a large number of satellite repeats with unusually long monomers (687–2033 bp), which are predominantly localized in pericentromeric regions. Using chromatin immunoprecipitation with CenH3 antibody, we revealed an extraordinary diversity of centromeric satellites, consisting of seven repeats with chromosome-specific distribution. We also found that in spite of their different nucleotide sequences, all centromeric repeats are replicated during mid-S phase, while most other satellites are replicated in the first part of late S phase, followed by a single family of FokI repeats representing the latest replicating chromatin.
Collapse
Affiliation(s)
- Laura Ávila Robledillo
- Biology Centre of the Czech Academy of Sciences, Institute of Plant Molecular Biology, České Budějovice, 37005, Czech Republic.,University of South Bohemia, Faculty of Science, České Budějovice, 37005, Czech Republic
| | - Andrea Koblížková
- Biology Centre of the Czech Academy of Sciences, Institute of Plant Molecular Biology, České Budějovice, 37005, Czech Republic
| | - Petr Novák
- Biology Centre of the Czech Academy of Sciences, Institute of Plant Molecular Biology, České Budějovice, 37005, Czech Republic
| | - Katharina Böttinger
- Biology Centre of the Czech Academy of Sciences, Institute of Plant Molecular Biology, České Budějovice, 37005, Czech Republic.,University of South Bohemia, Faculty of Science, České Budějovice, 37005, Czech Republic
| | - Iva Vrbová
- Biology Centre of the Czech Academy of Sciences, Institute of Plant Molecular Biology, České Budějovice, 37005, Czech Republic
| | - Pavel Neumann
- Biology Centre of the Czech Academy of Sciences, Institute of Plant Molecular Biology, České Budějovice, 37005, Czech Republic
| | - Ingo Schubert
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), 06466, Gatersleben, Stadt Seeland, Germany
| | - Jiří Macas
- Biology Centre of the Czech Academy of Sciences, Institute of Plant Molecular Biology, České Budějovice, 37005, Czech Republic.
| |
Collapse
|
6
|
Novák P, Ávila Robledillo L, Koblížková A, Vrbová I, Neumann P, Macas J. TAREAN: a computational tool for identification and characterization of satellite DNA from unassembled short reads. Nucleic Acids Res 2017. [PMID: 28402514 DOI: 10.1093/nar/gkx257.] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Satellite DNA is one of the major classes of repetitive DNA, characterized by tandemly arranged repeat copies that form contiguous arrays up to megabases in length. This type of genomic organization makes satellite DNA difficult to assemble, which hampers characterization of satellite sequences by computational analysis of genomic contigs. Here, we present tandem repeat analyzer (TAREAN), a novel computational pipeline that circumvents this problem by detecting satellite repeats directly from unassembled short reads. The pipeline first employs graph-based sequence clustering to identify groups of reads that represent repetitive elements. Putative satellite repeats are subsequently detected by the presence of circular structures in their cluster graphs. Consensus sequences of repeat monomers are then reconstructed from the most frequent k-mers obtained by decomposing read sequences from corresponding clusters. The pipeline performance was successfully validated by analyzing low-pass genome sequencing data from five plant species where satellite DNA was previously experimentally characterized. Moreover, novel satellite repeats were predicted for the genome of Vicia faba and three of these repeats were verified by detecting their sequences on metaphase chromosomes using fluorescence in situ hybridization.
Collapse
Affiliation(s)
- Petr Novák
- Institute of Plant Molecular Biology, Biology Centre CAS, Ceské Budejovice CZ-37005, Czech Republic
| | - Laura Ávila Robledillo
- Institute of Plant Molecular Biology, Biology Centre CAS, Ceské Budejovice CZ-37005, Czech Republic
| | - Andrea Koblížková
- Institute of Plant Molecular Biology, Biology Centre CAS, Ceské Budejovice CZ-37005, Czech Republic
| | - Iva Vrbová
- Institute of Plant Molecular Biology, Biology Centre CAS, Ceské Budejovice CZ-37005, Czech Republic
| | - Pavel Neumann
- Institute of Plant Molecular Biology, Biology Centre CAS, Ceské Budejovice CZ-37005, Czech Republic
| | - Jirí Macas
- Institute of Plant Molecular Biology, Biology Centre CAS, Ceské Budejovice CZ-37005, Czech Republic
| |
Collapse
|
7
|
Novák P, Ávila Robledillo L, Koblížková A, Vrbová I, Neumann P, Macas J. TAREAN: a computational tool for identification and characterization of satellite DNA from unassembled short reads. Nucleic Acids Res 2017; 45:e111. [PMID: 28402514 PMCID: PMC5499541 DOI: 10.1093/nar/gkx257] [Citation(s) in RCA: 164] [Impact Index Per Article: 23.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2017] [Revised: 03/23/2017] [Accepted: 04/04/2017] [Indexed: 12/21/2022] Open
Abstract
Satellite DNA is one of the major classes of repetitive DNA, characterized by tandemly arranged repeat copies that form contiguous arrays up to megabases in length. This type of genomic organization makes satellite DNA difficult to assemble, which hampers characterization of satellite sequences by computational analysis of genomic contigs. Here, we present tandem repeat analyzer (TAREAN), a novel computational pipeline that circumvents this problem by detecting satellite repeats directly from unassembled short reads. The pipeline first employs graph-based sequence clustering to identify groups of reads that represent repetitive elements. Putative satellite repeats are subsequently detected by the presence of circular structures in their cluster graphs. Consensus sequences of repeat monomers are then reconstructed from the most frequent k-mers obtained by decomposing read sequences from corresponding clusters. The pipeline performance was successfully validated by analyzing low-pass genome sequencing data from five plant species where satellite DNA was previously experimentally characterized. Moreover, novel satellite repeats were predicted for the genome of Vicia faba and three of these repeats were verified by detecting their sequences on metaphase chromosomes using fluorescence in situ hybridization.
Collapse
Affiliation(s)
- Petr Novák
- Institute of Plant Molecular Biology, Biology Centre CAS, Ceské Budejovice CZ-37005, Czech Republic
| | - Laura Ávila Robledillo
- Institute of Plant Molecular Biology, Biology Centre CAS, Ceské Budejovice CZ-37005, Czech Republic
| | - Andrea Koblížková
- Institute of Plant Molecular Biology, Biology Centre CAS, Ceské Budejovice CZ-37005, Czech Republic
| | - Iva Vrbová
- Institute of Plant Molecular Biology, Biology Centre CAS, Ceské Budejovice CZ-37005, Czech Republic
| | - Pavel Neumann
- Institute of Plant Molecular Biology, Biology Centre CAS, Ceské Budejovice CZ-37005, Czech Republic
| | - Jirí Macas
- Institute of Plant Molecular Biology, Biology Centre CAS, Ceské Budejovice CZ-37005, Czech Republic
| |
Collapse
|