1
|
Geethanjali S, Kadirvel P, Anumalla M, Hemanth Sadhana N, Annamalai A, Ali J. Streamlining of Simple Sequence Repeat Data Mining Methodologies and Pipelines for Crop Scanning. PLANTS (BASEL, SWITZERLAND) 2024; 13:2619. [PMID: 39339594 PMCID: PMC11435353 DOI: 10.3390/plants13182619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Revised: 08/18/2024] [Accepted: 08/29/2024] [Indexed: 09/30/2024]
Abstract
Genetic markers are powerful tools for understanding genetic diversity and the molecular basis of traits, ushering in a new era of molecular breeding in crops. Over the past 50 years, DNA markers have rapidly changed, moving from hybridization-based and second-generation-based to sequence-based markers. Simple sequence repeats (SSRs) are the ideal markers in plant breeding, and they have numerous desirable properties, including their repeatability, codominance, multi-allelic nature, and locus specificity. They can be generated from any species, which requires prior sequence knowledge. SSRs may serve as evolutionary tuning knobs, allowing for rapid identification and adaptation to new circumstances. The evaluations published thus far have mostly ignored SSR polymorphism and gene evolution due to a lack of data regarding the precise placements of SSRs on chromosomes. However, NGS technologies have made it possible to produce high-throughput SSRs for any species using massive volumes of genomic sequence data that can be generated fast and at a minimal cost. Though SNP markers are gradually replacing the erstwhile DNA marker systems, SSRs remain the markers of choice in orphan crops due to the lack of genomic resources at the reference level and their adaptability to resource-limited labor. Several bioinformatic approaches and tools have evolved to handle genomic sequences to identify SSRs and generate primers for genotyping applications in plant breeding projects. This paper includes the currently available methodologies for producing SSR markers, genomic resource databases, and computational tools/pipelines for SSR data mining and primer generation. This review aims to provide a 'one-stop shop' of information to help each new user carefully select tools for identifying and utilizing SSRs in genetic research and breeding programs.
Collapse
Affiliation(s)
- Subramaniam Geethanjali
- Department of Plant Biotechnology, Centre for Plant Molecular Biology and Biotechnology, Tamil Nadu Agricultural University, Coimbatore 641003, India
| | - Palchamy Kadirvel
- Crop Improvement Section, ICAR-Indian Institute of Oilseeds Research, Rajendranagar, Hyderabad 500030, India
| | - Mahender Anumalla
- Rice Breeding Innovation Platform, International Rice Research Institute (IRRI), Los Baños 4031, Laguna, Philippines
- IRRI South Asia Hub, Patancheru, Hyderabad 502324, India
| | - Nithyananth Hemanth Sadhana
- Department of Plant Biotechnology, Centre for Plant Molecular Biology and Biotechnology, Tamil Nadu Agricultural University, Coimbatore 641003, India
| | - Anandan Annamalai
- Indian Council of Agricultural Research (ICAR), Indian Institute of Seed Science, Bengaluru 560065, India
| | - Jauhar Ali
- Rice Breeding Innovation Platform, International Rice Research Institute (IRRI), Los Baños 4031, Laguna, Philippines
| |
Collapse
|
2
|
Aidley J, Wanford JJ, Green LR, Sheppard SK, Bayliss CD. PhasomeIt: an 'omics' approach to cataloguing the potential breadth of phase variation in the genus Campylobacter. Microb Genom 2018; 4:e000228. [PMID: 30351264 PMCID: PMC6321876 DOI: 10.1099/mgen.0.000228] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2018] [Accepted: 09/13/2018] [Indexed: 11/18/2022] Open
Abstract
Hypermutable simple sequence repeats (SSRs) are drivers of phase variation (PV) whose stochastic, high-frequency, reversible switches in gene expression are a common feature of several pathogenic bacterial species, including the human pathogen Campylobacter jejuni. Here we examine the distribution and conservation of known and putative SSR-driven phase variable genes - the phasome - in the genus Campylobacter. PhasomeIt, a new program, was specifically designed for rapid identification of SSR-mediated PV. This program detects the location, type and repeat number of every SSR. Each SSR is linked to a specific gene and its putative expression state. Other outputs include conservation of SSR-driven phase-variable genes and the 'core phasome' - the minimal set of PV genes in a phylogenetic grouping. Analysis of 77 complete Campylobacter genome sequences detected a 'core phasome' of conserved PV genes in each species and a large number of rare PV genes with few, or no, homologues in other genome sequences. Analysis of a set of partial genome sequences, with food-chain-associated metadata, detected evidence of a weak link between phasome and source host for disease-causing isolates of sequence type (ST)-828 but not the ST-21 or ST-45 complexes. Investigation of the phasomes in the genus Campylobacter provided evidence of overlapping but distinctive mechanisms of PV-mediated adaptation to specific niches. This suggests that the phasome could be involved in host adaptation and spread of campylobacters. Finally, this tool is malleable and will have utility for studying the distribution and genic effects of other repetitive elements in diverse bacterial species.
Collapse
Affiliation(s)
- Jack Aidley
- Department of Genetics and Genome Biology, University of Leicester, Leicester, UK
| | - Joseph J. Wanford
- Department of Genetics and Genome Biology, University of Leicester, Leicester, UK
| | - Luke R. Green
- Department of Genetics and Genome Biology, University of Leicester, Leicester, UK
| | - Samuel K. Sheppard
- Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, UK
| | | |
Collapse
|
3
|
Complete Chloroplast Genome of the Wollemi Pine (Wollemia nobilis): Structure and Evolution. PLoS One 2015; 10:e0128126. [PMID: 26061691 PMCID: PMC4464890 DOI: 10.1371/journal.pone.0128126] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2015] [Accepted: 04/23/2015] [Indexed: 11/19/2022] Open
Abstract
The Wollemi pine (Wollemia nobilis) is a rare Southern conifer with striking morphological similarity to fossil pines. A small population of W. nobilis was discovered in 1994 in a remote canyon system in the Wollemi National Park (near Sydney, Australia). This population contains fewer than 100 individuals and is critically endangered. Previous genetic studies of the Wollemi pine have investigated its evolutionary relationship with other pines in the family Araucariaceae, and have suggested that the Wollemi pine genome contains little or no variation. However, these studies were performed prior to the widespread use of genome sequencing, and their conclusions were based on a limited fraction of the Wollemi pine genome. In this study, we address this problem by determining the entire sequence of the W. nobilis chloroplast genome. A detailed analysis of the structure of the genome is presented, and the evolution of the genome is inferred by comparison with the chloroplast sequences of other members of the Araucariaceae and the related family Podocarpaceae. Pairwise alignments of whole genome sequences, and the presence of unique pseudogenes, gene duplications and insertions in W. nobilis and Araucariaceae, indicate that the W. nobilis chloroplast genome is most similar to that of its sister taxon Agathis. However, the W. nobilis genome contains an unusually high number of repetitive sequences, and these could be used in future studies to investigate and conserve any remnant genetic diversity in the Wollemi pine.
Collapse
|
4
|
Anisimova M, Pečerska J, Schaper E. Statistical approaches to detecting and analyzing tandem repeats in genomic sequences. Front Bioeng Biotechnol 2015; 3:31. [PMID: 25853125 PMCID: PMC4362331 DOI: 10.3389/fbioe.2015.00031] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2014] [Accepted: 02/26/2015] [Indexed: 11/13/2022] Open
Abstract
Tandem repeats (TRs) are frequently observed in genomes across all domains of life. Evidence suggests that some TRs are crucial for proteins with fundamental biological functions and can be associated with virulence, resistance, and infectious/neurodegenerative diseases. Genome-scale systematic studies of TRs have the potential to unveil core mechanisms governing TR evolution and TR roles in shaping genomes. However, TR-related studies are often non-trivial due to heterogeneous and sometimes fast evolving TR regions. In this review, we discuss these intricacies and their consequences. We present our recent contributions to computational and statistical approaches for TR significance testing, sequence profile-based TR annotation, TR-aware sequence alignment, phylogenetic analyses of TR unit number and order, and TR benchmarks. Importantly, all these methods explicitly rely on the evolutionary definition of a tandem repeat as a sequence of adjacent repeat units stemming from a common ancestor. The discussed work has a focus on protein TRs, yet is generally applicable to nucleic acid TRs, sharing similar features.
Collapse
Affiliation(s)
- Maria Anisimova
- Institute of Applied Simulation, School of Life Sciences and Facility Management, Zürich University of Applied Sciences (ZHAW) , Wädenswil , Switzerland
| | - Julija Pečerska
- Department of Biosystems Science and Engineering, ETH Zürich , Basel , Switzerland ; Department of Computer Science, ETH Zürich , Zürich , Switzerland
| | - Elke Schaper
- Department of Computer Science, ETH Zürich , Zürich , Switzerland ; Vital-IT Competency Center, Swiss Institute for Bioinformatics , Lausanne , Switzerland
| |
Collapse
|
5
|
Simpson MC, Wilken PM, Coetzee MPA, Wingfield MJ, Wingfield BD. Analysis of microsatellite markers in the genome of the plant pathogen Ceratocystis fimbriata. Fungal Biol 2013; 117:545-55. [PMID: 23931120 DOI: 10.1016/j.funbio.2013.06.004] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2012] [Revised: 06/13/2013] [Accepted: 06/17/2013] [Indexed: 01/13/2023]
Abstract
Ceratocystis fimbriata sensu lato represents a complex of cryptic and commonly plant pathogenic species that are morphologically similar. Species in this complex have been described using morphological characteristics, intersterility tests and phylogenetics. Microsatellite markers have been useful to study the population structure and origin of some species in the complex. In this study we sequenced the genome of C. fimbriata. This provided an opportunity to mine the genome for microsatellites, to develop new microsatellite markers, and map previously developed markers onto the genome. Over 6000 microsatellites were identified in the genome and their abundance and distribution was determined. Ceratocystis fimbriata has a medium level of microsatellite density and slightly smaller genome when compared with other fungi for which similar microsatellite analyses have been performed. This is the first report of a microsatellite analysis conducted on a genome sequence of a fungal species in the order Microascales. Forty-seven microsatellite markers have been published for population genetic studies, of which 35 could be mapped onto the C. fimbriata genome sequence. We developed an additional ten microsatellite markers within putative genes to differentiate between species in the C. fimbriata s.l. complex. These markers were used to distinguish between 12 species in the complex.
Collapse
Affiliation(s)
- Melissa C Simpson
- Department of Genetics, Forestry and Agricultural Biotechnology Institute FABI, University of Pretoria, Private Bag X20, Hatfield, Pretoria 0028, South Africa.
| | | | | | | | | |
Collapse
|
6
|
Meglécz E, Nève G, Biffin E, Gardner MG. Breakdown of phylogenetic signal: a survey of microsatellite densities in 454 shotgun sequences from 154 non model eukaryote species. PLoS One 2012; 7:e40861. [PMID: 22815847 PMCID: PMC3397955 DOI: 10.1371/journal.pone.0040861] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2012] [Accepted: 06/14/2012] [Indexed: 11/19/2022] Open
Abstract
Microsatellites are ubiquitous in Eukaryotic genomes. A more complete understanding of their origin and spread can be gained from a comparison of their distribution within a phylogenetic context. Although information for model species is accumulating rapidly, it is insufficient due to a lack of species depth, thus intragroup variation is necessarily ignored. As such, apparent differences between groups may be overinflated and generalizations cannot be inferred until an analysis of the variation that exists within groups has been conducted. In this study, we examined microsatellite coverage and motif patterns from 454 shotgun sequences of 154 Eukaryote species from eight distantly related phyla (Cnidaria, Arthropoda, Onychophora, Bryozoa, Mollusca, Echinodermata, Chordata and Streptophyta) to test if a consistent phylogenetic pattern emerges from the microsatellite composition of these species. It is clear from our results that data from model species provide incomplete information regarding the existing microsatellite variability within the Eukaryotes. A very strong heterogeneity of microsatellite composition was found within most phyla, classes and even orders. Autocorrelation analyses indicated that while microsatellite contents of species within clades more recent than 200 Mya tend to be similar, the autocorrelation breaks down and becomes negative or non-significant with increasing divergence time. Therefore, the age of the taxon seems to be a primary factor in degrading the phylogenetic pattern present among related groups. The most recent classes or orders of Chordates still retain the pattern of their common ancestor. However, within older groups, such as classes of Arthropods, the phylogenetic pattern has been scrambled by the long independent evolution of the lineages.
Collapse
Affiliation(s)
- Emese Meglécz
- IMBE UMR 7263 CNRS IRD, Aix-Marseille University, Marseille, France.
| | | | | | | |
Collapse
|
7
|
Meglécz E, Nève G, Biffin E, Gardner MG. Breakdown of phylogenetic signal: a survey of microsatellite densities in 454 shotgun sequences from 154 non model eukaryote species. PLoS One 2012. [PMID: 22815847 DOI: 10.1371/journal.pone.004086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/11/2023] Open
Abstract
Microsatellites are ubiquitous in Eukaryotic genomes. A more complete understanding of their origin and spread can be gained from a comparison of their distribution within a phylogenetic context. Although information for model species is accumulating rapidly, it is insufficient due to a lack of species depth, thus intragroup variation is necessarily ignored. As such, apparent differences between groups may be overinflated and generalizations cannot be inferred until an analysis of the variation that exists within groups has been conducted. In this study, we examined microsatellite coverage and motif patterns from 454 shotgun sequences of 154 Eukaryote species from eight distantly related phyla (Cnidaria, Arthropoda, Onychophora, Bryozoa, Mollusca, Echinodermata, Chordata and Streptophyta) to test if a consistent phylogenetic pattern emerges from the microsatellite composition of these species. It is clear from our results that data from model species provide incomplete information regarding the existing microsatellite variability within the Eukaryotes. A very strong heterogeneity of microsatellite composition was found within most phyla, classes and even orders. Autocorrelation analyses indicated that while microsatellite contents of species within clades more recent than 200 Mya tend to be similar, the autocorrelation breaks down and becomes negative or non-significant with increasing divergence time. Therefore, the age of the taxon seems to be a primary factor in degrading the phylogenetic pattern present among related groups. The most recent classes or orders of Chordates still retain the pattern of their common ancestor. However, within older groups, such as classes of Arthropods, the phylogenetic pattern has been scrambled by the long independent evolution of the lineages.
Collapse
Affiliation(s)
- Emese Meglécz
- IMBE UMR 7263 CNRS IRD, Aix-Marseille University, Marseille, France.
| | | | | | | |
Collapse
|
8
|
Hamarsheh O, Amro A. Characterization of simple sequence repeats (SSRs) from Phlebotomus papatasi (Diptera: Psychodidae) expressed sequence tags (ESTs). Parasit Vectors 2011; 4:189. [PMID: 21958493 PMCID: PMC3191335 DOI: 10.1186/1756-3305-4-189] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2011] [Accepted: 09/29/2011] [Indexed: 10/31/2022] Open
Abstract
BACKGROUND Phlebotomus papatasi is a natural vector of Leishmania major, which causes cutaneous leishmaniasis in many countries. Simple sequence repeats (SSRs), or microsatellites, are common in eukaryotic genomes and are short, repeated nucleotide sequence elements arrayed in tandem and flanked by non-repetitive regions. The enrichment methods used previously for finding new microsatellite loci in sand flies remain laborious and time consuming; in silico mining, which includes retrieval and screening of microsatellites from large amounts of sequence data from sequence data bases using microsatellite search tools can yield many new candidate markers. RESULTS Simple sequence repeats (SSRs) were characterized in P. papatasi expressed sequence tags (ESTs) derived from a public database, National Center for Biotechnology Information (NCBI). A total of 42,784 sequences were mined, and 1,499 SSRs were identified with a frequency of 3.5% and an average density of 15.55 kb per SSR. Dinucleotide motifs were the most common SSRs, accounting for 67% followed by tri-, tetra-, and penta-nucleotide repeats, accounting for 31.1%, 1.5%, and 0.1%, respectively. The length of microsatellites varied from 5 to 16 repeats. Dinucleotide types; AG and CT have the highest frequency. Dinucleotide SSR-ESTs are relatively biased toward an excess of (AX)n repeats and a low GC base content. Forty primer pairs were designed based on motif lengths for further experimental validation. CONCLUSION The first large-scale survey of SSRs derived from P. papatasi is presented; dinucleotide SSRs identified are more frequent than other types. EST data mining is an effective strategy to identify functional microsatellites in P. papatasi.
Collapse
Affiliation(s)
- Omar Hamarsheh
- Department of Biological Sciences, Faculty of Science and Technology, Al-Quds University, PO Box 51000, Jerusalem, Palestine.
| | | |
Collapse
|
9
|
Freschi V, Bogliolo A. A monte carlo method for assessing the quality of duplication-aware alignment algorithms. Evol Bioinform Online 2011; 7:31-40. [PMID: 21698090 PMCID: PMC3118696 DOI: 10.4137/ebo.s6662] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
The increasing availability of high throughput sequencing technologies poses several challenges concerning the analysis of genomic data. Within this context, duplication-aware sequence alignment taking into account complex mutation events is regarded as an important problem, particularly in light of recent evolutionary bioinformatics researches that highlighted the role of tandem duplications as one of the most important mutation events. Traditional sequence comparison algorithms do not take into account these events, resulting in poor alignments in terms of biological significance, mainly because of their assumption of statistical independence among contiguous residues. Several duplication-aware algorithms have been proposed in the last years which differ either for the type of duplications they consider or for the methods adopted to identify and compare them. However, there is no solution which clearly outperforms the others and no methods exist for assessing the reliability of the resulting alignments. This paper proposes a Monte Carlo method for assessing the quality of duplication-aware alignment algorithms and for driving the choice of the most appropriate alignment technique to be used in a specific context. The applicability and usefulness of the proposed approach are demonstrated on a case study, namely, the comparison of alignments based on edit distance with or without repeat masking.
Collapse
Affiliation(s)
- Valerio Freschi
- DiSBeF-Department of Base Sciences and Fundamentals, University of Urbino, Italy
| | | |
Collapse
|
10
|
Survey and analysis of simple sequence repeats in the Laccaria bicolor genome, with development of microsatellite markers. Curr Genet 2010; 57:75-88. [DOI: 10.1007/s00294-010-0328-9] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2010] [Revised: 11/02/2010] [Accepted: 11/16/2010] [Indexed: 10/18/2022]
|
11
|
Ueno S, Le Provost G, Léger V, Klopp C, Noirot C, Frigerio JM, Salin F, Salse J, Abrouk M, Murat F, Brendel O, Derory J, Abadie P, Léger P, Cabane C, Barré A, de Daruvar A, Couloux A, Wincker P, Reviron MP, Kremer A, Plomion C. Bioinformatic analysis of ESTs collected by Sanger and pyrosequencing methods for a keystone forest tree species: oak. BMC Genomics 2010; 11:650. [PMID: 21092232 PMCID: PMC3017864 DOI: 10.1186/1471-2164-11-650] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2010] [Accepted: 11/23/2010] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND The Fagaceae family comprises about 1,000 woody species worldwide. About half belong to the Quercus family. These oaks are often a source of raw material for biomass wood and fiber. Pedunculate and sessile oaks, are among the most important deciduous forest tree species in Europe. Despite their ecological and economical importance, very few genomic resources have yet been generated for these species. Here, we describe the development of an EST catalogue that will support ecosystem genomics studies, where geneticists, ecophysiologists, molecular biologists and ecologists join their efforts for understanding, monitoring and predicting functional genetic diversity. RESULTS We generated 145,827 sequence reads from 20 cDNA libraries using the Sanger method. Unexploitable chromatograms and quality checking lead us to eliminate 19,941 sequences. Finally a total of 125,925 ESTs were retained from 111,361 cDNA clones. Pyrosequencing was also conducted for 14 libraries, generating 1,948,579 reads, from which 370,566 sequences (19.0%) were eliminated, resulting in 1,578,192 sequences. Following clustering and assembly using TGICL pipeline, 1,704,117 EST sequences collapsed into 69,154 tentative contigs and 153,517 singletons, providing 222,671 non-redundant sequences (including alternative transcripts). We also assembled the sequences using MIRA and PartiGene software and compared the three unigene sets. Gene ontology annotation was then assigned to 29,303 unigene elements. Blast search against the SWISS-PROT database revealed putative homologs for 32,810 (14.7%) unigene elements, but more extensive search with Pfam, Refseq_protein, Refseq_RNA and eight gene indices revealed homology for 67.4% of them. The EST catalogue was examined for putative homologs of candidate genes involved in bud phenology, cuticle formation, phenylpropanoids biosynthesis and cell wall formation. Our results suggest a good coverage of genes involved in these traits. Comparative orthologous sequences (COS) with other plant gene models were identified and allow to unravel the oak paleo-history. Simple sequence repeats (SSRs) and single nucleotide polymorphisms (SNPs) were searched, resulting in 52,834 SSRs and 36,411 SNPs. All of these are available through the Oak Contig Browser http://genotoul-contigbrowser.toulouse.inra.fr:9092/Quercus_robur/index.html. CONCLUSIONS This genomic resource provides a unique tool to discover genes of interest, study the oak transcriptome, and develop new markers to investigate functional diversity in natural populations.
Collapse
Affiliation(s)
- Saneyoshi Ueno
- INRA, UMR 1202 BIOGECO, 69 route d'Arcachon, F-33612 Cestas, France
- Forestry and Forest Products Research Institute, Department of Forest Genetics, Tree Genetics Laboratory, 1 Matsunosato, Tsukuba, Ibaraki, 305-8687, Japan
| | | | - Valérie Léger
- INRA, UMR 1202 BIOGECO, 69 route d'Arcachon, F-33612 Cestas, France
| | - Christophe Klopp
- Plateforme bioinformatique Genotoul, UR875 Biométrie et Intelligence Artificielle, INRA, 31326 Castanet-Tolosan, France
| | - Céline Noirot
- Plateforme bioinformatique Genotoul, UR875 Biométrie et Intelligence Artificielle, INRA, 31326 Castanet-Tolosan, France
| | | | - Franck Salin
- INRA, UMR 1202 BIOGECO, 69 route d'Arcachon, F-33612 Cestas, France
| | - Jérôme Salse
- INRA/UBP UMR 1095, Laboratoire Génétique, Diversité et Ecophysiologie des Céréales, 234 avenue du Brézet, 63100 Clermont Ferrand, France
| | - Michael Abrouk
- INRA/UBP UMR 1095, Laboratoire Génétique, Diversité et Ecophysiologie des Céréales, 234 avenue du Brézet, 63100 Clermont Ferrand, France
| | - Florent Murat
- INRA/UBP UMR 1095, Laboratoire Génétique, Diversité et Ecophysiologie des Céréales, 234 avenue du Brézet, 63100 Clermont Ferrand, France
| | - Oliver Brendel
- INRA, UMR1137 EEF "Ecologie et Ecophysiologie Forestières", F 54280 Champenoux, France
| | - Jérémy Derory
- INRA, UMR 1202 BIOGECO, 69 route d'Arcachon, F-33612 Cestas, France
| | - Pierre Abadie
- INRA, UMR 1202 BIOGECO, 69 route d'Arcachon, F-33612 Cestas, France
| | - Patrick Léger
- INRA, UMR 1202 BIOGECO, 69 route d'Arcachon, F-33612 Cestas, France
| | - Cyril Cabane
- Université de Bordeaux, Centre de Bioinformatique de Bordeaux, Bordeaux, France
- CNRS, UMR 5800, Laboratoire Bordelais de Recherche en Informatique, Talence, France
| | - Aurélien Barré
- Université de Bordeaux, Centre de Bioinformatique de Bordeaux, Bordeaux, France
| | - Antoine de Daruvar
- Université de Bordeaux, Centre de Bioinformatique de Bordeaux, Bordeaux, France
- CNRS, UMR 5800, Laboratoire Bordelais de Recherche en Informatique, Talence, France
| | - Arnaud Couloux
- CEA, DSV, Genoscope, Centre National de Séquençage, 2 rue Gaston Crémieux CP5706 91057 Evry cedex, France
| | - Patrick Wincker
- CEA, DSV, Genoscope, Centre National de Séquençage, 2 rue Gaston Crémieux CP5706 91057 Evry cedex, France
| | | | - Antoine Kremer
- INRA, UMR 1202 BIOGECO, 69 route d'Arcachon, F-33612 Cestas, France
| | | |
Collapse
|
12
|
Murat C, Riccioni C, Belfiori B, Cichocki N, Labbé J, Morin E, Tisserant E, Paolocci F, Rubini A, Martin F. Distribution and localization of microsatellites in the Perigord black truffle genome and identification of new molecular markers. Fungal Genet Biol 2010; 48:592-601. [PMID: 20965267 DOI: 10.1016/j.fgb.2010.10.007] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2010] [Revised: 09/29/2010] [Accepted: 10/13/2010] [Indexed: 10/18/2022]
Abstract
The level of genetic diversity and genetic structure in the Perigord black truffle (Tuber melanosporum Vittad.) has been debated for several years, mainly due to the lack of appropriate genetic markers. Microsatellites or simple sequence repeats (SSRs) are important for the genome organisation, phenotypic diversity and are one of the most popular molecular markers. In this study, we surveyed the T. melanosporum genome (1) to characterise its SSR pattern; (2) to compare it with SSR patterns found in 48 other fungal and three oomycetes genomes and (3) to identify new polymorphic SSR markers for population genetics. The T. melanosporum genome is rich in SSRs with 22,425 SSRs with mono-nucleotides being the most frequent motifs. SSRs were found in all genomic regions although they are more frequent in non-coding regions (introns and intergenic regions). Sixty out of 135 PCR-amplified mono-, di-, tri-, tetra, penta, and hexa-nucleotides were polymorphic (44%) within black truffle populations and 27 were randomly selected and analysed on 139 T. melanosporum isolates from France, Italy and Spain. The number of alleles varied from 2 to 18 and the expected heterozygosity from 0.124 to 0.815. One hundred and thirty-two different multilocus genotypes out of the 139 T. melanosporum isolates were identified and the genotypic diversity was high (0.999). Polymorphic SSRs were found in UTR regulatory regions of fruiting bodies and ectomycorrhiza regulated genes, suggesting that they may play a role in phenotypic variation. In conclusion, SSRs developed in this study were highly polymorphic and our results showed that T. melanosporum is a species with an important genetic diversity, which is in agreement with its recently uncovered heterothallic mating system.
Collapse
Affiliation(s)
- C Murat
- UMR INRA-UHP Interactions Arbres/Micro-Organismes, INRA-Nancy, 54280 Champenoux, France.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
13
|
Kelkar YD, Strubczewski N, Hile SE, Chiaromonte F, Eckert KA, Makova KD. What is a microsatellite: a computational and experimental definition based upon repeat mutational behavior at A/T and GT/AC repeats. Genome Biol Evol 2010; 2:620-35. [PMID: 20668018 PMCID: PMC2940325 DOI: 10.1093/gbe/evq046] [Citation(s) in RCA: 86] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Microsatellites are abundant in eukaryotic genomes and have high rates of strand slippage-induced repeat number alterations. They are popular genetic markers, and their mutations are associated with numerous neurological diseases. However, the minimal number of repeats required to constitute a microsatellite has been debated, and a definition of a microsatellite that considers its mutational behavior has been lacking. To define a microsatellite, we investigated slippage dynamics for a range of repeat sizes, utilizing two approaches. Computationally, we assessed length polymorphism at repeat loci in ten ENCODE regions resequenced in four human populations, assuming that the occurrence of polymorphism reflects strand slippage rates. Experimentally, we determined the in vitro DNA polymerase-mediated strand slippage error rates as a function of repeat number. In both approaches, we compared strand slippage rates at tandem repeats with the background slippage rates. We observed two distinct modes of mutational behavior. At small repeat numbers, slippage rates were low and indistinguishable from background measurements. A marked transition in mutability was observed as the repeat array lengthened, such that slippage rates at large repeat numbers were significantly higher than the background rates. For both mononucleotide and dinucleotide microsatellites studied, the transition length corresponded to a similar number of nucleotides (approximately 10). Thus, microsatellite threshold is determined not by the presence/absence of strand slippage at repeats but by an abrupt alteration in slippage rates relative to background. These findings have implications for understanding microsatellite mutagenesis, standardization of genome-wide microsatellite analyses, and predicting polymorphism levels of individual microsatellite loci.
Collapse
|
14
|
Ellison CK, Shaw KL. Mining non-model genomic libraries for microsatellites: BAC versus EST libraries and the generation of allelic richness. BMC Genomics 2010; 11:428. [PMID: 20624300 PMCID: PMC2996956 DOI: 10.1186/1471-2164-11-428] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2010] [Accepted: 07/12/2010] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Simple sequence repeats (SSRs) are tandemly repeated sequence motifs common in genomic nucleotide sequence that often harbor significant variation in repeat number. Frequently used as molecular markers, SSRs are increasingly identified via in silico approaches. Two common classes of genomic resources that can be mined are bacterial artificial chromosome (BAC) libraries and expressed sequence tag (EST) libraries. RESULTS 288 SSR loci were screened in the rapidly radiating Hawaiian swordtail cricket genus Laupala. SSRs were more densely distributed and contained longer repeat structures in BAC library-derived sequence than in EST library-derived sequence, although neither repeat density nor length was exceptionally elevated despite the relatively large genome size of Laupala. A non-random distribution favoring AT-rich SSRs was observed. Allelic diversity of SSRs was positively correlated with repeat length and was generally higher in AT-rich repeat motifs. CONCLUSION The first large-scale survey of Orthopteran SSR allelic diversity is presented. Selection contributes more strongly to the size and density distributions of SSR loci derived from EST library sequence than from BAC library sequence, although all SSRs likely are subject to similar physical and structural constraints, such as slippage of DNA replication machinery, that may generate increased allelic diversity in AT-rich sequence motifs. Although in silico approaches work well for SSR locus identification in both EST and BAC libraries, BAC library sequence and AT-rich repeat motifs are generally superior SSR development resources for most applications.
Collapse
Affiliation(s)
| | - Kerry L Shaw
- Department of Neurobiology and Behavior, Cornell University, Ithaca, NY 14850, USA
| |
Collapse
|
15
|
Mayer C, Leese F, Tollrian R. Genome-wide analysis of tandem repeats in Daphnia pulex--a comparative approach. BMC Genomics 2010; 11:277. [PMID: 20433735 PMCID: PMC3152781 DOI: 10.1186/1471-2164-11-277] [Citation(s) in RCA: 72] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2009] [Accepted: 04/30/2010] [Indexed: 11/10/2022] Open
Abstract
Background DNA tandem repeats (TRs) are not just popular molecular markers, but are also important genomic elements from an evolutionary and functional perspective. For various genomes, the densities of short TR types were shown to differ strongly among different taxa and genomic regions. In this study we analysed the TR characteristics in the genomes of Daphnia pulex and 11 other eukaryotic species. Characteristics of TRs in different genomic regions and among different strands are compared in details for D. pulex and the two model insects Apis mellifera and Drosophila melanogaster. Results Profound differences in TR characteristics were found among all 12 genomes compared in this study. In D. pulex, the genomic density of TRs was low compared to the arthropod species D. melanogaster and A. mellifera. For these three species, very few common features in repeat type usage, density distribution, and length characteristics were observed in the genomes and in different genomic regions. In introns and coding regions an unexpectedly high strandedness was observed for several repeat motifs. In D. pulex, the density of TRs was highest in introns, a rare feature in animals. In coding regions, the density of TRs with unit sizes 7-50 bp were more than three times as high as for 1-6 bp repeats. Conclusions TRs in the genome of D. pulex show several notable features, which distinguish it from the other genomes. Altogether, the highly non-random distribution of TRs among genomes, genomic regions and even among different DNA-stands raises many questions concerning their functional and evolutionary importance. The high density of TRs with a unit size longer than 6 bp found in non-coding and coding regions underpins the importance to include longer TR units in comparative analyses.
Collapse
Affiliation(s)
- Christoph Mayer
- Department of Animal Ecology, Evolution and Biodiversity, Ruhr University Bochum, Bochum, Germany.
| | | | | |
Collapse
|