1
|
Alves SIA, Dantas CWD, Macedo DB, Ramos RTJ. What are microsatellites and how to choose the best tool: a user-friendly review of SSR and 74 SSR mining tools. Front Genet 2024; 15:1474611. [PMID: 39606018 PMCID: PMC11599195 DOI: 10.3389/fgene.2024.1474611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Accepted: 10/30/2024] [Indexed: 11/29/2024] Open
Abstract
Microsatellites, also known as SSR or STR, are essential molecular markers in genomic research, playing crucial roles in genetic mapping, population genetics, and evolutionary studies. Their applications range from plant breeding to forensics, highlighting their diverse utility across disciplines. Despite their widespread use, traditional methods for SSR analysis are often laborious and time-consuming, requiring significant resources and expertise. To address these challenges, a variety of computational tools for SSR analysis have been developed, offering faster and more efficient alternatives to traditional methods. However, selecting the most appropriate tool can be daunting due to rapid technological advancements and the sheer number of options available. This study presents a comprehensive review and analysis of 74 SSR tools, aiming to provide researchers with a valuable resource for SSR analysis tool selection. The methodology employed includes thorough literature reviews, detailed tool comparisons, and in-depth analyses of tool functionality. By compiling and analyzing these tools, this study not only advances the field of genomic research but also contributes to the broader scientific community by facilitating informed decision-making in the selection of SSR analysis tools. Researchers seeking to understand SSRs and select the most appropriate tools for their projects will benefit from this comprehensive guide. Overall, this study enhances our understanding of SSR analysis tools, paving the way for more efficient and effective SSR research in various fields of study.
Collapse
Affiliation(s)
- Sandy Ingrid Aguiar Alves
- Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
- Laboratory of Simulation and Computational Biology — SIMBIC, High Performance Computing Center — CCAD, Federal University of Pará, Belém, Pará, Brazil
- Laboratory of Bioinformatics and Genomics of Microorganisms, Institute of Biological Sciences, Federal University of Pará, Belém, Pará, Brazil
| | - Carlos Willian Dias Dantas
- Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
- Laboratory of Simulation and Computational Biology — SIMBIC, High Performance Computing Center — CCAD, Federal University of Pará, Belém, Pará, Brazil
- Laboratory of Bioinformatics and Genomics of Microorganisms, Institute of Biological Sciences, Federal University of Pará, Belém, Pará, Brazil
| | - Daralyns Borges Macedo
- Laboratory of Simulation and Computational Biology — SIMBIC, High Performance Computing Center — CCAD, Federal University of Pará, Belém, Pará, Brazil
- Laboratory of Bioinformatics and Genomics of Microorganisms, Institute of Biological Sciences, Federal University of Pará, Belém, Pará, Brazil
| | - Rommel Thiago Jucá Ramos
- Laboratory of Simulation and Computational Biology — SIMBIC, High Performance Computing Center — CCAD, Federal University of Pará, Belém, Pará, Brazil
- Laboratory of Bioinformatics and Genomics of Microorganisms, Institute of Biological Sciences, Federal University of Pará, Belém, Pará, Brazil
| |
Collapse
|
2
|
Geethanjali S, Kadirvel P, Anumalla M, Hemanth Sadhana N, Annamalai A, Ali J. Streamlining of Simple Sequence Repeat Data Mining Methodologies and Pipelines for Crop Scanning. PLANTS (BASEL, SWITZERLAND) 2024; 13:2619. [PMID: 39339594 PMCID: PMC11435353 DOI: 10.3390/plants13182619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Revised: 08/18/2024] [Accepted: 08/29/2024] [Indexed: 09/30/2024]
Abstract
Genetic markers are powerful tools for understanding genetic diversity and the molecular basis of traits, ushering in a new era of molecular breeding in crops. Over the past 50 years, DNA markers have rapidly changed, moving from hybridization-based and second-generation-based to sequence-based markers. Simple sequence repeats (SSRs) are the ideal markers in plant breeding, and they have numerous desirable properties, including their repeatability, codominance, multi-allelic nature, and locus specificity. They can be generated from any species, which requires prior sequence knowledge. SSRs may serve as evolutionary tuning knobs, allowing for rapid identification and adaptation to new circumstances. The evaluations published thus far have mostly ignored SSR polymorphism and gene evolution due to a lack of data regarding the precise placements of SSRs on chromosomes. However, NGS technologies have made it possible to produce high-throughput SSRs for any species using massive volumes of genomic sequence data that can be generated fast and at a minimal cost. Though SNP markers are gradually replacing the erstwhile DNA marker systems, SSRs remain the markers of choice in orphan crops due to the lack of genomic resources at the reference level and their adaptability to resource-limited labor. Several bioinformatic approaches and tools have evolved to handle genomic sequences to identify SSRs and generate primers for genotyping applications in plant breeding projects. This paper includes the currently available methodologies for producing SSR markers, genomic resource databases, and computational tools/pipelines for SSR data mining and primer generation. This review aims to provide a 'one-stop shop' of information to help each new user carefully select tools for identifying and utilizing SSRs in genetic research and breeding programs.
Collapse
Affiliation(s)
- Subramaniam Geethanjali
- Department of Plant Biotechnology, Centre for Plant Molecular Biology and Biotechnology, Tamil Nadu Agricultural University, Coimbatore 641003, India
| | - Palchamy Kadirvel
- Crop Improvement Section, ICAR-Indian Institute of Oilseeds Research, Rajendranagar, Hyderabad 500030, India
| | - Mahender Anumalla
- Rice Breeding Innovation Platform, International Rice Research Institute (IRRI), Los Baños 4031, Laguna, Philippines
- IRRI South Asia Hub, Patancheru, Hyderabad 502324, India
| | - Nithyananth Hemanth Sadhana
- Department of Plant Biotechnology, Centre for Plant Molecular Biology and Biotechnology, Tamil Nadu Agricultural University, Coimbatore 641003, India
| | - Anandan Annamalai
- Indian Council of Agricultural Research (ICAR), Indian Institute of Seed Science, Bengaluru 560065, India
| | - Jauhar Ali
- Rice Breeding Innovation Platform, International Rice Research Institute (IRRI), Los Baños 4031, Laguna, Philippines
| |
Collapse
|
3
|
Chaudhari JK, Pant S, Jha R, Pathak RK, Singh DB. Biological big-data sources, problems of storage, computational issues, and applications: a comprehensive review. Knowl Inf Syst 2024; 66:3159-3209. [DOI: 10.1007/s10115-023-02049-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 09/12/2023] [Accepted: 12/11/2023] [Indexed: 01/03/2025]
|
4
|
Hodel RGJ, Segovia-Salcedo MC, Landis JB, Crowl AA, Sun M, Liu X, Gitzendanner MA, Douglas NA, Germain-Aubrey CC, Chen S, Soltis DE, Soltis PS. The report of my death was an exaggeration: A review for researchers using microsatellites in the 21st century. APPLICATIONS IN PLANT SCIENCES 2016; 4:apps1600025. [PMID: 27347456 PMCID: PMC4915923 DOI: 10.3732/apps.1600025] [Citation(s) in RCA: 95] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/04/2016] [Accepted: 05/25/2016] [Indexed: 05/19/2023]
Abstract
Microsatellites, or simple sequence repeats (SSRs), have long played a major role in genetic studies due to their typically high polymorphism. They have diverse applications, including genome mapping, forensics, ascertaining parentage, population and conservation genetics, identification of the parentage of polyploids, and phylogeography. We compare SSRs and newer methods, such as genotyping by sequencing (GBS) and restriction site associated DNA sequencing (RAD-Seq), and offer recommendations for researchers considering which genetic markers to use. We also review the variety of techniques currently used for identifying microsatellite loci and developing primers, with a particular focus on those that make use of next-generation sequencing (NGS). Additionally, we review software for microsatellite development and report on an experiment to assess the utility of currently available software for SSR development. Finally, we discuss the future of microsatellites and make recommendations for researchers preparing to use microsatellites. We argue that microsatellites still have an important place in the genomic age as they remain effective and cost-efficient markers.
Collapse
Affiliation(s)
- Richard G. J. Hodel
- Department of Biology, University of Florida, Gainesville, Florida 32611 USA
- Florida Museum of Natural History, University of Florida, Gainesville, Florida 32611 USA
- Author for correspondence:
| | | | - Jacob B. Landis
- Department of Biology, University of Florida, Gainesville, Florida 32611 USA
- Florida Museum of Natural History, University of Florida, Gainesville, Florida 32611 USA
| | - Andrew A. Crowl
- Department of Biology, University of Florida, Gainesville, Florida 32611 USA
- Florida Museum of Natural History, University of Florida, Gainesville, Florida 32611 USA
| | - Miao Sun
- Florida Museum of Natural History, University of Florida, Gainesville, Florida 32611 USA
| | - Xiaoxian Liu
- Department of Biology, University of Florida, Gainesville, Florida 32611 USA
- Florida Museum of Natural History, University of Florida, Gainesville, Florida 32611 USA
| | | | - Norman A. Douglas
- Department of Biology, University of Florida, Gainesville, Florida 32611 USA
| | | | - Shichao Chen
- College of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Douglas E. Soltis
- Department of Biology, University of Florida, Gainesville, Florida 32611 USA
- Florida Museum of Natural History, University of Florida, Gainesville, Florida 32611 USA
- The Genetics Institute, University of Florida, Gainesville, Florida 32611 USA
| | - Pamela S. Soltis
- Florida Museum of Natural History, University of Florida, Gainesville, Florida 32611 USA
- The Genetics Institute, University of Florida, Gainesville, Florida 32611 USA
| |
Collapse
|
5
|
Novel microsatellite marker development from the unassembled genome sequence data of the marbled flounder Pseudopleuronectes yokohamae. Mar Genomics 2015; 24 Pt 3:357-61. [PMID: 26439000 DOI: 10.1016/j.margen.2015.09.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2015] [Revised: 09/09/2015] [Accepted: 09/11/2015] [Indexed: 11/23/2022]
Abstract
Various genome-scale data have been increasingly published in diverged species, but they can be reused for other purposes by re-analyzing in other ways. As a case study to utilize the published genome data, we developed microsatellite markers from the genome sequence data (assembled contigs and unassembled reads) of the marbled flounder Pseudopleuronectes yokohamae. No microsatellites were identified in the contig sequences, whereas the computer software found 781,773 sequences containing microsatellites with di- to hexa-nucleotide motif in the unassembled reads. For 86,732 unique sequences among them, a total of 331,368 primer pairs were designed. Screening based on PCR amplification, polymorphisms and accurate genotyping resulted in sixteen primer sets, which were later characterized using 45 samples collected in Onagawa Bay, Miyagi, Japan. The presence of null alleles was suggested at four loci in the studied population but no evidence of allelic dropout was found. The observed number of alleles and heterozygosity was 2-20 and 0-0.88889, respectively, indicating polymorphisms and usefulness for population genetic analyses of this species. In addition, a large number of the microsatellite primers developed in this study are potentially applicable also for kinship estimation, individual fingerprint and linkage map construction.
Collapse
|
6
|
Abstract
BACKGROUND With the advent of high-throughput sequencing technologies large-scale identification of microsatellites became affordable and was especially directed to non-model species. By contrast, few efforts have been published toward the automatic identification of polymorphic microsatellites by exploiting sequence redundancy. Few tools for genotyping microsatellite repeats have been implemented so far that are able to manage huge amount of sequence data and handle the SAM/BAM file format. Most of them have been developed for and tested on human or model organisms with high quality reference genomes. RESULTS In this note we describe polymorphic SSR retrieval (PSR), a read counter and simple sequence repeat (SSR) length polymorphism detection tool. It is written in Perl and was developed to identify length polymorphisms in perfect microsatellites exploiting next generation sequencing (NGS) data. PSR has been developed bearing in mind plant non-model species for which de novo transcriptome assembly is generally the first sequence resource available to be used for SSR-mining. PSR is divided into two modules: the read-counting module (PSR_read_retrieval) identifies all the reads that cover the full-length of perfect microsatellites; the comparative module (PSR_poly_finder) detects both heterozygous and homozygous alleles at each microsatellite locus across all genotypes under investigation. Two threshold values to call a length polymorphism and reduce the number of false positives can be defined by the user: the minimum number of reads overlapping the repetitive stretch and the minimum read depth. The first parameter determines if the microsatellite-containing sequence must be processed or not, while the second one is decisive for the identification of minor alleles. PSR was tested on two different case studies. The first study aims at the identification of polymorphic SSRs in a set of de novo assembled transcripts defined by RNA-sequencing of two different plant genotypes. The second research activity aims to investigate sequence variations within a collection of newly sequenced chloroplast genomes. In both the cases PSR results are in agreement with those obtained by capillary gel separation. CONCLUSION PSR has been specifically developed from the need to automate the gene-based and genome-wide identification of polymorphic microsatellites from NGS data. It overcomes the limits related to the existing and time-consuming efforts based on tools developed in the pre-NGS era.
Collapse
Affiliation(s)
- Concita Cantarella
- Consiglio per la ricerca in agricoltura e l'analisi dell'economia agraria - Centro di ricerca per l'orticoltura, Via Cavalleggeri 25, 84098, Pontecagnano Faiano, Italy.
| | - Nunzio D'Agostino
- Consiglio per la ricerca in agricoltura e l'analisi dell'economia agraria - Centro di ricerca per l'orticoltura, Via Cavalleggeri 25, 84098, Pontecagnano Faiano, Italy.
| |
Collapse
|
7
|
Sablok G, Padma Raju GV, Mudunuri SB, Prabha R, Singh DP, Baev V, Yahubyan G, Ralph PJ, La Porta N. ChloroMitoSSRDB 2.00: more genomes, more repeats, unifying SSRs search patterns and on-the-fly repeat detection. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2015; 2015:bav084. [PMID: 26412851 PMCID: PMC4584093 DOI: 10.1093/database/bav084] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/30/2015] [Accepted: 08/17/2015] [Indexed: 01/13/2023]
Abstract
Organelle genomes evolve rapidly as compared with nuclear genomes and have been widely used for developing microsatellites or simple sequence repeats (SSRs) markers for delineating phylogenomics. In our previous reports, we have established the largest repository of organelle SSRs, ChloroMitoSSRDB, which provides access to 2161 organelle genomes (1982 mitochondrial and 179 chloroplast genomes) with a total of 5838 perfect chloroplast SSRs, 37 297 imperfect chloroplast SSRs, 5898 perfect mitochondrial SSRs and 50 355 imperfect mitochondrial SSRs across organelle genomes. In the present research, we have updated ChloroMitoSSRDB by systematically analyzing and adding additional 191 chloroplast and 2102 mitochondrial genomes. With the recent update, ChloroMitoSSRDB 2.00 provides access to a total of 4454 organelle genomes displaying a total of 40 653 IMEx Perfect SSRs (11 802 Chloroplast Perfect SSRs and 28 851 Mitochondria Perfect SSRs), 275 981 IMEx Imperfect SSRs (78 972 Chloroplast Imperfect SSRs and 197 009 Mitochondria Imperfect SSRs), 35 250 MISA (MIcroSAtellite identification tool) Perfect SSRs and 3211 MISA Compound SSRs and associated information such as location of the repeats (coding and non-coding), size of repeat, motif and length polymorphism, and primer pairs. Additionally, we have integrated and made available several in silico SSRs mining tools through a unified web-portal for in silico repeat mining for assembled organelle genomes and from next generation sequencing reads. ChloroMitoSSRDB 2.00 allows the end user to perform multiple SSRs searches and easy browsing through the SSRs using two repeat algorithms and provide primer pair information for identified SSRs for evolutionary genomics. Database URL:http://www.mcr.org.in/chloromitossrdb
Collapse
Affiliation(s)
- Gaurav Sablok
- Plant Functional Biology and Climate Change Cluster (C3), University of Technology Sydney, PO Box 123, Broadway, NSW 2007, Australia, Environmental Biotechnology Platform, Research and Innovation Center, Fondazione Edmund Mach (FEM), IASMA Via Mach 1., 38010 San Michele all'Adige (TN), Italy,
| | - G V Padma Raju
- Department of Computer Science and Engineering, S.R.K.R Engineering College, Chinna Amiram, Bhimavaram 534204, Andhra Pradesh, India
| | - Suresh B Mudunuri
- Technology Centre, S.R.K.R. Engineering College, Chinna Amiram, Bhimavaram 534204, Andhra Pradesh, India
| | - Ratna Prabha
- National Bureau of Agriculturally Important Microorganisms (NBAIM) (Indian Council of Agricultural Research), Maunath Bhanjan 275101, Uttar Pradesh, India and
| | - Dhananjaya P Singh
- National Bureau of Agriculturally Important Microorganisms (NBAIM) (Indian Council of Agricultural Research), Maunath Bhanjan 275101, Uttar Pradesh, India and
| | - Vesselin Baev
- Department of Plant Physiology and Molecular Biology, University of Plovdiv, 24 Tsar Assen St, 4000 Plovdiv, Bulgaria
| | - Galina Yahubyan
- Department of Plant Physiology and Molecular Biology, University of Plovdiv, 24 Tsar Assen St, 4000 Plovdiv, Bulgaria
| | - Peter J Ralph
- Plant Functional Biology and Climate Change Cluster (C3), University of Technology Sydney, PO Box 123, Broadway, NSW 2007, Australia
| | - Nicola La Porta
- Environmental Biotechnology Platform, Research and Innovation Center, Fondazione Edmund Mach (FEM), IASMA Via Mach 1., 38010 San Michele all'Adige (TN), Italy
| |
Collapse
|
8
|
Lee JCI, Tseng B, Ho BC, Linacre A. pSTR Finder: a rapid method to discover polymorphic short tandem repeat markers from whole-genome sequences. INVESTIGATIVE GENETICS 2015; 6:10. [PMID: 26246889 PMCID: PMC4525727 DOI: 10.1186/s13323-015-0027-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 03/18/2015] [Accepted: 07/21/2015] [Indexed: 11/10/2022]
Abstract
Background Whole-genome sequencing is performed routinely as a means to identify polymorphic genetic loci such as short tandem repeat loci. We have developed a simple tool, called pSTR Finder, which is freely available as a means of identifying putative polymorphic short tandem repeat (STR) loci from data generated from genome-wide sequences. The program performs cross comparisons on the STR sequences generated using the Tandem Repeats Finder based on multiple-genome samples in a FASTA format. These comparisons generate reports listing identical, polymorphic, and different STR loci when comparing two samples. Methods The web site http://forensic.mc.ntu.edu.tw:9000/PSTRWeb/Default has been developed as a means to identify polymorphic STR loci within complex mass genome sequences. The program was developed to generate a series of user-friendly reports. Results As proof of concept for the program, four FASTA genome sequence samples of human chromosome X (AC_000155.1, CM000685.1, NC_018934.2, and CM000274.1) were obtained from GenBank and were analyzed for the presence of putative STR regions. The sequences within AC-000155.1 were used as an initial reference sequence from which there were 5443 identical and 4305 polymorphic STR loci identified using a repeat unit of 1–6 and 10 bp as the flanking sequence either side of the putative STR loci. A reliability test was used to compare five FASTA samples, which had sections of DNA sequence removed to mimic partial or fragmented DNA sequences, to determine whether pSTR Finder can efficiently and consistently find identical, polymorphic, and different STR loci. Conclusions From the mass of DNA sequence data, the project was found to reproducibly identify polymorphic STR loci and generate user-friendly reports detailing the number and location of these potential polymorphic loci. This freely available program was found to be a useful tool to find polymorphic STR within whole-genome sequence data in forensic genetic studies. Electronic supplementary material The online version of this article (doi:10.1186/s13323-015-0027-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- James Chun-I Lee
- Department of Forensic Medicine, College of Medicine, National Taiwan University, No. 1 Jen-Ai Road Section 1, Taipei, 10051 Taiwan
| | - Bill Tseng
- Department of Forensic Medicine, College of Medicine, National Taiwan University, No. 1 Jen-Ai Road Section 1, Taipei, 10051 Taiwan
| | - Bing-Ching Ho
- Department of Clinical Laboratory Sciences and Medical Biotechnology, College of Medicine, National Taiwan University, No. 1 Jen-Ai Road Section 1, Taipei, 10051 Taiwan ; NTU Center for Genomic Medicine, College of Medicine, National Taiwan University, No. 1 Jen-Ai Road Section 1, Taipei, 10051 Taiwan
| | - Adrian Linacre
- School of Biological Sciences, Flinders University, Adelaide, 5001 Australia
| |
Collapse
|
9
|
Oulas A, Pavloudi C, Polymenakou P, Pavlopoulos GA, Papanikolaou N, Kotoulas G, Arvanitidis C, Iliopoulos I. Metagenomics: tools and insights for analyzing next-generation sequencing data derived from biodiversity studies. Bioinform Biol Insights 2015; 9:75-88. [PMID: 25983555 PMCID: PMC4426941 DOI: 10.4137/bbi.s12462] [Citation(s) in RCA: 187] [Impact Index Per Article: 18.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2014] [Revised: 03/09/2015] [Accepted: 03/13/2015] [Indexed: 12/14/2022] Open
Abstract
Advances in next-generation sequencing (NGS) have allowed significant breakthroughs in microbial ecology studies. This has led to the rapid expansion of research in the field and the establishment of "metagenomics", often defined as the analysis of DNA from microbial communities in environmental samples without prior need for culturing. Many metagenomics statistical/computational tools and databases have been developed in order to allow the exploitation of the huge influx of data. In this review article, we provide an overview of the sequencing technologies and how they are uniquely suited to various types of metagenomic studies. We focus on the currently available bioinformatics techniques, tools, and methodologies for performing each individual step of a typical metagenomic dataset analysis. We also provide future trends in the field with respect to tools and technologies currently under development. Moreover, we discuss data management, distribution, and integration tools that are capable of performing comparative metagenomic analyses of multiple datasets using well-established databases, as well as commonly used annotation standards.
Collapse
Affiliation(s)
- Anastasis Oulas
- Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, Greece
| | - Christina Pavloudi
- Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, Greece
- Department of Biology, University of Ghent, Ghent, Belgium
- Department of Microbial Ecophysiology, University of Bremen, Bremen, Germany
| | - Paraskevi Polymenakou
- Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, Greece
| | - Georgios A Pavlopoulos
- Division of Basic Sciences, University of Crete, Medical School, Heraklion, Crete, Greece
| | - Nikolas Papanikolaou
- Division of Basic Sciences, University of Crete, Medical School, Heraklion, Crete, Greece
| | - Georgios Kotoulas
- Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, Greece
| | - Christos Arvanitidis
- Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, Crete, Greece
| | - Ioannis Iliopoulos
- Division of Basic Sciences, University of Crete, Medical School, Heraklion, Crete, Greece
| |
Collapse
|
10
|
Singh NV, Abburi VL, Ramajayam D, Kumar R, Chandra R, Sharma KK, Sharma J, Babu KD, Pal RK, Mundewadikar DM, Saminathan T, Cantrell R, Nimmakayala P, Reddy UK. Genetic diversity and association mapping of bacterial blight and other horticulturally important traits with microsatellite markers in pomegranate from India. Mol Genet Genomics 2015; 290:1393-402. [PMID: 25675870 DOI: 10.1007/s00438-015-1003-0] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2015] [Accepted: 01/27/2015] [Indexed: 10/24/2022]
Abstract
This genetic diversity study aimed to estimate the population structure and explore the use of association mapping strategies to identify linked markers for bacterial resistance, growth and fruit quality in pomegranate collections from India. In total, 88 accessions including 37 cultivated types were investigated. A total of 112 alleles were amplified by use of 44 publicly available microsatellites for estimating molecular genetic diversity and population structure. Neighbor-joining analysis, model-based population structure and principal component analysis corroborated the genetic relationships among wild-type and cultivated pomegranate collections from India. Our study placed all 88 germplasm into four clusters. We identified a cultivated clade of pomegranates in close proximity to Daru types of wild-type pomegranates that grow naturally near the foothills of the Himalayas. Admixture analysis sorted various lineages of cultivated pomegranates to their respective ancestral forms. We identified four linked markers for fruit weight, titratable acidity and bacterial blight severity. PGCT001 was found associated with both fruit weight and bacterial blight, and the association with fruit weight during both seasons analyzed was significant after Bonferroni correction. This research demonstrates effectiveness of microsatellites to resolve population structure among the wild and cultivar collection of pomegranates and future use for association mapping studies.
Collapse
Affiliation(s)
- Nripendra Vikram Singh
- ICAR-National Research Center on Pomegranate, Kegaon, Solapur, Maharashtra, 413255, India
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
11
|
Putman AI, Carbone I. Challenges in analysis and interpretation of microsatellite data for population genetic studies. Ecol Evol 2014; 4:4399-428. [PMID: 25540699 PMCID: PMC4267876 DOI: 10.1002/ece3.1305] [Citation(s) in RCA: 207] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2014] [Revised: 10/02/2014] [Accepted: 10/03/2014] [Indexed: 12/14/2022] Open
Abstract
Advancing technologies have facilitated the ever-widening application of genetic markers such as microsatellites into new systems and research questions in biology. In light of the data and experience accumulated from several years of using microsatellites, we present here a literature review that synthesizes the limitations of microsatellites in population genetic studies. With a focus on population structure, we review the widely used fixation (F ST) statistics and Bayesian clustering algorithms and find that the former can be confusing and problematic for microsatellites and that the latter may be confounded by complex population models and lack power in certain cases. Clustering, multivariate analyses, and diversity-based statistics are increasingly being applied to infer population structure, but in some instances these methods lack formalization with microsatellites. Migration-specific methods perform well only under narrow constraints. We also examine the use of microsatellites for inferring effective population size, changes in population size, and deeper demographic history, and find that these methods are untested and/or highly context-dependent. Overall, each method possesses important weaknesses for use with microsatellites, and there are significant constraints on inferences commonly made using microsatellite markers in the areas of population structure, admixture, and effective population size. To ameliorate and better understand these constraints, researchers are encouraged to analyze simulated datasets both prior to and following data collection and analysis, the latter of which is formalized within the approximate Bayesian computation framework. We also examine trends in the literature and show that microsatellites continue to be widely used, especially in non-human subject areas. This review assists with study design and molecular marker selection, facilitates sound interpretation of microsatellite data while fostering respect for their practical limitations, and identifies lessons that could be applied toward emerging markers and high-throughput technologies in population genetics.
Collapse
Affiliation(s)
- Alexander I Putman
- Department of Plant Pathology, North Carolina State University Raleigh, North Carolina, 27695-7616
| | - Ignazio Carbone
- Department of Plant Pathology, North Carolina State University Raleigh, North Carolina, 27695-7616
| |
Collapse
|
12
|
Gelfand Y, Hernandez Y, Loving J, Benson G. VNTRseek-a computational tool to detect tandem repeat variants in high-throughput sequencing data. Nucleic Acids Res 2014; 42:8884-94. [PMID: 25056320 PMCID: PMC4132751 DOI: 10.1093/nar/gku642] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
DNA tandem repeats (TRs) are ubiquitous genomic features which consist of two or more adjacent copies of an underlying pattern sequence. The copies may be identical or approximate. Variable number of tandem repeats or VNTRs are polymorphic TR loci in which the number of pattern copies is variable. In this paper we describe VNTRseek, our software for discovery of minisatellite VNTRs (pattern size ≥ 7 nucleotides) using whole genome sequencing data. VNTRseek maps sequencing reads to a set of reference TRs and then identifies putative VNTRs based on a discrepancy between the copy number of a reference and its mapped reads. VNTRseek was used to analyze the Watson and Khoisan genomes (454 technology) and two 1000 Genomes family trios (Illumina). In the Watson genome, we identified 752 VNTRs with pattern sizes ranging from 7 to 84 nt. In the Khoisan genome, we identified 2572 VNTRs with pattern sizes ranging from 7 to 105 nt. In the trios, we identified between 2660 and 3822 VNTRs per individual and found nearly 100% consistency with Mendelian inheritance. VNTRseek is, to the best of our knowledge, the first software for genome-wide detection of minisatellite VNTRs. It is available at http://orca.bu.edu/vntrseek/.
Collapse
Affiliation(s)
- Yevgeniy Gelfand
- Laboratory for Biocomputing and Informatics, Boston University, Boston, MA 02215, USA
| | - Yozen Hernandez
- Graduate Program in Bioinformatics, Boston University, Boston, MA 02215, USA
| | - Joshua Loving
- Graduate Program in Bioinformatics, Boston University, Boston, MA 02215, USA
| | - Gary Benson
- Laboratory for Biocomputing and Informatics, Boston University, Boston, MA 02215, USA Graduate Program in Bioinformatics, Boston University, Boston, MA 02215, USA Department of Computer Science, Boston University, Boston, MA 02215, USA
| |
Collapse
|
13
|
Meglécz E, Pech N, Gilles A, Dubut V, Hingamp P, Trilles A, Grenier R, Martin JF. QDD version 3.1: a user-friendly computer program for microsatellite selection and primer design revisited: experimental validation of variables determining genotyping success rate. Mol Ecol Resour 2014; 14:1302-13. [PMID: 24785154 DOI: 10.1111/1755-0998.12271] [Citation(s) in RCA: 122] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2014] [Revised: 04/21/2014] [Accepted: 04/25/2014] [Indexed: 11/30/2022]
Abstract
Microsatellite marker development has been greatly simplified by the use of high-throughput sequencing followed by in silico microsatellite detection and primer design. However, the selection of markers designed by the existing pipelines depends either on arbitrary criteria, or older studies on PCR success. Based on wet laboratory experiments, we have identified the following factors that are most likely to influence genotyping success rate: alignment score between the primers and the amplicon; the distance between primers and microsatellites; the length of the PCR product; target region complexity and the number of reads underlying the sequence. The QDD pipeline has been modified to include these most pertinent factors in the output to help the selection of markers. Furthermore, new features are also included in the present version: (i) not only raw sequencing reads are accepted as input, but also contigs, allowing the analysis of assembled high-coverage data; (ii) input data can be both in fasta and fastq format to facilitate the use of Illumina and IonTorrent reads; (iii) A comparison to known transposable elements allows their detection; (iv) A contamination check can be carried out by BLASTing potential markers against the nucleotide (nt) database of NCBI; (v) QDD3 is now also available imbedded into a virtual machine making installation easier and operating system independent. It can be used both on command-line version as well as integrated into a Galaxy server, providing a user-friendly interface, as well as the possibility to utilize a large variety of NGS tools.
Collapse
Affiliation(s)
- Emese Meglécz
- Aix-Marseille Université, CNRS, IRD, Univ. Avignon, UMR 7263 - IMBE, Equipe EGE, Centre Saint-Charles, Case 36, 3 Place Victor Hugo, 13331, Marseille Cedex 3, France
| | | | | | | | | | | | | | | |
Collapse
|
14
|
Genomic and global approaches to unravelling how hypermutable sequences influence bacterial pathogenesis. Pathogens 2014; 3:164-84. [PMID: 25437613 PMCID: PMC4235727 DOI: 10.3390/pathogens3010164] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2013] [Revised: 01/06/2014] [Accepted: 02/13/2014] [Indexed: 12/23/2022] Open
Abstract
Rapid adaptation to fluctuations in the host milieu contributes to the host persistence and virulence of bacterial pathogens. Adaptation is frequently mediated by hypermutable sequences in bacterial pathogens. Early bacterial genomic studies identified the multiplicity and virulence-associated functions of these hypermutable sequences. Thus, simple sequence repeat tracts (SSRs) and site-specific recombination were found to control capsular type, lipopolysaccharide structure, pilin diversity and the expression of outer membrane proteins. We review how the population diversity inherent in the SSR-mediated mechanism of localised hypermutation is being unlocked by the investigation of whole genome sequences of disease isolates, analysis of clinical samples and use of model systems. A contrast is presented between the problematical nature of analysing simple sequence repeats in next generation sequencing data and in simpler, pragmatic PCR-based approaches. Specific examples are presented of the potential relevance of this localized hypermutation to meningococcal pathogenesis. This leads us to speculate on the future prospects for unravelling how hypermutable mechanisms may contribute to the transmission, spread and persistence of bacterial pathogens.
Collapse
|
15
|
Miah G, Rafii MY, Ismail MR, Puteh AB, Rahim HA, Islam KN, Latif MA. A review of microsatellite markers and their applications in rice breeding programs to improve blast disease resistance. Int J Mol Sci 2013; 14:22499-528. [PMID: 24240810 PMCID: PMC3856076 DOI: 10.3390/ijms141122499] [Citation(s) in RCA: 84] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2013] [Revised: 09/26/2013] [Accepted: 10/16/2013] [Indexed: 11/16/2022] Open
Abstract
Over the last few decades, the use of molecular markers has played an increasing role in rice breeding and genetics. Of the different types of molecular markers, microsatellites have been utilized most extensively, because they can be readily amplified by PCR and the large amount of allelic variation at each locus. Microsatellites are also known as simple sequence repeats (SSR), and they are typically composed of 1-6 nucleotide repeats. These markers are abundant, distributed throughout the genome and are highly polymorphic compared with other genetic markers, as well as being species-specific and co-dominant. For these reasons, they have become increasingly important genetic markers in rice breeding programs. The evolution of new biotypes of pests and diseases as well as the pressures of climate change pose serious challenges to rice breeders, who would like to increase rice production by introducing resistance to multiple biotic and abiotic stresses. Recent advances in rice genomics have now made it possible to identify and map a number of genes through linkage to existing DNA markers. Among the more noteworthy examples of genes that have been tightly linked to molecular markers in rice are those that confer resistance or tolerance to blast. Therefore, in combination with conventional breeding approaches, marker-assisted selection (MAS) can be used to monitor the presence or lack of these genes in breeding populations. For example, marker-assisted backcross breeding has been used to integrate important genes with significant biological effects into a number of commonly grown rice varieties. The use of cost-effective, finely mapped microsatellite markers and MAS strategies should provide opportunities for breeders to develop high-yield, blast resistance rice cultivars. The aim of this review is to summarize the current knowledge concerning the linkage of microsatellite markers to rice blast resistance genes, as well as to explore the use of MAS in rice breeding programs aimed at improving blast resistance in this species. We also discuss the various advantages, disadvantages and uses of microsatellite markers relative to other molecular marker types.
Collapse
Affiliation(s)
- Gous Miah
- Laboratory of Food Crops, Institute of Tropical Agriculture, Universiti Putra Malaysia, 43400 UPM Serdang, Selangor, Malaysia; E-Mails: (G.M.); (M.R.I.)
| | - Mohd Y. Rafii
- Laboratory of Food Crops, Institute of Tropical Agriculture, Universiti Putra Malaysia, 43400 UPM Serdang, Selangor, Malaysia; E-Mails: (G.M.); (M.R.I.)
- Department of Crop Science, Faculty of Agriculture, Universiti Putra Malaysia, 43400 UPM Serdang, Selangor, Malaysia; E-Mails: (A.B.P.); (M.A.L.)
- Author to whom correspondence should be addressed; E-Mail: ; Tel.: +603-8947-1149
| | - Mohd R. Ismail
- Laboratory of Food Crops, Institute of Tropical Agriculture, Universiti Putra Malaysia, 43400 UPM Serdang, Selangor, Malaysia; E-Mails: (G.M.); (M.R.I.)
- Department of Crop Science, Faculty of Agriculture, Universiti Putra Malaysia, 43400 UPM Serdang, Selangor, Malaysia; E-Mails: (A.B.P.); (M.A.L.)
| | - Adam B. Puteh
- Department of Crop Science, Faculty of Agriculture, Universiti Putra Malaysia, 43400 UPM Serdang, Selangor, Malaysia; E-Mails: (A.B.P.); (M.A.L.)
| | - Harun A. Rahim
- Agrotechnology and Bioscience Division, Malaysian Nuclear Agency, 43000 Kajang, Selangor, Malaysia; E-Mail:
| | - Kh. Nurul Islam
- Laboratory of Anatomy and Histology, Department of Veterinary Preclinical Sciences, Faculty of Veterinary Medicine, Universiti Putra Malaysia, 43400 UPM Serdang, Selangor, Malaysia; E-Mail:
| | - Mohammad Abdul Latif
- Department of Crop Science, Faculty of Agriculture, Universiti Putra Malaysia, 43400 UPM Serdang, Selangor, Malaysia; E-Mails: (A.B.P.); (M.A.L.)
- Bangladesh Rice Research Institute, Gazipur 1701, Bangladesh
| |
Collapse
|