1
|
Alves SIA, Dantas CWD, Macedo DB, Ramos RTJ. What are microsatellites and how to choose the best tool: a user-friendly review of SSR and 74 SSR mining tools. Front Genet 2024; 15:1474611. [PMID: 39606018 PMCID: PMC11599195 DOI: 10.3389/fgene.2024.1474611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Accepted: 10/30/2024] [Indexed: 11/29/2024] Open
Abstract
Microsatellites, also known as SSR or STR, are essential molecular markers in genomic research, playing crucial roles in genetic mapping, population genetics, and evolutionary studies. Their applications range from plant breeding to forensics, highlighting their diverse utility across disciplines. Despite their widespread use, traditional methods for SSR analysis are often laborious and time-consuming, requiring significant resources and expertise. To address these challenges, a variety of computational tools for SSR analysis have been developed, offering faster and more efficient alternatives to traditional methods. However, selecting the most appropriate tool can be daunting due to rapid technological advancements and the sheer number of options available. This study presents a comprehensive review and analysis of 74 SSR tools, aiming to provide researchers with a valuable resource for SSR analysis tool selection. The methodology employed includes thorough literature reviews, detailed tool comparisons, and in-depth analyses of tool functionality. By compiling and analyzing these tools, this study not only advances the field of genomic research but also contributes to the broader scientific community by facilitating informed decision-making in the selection of SSR analysis tools. Researchers seeking to understand SSRs and select the most appropriate tools for their projects will benefit from this comprehensive guide. Overall, this study enhances our understanding of SSR analysis tools, paving the way for more efficient and effective SSR research in various fields of study.
Collapse
Affiliation(s)
- Sandy Ingrid Aguiar Alves
- Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
- Laboratory of Simulation and Computational Biology — SIMBIC, High Performance Computing Center — CCAD, Federal University of Pará, Belém, Pará, Brazil
- Laboratory of Bioinformatics and Genomics of Microorganisms, Institute of Biological Sciences, Federal University of Pará, Belém, Pará, Brazil
| | - Carlos Willian Dias Dantas
- Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
- Laboratory of Simulation and Computational Biology — SIMBIC, High Performance Computing Center — CCAD, Federal University of Pará, Belém, Pará, Brazil
- Laboratory of Bioinformatics and Genomics of Microorganisms, Institute of Biological Sciences, Federal University of Pará, Belém, Pará, Brazil
| | - Daralyns Borges Macedo
- Laboratory of Simulation and Computational Biology — SIMBIC, High Performance Computing Center — CCAD, Federal University of Pará, Belém, Pará, Brazil
- Laboratory of Bioinformatics and Genomics of Microorganisms, Institute of Biological Sciences, Federal University of Pará, Belém, Pará, Brazil
| | - Rommel Thiago Jucá Ramos
- Laboratory of Simulation and Computational Biology — SIMBIC, High Performance Computing Center — CCAD, Federal University of Pará, Belém, Pará, Brazil
- Laboratory of Bioinformatics and Genomics of Microorganisms, Institute of Biological Sciences, Federal University of Pará, Belém, Pará, Brazil
| |
Collapse
|
2
|
Geethanjali S, Kadirvel P, Anumalla M, Hemanth Sadhana N, Annamalai A, Ali J. Streamlining of Simple Sequence Repeat Data Mining Methodologies and Pipelines for Crop Scanning. PLANTS (BASEL, SWITZERLAND) 2024; 13:2619. [PMID: 39339594 PMCID: PMC11435353 DOI: 10.3390/plants13182619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Revised: 08/18/2024] [Accepted: 08/29/2024] [Indexed: 09/30/2024]
Abstract
Genetic markers are powerful tools for understanding genetic diversity and the molecular basis of traits, ushering in a new era of molecular breeding in crops. Over the past 50 years, DNA markers have rapidly changed, moving from hybridization-based and second-generation-based to sequence-based markers. Simple sequence repeats (SSRs) are the ideal markers in plant breeding, and they have numerous desirable properties, including their repeatability, codominance, multi-allelic nature, and locus specificity. They can be generated from any species, which requires prior sequence knowledge. SSRs may serve as evolutionary tuning knobs, allowing for rapid identification and adaptation to new circumstances. The evaluations published thus far have mostly ignored SSR polymorphism and gene evolution due to a lack of data regarding the precise placements of SSRs on chromosomes. However, NGS technologies have made it possible to produce high-throughput SSRs for any species using massive volumes of genomic sequence data that can be generated fast and at a minimal cost. Though SNP markers are gradually replacing the erstwhile DNA marker systems, SSRs remain the markers of choice in orphan crops due to the lack of genomic resources at the reference level and their adaptability to resource-limited labor. Several bioinformatic approaches and tools have evolved to handle genomic sequences to identify SSRs and generate primers for genotyping applications in plant breeding projects. This paper includes the currently available methodologies for producing SSR markers, genomic resource databases, and computational tools/pipelines for SSR data mining and primer generation. This review aims to provide a 'one-stop shop' of information to help each new user carefully select tools for identifying and utilizing SSRs in genetic research and breeding programs.
Collapse
Affiliation(s)
- Subramaniam Geethanjali
- Department of Plant Biotechnology, Centre for Plant Molecular Biology and Biotechnology, Tamil Nadu Agricultural University, Coimbatore 641003, India
| | - Palchamy Kadirvel
- Crop Improvement Section, ICAR-Indian Institute of Oilseeds Research, Rajendranagar, Hyderabad 500030, India
| | - Mahender Anumalla
- Rice Breeding Innovation Platform, International Rice Research Institute (IRRI), Los Baños 4031, Laguna, Philippines
- IRRI South Asia Hub, Patancheru, Hyderabad 502324, India
| | - Nithyananth Hemanth Sadhana
- Department of Plant Biotechnology, Centre for Plant Molecular Biology and Biotechnology, Tamil Nadu Agricultural University, Coimbatore 641003, India
| | - Anandan Annamalai
- Indian Council of Agricultural Research (ICAR), Indian Institute of Seed Science, Bengaluru 560065, India
| | - Jauhar Ali
- Rice Breeding Innovation Platform, International Rice Research Institute (IRRI), Los Baños 4031, Laguna, Philippines
| |
Collapse
|
3
|
Behboudi R, Nouri-Baygi M, Naghibzadeh M. RPTRF: A rapid perfect tandem repeat finder tool for DNA sequences. Biosystems 2023; 226:104869. [PMID: 36858110 DOI: 10.1016/j.biosystems.2023.104869] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 01/23/2023] [Accepted: 02/23/2023] [Indexed: 03/02/2023]
Abstract
The sequencing of eukaryotic genomes has shown that tandem repeats are abundant in their sequences. In addition to affecting some cellular processes, tandem repeats in the genome may be associated with specific diseases and have been the key to resolving criminal cases. Any tool developed for detecting tandem repeats must be accurate, fast, and useable in thousands of laboratories worldwide, including those with not very advanced computing capabilities. The proposed method, the Rapid Perfect Tandem Repeat Finder (RPTRF), minimizes the need for excess character comparison processing by indexing the input file and significantly helps to accelerate and prepare the output without artifacts by using an interval tree in the filtering section. The experiments demonstrated that the RPTRF is very fast in discovering all perfect tandem repeats of all categories of any genomic sequences. Although the detection of imperfect TRs is not the focus of the RPTRF, comparisons show that it even outperforms some other tools (in five selected gold standards) designed explicitly for this purpose. The implemented tool and how to use it are available on GitHub.
Collapse
Affiliation(s)
- Reza Behboudi
- Department of Computer Engineering, Ferdowsi University of Mashhad, Mashhad, Iran
| | - Mostafa Nouri-Baygi
- Department of Computer Engineering, Ferdowsi University of Mashhad, Mashhad, Iran.
| | - Mahmoud Naghibzadeh
- Department of Computer Engineering, Ferdowsi University of Mashhad, Mashhad, Iran
| |
Collapse
|
4
|
nTreeClus: A tree-based sequence encoder for clustering categorical series. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.04.076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
5
|
Raman G, Park KT, Kim JH, Park S. Characteristics of the completed chloroplast genome sequence of Xanthium spinosum: comparative analyses, identification of mutational hotspots and phylogenetic implications. BMC Genomics 2020; 21:855. [PMID: 33267775 PMCID: PMC7709266 DOI: 10.1186/s12864-020-07219-0] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Accepted: 11/09/2020] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND The invasive species Xanthium spinosum has been used as a traditional Chinese medicine for many years. Unfortunately, no extensive molecular studies of this plant have been conducted. RESULTS Here, the complete chloroplast (cp) genome sequence of X. spinosum was assembled and analyzed. The cp genome of X. spinosum was 152,422 base pairs (bp) in length, with a quadripartite circular structure. The cp genome contained 115 unique genes, including 80 PCGs, 31 tRNA genes, and 4 rRNA genes. Comparative analyses revealed that X. spinosum contains a large number of repeats (999 repeats) and 701 SSRs in its cp genome. Fourteen divergences (Π > 0.03) were found in the intergenic spacer regions. Phylogenetic analyses revealed that Parthenium is a sister clade to both Xanthium and Ambrosia and an early-diverging lineage of subtribe Ambrosiinae, although this finding was supported with a very weak bootstrap value. CONCLUSION The identified hotspot regions could be used as molecular markers for resolving phylogenetic relationships and species identification in the genus Xanthium.
Collapse
Affiliation(s)
- Gurusamy Raman
- Department of Life Sciences, Yeungnam University, Gyeongsan, Gyeongsangbuk-do, Republic of Korea, 38541
| | - Kyu Tae Park
- Department of Life Sciences, Yeungnam University, Gyeongsan, Gyeongsangbuk-do, Republic of Korea, 38541
| | - Joo-Hwan Kim
- Department of Life Science, Gachon University, Seongnam, Gyeonggi-do, Republic of Korea
| | - SeonJoo Park
- Department of Life Sciences, Yeungnam University, Gyeongsan, Gyeongsangbuk-do, Republic of Korea, 38541.
| |
Collapse
|
6
|
Zhang H, Li D, Zhao X, Pan S, Wu X, Peng S, Huang H, Shi R, Tan Z. Relatively semi-conservative replication and a folded slippage model for short tandem repeats. BMC Genomics 2020; 21:563. [PMID: 32807079 PMCID: PMC7430839 DOI: 10.1186/s12864-020-06949-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2020] [Accepted: 07/27/2020] [Indexed: 12/11/2022] Open
Abstract
Background The ubiquitous presence of short tandem repeats (STRs) in virtually all genomes implicates their functional relevance, while a widely-accepted definition of STR is yet to be established. Previous studies majorly focus on relatively longer STRs, while shorter repeats were generally excluded. Herein, we have adopted a more generous criteria to define shorter repeats, which has led to the definition of a much larger number of STRs that lack prior analysis. Using this definition, we analyzed the short repeats in 55 randomly selected segments in 55 randomly selected genomic sequences from a fairly wide range of species covering animals, plants, fungi, protozoa, bacteria, archaea and viruses. Results Our analysis reveals a high percentage of short repeats in all 55 randomly selected segments, indicating that the universal presence of high-content short repeats could be a common characteristic of genomes across all biological kingdoms. Therefore, it is reasonable to assume a mechanism for continuous production of repeats that can make the replicating process relatively semi-conservative. We have proposed a folded replication slippage model that considers the geometric space of nucleotides and hydrogen bond stability to explain the mechanism more explicitly, with improving the existing straight-line slippage model. The folded slippage model can explain the expansion and contraction of mono- to hexa- nucleotide repeats with proper folding angles. Analysis of external forces in the folding template strands also suggests that expansion exists more commonly than contraction in the short tandem repeats. Conclusion The folded replication slippage model provides a reasonable explanation for the continuous occurrences of simple sequence repeats in genomes. This model also contributes to the explanation of STR-to-genome evolution and is an alternative model that complements semi-conservative replication.
Collapse
Affiliation(s)
- Hongxi Zhang
- Bioinformatics Center, College of Biology, Hunan University, Changsha, 410082, China
| | - Douyue Li
- Bioinformatics Center, College of Biology, Hunan University, Changsha, 410082, China
| | - Xiangyan Zhao
- Bioinformatics Center, College of Biology, Hunan University, Changsha, 410082, China
| | - Saichao Pan
- Bioinformatics Center, College of Biology, Hunan University, Changsha, 410082, China
| | - Xiaolong Wu
- Bioinformatics Center, College of Biology, Hunan University, Changsha, 410082, China
| | - Shan Peng
- Bioinformatics Center, College of Biology, Hunan University, Changsha, 410082, China
| | - Hanrou Huang
- Bioinformatics Center, College of Biology, Hunan University, Changsha, 410082, China
| | - Ruixue Shi
- Bioinformatics Center, College of Biology, Hunan University, Changsha, 410082, China
| | - Zhongyang Tan
- Bioinformatics Center, College of Biology, Hunan University, Changsha, 410082, China.
| |
Collapse
|
7
|
Toubiana W, Khila A. Fluctuating selection strength and intense male competition underlie variation and exaggeration of a water strider's male weapon. Proc Biol Sci 2020; 286:20182400. [PMID: 30991924 PMCID: PMC6501938 DOI: 10.1098/rspb.2018.2400] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
Sexually selected traits can reach high degrees of phenotypic expression and variation under directional selection. A growing number of studies suggest that such selection can vary in space, time and form within and between populations. However, the impact of these fluctuations on sexual trait evolution is poorly understood. In the water strider Microvelia longipes, males display striking trait exaggeration and phenotypic variation manifested as extreme differences in the rear leg length. To study the origin and maintenance of this exaggerated trait, we conducted comparative behavioural, morphometric and reaction norm experiments in a selection of Microvelia species. We uncovered differences both in the mating behaviour and the degree of sexual dimorphism across these species. Interestingly, M. longipes evolved a specific mating behaviour where males compete for egg-laying sites, consisting of small floating objects, to intercept and copulate with gravid females. Through male–male competition assays, we demonstrated that male rear legs are used as weapons to dominate egg-laying sites and that intense competition is associated with the evolution of rear leg length exaggeration. Field observations revealed rapid fluctuation in M. longipes habitat stability and the abundance of egg-laying sites. Paternity tests using genetic markers demonstrated that small males could only fertilize about 5% of the eggs when egg-laying sites are limiting, whereas this proportion increased to about 20% when egg-laying sites become abundant. Furthermore, diet manipulation and artificial selection experiments also showed that the exaggerated leg length in M. longipes males is influenced by both genetic and nutritional factors. Collectively, our results highlight how fluctuation in the strength of directional sexual selection, through changes in the intensity of male competition, can drive the exaggeration and phenotypic variation in this weapon trait.
Collapse
Affiliation(s)
- William Toubiana
- Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, Université Claude Bernard Lyon 1, CNRS UMR 5242 , Ecole Normale Supérieure de Lyon, 46, allée d'Italie, 69364 Lyon Cedex 07 , France
| | - Abderrahman Khila
- Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, Université Claude Bernard Lyon 1, CNRS UMR 5242 , Ecole Normale Supérieure de Lyon, 46, allée d'Italie, 69364 Lyon Cedex 07 , France
| |
Collapse
|
8
|
Genovese LM, Mosca MM, Pellegrini M, Geraci F. Dot2dot: accurate whole-genome tandem repeats discovery. Bioinformatics 2019; 35:914-922. [PMID: 30165507 PMCID: PMC6419916 DOI: 10.1093/bioinformatics/bty747] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2018] [Revised: 08/03/2018] [Accepted: 08/24/2018] [Indexed: 01/18/2023] Open
Abstract
MOTIVATION Large-scale sequencing projects have confirmed the hypothesis that eukaryotic DNA is rich in repetitions whose functional role needs to be elucidated. In particular, tandem repeats (TRs) (i.e. short, almost identical sequences that lie adjacent to each other) have been associated to many cellular processes and, indeed, are also involved in several genetic disorders. The need of comprehensive lists of TRs for association studies and the absence of a computational model able to capture their variability have revived research on discovery algorithms. RESULTS Building upon the idea that sequence similarities can be easily displayed using graphical methods, we formalized the structure that TRs induce in dot-plot matrices where a sequence is compared with itself. Leveraging on the observation that a compact representation of these matrices can be built and searched in linear time, we developed Dot2dot: an accurate algorithm fast enough to be suitable for whole-genome discovery of TRs. Experiments on five manually curated collections of TRs have shown that Dot2dot is more accurate than other established methods, and completes the analysis of the biggest known reference genome in about one day on a standard PC. AVAILABILITY AND IMPLEMENTATION Source code and datasets are freely available upon paper acceptance at the URL: https://github.com/Gege7177/Dot2dot. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Marco M Mosca
- Department of Computer Science, University of Liverpool, Liverpool, UK
| | - Marco Pellegrini
- Institute for Informatics and Telematics, CNR, Pisa, Italy.,Laboratory of Integrative Systems Medicine (LISM), Institute of Informatics and Telematics and Institute of Clinical Physiology, Pisa, Italy
| | - Filippo Geraci
- Institute for Informatics and Telematics, CNR, Pisa, Italy
| |
Collapse
|
9
|
Ruperao P, Edwards D. Bioinformatics: identification of markers from next-generation sequence data. Methods Mol Biol 2015; 1245:29-47. [PMID: 25373747 DOI: 10.1007/978-1-4939-1966-6_3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
With the advent of sequencing technology, next-generation sequencing (NGS) technology has dramatically revolutionized plant genomics. NGS technology combined with new software tools enables the discovery, validation, and assessment of genetic markers on a large scale. Among different markers systems, simple sequence repeats (SSRs) and Single nucleotide polymorphisms (SNPs) are the markers of choice for genetics and plant breeding. SSR markers have been a choice for large-scale characterization of germplasm collections, construction of genetic maps, and QTL identification. Similarly, SNPs are the most abundant genetic variations with higher frequencies throughout the genome of plant species. This chapter discusses various tools available for genome assembly and widely focuses on SSR and SNP marker discovery.
Collapse
Affiliation(s)
- Pradeep Ruperao
- School of Agriculture and Food Sciences, University of Queensland, Brisbane, QLD, Australia
| | | |
Collapse
|
10
|
Karaca M, Ince AG, Aydin A, Ay ST. Cross-genera transferable e-microsatellite markers for 12 genera of the Lamiaceae family. JOURNAL OF THE SCIENCE OF FOOD AND AGRICULTURE 2013; 93:1869-1879. [PMID: 23238626 DOI: 10.1002/jsfa.5982] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2012] [Revised: 10/10/2012] [Accepted: 11/09/2012] [Indexed: 06/01/2023]
Abstract
BACKGROUND The Lamiaceae family contains many high-valued medicinal, aromatic and ornamental plant species. Several members of the genera in this family are under heavy pressure of collection for commercial use. DNA markers such as microsatellites could be used to identify commercially important genotypes and to select high-yielding ones for development of new varieties. RESULTS A total of 12,432 expressed sequence tags (ESTs) from Salvia fruticosa, S. miltiorrhiza, S. sclarea and Stenogyne rugosa were analyzed. A total of 6216 ESTs were found to be unique according the redundancy test used. Results of this study indicated that the use of redundant ESTs in comparison to non-redundant ESTs was advantageous in terms of higher cross-genera transferability of the markers. A total of 75 EST-microsatellite primer pairs were tested using two different polymerase chain reaction amplification profiles and 52 were found to be cross-genera transferable. Cross-genera transferability of the e-microsatellite primer pairs varied from one species to 12 species tested. It was noted that cross-genera transferability of e-microsatellite primer pairs decreased as the evolutionary distance between the sources and target species increased. CONCLUSION This study indicated that EST resources from Salvia spp. and Stenogyne rugosa could be successfully used to identify cross-genera transferable e-microsatellite markers for uncharacterized genomes of the genera in the Lamiaceae family. These e-microsatellite markers could allow one to perform comparative analyses of population structure and genomic studies, and facilitate comparative linkage mapping in the genera studied. E-microsatellite primer pairs reported in this manuscript are equivalent to a total of 135 e-microsatellite primer pairs since many e-microsatellite primer pairs show cross-genera transferability.
Collapse
Affiliation(s)
- Mehmet Karaca
- Department of Field Crops, Faculty of Agriculture, Akdeniz University, Antalya, 07059, Turkey.
| | | | | | | |
Collapse
|
11
|
Grover A, Aishwarya V, Sharma PC. Searching microsatellites in DNA sequences: approaches used and tools developed. PHYSIOLOGY AND MOLECULAR BIOLOGY OF PLANTS : AN INTERNATIONAL JOURNAL OF FUNCTIONAL PLANT BIOLOGY 2012; 18:11-9. [PMID: 23573036 PMCID: PMC3550526 DOI: 10.1007/s12298-011-0098-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Microsatellite instability associated genomic activities and evolutionary changes have led to a renewed focus on microsatellite research. In last decade, a number of microsatellite mining tools have been introduced based on different computational approaches. The choice is generally made between slow but exhaustive dynamic programming based approaches, or fast and incomplete heuristic methods. Tools based on stochastic approaches are more popular due to their simplicity and added ornamental features. We have performed a comparative evaluation of the relative efficiency of some microsatellite search tools with their default settings. The graphical user interface, the statistical analysis of the output and ability to mine imperfect repeats are the most important criteria in selecting a tool for a particular investigation. However, none of the available tools alone provides complete and accurate information about microsatellites, and a lot depends on the discretion of the user.
Collapse
Affiliation(s)
- Atul Grover
- />University School of Biotechnology, Guru Gobind Singh Indraprastha University, Sector 16C Dwarka, New Delhi, 110075 India
- />Molecular Biology and Genetic Engineering Laboratory, Defence Institute of Bio Energy Research, Goraparao, Haldwani, 263139 India
| | - Veenu Aishwarya
- />University School of Biotechnology, Guru Gobind Singh Indraprastha University, Sector 16C Dwarka, New Delhi, 110075 India
- />Division of Hematology/Oncology, Department of Medicine, University of Pennsylvania School of Medicine, Philadelphia, PA USA
| | - P. C. Sharma
- />University School of Biotechnology, Guru Gobind Singh Indraprastha University, Sector 16C Dwarka, New Delhi, 110075 India
| |
Collapse
|
12
|
Abstract
Advances in sequencing technologies have fundamentally changed the pace of genome sequencing projects and have contributed to the ever-increasing volume of genomic data. This has been paralleled by an increase in computational power and resources to process and translate raw sequence data into meaningful information. In addition to protein coding regions, an integral part of all the genomes studied so far has been the presence of repetitive sequences. Previously considered as "junk," numerous studies have implicated repetitive sequences in important biological and structural roles in the genome. Therefore, the identification and characterization of these repetitive sequences has become an indispensable part of genome sequencing projects. Numerous similarity-based and de novo methods have been developed to search for and annotate repeats in the genome, many of which have been discussed in this chapter.
Collapse
|
13
|
Perry JC, Rowe L. Rapid microsatellite development for water striders by next-generation sequencing. ACTA ACUST UNITED AC 2010; 102:125-9. [PMID: 20810468 DOI: 10.1093/jhered/esq099] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Water striders have become a model system for studies of sexual conflict and coevolution, but progress is currently limited by a lack of genetic resources. Next-generation sequencing technologies offer the potential for rapid and cost-effective development of molecular markers and hold particular promise for model organisms in ecology for which no reference genome exists. We used Roche 454 sequencing of genomic DNA to identify microsatellite loci for the water strider Gerris incognitus. A modest sequencing volume generated 182,912 reads, of which 30,820 (16.8%) contained microsatellite repeats. We selected 23 loci for primer development, based on criteria that maximized the likelihood of amplifying polymorphic loci, and tested them in G. incognitus and the related species G. buenoi. Of the 16 amplifying loci, 10 yielded reliable amplification and detectable polymorphism, with an average of 6.1 alleles per locus (range: 2-12). These markers should facilitate new avenues of study, including postcopulatory sexual selection, population genetic structure, phylogeography, and sexual coevolution, for a key taxon in studies of mating conflict. The current study demonstrates an effective method for microsatellite development and shows that light sequencing of genomic DNA can provide numerous and highly variable markers.
Collapse
Affiliation(s)
- Jen C Perry
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto ON M5S 3B2, Canada.
| | | |
Collapse
|
14
|
Rouchka EC. Database of exact tandem repeats in the Zebrafish genome. BMC Genomics 2010; 11:347. [PMID: 20515480 PMCID: PMC2901318 DOI: 10.1186/1471-2164-11-347] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2009] [Accepted: 06/01/2010] [Indexed: 11/23/2022] Open
Abstract
Background Sequencing of the approximately 1.7 billion bases of the zebrafish genome is currently underway. To date, few high resolution genetic maps exist for the zebrafish genome, based mainly on single nucleotide polymorphisms (SNPs) and short microsatellite repeats. The desire to construct a higher resolution genetic map led to the construction of a database of tandemly repeating elements within the zebrafish Zv8 assembly. Description Exact tandem repeats with a repeat length of at least three bases and a copy number of at least 10 were reported. Repeats with a total length of 250 or fewer bases and their flanking regions were masked for known vertebrate repeats. Optimal primer pairs were computationally designed in the regions flanking the detected repeats. This database of exact tandem repeats can then be used as a resource by molecular biologists with interests in experimentally testing VNTRs within a zebrafish population. Conclusions A total of 116,915 repeats with a base length of at least three nucleotides were detected. The longest of these was a 54-base repeat with fourteen tandem copies. A significant number of repeats with a base length of 18, 24, 27 and 30 were detected, many with potentially novel proline-rich coding regions. Detection of exact tandem repeats in the zebrafish genome leads to a wealth of information regarding potential polymorphic sites for VNTRs. The association of many of these repeats with potentially novel yet similar coding regions yields an exciting potential for disease associated genes. A web interface for querying repeats is available at http://bioinformatics.louisville.edu/zebrafish/. This portal allows for users to search for a repeats of a selected base size from any valid specified region within the 25 linkage groups.
Collapse
Affiliation(s)
- Eric C Rouchka
- Department of Computer Engineering and Computer Science, Speed School of Engineering, University of Louisville, Duthie Center, Room 208, Louisville, KY, USA.
| |
Collapse
|
15
|
Minisatellites as DNA markers to classify bermudagrasses (Cynodon spp.): confirmation of minisatellite in amplified products. J Genet 2008; 87:83-6. [DOI: 10.1007/s12041-008-0011-9] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
16
|
Sharma PC, Grover A, Kahl G. Mining microsatellites in eukaryotic genomes. Trends Biotechnol 2007; 25:490-8. [PMID: 17945369 DOI: 10.1016/j.tibtech.2007.07.013] [Citation(s) in RCA: 170] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2007] [Revised: 07/12/2007] [Accepted: 07/31/2007] [Indexed: 12/13/2022]
Abstract
During recent decades, microsatellites have become the most popular source of genetic markers. More recently, the availability of enormous sequence data for a large number of eukaryotic genomes has accelerated research aimed at understanding the origin and functions of microsatellites and searching for new applications. This review presents recent developments of in silico mining of microsatellites to reveal various facets of the distribution and dynamics of microsatellites in eukaryotic genomes. Two aspects of microsatellite search strategies--using a suitable search tool and accessing a relevant microsatellite database--have been explored. Judicious microsatellite mining not only helps in addressing biological questions but also facilitates better exploitation of microsatellites for diverse applications.
Collapse
Affiliation(s)
- Prakash C Sharma
- University School of Biotechnology, Guru Gobind Singh Indraprastha University, Kashmere Gate, Delhi 110 006, India.
| | | | | |
Collapse
|