1
|
Loh CA, Shields DA, Schwing A, Evrony GD. High-fidelity, large-scale targeted profiling of microsatellites. Genome Res 2024; 34:1008-1026. [PMID: 39013593 PMCID: PMC11368184 DOI: 10.1101/gr.278785.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 07/11/2024] [Indexed: 07/18/2024]
Abstract
Microsatellites are highly mutable sequences that can serve as markers for relationships among individuals or cells within a population. The accuracy and resolution of reconstructing these relationships depends on the fidelity of microsatellite profiling and the number of microsatellites profiled. However, current methods for targeted profiling of microsatellites incur significant "stutter" artifacts that interfere with accurate genotyping, and sequencing costs preclude whole-genome microsatellite profiling of a large number of samples. We developed a novel method for accurate and cost-effective targeted profiling of a panel of more than 150,000 microsatellites per sample, along with a computational tool for designing large-scale microsatellite panels. Our method addresses the greatest challenge for microsatellite profiling-"stutter" artifacts-with a low-temperature hybridization capture that significantly reduces these artifacts. We also developed a computational tool for accurate genotyping of the resulting microsatellite sequencing data that uses an ensemble approach integrating three microsatellite genotyping tools, which we optimize by analysis of de novo microsatellite mutations in human trios. Altogether, our suite of experimental and computational tools enables high-fidelity, large-scale profiling of microsatellites, which may find utility in diverse applications such as lineage tracing, population genetics, ecology, and forensics.
Collapse
Affiliation(s)
- Caitlin A Loh
- Center for Human Genetics and Genomics, New York University Grossman School of Medicine, New York, New York 10016, USA
- Department of Pediatrics, Department of Neuroscience & Physiology, Institute for Systems Genetics, Perlmutter Cancer Center, and Neuroscience Institute, New York University Grossman School of Medicine, New York, New York 10016, USA
| | - Danielle A Shields
- Center for Human Genetics and Genomics, New York University Grossman School of Medicine, New York, New York 10016, USA
- Department of Pediatrics, Department of Neuroscience & Physiology, Institute for Systems Genetics, Perlmutter Cancer Center, and Neuroscience Institute, New York University Grossman School of Medicine, New York, New York 10016, USA
| | - Adam Schwing
- Center for Human Genetics and Genomics, New York University Grossman School of Medicine, New York, New York 10016, USA
- Department of Pediatrics, Department of Neuroscience & Physiology, Institute for Systems Genetics, Perlmutter Cancer Center, and Neuroscience Institute, New York University Grossman School of Medicine, New York, New York 10016, USA
| | - Gilad D Evrony
- Center for Human Genetics and Genomics, New York University Grossman School of Medicine, New York, New York 10016, USA;
- Department of Pediatrics, Department of Neuroscience & Physiology, Institute for Systems Genetics, Perlmutter Cancer Center, and Neuroscience Institute, New York University Grossman School of Medicine, New York, New York 10016, USA
| |
Collapse
|
2
|
Behboudi R, Nouri-Baygi M, Naghibzadeh M. RPTRF: A rapid perfect tandem repeat finder tool for DNA sequences. Biosystems 2023; 226:104869. [PMID: 36858110 DOI: 10.1016/j.biosystems.2023.104869] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 01/23/2023] [Accepted: 02/23/2023] [Indexed: 03/02/2023]
Abstract
The sequencing of eukaryotic genomes has shown that tandem repeats are abundant in their sequences. In addition to affecting some cellular processes, tandem repeats in the genome may be associated with specific diseases and have been the key to resolving criminal cases. Any tool developed for detecting tandem repeats must be accurate, fast, and useable in thousands of laboratories worldwide, including those with not very advanced computing capabilities. The proposed method, the Rapid Perfect Tandem Repeat Finder (RPTRF), minimizes the need for excess character comparison processing by indexing the input file and significantly helps to accelerate and prepare the output without artifacts by using an interval tree in the filtering section. The experiments demonstrated that the RPTRF is very fast in discovering all perfect tandem repeats of all categories of any genomic sequences. Although the detection of imperfect TRs is not the focus of the RPTRF, comparisons show that it even outperforms some other tools (in five selected gold standards) designed explicitly for this purpose. The implemented tool and how to use it are available on GitHub.
Collapse
Affiliation(s)
- Reza Behboudi
- Department of Computer Engineering, Ferdowsi University of Mashhad, Mashhad, Iran
| | - Mostafa Nouri-Baygi
- Department of Computer Engineering, Ferdowsi University of Mashhad, Mashhad, Iran.
| | - Mahmoud Naghibzadeh
- Department of Computer Engineering, Ferdowsi University of Mashhad, Mashhad, Iran
| |
Collapse
|
3
|
Liu J, Maxwell M, Cuddihy T, Crawford T, Bassetti M, Hyde C, Peigneur S, Tytgat J, Undheim EAB, Mobli M. ScrepYard: An online resource for disulfide-stabilized tandem repeat peptides. Protein Sci 2023; 32:e4566. [PMID: 36644825 PMCID: PMC9885460 DOI: 10.1002/pro.4566] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 01/05/2023] [Accepted: 01/12/2023] [Indexed: 01/17/2023]
Abstract
Receptor avidity through multivalency is a highly sought-after property of ligands. While readily available in nature in the form of bivalent antibodies, this property remains challenging to engineer in synthetic molecules. The discovery of several bivalent venom peptides containing two homologous and independently folded domains (in a tandem repeat arrangement) has provided a unique opportunity to better understand the underpinning design of multivalency in multimeric biomolecules, as well as how naturally occurring multivalent ligands can be identified. In previous work, we classified these molecules as a larger class termed secreted cysteine-rich repeat-proteins (SCREPs). Here, we present an online resource; ScrepYard, designed to assist researchers in identification of SCREP sequences of interest and to aid in characterizing this emerging class of biomolecules. Analysis of sequences within the ScrepYard reveals that two-domain tandem repeats constitute the most abundant SCREP domain architecture, while the interdomain "linker" regions connecting the functional domains are found to be abundant in amino acids with short or polar sidechains and contain an unusually high abundance of proline residues. Finally, we demonstrate the utility of ScrepYard as a virtual screening tool for discovery of putatively multivalent peptides, by using it as a resource to identify a previously uncharacterized serine protease inhibitor and confirm its predicted activity using an enzyme assay.
Collapse
Affiliation(s)
- Junyu Liu
- Centre for Advanced ImagingThe University of QueenslandSt. LuciaQueenslandAustralia
| | - Michael Maxwell
- Centre for Advanced ImagingThe University of QueenslandSt. LuciaQueenslandAustralia
| | - Thom Cuddihy
- Queensland Cyber Infrastructure Foundation Ltd.The University of QueenslandSt. LuciaQueenslandAustralia,Centre for Clinical ResearchThe University of QueenslandSt. LuciaQueenslandAustralia
| | - Theo Crawford
- Centre for Advanced ImagingThe University of QueenslandSt. LuciaQueenslandAustralia
| | - Madeline Bassetti
- Queensland Cyber Infrastructure Foundation Ltd.The University of QueenslandSt. LuciaQueenslandAustralia
| | - Cameron Hyde
- Queensland Cyber Infrastructure Foundation Ltd.The University of QueenslandSt. LuciaQueenslandAustralia,University of the Sunshine CoastMaroochydoreQueenslandAustralia
| | - Steve Peigneur
- Toxicology and PharmacologyUniversity of Leuven (KU Leuven)LeuvenBelgium
| | - Jan Tytgat
- Toxicology and PharmacologyUniversity of Leuven (KU Leuven)LeuvenBelgium
| | - Eivind A. B. Undheim
- Centre for Advanced ImagingThe University of QueenslandSt. LuciaQueenslandAustralia,Centre for Ecological and Evolutionary Synthesis, Department of BiosciencesUniversity of OsloOsloNorway
| | - Mehdi Mobli
- Centre for Advanced ImagingThe University of QueenslandSt. LuciaQueenslandAustralia
| |
Collapse
|
4
|
Li Z, Chen F, Huang C, Zheng W, Yu C, Cheng H, Zhou R. Genome-wide mapping and characterization of microsatellites in the swamp eel genome. Sci Rep 2017; 7:3157. [PMID: 28600492 PMCID: PMC5466649 DOI: 10.1038/s41598-017-03330-7] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2017] [Accepted: 04/26/2017] [Indexed: 11/09/2022] Open
Abstract
We described genome-wide screening and characterization of microsatellites in the swamp eel genome. A total of 99,293 microsatellite loci were identified in the genome with an overall density of 179 microsatellites per megabase of genomic sequences. The dinucleotide microsatellites were the most abundant type representing 71% of the total microsatellite loci and the AC-rich motifs were the most recurrent in all repeat types. Microsatellite frequency decreased as numbers of repeat units increased, which was more obvious in long than short microsatellite motifs. Most of microsatellites were located in non-coding regions, whereas only approximately 1% of the microsatellites were detected in coding regions. Trinucleotide repeats were most abundant microsatellites in the coding regions, which represented amino acid repeats in proteins. There was a chromosome-biased distribution of microsatellites in non-coding regions, with the highest density of 203.95/Mb on chromosome 8 and the least on chromosome 7 (164.06/Mb). The most abundant dinucleotides (AC)n was mainly located on chromosome 8. Notably, genomic mapping showed that there was a chromosome-biased association of genomic distributions between microsatellites and transposon elements. Thus, the novel dataset of microsatellites in swamp eel provides a valuable resource for further studies on QTL-based selection breeding, genetic resource conservation and evolutionary genetics.
Collapse
Affiliation(s)
- Zhigang Li
- Hubei Key Laboratory of Cell Homeostasis, Laboratory of Molecular and Developmental Genetics, College of Life Sciences, Wuhan University, Wuhan, 430072, P. R. China
| | - Feng Chen
- Hubei Key Laboratory of Cell Homeostasis, Laboratory of Molecular and Developmental Genetics, College of Life Sciences, Wuhan University, Wuhan, 430072, P. R. China
| | - Chunhua Huang
- Hubei Key Laboratory of Cell Homeostasis, Laboratory of Molecular and Developmental Genetics, College of Life Sciences, Wuhan University, Wuhan, 430072, P. R. China
| | - Weixin Zheng
- Hubei Key Laboratory of Cell Homeostasis, Laboratory of Molecular and Developmental Genetics, College of Life Sciences, Wuhan University, Wuhan, 430072, P. R. China
| | - Chunlai Yu
- Hubei Key Laboratory of Cell Homeostasis, Laboratory of Molecular and Developmental Genetics, College of Life Sciences, Wuhan University, Wuhan, 430072, P. R. China
| | - Hanhua Cheng
- Hubei Key Laboratory of Cell Homeostasis, Laboratory of Molecular and Developmental Genetics, College of Life Sciences, Wuhan University, Wuhan, 430072, P. R. China.
| | - Rongjia Zhou
- Hubei Key Laboratory of Cell Homeostasis, Laboratory of Molecular and Developmental Genetics, College of Life Sciences, Wuhan University, Wuhan, 430072, P. R. China.
| |
Collapse
|
5
|
Fungtammasan A, Tomaszkiewicz M, Campos-Sánchez R, Eckert KA, DeGiorgio M, Makova KD. Reverse Transcription Errors and RNA-DNA Differences at Short Tandem Repeats. Mol Biol Evol 2016; 33:2744-58. [PMID: 27413049 PMCID: PMC5026258 DOI: 10.1093/molbev/msw139] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Transcript variation has important implications for organismal function in health and disease. Most transcriptome studies focus on assessing variation in gene expression levels and isoform representation. Variation at the level of transcript sequence is caused by RNA editing and transcription errors, and leads to nongenetically encoded transcript variants, or RNA–DNA differences (RDDs). Such variation has been understudied, in part because its detection is obscured by reverse transcription (RT) and sequencing errors. It has only been evaluated for intertranscript base substitution differences. Here, we investigated transcript sequence variation for short tandem repeats (STRs). We developed the first maximum-likelihood estimator (MLE) to infer RT error and RDD rates, taking next generation sequencing error rates into account. Using the MLE, we empirically evaluated RT error and RDD rates for STRs in a large-scale DNA and RNA replicated sequencing experiment conducted in a primate species. The RT error rates increased exponentially with STR length and were biased toward expansions. The RDD rates were approximately 1 order of magnitude lower than the RT error rates. The RT error rates estimated with the MLE from a primate data set were concordant with those estimated with an independent method, barcoded RNA sequencing, from a Caenorhabditis elegans data set. Our results have important implications for medical genomics, as STR allelic variation is associated with >40 diseases. STR nonallelic transcript variation can also contribute to disease phenotype. The MLE and empirical rates presented here can be used to evaluate the probability of disease-associated transcripts arising due to RDD.
Collapse
Affiliation(s)
- Arkarachai Fungtammasan
- Integrative Biosciences, Bioinformatics and Genomics Option, Pennsylvania State University Department of Biology, Pennsylvania State University Center for Medical Genomics, Pennsylvania State University Huck Institute of Genome Sciences, Pennsylvania State University
| | - Marta Tomaszkiewicz
- Department of Biology, Pennsylvania State University Center for Medical Genomics, Pennsylvania State University
| | - Rebeca Campos-Sánchez
- Department of Biology, Pennsylvania State University Center for Medical Genomics, Pennsylvania State University
| | - Kristin A Eckert
- Center for Medical Genomics, Pennsylvania State University Department of Pathology, The Jake Gittlen Laboratories for Cancer Research, The Pennsylvania State University College of Medicine
| | - Michael DeGiorgio
- Department of Biology, Pennsylvania State University Center for Medical Genomics, Pennsylvania State University Institute for CyberScience, Pennsylvania State University
| | - Kateryna D Makova
- Department of Biology, Pennsylvania State University Center for Medical Genomics, Pennsylvania State University Huck Institute of Genome Sciences, Pennsylvania State University
| |
Collapse
|
6
|
Someswara Rao C, Raju SV. Next generation sequencing (NGS) database for tandem repeats with multiple pattern 2°-shaft multicore string matching. GENOMICS DATA 2016; 7:307-17. [PMID: 26981434 PMCID: PMC4778683 DOI: 10.1016/j.gdata.2016.01.015] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/14/2015] [Revised: 01/15/2016] [Accepted: 01/27/2016] [Indexed: 11/25/2022]
Abstract
Next generation sequencing (NGS) technologies have been rapidly applied in biomedical and biological research in recent years. To provide the comprehensive NGS resource for the research, in this paper , we have considered 10 loci/codi/repeats TAGA, TCAT, GAAT, AGAT, AGAA, GATA, TATC, CTTT, TCTG and TCTA. Then we developed the NGS Tandem Repeat Database (TandemRepeatDB) for all the chromosomes of Homo sapiens, Callithrix jacchus, Chlorocebus sabaeus, Gorilla gorilla, Macaca fascicularis, Macaca mulatta, Nomascus leucogenys, Pan troglodytes, Papio anubis and Pongo abelii genome data sets for all those locis. We find the successive occurence frequency for all the above 10 SSR (simple sequence repeats) in the above genome data sets on a chromosome-by-chromosome basis with multiple pattern 2° shaft multicore string matching.
Collapse
Affiliation(s)
| | - S Viswanadha Raju
- Department of CSE, JNTUCEJ, JNTUniversity Hyderabad, Telangana, India
| |
Collapse
|
7
|
Liang KC, Tseng JT, Tsai SJ, Sun HS. Characterization and distribution of repetitive elements in association with genes in the human genome. Comput Biol Chem 2015; 57:29-38. [DOI: 10.1016/j.compbiolchem.2015.02.007] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2014] [Accepted: 02/03/2015] [Indexed: 11/27/2022]
|
8
|
Yu J, Ke T, Tehrim S, Sun F, Liao B, Hua W. PTGBase: an integrated database to study tandem duplicated genes in plants. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2015; 2015:bav017. [PMID: 25797062 PMCID: PMC4369376 DOI: 10.1093/database/bav017] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Tandem duplication is a wide-spread phenomenon in plant genomes and plays significant roles in evolution and adaptation to changing environments. Tandem duplicated genes related to certain functions will lead to the expansion of gene families and bring increase of gene dosage in the form of gene cluster arrays. Many tandem duplication events have been studied in plant genomes; yet, there is a surprising shortage of efforts to systematically present the integration of large amounts of information about publicly deposited tandem duplicated gene data across the plant kingdom. To address this shortcoming, we developed the first plant tandem duplicated genes database, PTGBase. It delivers the most comprehensive resource available to date, spanning 39 plant genomes, including model species and newly sequenced species alike. Across these genomes, 54 130 tandem duplicated gene clusters (129 652 genes) are presented in the database. Each tandem array, as well as its member genes, is characterized in complete detail. Tandem duplicated genes in PTGBase can be explored through browsing or searching by identifiers or keywords of functional annotation and sequence similarity. Users can download tandem duplicated gene arrays easily to any scale, up to the complete annotation data set for an entire plant genome. PTGBase will be updated regularly with newly sequenced plant species as they become available. Database URL:http://ocri-genomics.org/PTGBase/.
Collapse
Affiliation(s)
- Jingyin Yu
- The Key Laboratory of Biology and Genetic Improvement of Oil Crops, Ministry of Agriculture, Oil Crops Research Institute, Chinese Academy of Agricultural Sciences, Wuhan 430062, China and Department of Life Science and Technology, Nanyang Normal University, Wolong Road, Nanyang 473061, China
| | - Tao Ke
- The Key Laboratory of Biology and Genetic Improvement of Oil Crops, Ministry of Agriculture, Oil Crops Research Institute, Chinese Academy of Agricultural Sciences, Wuhan 430062, China and Department of Life Science and Technology, Nanyang Normal University, Wolong Road, Nanyang 473061, China
| | - Sadia Tehrim
- The Key Laboratory of Biology and Genetic Improvement of Oil Crops, Ministry of Agriculture, Oil Crops Research Institute, Chinese Academy of Agricultural Sciences, Wuhan 430062, China and Department of Life Science and Technology, Nanyang Normal University, Wolong Road, Nanyang 473061, China
| | - Fengming Sun
- The Key Laboratory of Biology and Genetic Improvement of Oil Crops, Ministry of Agriculture, Oil Crops Research Institute, Chinese Academy of Agricultural Sciences, Wuhan 430062, China and Department of Life Science and Technology, Nanyang Normal University, Wolong Road, Nanyang 473061, China
| | - Boshou Liao
- The Key Laboratory of Biology and Genetic Improvement of Oil Crops, Ministry of Agriculture, Oil Crops Research Institute, Chinese Academy of Agricultural Sciences, Wuhan 430062, China and Department of Life Science and Technology, Nanyang Normal University, Wolong Road, Nanyang 473061, China
| | - Wei Hua
- The Key Laboratory of Biology and Genetic Improvement of Oil Crops, Ministry of Agriculture, Oil Crops Research Institute, Chinese Academy of Agricultural Sciences, Wuhan 430062, China and Department of Life Science and Technology, Nanyang Normal University, Wolong Road, Nanyang 473061, China
| |
Collapse
|
9
|
Taher L, Narlikar L, Ovcharenko I. Identification and computational analysis of gene regulatory elements. Cold Spring Harb Protoc 2015; 2015:pdb.top083642. [PMID: 25561628 PMCID: PMC5885252 DOI: 10.1101/pdb.top083642] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Over the last two decades, advances in experimental and computational technologies have greatly facilitated genomic research. Next-generation sequencing technologies have made de novo sequencing of large genomes affordable, and powerful computational approaches have enabled accurate annotations of genomic DNA sequences. Charting functional regions in genomes must account for not only the coding sequences, but also noncoding RNAs, repetitive elements, chromatin states, epigenetic modifications, and gene regulatory elements. A mix of comparative genomics, high-throughput biological experiments, and machine learning approaches has played a major role in this truly global effort. Here we describe some of these approaches and provide an account of our current understanding of the complex landscape of the human genome. We also present overviews of different publicly available, large-scale experimental data sets and computational tools, which we hope will prove beneficial for researchers working with large and complex genomes.
Collapse
Affiliation(s)
- Leila Taher
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894
- Institute for Biostatistics and Informatics in Medicine and Ageing Research, University of Rostock, 18051 Rostock, Germany
| | - Leelavati Narlikar
- Chemical Engineering and Process Development Division, National Chemical Laboratory, CSIR, Pune 411008, India
| | - Ivan Ovcharenko
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894
| |
Collapse
|
10
|
Chaley M, Kutyrkin V, Tulbasheva G, Teplukhina E, Nazipova N. HeteroGenome: database of genome periodicity. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2014; 2014:bau040. [PMID: 24857969 PMCID: PMC4038257 DOI: 10.1093/database/bau040] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
We present the first release of the HeteroGenome database collecting latent periodicity regions in genomes. Tandem repeats and highly divergent tandem repeats along with the regions of a new type of periodicity, known as profile periodicity, have been collected for the genomes of Saccharomyces cerevisiae, Arabidopsis thaliana, Caenorhabditis elegans and Drosophila melanogaster. We obtained data with the aid of a spectral-statistical approach to search for reliable latent periodicity regions (with periods up to 2000 bp) in DNA sequences. The original two-level mode of data presentation (a broad view of the region of latent periodicity and a second level indicating conservative fragments of its structure) was further developed to enable us to obtain the estimate, without redundancy, that latent periodicity regions make up ∼10% of the analyzed genomes. Analysis of the quantitative and qualitative content of located periodicity regions on all chromosomes of the analyzed organisms revealed dominant characteristic types of periodicity in the genomes. The pattern of density distribution of latent periodicity regions on chromosome unambiguously characterizes each chromosome in genome. Database URL:http://www.jcbi.ru/lp_baze/
Collapse
Affiliation(s)
- Maria Chaley
- Laboratory of Bioinformatics, Institute of Mathematical Problems of Biology, Russian Academy of Sciences, Institutskaya st. 4, 142290 Pushchino, Russia and Department of Computational Mathematics and Mathematical Physics, Moscow State Technical University n.a. N.E. Bauman, the 2nd Baumanskaya st., 5, 105005 Moscow, Russia
| | - Vladimir Kutyrkin
- Laboratory of Bioinformatics, Institute of Mathematical Problems of Biology, Russian Academy of Sciences, Institutskaya st. 4, 142290 Pushchino, Russia and Department of Computational Mathematics and Mathematical Physics, Moscow State Technical University n.a. N.E. Bauman, the 2nd Baumanskaya st., 5, 105005 Moscow, Russia
| | - Gayane Tulbasheva
- Laboratory of Bioinformatics, Institute of Mathematical Problems of Biology, Russian Academy of Sciences, Institutskaya st. 4, 142290 Pushchino, Russia and Department of Computational Mathematics and Mathematical Physics, Moscow State Technical University n.a. N.E. Bauman, the 2nd Baumanskaya st., 5, 105005 Moscow, Russia
| | - Elena Teplukhina
- Laboratory of Bioinformatics, Institute of Mathematical Problems of Biology, Russian Academy of Sciences, Institutskaya st. 4, 142290 Pushchino, Russia and Department of Computational Mathematics and Mathematical Physics, Moscow State Technical University n.a. N.E. Bauman, the 2nd Baumanskaya st., 5, 105005 Moscow, Russia
| | - Nafisa Nazipova
- Laboratory of Bioinformatics, Institute of Mathematical Problems of Biology, Russian Academy of Sciences, Institutskaya st. 4, 142290 Pushchino, Russia and Department of Computational Mathematics and Mathematical Physics, Moscow State Technical University n.a. N.E. Bauman, the 2nd Baumanskaya st., 5, 105005 Moscow, Russia
| |
Collapse
|
11
|
Nagpure NS, Rashid I, Pati R, Pathak AK, Singh M, Singh SP, Sarkar UK. FishMicrosat: a microsatellite database of commercially important fishes and shellfishes of the Indian subcontinent. BMC Genomics 2013; 14:630. [PMID: 24047532 PMCID: PMC3852227 DOI: 10.1186/1471-2164-14-630] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2013] [Accepted: 09/11/2013] [Indexed: 11/17/2022] Open
Abstract
Background Microsatellite DNA is one of many powerful genetic markers used for the construction of genetic linkage maps and the study of population genetics. The biological databases in public domain hold vast numbers of microsatellite sequences for many organisms including fishes. The microsatellite data available in these data sources were extracted and managed into a database that facilitates sequences analysis and browsing relevant information. The system also helps to design primer sequences for flanking regions of repeat loci for PCR identification of polymorphism within populations. Description FishMicrosat is a database of microsatellite sequences of fishes and shellfishes that includes important aquaculture species such as Lates calcarifer, Ctenopharyngodon idella, Hypophthalmichthys molitrix, Penaeus monodon, Labeo rohita, Oreochromis niloticus, Fenneropenaeus indicus and Macrobrachium rosenbergii. The database contains 4398 microsatellite sequences of 41 species belonging to 15 families from the Indian subcontinent. GenBank of NCBI was used as a prime data source for developing the database. The database presents information about simple and compound microsatellites, their clusters and locus orientation within sequences. The database has been integrated with different tools in a web interface such as primer designing, locus finding, mapping repeats, detecting similarities among sequences across species, and searching using motifs and keywords. In addition, the database has the ability to browse information on the top 10 families and the top 10 species, through record overview. Conclusions FishMicrosat database is a useful resource for fish and shellfish microsatellite analyses and locus identification across species, which has important applications in population genetics, evolutionary studies and genetic relatedness among species. The database can be expanded further to include the microsatellite data of fishes and shellfishes from other regions and available information on genome sequencing project of species of aquaculture importance.
Collapse
Affiliation(s)
- Naresh Sahebrao Nagpure
- Division of Molecular Biology and Biotechnology, National Bureau of Fish Genetic Resources, Lucknow 226002, India.
| | | | | | | | | | | | | |
Collapse
|
12
|
Churbanov A, Ryan R, Hasan N, Bailey D, Chen H, Milligan B, Houde P. HighSSR: high-throughput SSR characterization and locus development from next-gen sequencing data. ACTA ACUST UNITED AC 2012; 28:2797-803. [PMID: 22954626 DOI: 10.1093/bioinformatics/bts524] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
MOTIVATION Microsatellites are among the most useful genetic markers in population biology. High-throughput sequencing of microsatellite-enriched libraries dramatically expedites the traditional process of screening recombinant libraries for microsatellite markers. However, sorting through millions of reads to distill high-quality polymorphic markers requires special algorithms tailored to tolerate sequencing errors in locus reconstruction, distinguish paralogous loci, rarify raw reads originating from the same amplicon and sort out various artificial fragments resulting from recombination or concatenation of auxiliary adapters. Existing programs warrant improvement. RESULTS We describe a microsatellite prediction framework named HighSSR for microsatellite genotyping based on high-throughput sequencing. We demonstrate the utility of HighSSR in comparison to Roche gsAssembler on two Roche 454 GS FLX runs. The majority of the HighSSR-assembled loci were reliably mapped against model organism reference genomes. HighSSR demultiplexes pooled libraries, assesses locus polymorphism and implements Primer3 for the design of PCR primers flanking polymorphic microsatellite loci. As sequencing costs drop and permit the analysis of all project samples on next-generation platforms, this framework can also be used for direct simple sequence repeats genotyping. AVAILABILITY http://code.google.com/p/highssr/
Collapse
Affiliation(s)
- Alexander Churbanov
- New Mexico State University, Biology Deptartment, MSC 3AF, PO Box 30001, Las Cruces, NM 88003, USA.
| | | | | | | | | | | | | |
Collapse
|
13
|
Affiliation(s)
- Julien Jorda
- Centre de Recherches de Biochimie Macromoléculaire UMR 5237, CNRS; University of Montpellier; 1 and 2 Montpellier France
- UCLA-DOE Institute for Genomics and Proteomics; Los Angeles CA USA
| | - Thierry Baudrand
- Centre de Recherches de Biochimie Macromoléculaire UMR 5237, CNRS; University of Montpellier; 1 and 2 Montpellier France
| | - Andrey V. Kajava
- Centre de Recherches de Biochimie Macromoléculaire UMR 5237, CNRS; University of Montpellier; 1 and 2 Montpellier France
| |
Collapse
|
14
|
Pellegrini M, Renda ME, Vecchio A. Tandem repeats discovery service (TReaDS) applied to finding novel cis-acting factors in repeat expansion diseases. BMC Bioinformatics 2012; 13 Suppl 4:S3. [PMID: 22536970 PMCID: PMC3303744 DOI: 10.1186/1471-2105-13-s4-s3] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Background Tandem repeats are multiple duplications of substrings in the DNA that occur contiguously, or at a short distance, and may involve some mutations (such as substitutions, insertions, and deletions). Tandem repeats have been extensively studied also for their association with the class of repeat expansion diseases (mostly affecting the nervous system). Comparative studies on the output of different tools for finding tandem repeats highlighted significant differences among the sets of detected tandem repeats, while many authors pointed up how critical it is the right choice of parameters. Results In this paper we present TReaDS - Tandem Repeats Discovery Service, a tandem repeat meta search engine. TReaDS forwards user requests to several state of the art tools for finding tandem repeats and merges their outcome into a single report, providing a global, synthetic, and comparative view of the results. In particular, TReaDS allows the user to (i) simultaneously run different algorithms on the same data set, (ii) choose for each algorithm a different setting of parameters, and (iii) obtain a report that can be downloaded for further, off-line, investigations. We used TReaDS to investigate sequences associated with repeat expansion diseases. Conclusions By using the tool TReaDS we discover that, for 27 repeat expansion diseases out of a currently known set of 29, long fuzzy tandem repeats are covering the expansion loci. Tests with control sets confirm the specificity of this association. This finding suggests that long fuzzy tandem repeats can be a new class of cis-acting elements involved in the mechanisms leading to the expansion instability. We strongly believe that biologists can be interested in a tool that, not only gives them the possibility of using multiple search algorithm at the same time, with the same effort exerted in using just one of the systems, but also simplifies the burden of comparing and merging the results, thus expanding our capabilities in detecting important phenomena related to tandem repeats.
Collapse
Affiliation(s)
- Marco Pellegrini
- Istituto di Informatica e Telematica, Consiglio Nazionale delle Ricerche, Pisa I-56124, Italy
| | | | | |
Collapse
|
15
|
Chaturvedi A, Tiwari S, Jesudasan RA. RiDs db: Repeats in diseases database. Bioinformation 2011; 7:96-7. [PMID: 21938212 PMCID: PMC3174043 DOI: 10.6026/97320630007096] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2011] [Accepted: 08/13/2011] [Indexed: 11/23/2022] Open
Abstract
UNLABELLED The non-coding fraction of the human genome, which is approximately 98%, is mainly constituted by repeats. Transpositions, expansions and deletions of these repeat elements contribute to a number of diseases. None of the available databases consolidates information on both tandem and interspersed repeats with the flexibility of FASTA based homology search with reference to disease genes. Repeats in diseases database (RiDs db) is a web accessible relational database, which aids analysis of repeats associated with Mendelian disorders. It is a repository of disease genes, which can be searched by FASTA program or by limitedor free- text keywords. Unlike other databases, RiDs db contains the sequences of these genes with access to corresponding information on both interspersed and tandem repeats contained within them, on a unified platform. Comparative analysis of novel or patient sequences with the reference sequences in RiDs db using FASTA search will indicate change in structure of repeats, if any, with a particular disorder. This database also provides links to orthologs in model organisms such as zebrafish, mouse and Drosophila. AVAILABILITY The database is available for free at http://115.111.90.196/ridsdb/index.php.
Collapse
Affiliation(s)
- Anurag Chaturvedi
- Centre for Cellular and Molecular Biology, Habsiguda, Hyderabad − 500007, Andhra Pradesh, India
| | - Shrish Tiwari
- Centre for Cellular and Molecular Biology, Habsiguda, Hyderabad − 500007, Andhra Pradesh, India
| | - Rachel A Jesudasan
- Centre for Cellular and Molecular Biology, Habsiguda, Hyderabad − 500007, Andhra Pradesh, India
| |
Collapse
|
16
|
Transgenerational analysis of transcriptional silencing in zebrafish. Dev Biol 2011; 352:191-201. [PMID: 21223961 DOI: 10.1016/j.ydbio.2011.01.002] [Citation(s) in RCA: 118] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2010] [Revised: 12/31/2010] [Accepted: 01/04/2011] [Indexed: 12/11/2022]
Abstract
The yeast Gal4/UAS transcriptional activation system is a powerful tool for regulating gene expression in Drosophila and has been increasing in popularity for developmental studies in zebrafish. It is also useful for studying the basis of de novo transcriptional silencing. Fluorescent reporter genes under the control of multiple tandem copies of the upstream activator sequence (UAS) often show evidence of variegated expression and DNA methylation in transgenic zebrafish embryos. To characterize this systematically, we monitored the progression of transcriptional silencing of UAS-regulated transgenes that differ in their integration sites and in the repetitive nature of the UAS. Transgenic larvae were examined in three generations for tissue-specific expression of a green fluorescent protein (GFP) reporter and DNA methylation at the UAS. Single insertions containing four distinct upstream activator sequences were far less susceptible to methylation than insertions containing fourteen copies of the same UAS. In addition, transgenes that integrated in or adjacent to transposon sequence exhibited silencing regardless of the number of UAS sites included in the transgene. Placement of promoter-driven Gal4 upstream of UAS-regulated responder genes in a single bicistronic construct also appeared to accelerate silencing and methylation. The results demonstrate the utility of the zebrafish for efficient tracking of gene silencing mechanisms across several generations, as well as provide useful guidelines for optimal Gal4-regulated gene expression in organisms subject to DNA methylation.
Collapse
|
17
|
Sokol D, Atagun F. TRedD--a database for tandem repeats over the edit distance. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2010; 2010:baq003. [PMID: 20624712 PMCID: PMC2911838 DOI: 10.1093/database/baq003] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
A ‘tandem repeat’ in DNA is a sequence of two or more contiguous, approximate copies of a pattern of nucleotides. Tandem repeats are common in the genomes of both eukaryotic and prokaryotic organisms. They are significant markers for human identity testing, disease diagnosis, sequence homology and population studies. In this article, we describe a new database, TRedD, which contains the tandem repeats found in the human genome. The database is publicly available online, and the software for locating the repeats is also freely available. The definition of tandem repeats used by TRedD is a new and innovative definition based upon the concept of ‘evolutive tandem repeats’. In addition, we have developed a tool, called TandemGraph, to graphically depict the repeats occurring in a sequence. This tool can be coupled with any repeat finding software, and it should greatly facilitate analysis of results. Database URL:http://tandem.sci.brooklyn.cuny.edu/
Collapse
Affiliation(s)
- Dina Sokol
- Department of Computer and Information Science, Brooklyn College of the City University of New York, 2900 Bedford Avenue, Brooklyn, NY 11210, USA.
| | | |
Collapse
|
18
|
Mayer C, Leese F, Tollrian R. Genome-wide analysis of tandem repeats in Daphnia pulex--a comparative approach. BMC Genomics 2010; 11:277. [PMID: 20433735 PMCID: PMC3152781 DOI: 10.1186/1471-2164-11-277] [Citation(s) in RCA: 72] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2009] [Accepted: 04/30/2010] [Indexed: 11/10/2022] Open
Abstract
Background DNA tandem repeats (TRs) are not just popular molecular markers, but are also important genomic elements from an evolutionary and functional perspective. For various genomes, the densities of short TR types were shown to differ strongly among different taxa and genomic regions. In this study we analysed the TR characteristics in the genomes of Daphnia pulex and 11 other eukaryotic species. Characteristics of TRs in different genomic regions and among different strands are compared in details for D. pulex and the two model insects Apis mellifera and Drosophila melanogaster. Results Profound differences in TR characteristics were found among all 12 genomes compared in this study. In D. pulex, the genomic density of TRs was low compared to the arthropod species D. melanogaster and A. mellifera. For these three species, very few common features in repeat type usage, density distribution, and length characteristics were observed in the genomes and in different genomic regions. In introns and coding regions an unexpectedly high strandedness was observed for several repeat motifs. In D. pulex, the density of TRs was highest in introns, a rare feature in animals. In coding regions, the density of TRs with unit sizes 7-50 bp were more than three times as high as for 1-6 bp repeats. Conclusions TRs in the genome of D. pulex show several notable features, which distinguish it from the other genomes. Altogether, the highly non-random distribution of TRs among genomes, genomic regions and even among different DNA-stands raises many questions concerning their functional and evolutionary importance. The high density of TRs with a unit size longer than 6 bp found in non-coding and coding regions underpins the importance to include longer TR units in comparative analyses.
Collapse
Affiliation(s)
- Christoph Mayer
- Department of Animal Ecology, Evolution and Biodiversity, Ruhr University Bochum, Bochum, Germany.
| | | | | |
Collapse
|
19
|
Abstract
Single nucleotide polymorphisms (SNPs) are widely distributed in the human genome and although most SNPs are the result of independent point-mutations, there are exceptions. When studying distances between SNPs, a periodic pattern in the distance between pairs of identical SNPs has been found to be heavily correlated with periodicity in short tandem repeats (STRs). STRs are short DNA segments, widely distributed in the human genome and mainly found outside known tandem repeats. Because of the biased occurrence of SNPs, special care has to be taken when analyzing SNP-variation in STRs. We present a review of STRs in the human genome and discuss molecular mechanisms related to the biased occurrence of SNPs in STRs, and its implications for genome comparisons and genetic association studies.
Collapse
Affiliation(s)
- Bo Eskerod Madsen
- AgroTech, Institute for Agri Technology and Food Innovation, Aarhus N, Denmark
| | | | | |
Collapse
|
20
|
Chaley MB, Nazipova NN, Kutyrkin VA. Statistical methods for detecting latent periodicity patterns in biological sequences: The case of small-size samples. PATTERN RECOGNITION AND IMAGE ANALYSIS 2009. [DOI: 10.1134/s1054661809020217] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
21
|
Madsen BE, Villesen P, Wiuf C. Short tandem repeats in human exons: a target for disease mutations. BMC Genomics 2008; 9:410. [PMID: 18789129 PMCID: PMC2543027 DOI: 10.1186/1471-2164-9-410] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2008] [Accepted: 09/12/2008] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND In recent years it has been demonstrated that structural variations, such as indels (insertions and deletions), are common throughout the genome, but the implications of structural variations are still not clearly understood. Long tandem repeats (e.g. microsatellites or simple repeats) are known to be hypermutable (indel-rich), but are rare in exons and only occasionally associated with diseases. Here we focus on short (imperfect) tandem repeats (STRs) which fall below the radar of conventional tandem repeat detection, and investigate whether STRs are targets for disease-related mutations in human exons. In particular, we test whether they share the hypermutability of the longer tandem repeats and whether disease-related genes have a higher STR content than non-disease-related genes. RESULTS We show that validated human indels are extremely common in STR regions compared to non-STR regions. In contrast to longer tandem repeats, our definition of STRs found them to be present in exons of most known human genes (92%), 99% of all STR sequences in exons are shorter than 33 base pairs and 62% of all STR sequences are imperfect repeats. We also demonstrate that STRs are significantly overrepresented in disease-related genes in both human and mouse. These results are preserved when we limit the analysis to STRs outside known longer tandem repeats. CONCLUSION Based on our findings we conclude that STRs represent hypermutable regions in the human genome that are linked to human disease. In addition, STRs constitute an obvious target when screening for rare mutations, because of the relatively low amount of STRs in exons (1,973,844 bp) and the limited length of STR regions.
Collapse
Affiliation(s)
- Bo Eskerod Madsen
- Bioinformatics Research Center (BiRC), University of Aarhus, DK-8000 Aarhus C, Denmark
| | - Palle Villesen
- Bioinformatics Research Center (BiRC), University of Aarhus, DK-8000 Aarhus C, Denmark
| | - Carsten Wiuf
- Bioinformatics Research Center (BiRC), University of Aarhus, DK-8000 Aarhus C, Denmark
| |
Collapse
|
22
|
Merkel A, Gemmell N. Detecting short tandem repeats from genome data: opening the software black box. Brief Bioinform 2008; 9:355-66. [PMID: 18621747 DOI: 10.1093/bib/bbn028] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Short tandem repeats, specifically microsatellites, are widely used genetic markers, associated with human genetic diseases, and play an important role in various regulatory mechanisms and evolution. Despite their importance, much is yet unknown about their mutational dynamics. The increasing availability of genome data has led to several in silico studies of microsatellite evolution which have produced a vast range of algorithms and software for tandem repeat detection. Documentation of these tools is often sparse, or provided in a format that is impenetrable to most biologists without informatics background. This article introduces the major concepts behind repeat detecting software essential for informed tool selection. We reflect on issues such as parameter settings and program bias, as well as redundancy filtering and efficiency using examples from the currently available range of programs, to provide an integrated comparison and practical guide to microsatellite detecting programs.
Collapse
Affiliation(s)
- Angelika Merkel
- School of Biological Sciences, University of Canterbury, Private Bag 4800, Christchurch 8041, New Zealand.
| | | |
Collapse
|
23
|
Abstract
Using the compiled human genome sequence, we systematically cataloged all tandem repeats with periods between 20 and 2000 bp and defined two subsets whose consensus sequences were found at either single-locus tandem repeats (slTRs) or multilocus tandem repeats (mlTRs). Parameters compiled for these subsets provide insights into mechanisms underlying the creation and evolution of tandem repeats. Both subsets of tandem repeats are nonrandomly distributed in the genome, being found at higher frequency at many but not all chromosome ends and internal clusters of mlTRs were also observed. Despite the integral role of recombination in the biology of tandem repeats, recombination hotspots colocalized only with shorter microsatellites and not the longer repeats examined here. An increased frequency of slTRs was observed near imprinted genes, consistent with a functional role, while both slTRs and mlTRs were found more frequently near genes implicated in triplet expansion diseases, suggesting a general instability of these regions. Using our collated parameters, we identified 2230 slTRs as candidates for highly informative molecular markers.
Collapse
|
24
|
Grissa I, Bouchon P, Pourcel C, Vergnaud G. On-line resources for bacterial micro-evolution studies using MLVA or CRISPR typing. Biochimie 2008; 90:660-8. [PMID: 17822824 DOI: 10.1016/j.biochi.2007.07.014] [Citation(s) in RCA: 77] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2007] [Accepted: 07/19/2007] [Indexed: 10/23/2022]
Abstract
The control of bacterial pathogens requires the development of tools allowing the precise identification of strains at the subspecies level. It is now widely accepted that these tools will need to be DNA-based assays (in contrast to identification at the species level, where biochemical based assays are still widely used, even though very powerful 16S DNA sequence databases exist). Typing assays need to be cheap and amenable to the designing of international databases. The success of such subspecies typing tools will eventually be measured by the size of the associated reference databases accessible over the internet. Three methods have shown some potential in this direction, the so-called spoligotyping assay (Mycobacterium tuberculosis, 40,000 entries database), Multiple Loci Sequence Typing (MLST; up to a few thousands entries for the more than 20 bacterial species), and more recently Multiple Loci VNTR Analysis (MLVA; up to a few hundred entries, assays available for more than 20 pathogens). In the present report we will review the current status of the tools and resources we have developed along the past seven years to help in the setting-up or the use of MLVA assays or lately for analysing Clustered Regularly Interspaced Short Palindromic Repeats called CRISPRs which are the basis for spoligotyping assays.
Collapse
Affiliation(s)
- Ibtissem Grissa
- Univ Paris-Sud, Institut de Génétique et Microbiologie, Orsay F-91405, France.
| | | | | | | |
Collapse
|
25
|
Shelenkov AA, Skryabin KG, Korotkov EV. Classification analysis of a latent dinucleotide periodicity of plant genomes. RUSS J GENET+ 2008. [DOI: 10.1134/s1022795408010134] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
26
|
Model of perfect tandem repeat with random pattern and empirical homogeneity testing poly-criteria for latent periodicity revelation in biological sequences. Math Biosci 2008; 211:186-204. [DOI: 10.1016/j.mbs.2007.10.008] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2007] [Revised: 10/19/2007] [Accepted: 10/26/2007] [Indexed: 11/23/2022]
|
27
|
Abstract
MOTIVATION Microsatellites, also known as simple sequence repeats, are the tandem repeats of nucleotide motifs of the size 1-6 bp found in every genome known so far. Their importance in genomes is well known. Microsatellites are associated with various disease genes, have been used as molecular markers in linkage analysis and DNA fingerprinting studies, and also seem to play an important role in the genome evolution. Therefore, it is of importance to study distribution, enrichment and polymorphism of microsatellites in the genomes of interest. For this, the prerequisite is the availability of a computational tool for extraction of microsatellites (perfect as well as imperfect) and their related information from whole genome sequences. Examination of available tools revealed certain lacunae in them and prompted us to develop a new tool. RESULTS In order to efficiently screen genome sequences for microsatellites (perfect as well as imperfect), we developed a new tool called IMEx (Imperfect Microsatellite Extractor). IMEx uses simple string-matching algorithm with sliding window approach to screen DNA sequences for microsatellites and reports the motif, copy number, genomic location, nearby genes, mutational events and many other features useful for in-depth studies. IMEx is more sensitive, efficient and useful than the available widely used tools. IMEx is available in the form of a stand-alone program as well as in the form of a web-server. AVAILABILITY A World Wide Web server and the stand-alone program are available for free access at http://203.197.254.154/IMEX/ or http://www.cdfd.org.in/imex.
Collapse
Affiliation(s)
- Suresh B Mudunuri
- Laboratory of Computational Biology, Centre for DNA Fingerprinting and Diagnostics, ECIL Road, Nacharam, Hyderabad 500 076, India
| | | |
Collapse
|
28
|
Sakharkar MK, Sakharkar KR, Pervaiz S. Druggability of human disease genes. Int J Biochem Cell Biol 2007; 39:1156-64. [PMID: 17446117 DOI: 10.1016/j.biocel.2007.02.018] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2006] [Revised: 02/24/2007] [Accepted: 02/26/2007] [Indexed: 11/16/2022]
Abstract
The availability of complete genome sequences and the wealth of large-scale biological datasets provide an unprecedented opportunity to elucidate the genetic basis of human diseases. Here we use integrative in silico approaches to provide an accurate description of gene functions to a set of 1737 highly curated disease genes in the human genome. This analysis is the first attempt on in silico identification of druggable domains within disease genes. We provide information on gene architecture and function, druggability in the context of available drugs, and evolutionary conservation across 38 model eukaryotic genomes. These data could serve as a useful compendium for integrated information on disease genes with the potential for exploring pharmaceutically exploitable targets. Our analyses underscore the utility of large genomic databases for in silico systematic drug target identification in the post-genomic era.
Collapse
Affiliation(s)
- Meena Kishore Sakharkar
- Nanyang Centre for Supercomputing and Visualization, School of Mechanical and Aerospace Engineering (MAE), Nanyang Technological University, Singapore
| | | | | |
Collapse
|
29
|
Shelenkov A, Skryabin K, Korotkov E. Search and Classification of Potential Minisatellite Sequences from Bacterial Genomes. DNA Res 2006; 13:89-102. [PMID: 16980713 DOI: 10.1093/dnares/dsl004] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
We used the method of Information Decomposition developed by us to identify the latent dinucleotide periodicity regions in bacterial genomes. The number of potential minisatellite sequences obtained at high level of statistical significance was 454. Then we classified the periodicity matrices and obtained 45 classes. We used the other new method developed by us--Modified Profile Analysis--to reveal more periodic sequences in the presence of indels using the classes obtained. The number of sequences found by combination of these two methods was 3949. Most of them cannot be revealed by other methods including dynamic programming and Fourier transformation.
Collapse
Affiliation(s)
- Andrew Shelenkov
- Bioengineering Centre of Russian Academy of Sciences, Prospect 60-tya Oktyabrya 7/1, 117312 Moscow, Russia.
| | | | | |
Collapse
|
30
|
Ruiz-Herrera A, Castresana J, Robinson TJ. Is mammalian chromosomal evolution driven by regions of genome fragility? Genome Biol 2006; 7:R115. [PMID: 17156441 PMCID: PMC1794428 DOI: 10.1186/gb-2006-7-12-r115] [Citation(s) in RCA: 107] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2006] [Revised: 11/06/2006] [Accepted: 12/08/2006] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND A fundamental question in comparative genomics concerns the identification of mechanisms that underpin chromosomal change. In an attempt to shed light on the dynamics of mammalian genome evolution, we analyzed the distribution of syntenic blocks, evolutionary breakpoint regions, and evolutionary breakpoints taken from public databases available for seven eutherian species (mouse, rat, cattle, dog, pig, cat, and horse) and the chicken, and examined these for correspondence with human fragile sites and tandem repeats. RESULTS Our results confirm previous investigations that showed the presence of chromosomal regions in the human genome that have been repeatedly used as illustrated by a high breakpoint accumulation in certain chromosomes and chromosomal bands. We show, however, that there is a striking correspondence between fragile site location, the positions of evolutionary breakpoints, and the distribution of tandem repeats throughout the human genome, which similarly reflect a non-uniform pattern of occurrence. CONCLUSION These observations provide further evidence that certain chromosomal regions in the human genome have been repeatedly used in the evolutionary process. As a consequence, the genome is a composite of fragile regions prone to reorganization that have been conserved in different lineages, and genomic tracts that do not exhibit the same levels of evolutionary plasticity.
Collapse
Affiliation(s)
- Aurora Ruiz-Herrera
- Evolutionary Genomics Group, Department of Botany & Zoology, University of Stellenbosch, Private Bag X1, Matieland 7602, South Africa
| | - Jose Castresana
- Institut de Biologia Molecular de Barcelona, CSIC, Department of Physiology and Molecular Biodiversity, Jordi Girona 18, 08034 Barcelona, Spain
| | - Terence J Robinson
- Evolutionary Genomics Group, Department of Botany & Zoology, University of Stellenbosch, Private Bag X1, Matieland 7602, South Africa
| |
Collapse
|
31
|
Current Awareness on Comparative and Functional Genomics. Comp Funct Genomics 2005. [PMCID: PMC2447491 DOI: 10.1002/cfg.425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|