1
|
Abebe M, Candales MA, Duong A, Hood KS, Li T, Neufeld RAE, Shakenov A, Sun R, Wu L, Jarding AM, Semper C, Zimmerly S. A pipeline of programs for collecting and analyzing group II intron retroelement sequences from GenBank. Mob DNA 2013; 4:28. [PMID: 24359548 PMCID: PMC4028801 DOI: 10.1186/1759-8753-4-28] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2013] [Accepted: 10/28/2013] [Indexed: 11/16/2022] Open
Abstract
Background Accurate and complete identification of mobile elements is a challenging task in the current era of sequencing, given their large numbers and frequent truncations. Group II intron retroelements, which consist of a ribozyme and an intron-encoded protein (IEP), are usually identified in bacterial genomes through their IEP; however, the RNA component that defines the intron boundaries is often difficult to identify because of a lack of strong sequence conservation corresponding to the RNA structure. Compounding the problem of boundary definition is the fact that a majority of group II intron copies in bacteria are truncated. Results Here we present a pipeline of 11 programs that collect and analyze group II intron sequences from GenBank. The pipeline begins with a BLAST search of GenBank using a set of representative group II IEPs as queries. Subsequent steps download the corresponding genomic sequences and flanks, filter out non-group II introns, assign introns to phylogenetic subclasses, filter out incomplete and/or non-functional introns, and assign IEP sequences and RNA boundaries to the full-length introns. In the final step, the redundancy in the data set is reduced by grouping introns into sets of ≥95% identity, with one example sequence chosen to be the representative. Conclusions These programs should be useful for comprehensive identification of group II introns in sequence databases as data continue to rapidly accumulate.
Collapse
Affiliation(s)
- Michael Abebe
- Department of Biological Sciences, University of Calgary, Calgary, AB T2N 1 N4, Canada
| | - Manuel A Candales
- Department of Biological Sciences, University of Calgary, Calgary, AB T2N 1 N4, Canada
| | - Adrian Duong
- Department of Biological Sciences, University of Calgary, Calgary, AB T2N 1 N4, Canada
| | - Keyar S Hood
- Department of Biological Sciences, University of Calgary, Calgary, AB T2N 1 N4, Canada
| | - Tony Li
- Department of Biological Sciences, University of Calgary, Calgary, AB T2N 1 N4, Canada
| | - Ryan A E Neufeld
- Department of Biological Sciences, University of Calgary, Calgary, AB T2N 1 N4, Canada
| | - Abat Shakenov
- Department of Biological Sciences, University of Calgary, Calgary, AB T2N 1 N4, Canada
| | - Runda Sun
- Department of Biological Sciences, University of Calgary, Calgary, AB T2N 1 N4, Canada
| | - Li Wu
- Department of Biological Sciences, University of Calgary, Calgary, AB T2N 1 N4, Canada
| | - Ashley M Jarding
- Department of Biological Sciences, University of Calgary, Calgary, AB T2N 1 N4, Canada
| | - Cameron Semper
- Department of Biological Sciences, University of Calgary, Calgary, AB T2N 1 N4, Canada
| | - Steven Zimmerly
- Department of Biological Sciences, University of Calgary, Calgary, AB T2N 1 N4, Canada
| |
Collapse
|
2
|
Candales MA, Duong A, Hood KS, Li T, Neufeld RAE, Sun R, McNeil BA, Wu L, Jarding AM, Zimmerly S. Database for bacterial group II introns. Nucleic Acids Res 2011; 40:D187-90. [PMID: 22080509 PMCID: PMC3245105 DOI: 10.1093/nar/gkr1043] [Citation(s) in RCA: 89] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
The Database for Bacterial Group II Introns (http://webapps2.ucalgary.ca/~groupii/index.html#) provides a catalogue of full-length, non-redundant group II introns present in bacterial DNA sequences in GenBank. The website is divided into three sections. The first section provides general information on group II intron properties, structures and classification. The second and main section lists information for individual introns, including insertion sites, DNA sequences, intron-encoded protein sequences and RNA secondary structure models. The final section provides tools for identification and analysis of intron sequences. These include a step-by-step guide to identify introns in genomic sequences, a local BLAST tool to identify closest intron relatives to a query sequence, and a boundary-finding tool that predicts 5′ and 3′ intron–exon junctions in an input DNA sequence. Finally, selected intron data can be downloaded in FASTA format. It is hoped that this database will be a useful resource not only to group II intron and RNA researchers, but also to microbiologists who encounter these unexpected introns in genomic sequences.
Collapse
Affiliation(s)
- Manuel A Candales
- Department of Biological Sciences, University of Calgary, Calgary, Alberta T2N 1N4, Canada
| | | | | | | | | | | | | | | | | | | |
Collapse
|