1
|
Dieci G, Conti A, Pagano A, Carnevali D. Identification of RNA polymerase III-transcribed genes in eukaryotic genomes. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2012; 1829:296-305. [PMID: 23041497 DOI: 10.1016/j.bbagrm.2012.09.010] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/30/2012] [Revised: 09/20/2012] [Accepted: 09/21/2012] [Indexed: 12/16/2022]
Abstract
The RNA polymerase (Pol) III transcription system is devoted to the production of short, generally abundant noncoding (nc) RNAs in all eukaryotic cells. Previously thought to be restricted to a few housekeeping genes easily detectable in genome sequences, the set of known Pol III-transcribed genes (class III genes) has been expanding in the last ten years, and the issue of their detection, annotation and actual expression has been stimulated and revived by the results of recent high-resolution genome-wide location analyses of the mammalian Pol III machinery, together with those of Pol III-centered computational studies and of ncRNA-focused transcriptomic approaches. In this article, we provide an outline of distinctive features of Pol III-transcribed genes that have allowed and currently allow for their detection in genome sequences, we critically review the currently practiced strategies for the identification of novel class III genes and transcripts, and we discuss emerging themes in Pol III transcription regulation which might orient future transcriptomic studies. This article is part of a Special Issue entitled: Transcription by Odd Pols.
Collapse
Affiliation(s)
- Giorgio Dieci
- Dipartimento di Bioscienze, Università degli Studi di Parma, Parco Area delle Scienze 23/A, 43124 Parma, Italy.
| | | | | | | |
Collapse
|
2
|
Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res 1997. [PMID: 9023104 DOI: 10.1093/nar/25.5.0955] [Citation(s) in RCA: 3191] [Impact Index Per Article: 114.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
We describe a program, tRNAscan-SE, which identifies 99-100% of transfer RNA genes in DNA sequence while giving less than one false positive per 15 gigabases. Two previously described tRNA detection programs are used as fast, first-pass prefilters to identify candidate tRNAs, which are then analyzed by a highly selective tRNA covariance model. This work represents a practical application of RNA covariance models, which are general, probabilistic secondary structure profiles based on stochastic context-free grammars. tRNAscan-SE searches at approximately 30 000 bp/s. Additional extensions to tRNAscan-SE detect unusual tRNA homologues such as selenocysteine tRNAs, tRNA-derived repetitive elements and tRNA pseudogenes.
Collapse
Affiliation(s)
- T M Lowe
- Department of Genetics, Washington University School of Medicine, 660 South Euclid, Box 8232, St Louis, MO 63110, USA
| | | |
Collapse
|
3
|
Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res 1997; 25:955-64. [PMID: 9023104 PMCID: PMC146525 DOI: 10.1093/nar/25.5.955] [Citation(s) in RCA: 7770] [Impact Index Per Article: 277.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
We describe a program, tRNAscan-SE, which identifies 99-100% of transfer RNA genes in DNA sequence while giving less than one false positive per 15 gigabases. Two previously described tRNA detection programs are used as fast, first-pass prefilters to identify candidate tRNAs, which are then analyzed by a highly selective tRNA covariance model. This work represents a practical application of RNA covariance models, which are general, probabilistic secondary structure profiles based on stochastic context-free grammars. tRNAscan-SE searches at approximately 30 000 bp/s. Additional extensions to tRNAscan-SE detect unusual tRNA homologues such as selenocysteine tRNAs, tRNA-derived repetitive elements and tRNA pseudogenes.
Collapse
Affiliation(s)
- T M Lowe
- Department of Genetics, Washington University School of Medicine, 660 South Euclid, Box 8232, St Louis, MO 63110, USA
| | | |
Collapse
|
4
|
Abstract
Recognition of function of newly sequenced DNA fragments is an important area of computational molecular biology. Here we present an extensive review of methods for prediction of functional sites, tRNA, and protein-coding genes and discuss possible further directions of research in this area.
Collapse
Affiliation(s)
- M S Gelfand
- Institute of Protein Research, Russian Academy of Sciences, Pushchino, Moscow region, Russia
| |
Collapse
|
5
|
Abstract
The quantitative similarity among tRNA gene sequences was acquired by analysis with an artificial neural network. The evolutionary relationship derived from our results was consistent with those from other methods. A new sequence was recognized to be a tRNA-like gene by a neural network on the analysis of similarity. All of our results showed the efficiency of the artificial neural network method in the sequence analysis for biological molecules.
Collapse
Affiliation(s)
- J Sun
- Institute of Biophysics, Academia Sinica, Beijing, People's Republic of China
| | | | | | | |
Collapse
|
6
|
Sakakibara Y, Brown M, Hughey R, Mian IS, Sjölander K, Underwood RC, Haussler D. Stochastic context-free grammars for tRNA modeling. Nucleic Acids Res 1994; 22:5112-20. [PMID: 7800507 PMCID: PMC523785 DOI: 10.1093/nar/22.23.5112] [Citation(s) in RCA: 131] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
Stochastic context-free grammars (SCFGs) are applied to the problems of folding, aligning and modeling families of tRNA sequences. SCFGs capture the sequences' common primary and secondary structure and generalize the hidden Markov models (HMMs) used in related work on protein and DNA. Results show that after having been trained on as few as 20 tRNA sequences from only two tRNA subfamilies (mitochondrial and cytoplasmic), the model can discern general tRNA from similar-length RNA sequences of other kinds, can find secondary structure of new tRNA sequences, and can produce multiple alignments of large sets of tRNA sequences. Our results suggest potential improvements in the alignments of the D- and T-domains in some mitochondrial tRNAs that cannot be fit into the canonical secondary structure.
Collapse
Affiliation(s)
- Y Sakakibara
- Sinsheimer Laboratories, University of California, Santa Cruz 95064
| | | | | | | | | | | | | |
Collapse
|
7
|
Abstract
We describe a general approach to several RNA sequence analysis problems using probabilistic models that flexibly describe the secondary structure and primary sequence consensus of an RNA sequence family. We call these models 'covariance models'. A covariance model of tRNA sequences is an extremely sensitive and discriminative tool for searching for additional tRNAs and tRNA-related sequences in sequence databases. A model can be built automatically from an existing sequence alignment. We also describe an algorithm for learning a model and hence a consensus secondary structure from initially unaligned example sequences and no prior structural information. Models trained on unaligned tRNA examples correctly predict tRNA secondary structure and produce high-quality multiple alignments. The approach may be applied to any family of small RNA sequences.
Collapse
Affiliation(s)
- S R Eddy
- MRC Laboratory of Molecular Biology, Cambridge, UK
| | | |
Collapse
|
8
|
Pavesi A, Conterio F, Bolchi A, Dieci G, Ottonello S. Identification of new eukaryotic tRNA genes in genomic DNA databases by a multistep weight matrix analysis of transcriptional control regions. Nucleic Acids Res 1994; 22:1247-56. [PMID: 8165140 PMCID: PMC523650 DOI: 10.1093/nar/22.7.1247] [Citation(s) in RCA: 76] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
A linear method for the search of eukaryotic nuclear tRNA genes in DNA databases is described. Based on a modified version of the general weight matrix procedure, our algorithm relies on the recognition of two intragenic control regions known as A and B boxes, a transcription termination signal, and on the evaluation of the spacing between these elements. The scanning of the eukaryotic nuclear DNA database using this search algorithm correctly identified 933 of the 940 known tRNA genes (0.74% of false negatives). Thirty new potential tRNA genes were identified, and the transcriptional activity of two of them was directly verified by in vitro transcription. The total false positive rate of the algorithm was 0.014%. Structurally unusual tRNA genes, like those coding for selenocysteine tRNAs, could also be recognized using a set of rules concerning their specific properties, and one human gene coding for such tRNA was identified. Some of the newly identified tRNA genes were found in rather uncommon genomic positions: 2 in centromeric regions and 3 within introns. Furthermore, the presence of extragenically located B boxes in tRNA genes from various organisms could be detected through a specific subroutine of the standard search program.
Collapse
Affiliation(s)
- A Pavesi
- Department of Evolutionary Biology, University of Parma, Italy
| | | | | | | | | |
Collapse
|
9
|
Abstract
We have developed an algorithm that automatically and reproducibly identifies potential tRNA genes in genomic DNA sequences, and we present a general strategy for testing the sensitivity of such algorithms. This algorithm is useful for the flagging and characterization of long genomic sequences that have not been experimentally analyzed for identification of functional regions, and for the scanning of nucleotide sequence databases for errors in the sequences and the functional assignments associated with them. In an exhaustive scan of the GenBank database, 97.5% of the 744 known tRNA genes were correctly identified (true-positives), and 42 previously unidentified sequences were predicted to be tRNAs. A detailed analysis of these latter predictions reveals that 16 of the 42 are very similar to known tRNA genes, and we predict that they do, in fact, code for tRNA, yielding a false-positive rate for the algorithm of 0.003%. The new algorithm and testing strategy are a considerable improvement over any previously described strategies for recognizing tRNA genes, and they allow detections of genes (including introns) embedded in long genomic sequences.
Collapse
Affiliation(s)
- G A Fichant
- Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, NM 87545
| | | |
Collapse
|
10
|
Zelnick CR, Burks DJ, Duncan CH. A composite transposon 3' to the cow fetal globin gene binds a sequence specific factor. Nucleic Acids Res 1987; 15:10437-53. [PMID: 2827124 PMCID: PMC339954 DOI: 10.1093/nar/15.24.10437] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
Two unusual sequence organizations were found within the beta-globin locus of the cow. Each was a composite, consisting of closely linked Alu-type repeats with a short stretch of genomic non-repetitive sequence, called a lagan, sandwiched between. One lagan was found 3' to the fetal globin gene, while the second lay between the adult globin gene and a globin pseudogene. Southern blot analysis indicated that both lagans appeared twice within the cow haploid genome, with the second copies lying outside the cow beta-globin locus. One of these non-globin locus homologues was cloned and subjected to sequence analysis. Comparison of the DNA sequence data showed that the lagan-Alu composite was transposed as a unit. The lagan 3' to the cow fetal globin gene contains the recognition site for a sequence specific DNA binding factor. This factor was present in extracts from fetal, but not from adult cow tissues.
Collapse
Affiliation(s)
- C R Zelnick
- Division of Basic Science, Children's Hospital Research Foundation, Cincinnati, OH 45229
| | | | | |
Collapse
|