Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Gelfand MS, Roytberg MA. Prediction of the exon-intron structure by a dynamic programming approach. Biosystems 1993;30:173-82. [PMID: 8374074 DOI: 10.1016/0303-2647(93)90069-o] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]

For:	Gelfand MS, Roytberg MA. Prediction of the exon-intron structure by a dynamic programming approach. Biosystems 1993;30:173-82. [PMID: 8374074 DOI: 10.1016/0303-2647(93)90069-o] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]

Number

Cited by Other Article(s)

Alioto T. Gene prediction. Methods Mol Biol 2012;855:175-201. [PMID: 22407709 DOI: 10.1007/978-1-61779-582-4_6] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]

Gene Identification: Classical and Computational Intelligence Approaches. ACTA ACUST UNITED AC 2008. [DOI: 10.1109/tsmcc.2007.906066] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

Roy SW, Penny D. Intron length distributions and gene prediction. Nucleic Acids Res 2007;35:4737-42. [PMID: 17617639 PMCID: PMC1950532 DOI: 10.1093/nar/gkm281] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open

Farrar M. Striped Smith-Waterman speeds database searches six times over other SIMD implementations. Bioinformatics 2006;23:156-61. [PMID: 17110365 DOI: 10.1093/bioinformatics/btl582] [Citation(s) in RCA: 112] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Wu J, Haussler D. Coding exon detection using comparative sequences. J Comput Biol 2006;13:1148-64. [PMID: 16901234 DOI: 10.1089/cmb.2006.13.1148] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Bertone P, Trifonov V, Rozowsky JS, Schubert F, Emanuelsson O, Karro J, Kao MY, Snyder M, Gerstein M. Design optimization methods for genomic DNA tiling arrays. Genome Res 2005;16:271-81. [PMID: 16365382 PMCID: PMC1361723 DOI: 10.1101/gr.4452906] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]

Zhang MQ. Computational prediction of eukaryotic protein-coding genes. Nat Rev Genet 2002;3:698-709. [PMID: 12209144 DOI: 10.1038/nrg890] [Citation(s) in RCA: 124] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Searls DB. Bioinformatics tools for whole genomes. Annu Rev Genomics Hum Genet 2002;1:251-79. [PMID: 11701631 DOI: 10.1146/annurev.genom.1.1.251] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Grosse I, Herzel H, Buldyrev SV, Stanley HE. Species independence of mutual information in coding and noncoding DNA. PHYSICAL REVIEW. E, STATISTICAL PHYSICS, PLASMAS, FLUIDS, AND RELATED INTERDISCIPLINARY TOPICS 2000;61:5624-5629. [PMID: 11031617 DOI: 10.1103/physreve.61.5624] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/29/1999] [Indexed: 05/23/2023]

Stormo GD. Gene-finding approaches for eukaryotes. Genome Res 2000;10:394-7. [PMID: 10779479 DOI: 10.1101/gr.10.4.394] [Citation(s) in RCA: 78] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Guigó R. Assembling genes from predicted exons in linear time with dynamic programming. J Comput Biol 1999;5:681-702. [PMID: 10072084 DOI: 10.1089/cmb.1998.5.681] [Citation(s) in RCA: 63] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Abstract

In a number of programs for gene structure prediction in higher eukaryotic genomic sequences, exon prediction is decoupled from gene assembly: a large pool of candidate exons is predicted and scored from features located in the query DNA sequence, and candidate genes are assembled from such a pool as sequences of nonoverlapping frame-compatible exons. Genes are scored as a function of the scores of the assembled exons, and the highest scoring candidate gene is assumed to be the most likely gene encoded by the query DNA sequence. Considering additive gene scoring functions, currently available algorithms to determine such a highest scoring candidate gene run in time proportional to the square of the number of predicted exons. Here, we present an algorithm whose running time grows only linearly with the size of the set of predicted exons. Polynomial algorithms rely on the fact that, while scanning the set of predicted exons, the highest scoring gene ending in a given exon can be obtained by appending the exon to the highest scoring among the highest scoring genes ending at each compatible preceding exon. The algorithm here relies on the simple fact that such highest scoring gene can be stored and updated. This requires scanning the set of predicted exons simultaneously by increasing acceptor and donor position. On the other hand, the algorithm described here does not assume an underlying gene structure model. Indeed, the definition of valid gene structures is externally defined in the so-called Gene Model. The Gene Model specifies simply which gene features are allowed immediately upstream which other gene features in valid gene structures. This allows for great flexibility in formulating the gene identification problem. In particular it allows for multiple-gene two-strand predictions and for considering gene features other than coding exons (such as promoter elements) in valid gene structures.

Collapse

Roytberg MA, Astakhova TV, Gelfand MS. Combinatorial approaches to gene recognition. COMPUTERS & CHEMISTRY 1998;21:229-35. [PMID: 9440930 DOI: 10.1016/s0097-8485(96)00034-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]

Sze SH, Pevzner PA. Las Vegas algorithms for gene recognition: suboptimal and error-tolerant spliced alignment. J Comput Biol 1997;4:297-309. [PMID: 9278061 DOI: 10.1089/cmb.1997.4.297] [Citation(s) in RCA: 19] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open

Reese MG, Eeckman FH, Kulp D, Haussler D. Improved splice site detection in Genie. J Comput Biol 1997;4:311-23. [PMID: 9278062 DOI: 10.1089/cmb.1997.4.311] [Citation(s) in RCA: 1291] [Impact Index Per Article: 47.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open

Xu Y, Uberbacher EC. Automated gene identification in large-scale genomic sequences. J Comput Biol 1997;4:325-38. [PMID: 9278063 DOI: 10.1089/cmb.1997.4.325] [Citation(s) in RCA: 75] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open

Burge C, Karlin S. Prediction of complete gene structures in human genomic DNA. J Mol Biol 1997;268:78-94. [PMID: 9149143 DOI: 10.1006/jmbi.1997.0951] [Citation(s) in RCA: 2592] [Impact Index Per Article: 96.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]

Gelfand MS, Mironov AA, Pevzner PA. Gene recognition via spliced sequence alignment. Proc Natl Acad Sci U S A 1996;93:9061-6. [PMID: 8799154 PMCID: PMC38595 DOI: 10.1073/pnas.93.17.9061] [Citation(s) in RCA: 192] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open

Wu TD. A segment-based dynamic programming algorithm for predicting gene structure. J Comput Biol 1996;3:375-94. [PMID: 8891956 DOI: 10.1089/cmb.1996.3.375] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open

Gelfand MS, Podolsky LI, Astakhova TV, Roytberg MA. Recognition of genes in human DNA sequences. J Comput Biol 1996;3:223-34. [PMID: 8811484 DOI: 10.1089/cmb.1996.3.223] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open

Reddy BV, Pandit MW. A statistical analytical approach to decipher information from biological sequences: application to murine splice-site analysis and prediction. J Biomol Struct Dyn 1995;12:785-801. [PMID: 7779300 DOI: 10.1080/07391102.1995.10508776] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]

Abstract

A simple statistical approach for the analysis of biological sequences, such as splice-sites, promoter regions, helices and extended structure forming regions or any other sequence dependent functional entities in proteins, is presented. The approach has been proved useful to develop a method for prediction of such entities in newly available sequences. We first search for invariant sequence features of each functional entity from the experimentally available sequences and identify a set of 'like' sequences with similar sequence features. In the next step, concrete features of sequence entities in terms of occurrences of smaller subsequences are identified at various positions which are used as a knowledge base to select potential functional entities from the identified 'like' sequences. The third step consists of refinement of this pattern learning, statistical improvements of the knowledge base weight matrices, and finally its application to predict functional entities in newly available sequences. Such an analysis is operationally described for murine splice-site predictions. Regions comprising -30 to +30 nucleotides from the splice-junction at the murine splice-sites (donors and acceptors), reported earlier, were analyzed. Invariant sequence-specific features in terms of monomer frequency average were used to identify splice-site-like sequences in the EMBL murine DNA sequence data base. The frequencies of occurrence of mono-, di-, tri- and tetranucleotides in the known splice-sites were studied in comparison with the splice-site-like sequences; the significant differences in their occurrences were extracted as statistical knowledge coded in weight matrices for computer to identify potential splice-sites. The algorithm was refined and a method was developed to predict potential splice-sites in a given murine DNA; the analysis was also extended to human DNA. The success rate of the method to predict correct splice-sites in these species is found to be 80% and 85%, respectively. The major strength of this method lies in reducing significantly the number of false positives which are normally picked up in such analysis.

Collapse

Gelfand MS. Prediction of function in DNA sequence analysis. J Comput Biol 1995;2:87-115. [PMID: 7497122 DOI: 10.1089/cmb.1995.2.87] [Citation(s) in RCA: 91] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open