Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Zhang CT, Wang J. Recognition of protein coding genes in the yeast genome at better than 95% accuracy based on the Z curve. Nucleic Acids Res 2000;28:2804-14. [PMID: 10908339 PMCID: PMC102655 DOI: 10.1093/nar/28.14.2804] [Citation(s) in RCA: 100] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

For:	Zhang CT, Wang J. Recognition of protein coding genes in the yeast genome at better than 95% accuracy based on the Z curve. Nucleic Acids Res 2000;28:2804-14. [PMID: 10908339 PMCID: PMC102655 DOI: 10.1093/nar/28.14.2804] [Citation(s) in RCA: 100] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Number

Cited by Other Article(s)

Dietrich FS, Magwene P, McCusker J. Core gene set of the species Saccharomyces cerevisiae. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2023.09.07.545205. [PMID: 40502033 PMCID: PMC12157680 DOI: 10.1101/2023.09.07.545205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/19/2025]

Shi H, Wu C, Bai T, Chen J, Li Y, Wu H. Identify essential genes based on clustering based synthetic minority oversampling technique. Comput Biol Med 2023;153:106523. [PMID: 36652869 DOI: 10.1016/j.compbiomed.2022.106523] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Revised: 12/13/2022] [Accepted: 12/31/2022] [Indexed: 01/03/2023]

Dong YM, Bi JH, He QE, Song K. ESDA: An Improved Approach to Accurately Identify Human snoRNAs for Precision Cancer Therapy. Curr Bioinform 2020. [DOI: 10.2174/1574893614666190424162230] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]

Abstract Background: SnoRNAs (Small nucleolar RNAs) are small RNA molecules with approximately 60-300 nucleotides in sequence length. They have been proved to play important roles in cancer occurrence and progression. It is of great clinical importance to identify new snoRNAs as fast and accurately as possible. Objective: A novel algorithm, ESDA (Elastically Sparse Partial Least Squares Discriminant Analysis), was proposed to improve the speed and the performance of recognizing snoRNAs from other RNAs in human genomes. Methods: In ESDA algorithm, to optimize the extracted information, kernel features were selected from the variables extracted from both primary sequences and secondary structures. Then they were used by SPLSDA (sparse partial least squares discriminant analysis) algorithm as input variables for the final classification model training to distinguish snoRNA sequences from other Human RNAs. Due to the fact that no prior biological knowledge is request to optimize the classification model, ESDA is a very practical method especially for completely new sequences. Results: 89 H/ACA snoRNAs and 269 C/D snoRNAs of human were used as positive samples and 3403 non-snoRNAs as negative samples to test the identification performance of the proposed ESDA. For the H/ACA snoRNAs identification, the sensitivity and specificity were respectively as high as 99.6% and 98.8%. For C/D snoRNAs, they were respectively 96.1% and 98.3%. Furthermore, we compared ESDA with other widely used algorithms and classifiers: SnoReport, RF (Random Forest), DWD (Distance Weighted Discrimination) and SVM (Support Vector Machine). The highest improvement of accuracy obtained by ESDA was 25.1%. Conclusion: Strongly proved the superiority performance of ESDA and make it promising for identifying SnoRNAs for further development of the precision medicine for cancers. Collapse

Li C, Zhao J, Wang C, Yao Y. Protein Sequence Comparison and DNA-binding Protein Identification with Generalized PseAAC and Graphical Representation. Comb Chem High Throughput Screen 2019;21:100-110. [PMID: 29380690 PMCID: PMC5930480 DOI: 10.2174/1386207321666180130100838] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2017] [Revised: 01/24/2018] [Accepted: 01/26/2018] [Indexed: 11/22/2022]

Guo FB, Dong C, Hua HL, Liu S, Luo H, Zhang HW, Jin YT, Zhang KY. Accurate prediction of human essential genes using only nucleotide composition and association information. Bioinformatics 2018;33:1758-1764. [PMID: 28158612 PMCID: PMC7110051 DOI: 10.1093/bioinformatics/btx055] [Citation(s) in RCA: 42] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2016] [Accepted: 01/25/2017] [Indexed: 12/20/2022] Open

Deciphering the Origin, Evolution, and Physiological Function of the Subtelomeric Aryl-Alcohol Dehydrogenase Gene Family in the Yeast Saccharomyces cerevisiae. Appl Environ Microbiol 2017;84:AEM.01553-17. [PMID: 29079624 PMCID: PMC5734042 DOI: 10.1128/aem.01553-17] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2017] [Accepted: 10/23/2017] [Indexed: 12/02/2022] Open

Abstract

Homology searches indicate that Saccharomyces cerevisiae strain BY4741 contains seven redundant genes that encode putative aryl-alcohol dehydrogenases (AAD). Yeast AAD genes are located in subtelomeric regions of different chromosomes, and their functional role(s) remain enigmatic. Here, we show that two of these genes, AAD4 and AAD14, encode functional enzymes that reduce aliphatic and aryl-aldehydes concomitant with the oxidation of cofactor NADPH, and that Aad4p and Aad14p exhibit different substrate preference patterns. Other yeast AAD genes are undergoing pseudogenization. The 5′ sequence of AAD15 has been deleted from the genome. Repair of an AAD3 missense mutation at the catalytically essential Tyr⁷³ residue did not result in a functional enzyme. However, ancestral-state reconstruction by fusing Aad6 with Aad16 and by N-terminal repair of Aad10 restores NADPH-dependent aryl-alcohol dehydrogenase activities. Phylogenetic analysis indicates that AAD genes are narrowly distributed in wood-saprophyte fungi and in yeast that occupy lignocellulosic niches. Because yeast AAD genes exhibit activity on veratraldehyde, cinnamaldehyde, and vanillin, they could serve to detoxify aryl-aldehydes released during lignin degradation. However, none of these compounds induce yeast AAD gene expression, and Aad activities do not relieve aryl-aldehyde growth inhibition. Our data suggest an ancestral role for AAD genes in lignin degradation that is degenerating as a result of yeast's domestication and use in brewing, baking, and other industrial applications.

IMPORTANCE Functional characterization of hypothetical genes remains one of the chief tasks of the postgenomic era. Although the first Saccharomyces cerevisiae genome sequence was published over 20 years ago, 22% of its estimated 6,603 open reading frames (ORFs) remain unverified. One outstanding example of this category of genes is the enigmatic seven-member AAD family. Here, we demonstrate that proteins encoded by two members of this family exhibit aliphatic and aryl-aldehyde reductase activity, and further that such activity can be recovered from pseudogenized AAD genes via ancestral-state reconstruction. The phylogeny of yeast AAD genes suggests that these proteins may have played an important ancestral role in detoxifying aromatic aldehydes in ligninolytic fungi. However, in yeast adapted to niches rich in sugars, AAD genes become subject to mutational erosion. Our findings shed new light on the selective pressures and molecular mechanisms by which genes undergo pseudogenization.

Collapse

Dong C, Yuan YZ, Zhang FZ, Hua HL, Ye YN, Labena AA, Lin H, Chen W, Guo FB. Combining pseudo dinucleotide composition with the Z curve method to improve the accuracy of predicting DNA elements: a case study in recombination spots. MOLECULAR BIOSYSTEMS 2017;12:2893-900. [PMID: 27410247 DOI: 10.1039/c6mb00374e] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

Affiliation(s)

Chuan Dong Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China. and Center of Information in Biomedicine, University of Electronic Science and Technology of China, Chengdu, China and Key Laboratory for Neuro-information of the Ministry of Education, University of Electronic Science and Technology of China, Chengdu, China
Ya-Zhou Yuan Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China. and Center of Information in Biomedicine, University of Electronic Science and Technology of China, Chengdu, China and Key Laboratory for Neuro-information of the Ministry of Education, University of Electronic Science and Technology of China, Chengdu, China
Fa-Zhan Zhang Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China. and Center of Information in Biomedicine, University of Electronic Science and Technology of China, Chengdu, China and Key Laboratory for Neuro-information of the Ministry of Education, University of Electronic Science and Technology of China, Chengdu, China
Hong-Li Hua Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China. and Center of Information in Biomedicine, University of Electronic Science and Technology of China, Chengdu, China and Key Laboratory for Neuro-information of the Ministry of Education, University of Electronic Science and Technology of China, Chengdu, China
Yuan-Nong Ye School of Biology and Engineering, Guizhou Medical University, Guiyang, China
Abraham Alemayehu Labena Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China. and Center of Information in Biomedicine, University of Electronic Science and Technology of China, Chengdu, China and Key Laboratory for Neuro-information of the Ministry of Education, University of Electronic Science and Technology of China, Chengdu, China
Hao Lin Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China. and Center of Information in Biomedicine, University of Electronic Science and Technology of China, Chengdu, China and Key Laboratory for Neuro-information of the Ministry of Education, University of Electronic Science and Technology of China, Chengdu, China
Wei Chen Department of Physics, School of Sciences, Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan, China
Feng-Biao Guo Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China. and Center of Information in Biomedicine, University of Electronic Science and Technology of China, Chengdu, China and Key Laboratory for Neuro-information of the Ministry of Education, University of Electronic Science and Technology of China, Chengdu, China

Collapse

Ahmad M, Jung LT, Bhuiyan AA. From DNA to protein: Why genetic code context of nucleotides for DNA signal processing? A review. Biomed Signal Process Control 2017. [DOI: 10.1016/j.bspc.2017.01.004] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]

Mabrouk MS, Naeem SM, Eldosoky MA. DIFFERENT GENOMIC SIGNAL PROCESSING METHODS FOR EUKARYOTIC GENE PREDICTION: A SYSTEMATIC REVIEW. BIOMEDICAL ENGINEERING-APPLICATIONS BASIS COMMUNICATIONS 2017. [DOI: 10.4015/s1016237217300012] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

An estimator for local analysis of genome based on the minimal absent word. J Theor Biol 2016;395:23-30. [PMID: 26829314 DOI: 10.1016/j.jtbi.2016.01.023] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2015] [Revised: 01/17/2016] [Accepted: 01/19/2016] [Indexed: 11/22/2022]

Ahmad M, Jung LT, Bhuiyan MAA. On fuzzy semantic similarity measure for DNA coding. Comput Biol Med 2015;69:144-51. [PMID: 26773936 DOI: 10.1016/j.compbiomed.2015.12.017] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2015] [Revised: 12/22/2015] [Accepted: 12/23/2015] [Indexed: 11/28/2022]

Wang Y, Zhuang X, Zhong Y, Zhang C, Zhang Y, Zeng L, Zhu Y, He P, Dong K, Pal U, Guo X, Qin J. Distribution of Plasmids in Distinct Leptospira Pathogenic Species. PLoS Negl Trop Dis 2015;9:e0004220. [PMID: 26555137 PMCID: PMC4640553 DOI: 10.1371/journal.pntd.0004220] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2015] [Accepted: 10/19/2015] [Indexed: 11/18/2022] Open

Abstract

Leptospirosis, caused by pathogenic Leptospira, is a worldwide zoonotic infection. The genus Leptospira includes at least 21 species clustered into three groups--pathogens, non-pathogens, and intermediates--based on 16S rRNA phylogeny. Research on Leptospira is difficult due to slow growth and poor transformability of the pathogens. Recent identification of extrachromosomal elements besides the two chromosomes in L. interrogans has provided new insight into genome complexity of the genus Leptospira. The large size, low copy number, and high similarity of the sequence of these extrachromosomal elements with the chromosomes present challenges in isolating and detecting them without careful genome assembly. In this study, two extrachromosomal elements were identified in L. borgpetersenii serovar Ballum strain 56604 through whole genome assembly combined with S1 nuclease digestion following pulsed-field gel electrophoresis (S1-PFGE) analysis. Further, extrachromosomal elements in additional 15 Chinese epidemic strains of Leptospira, comprising L. borgpetersenii, L. weilii, and L. interrogans, were successfully separated and identified, independent of genome sequence data. Southern blot hybridization with extrachromosomal element-specific probes, designated as lcp1, lcp2 and lcp3-rep, further confirmed their occurrences as extrachromosomal elements. In total, 24 plasmids were detected in 13 out of 15 tested strains, among which 11 can hybridize with the lcp1-rep probe and 11 with the lcp2-rep probe, whereas two can hybridize with the lcp3-rep probe. None of them are likely to be species-specific. Blastp search of the lcp1, lcp2, and lcp3-rep genes with a nonredundant protein database of Leptospira species genomes showed that their homologous sequences are widely distributed among clades of pathogens but not non-pathogens or intermediates. These results suggest that the plasmids are widely distributed in Leptospira species, and further elucidation of their biological significance might contribute to our understanding of biology and infectivity of pathogenic spirochetes.

Collapse

Affiliation(s)

Yanzhuo Wang Department of Microbiology and Immunology, Institutes of Medical Science, Shanghai Jiao Tong University School of Medicine, Shanghai, China
Xuran Zhuang Department of Microbiology and Immunology, Institutes of Medical Science, Shanghai Jiao Tong University School of Medicine, Shanghai, China
Yi Zhong Computational Biology Department, Memorial Sloan Kettering Cancer Center, New York, New York, United States of America
Cuicai Zhang National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention (ICDC, CCDC), Beijing, China
Yan Zhang Department of Microbiology and Immunology, Institutes of Medical Science, Shanghai Jiao Tong University School of Medicine, Shanghai, China
Lingbing Zeng The First Affiliated Hospital of Nanchang University, Nanchang, China
Yongzhang Zhu Department of Microbiology and Immunology, Institutes of Medical Science, Shanghai Jiao Tong University School of Medicine, Shanghai, China
Ping He Department of Microbiology and Immunology, Institutes of Medical Science, Shanghai Jiao Tong University School of Medicine, Shanghai, China
Ke Dong Department of Microbiology and Immunology, Institutes of Medical Science, Shanghai Jiao Tong University School of Medicine, Shanghai, China
Utpal Pal Department of Veterinary Medicine, University of Maryland, College Park and Virginia-Maryland Regional College of Veterinary Medicine, College Park, Maryland, United States of America * E-mail: (UP); (XG); (JQ)
Xiaokui Guo Department of Microbiology and Immunology, Institutes of Medical Science, Shanghai Jiao Tong University School of Medicine, Shanghai, China * E-mail: (UP); (XG); (JQ)
Jinhong Qin Department of Microbiology and Immunology, Institutes of Medical Science, Shanghai Jiao Tong University School of Medicine, Shanghai, China * E-mail: (UP); (XG); (JQ)

Collapse

Zhu W, Wang J, Zhu Y, Tang B, Zhang Y, He P, Zhang Y, Liu B, Guo X, Zhao G, Qin J. Identification of three extra-chromosomal replicons in Leptospira pathogenic strain and development of new shuttle vectors. BMC Genomics 2015;16:90. [PMID: 25887950 PMCID: PMC4338851 DOI: 10.1186/s12864-015-1321-y] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2014] [Accepted: 02/04/2015] [Indexed: 12/20/2022] Open

Abstract

BACKGROUND

The genome of pathogenic Leptospira interrogans contains two chromosomes. Plasmids and prophages are known to play specific roles in gene transfer in bacteria and can potentially serve as efficient genetic tools in these organisms. Although plasmids and prophage remnants have recently been reported in Leptospira species, their characteristics and potential applications in leptospiral genetic transformation systems have not been fully evaluated.

RESULTS

Three extrachromosomal replicons designated lcp1 (65,732 bp), lcp2 (56,757 bp), and lcp3 (54,986 bp) in the L. interrogans serovar Linhai strain 56609 were identified through whole genome sequencing. All three replicons were stable outside of the bacterial chromosomes. Phage particles were observed in the culture supernatant of 56609 after mitomycin C induction, and lcp3, which contained phage-related genes, was considered to be an inducible prophage. L. interrogans-Escherichia coli shuttle vectors, constructed with the predicted replication elements of single rep or rep combined with parAB loci from the three plasmids were shown to successfully transform into both saprophytic and pathogenic Leptospira species, suggesting an essential function for rep genes in supporting auto-replication of the plasmids. Additionally, a wide distribution of homologs of the three rep genes was identified in L. interrogans isolates, and correlation tests showed that the transformability of the shuttle vectors in L. interrogans isolates depended, to certain extent, on genetic compatibility between the rep sequences of both plasmid and host.

CONCLUSIONS

Three extrachromosomal replicons co-exist in L. interrogans, one of which we consider to be an inducible prophage. The vectors constructed with the rep genes of the three replicons successfully transformed into saprophytic and pathogenic Leptospira species alike, but this was partly dependent on genetic compatibility between the rep sequences of both plasmid and host.

Collapse

Affiliation(s)

Weinan Zhu Department of Microbiology and Immunology, Institutes of Medical Science, Shanghai Jiao Tong University School of Medicine, 280 South Chongqing Road, Shanghai, 200025, China.
Jin Wang CAS Key Laboratory of Synthetic Biology, Institute of Plant Physiology and Ecology, Shanghai Institute for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.
Yongzhang Zhu Department of Microbiology and Immunology, Institutes of Medical Science, Shanghai Jiao Tong University School of Medicine, 280 South Chongqing Road, Shanghai, 200025, China.
Biao Tang State Key Laboratory of Genetic Engineering, Department of Microbiology, School of Life Sciences, Fudan University, 220 Handan Road, Shanghai, 200433, China.
Yunyi Zhang CAS Key Laboratory of Synthetic Biology, Institute of Plant Physiology and Ecology, Shanghai Institute for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.
Ping He Department of Microbiology and Immunology, Institutes of Medical Science, Shanghai Jiao Tong University School of Medicine, 280 South Chongqing Road, Shanghai, 200025, China.
Yan Zhang Department of Microbiology and Immunology, Institutes of Medical Science, Shanghai Jiao Tong University School of Medicine, 280 South Chongqing Road, Shanghai, 200025, China.
Boyu Liu Department of Microbiology and Immunology, Institutes of Medical Science, Shanghai Jiao Tong University School of Medicine, 280 South Chongqing Road, Shanghai, 200025, China.
Xiaokui Guo Department of Microbiology and Immunology, Institutes of Medical Science, Shanghai Jiao Tong University School of Medicine, 280 South Chongqing Road, Shanghai, 200025, China.
Guoping Zhao CAS Key Laboratory of Synthetic Biology, Institute of Plant Physiology and Ecology, Shanghai Institute for Biological Sciences, Chinese Academy of Sciences, Shanghai, China. State Key Laboratory of Genetic Engineering, Department of Microbiology, School of Life Sciences, Fudan University, 220 Handan Road, Shanghai, 200433, China.
Jinhong Qin Department of Microbiology and Immunology, Institutes of Medical Science, Shanghai Jiao Tong University School of Medicine, 280 South Chongqing Road, Shanghai, 200025, China.

Collapse

-Biao Guo F, Lin Y, -Ling Chen L. Recognition of Protein-coding Genes Based on Z-curve Algorithms. Curr Genomics 2014;15:95-103. [PMID: 24822027 PMCID: PMC4009845 DOI: 10.2174/1389202915999140328162724] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2013] [Revised: 11/19/2013] [Accepted: 11/20/2013] [Indexed: 01/18/2023] Open

Zhang R, Zhang CT. A Brief Review: The Z-curve Theory and its Application in Genome Analysis. Curr Genomics 2014;15:78-94. [PMID: 24822026 PMCID: PMC4009844 DOI: 10.2174/1389202915999140328162433] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2013] [Revised: 10/16/2013] [Accepted: 10/16/2013] [Indexed: 11/22/2022] Open

Chen S, Zhang CY, Song K. Recognizing short coding sequences of prokaryotic genome using a novel iteratively adaptive sparse partial least squares algorithm. Biol Direct 2013;8:23. [PMID: 24067167 PMCID: PMC3852556 DOI: 10.1186/1745-6150-8-23] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2013] [Accepted: 09/23/2013] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Significant efforts have been made to address the problem of identifying short genes in prokaryotic genomes. However, most known methods are not effective in detecting short genes. Because of the limited information contained in short DNA sequences, it is very difficult to accurately distinguish between protein coding and non-coding sequences in prokaryotic genomes. We have developed a new Iteratively Adaptive Sparse Partial Least Squares (IASPLS) algorithm as the classifier to improve the accuracy of the identification process.

RESULTS

For testing, we chose the short coding and non-coding sequences from seven prokaryotic organisms. We used seven feature sets (including GC content, Z-curve, etc.) of short genes.In comparison with GeneMarkS, Metagene, Orphelia, and Heuristic Approachs methods, our model achieved the best prediction performance in identification of short prokaryotic genes. Even when we focused on the very short length group ([60-100 nt)), our model provided sensitivity as high as 83.44% and specificity as high as 92.8%. These values are two or three times higher than three of the other methods while Metagene fails to recognize genes in this length range.The experiments also proved that the IASPLS can improve the identification accuracy in comparison with other widely used classifiers, i.e. Logistic, Random Forest (RF) and K nearest neighbors (KNN). The accuracy in using IASPLS was improved 5.90% or more in comparison with the other methods. In addition to the improvements in accuracy, IASPLS required ten times less computer time than using KNN or RF.

CONCLUSIONS

It is conclusive that our method is preferable for application as an automated method of short gene classification. Its linearity and easily optimized parameters make it practicable for predicting short genes of newly-sequenced or under-studied species.

Collapse

Re-annotation of protein-coding genes in the genome of saccharomyces cerevisiae based on support vector machines. PLoS One 2013;8:e64477. [PMID: 23874379 PMCID: PMC3707884 DOI: 10.1371/journal.pone.0064477] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2013] [Accepted: 04/15/2013] [Indexed: 11/19/2022] Open

Guo FB, Xiong L, Teng JLL, Yuen KY, Lau SKP, Woo PCY. Re-annotation of protein-coding genes in 10 complete genomes of Neisseriaceae family by combining similarity-based and composition-based methods. DNA Res 2013;20:273-86. [PMID: 23571676 PMCID: PMC3686433 DOI: 10.1093/dnares/dst009] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open

Detecting the borders between coding and non-coding DNA regions in prokaryotes based on recursive segmentation and nucleotide doublets statistics. BMC Genomics 2012;13 Suppl 8:S19. [PMID: 23282225 PMCID: PMC3535712 DOI: 10.1186/1471-2164-13-s8-s19] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

SNR of DNA sequences mapped by general affine transformations of the indicator sequences. J Math Biol 2012;67:433-51. [DOI: 10.1007/s00285-012-0564-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2011] [Revised: 07/02/2012] [Indexed: 10/28/2022]

Song K, Zhang Z, Tong TP, Wu F. Classifier assessment and feature selection for recognizing short coding sequences of human genes. J Comput Biol 2012;19:251-60. [PMID: 22401589 DOI: 10.1089/cmb.2011.0078] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Abstract

With the ever-increasing pace of genome sequencing, there is a great need for fast and accurate computational tools to automatically identify genes in these genomes. Although great progress has been made in the development of gene-finding algorithms during the past decades, there is still room for further improvement. In particular, the issue of recognizing short exons in eukaryotes is still not solved satisfactorily. This article is devoted to assessing various linear and kernel-based classification algorithms and selecting the best combination of Z-curve features for further improvement of the issue. Eight state-of-the-art linear and kernel-based supervised pattern recognition techniques were used to identify the short (21-192 bp) coding sequences of human genes. By measuring the prediction accuracy, the tradeoff between sensitivity and specificity and the time consumption, partial least squares (PLS) and kernel partial least squares (KPLS) algorithms were verified to be the most optimal linear and kernel-based classifiers, respectively. A surprising result was that, by making good use of the interpretability of the PLS and the Z-curve methods, 93 Z-curve features were proved to be the best selective combination. Using them, the average recognition accuracy was improved as high as 7.7% by means of KPLS when compared with what was obtained by the Fisher discriminant analysis using 189 Z-curve variables (Gao and Zhang, 2004 ). The used codes are freely available from the following approaches (implemented in MATLAB and supported on Linux and MS Windows): (1) SVM: http://www.support-vector-machines.org/SVM_soft.html. (2) GP: http://www.gaussianprocess.org. (3) KPLS and KFDA: Taylor, J.S., and Cristianini, N. 2004. Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge, UK. (4) PLS: Wise, B.M., and Gallagher, N.B. 2011. PLS-Toolbox for use with MATLAB: ver 1.5.2. Eigenvector Technologies, Manson, WA. Supplementary Material for this article is available at www.liebertonline.com/cmb.

Collapse

Goli B, Nair AS. The elusive short gene – an ensemble method for recognition for prokaryotic genome. Biochem Biophys Res Commun 2012;422:36-41. [DOI: 10.1016/j.bbrc.2012.04.090] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2012] [Accepted: 04/17/2012] [Indexed: 10/28/2022]

Chen B, Ji P. Numericalization of the self adaptive spectral rotation method for coding region prediction. J Theor Biol 2011;296:95-102. [PMID: 22178641 DOI: 10.1016/j.jtbi.2011.12.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2011] [Revised: 10/24/2011] [Accepted: 12/01/2011] [Indexed: 11/27/2022]

Yu JF, Xiao K, Jiang DK, Guo J, Wang JH, Sun X. An integrative method for identifying the over-annotated protein-coding genes in microbial genomes. DNA Res 2011;18:435-49. [PMID: 21903723 PMCID: PMC3223076 DOI: 10.1093/dnares/dsr030] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open

Bielińska-Wąż D. Graphical and numerical representations of DNA sequences: statistical aspects of similarity. JOURNAL OF MATHEMATICAL CHEMISTRY 2011;49:2345. [PMID: 32214591 PMCID: PMC7087963 DOI: 10.1007/s10910-011-9890-8] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/18/2011] [Accepted: 07/22/2011] [Indexed: 05/10/2023]

Sahu SS, Panda G. Identification of protein-coding regions in DNA sequences using a time-frequency filtering approach. GENOMICS, PROTEOMICS & BIOINFORMATICS 2011;9:45-55. [PMID: 21641562 PMCID: PMC5054166 DOI: 10.1016/s1672-0229(11)60007-7] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/19/2010] [Accepted: 10/31/2010] [Indexed: 11/13/2022]

Zhang R. A rebuttal to the comments on the genome order index and the Z-curve. Biol Direct 2011;6:10. [PMID: 21324187 PMCID: PMC3046898 DOI: 10.1186/1745-6150-6-10] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2010] [Accepted: 02/16/2011] [Indexed: 11/15/2022] Open

Abstract

Background

Elhaik, Graur and Josic recently commented on the genome order index (S) and the Z-curve (Elhaik et al. Biol Direct 2010, 5: 10). S is a quantity defined as S = a²+ c²+ g²+ t², where a, c, g and t denote corresponding base frequencies. The Z-curve is a three dimensional curve that represents a DNA sequence in the manner that each can be uniquely reconstructed given the other. Elhaik et al. made 4 major claims. 1) In the previous mapping system with the regular tetrahedron, calculation of the radius of the inscribed sphere is "a mathematical error". 2) S follows an exponential distribution and is narrowly distributed with a range of (0.25 - 0.33). 3) Based on the Chargaff's second parity rule (PR2), "S is equivalent to H [Shannon entropy]" and they are derivable from each other. 4) Z-curve "suffers from over dimensionality", because based on the analysis of 235 bacterial genomes, x and y components contributed only less than 1% of the variance and therefore "would be of little use".

Results

1) Elhaik et al. mistakenly neglected the parameter 4/3 when calculating the radius of the inscribed sphere. 2) The exponential distribution of S is a restatement of our previous conclusion, and the range of (0.25 - 0.33) only paraphrases the previously suggested S range (0.25 -1/3). 3) Elhaik et al. incorrectly disregard deviations from PR2 by treating the deviations as 0 altogether, reduce S and H, both having 4 variables, a, c, g and t, into functions of one single variable, a only, and apply this treatment to all DNA sequences as the basis of their "demonstration", which is therefore invalid. 4) Elhaik et al. confuse numeral smallness with biological insignificance, and disregard the distributions of purine/pyrimidine and amino/keto bases (x and y components), the variations of which, although can be less than that of GC content, contain rich information that is important and useful, such as in locating replication origins of bacterial and archaeal genomes, and in studies of gene recognition in various species.

Conclusion

Elhaik et al. confuse S (a single number) with Z-curve (a series of 3D coordinates), which are distinct. To use S as a case study of Z-curve, by itself, is invalid. S and H are neither equivalent nor derivable from each other. The criticisms of Elhaik, Graur and Josic are wrong.

Reviewers

This article was reviewed by Erik van Nimwegen.

Collapse

Yu JF, Sun X. Reannotation of protein-coding genes based on an improved graphical representation of DNA sequence. J Comput Chem 2010;31:2126-35. [PMID: 20175214 DOI: 10.1002/jcc.21500] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

Chen B, Ji P. Visualization of the protein-coding regions with a self adaptive spectral rotation approach. Nucleic Acids Res 2010;39:e3. [PMID: 20947567 PMCID: PMC3017620 DOI: 10.1093/nar/gkq891] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Vector representations and related matrices of DNA primary sequence based on L-tuple. Math Biosci 2010;227:147-52. [DOI: 10.1016/j.mbs.2010.07.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2007] [Revised: 07/24/2010] [Accepted: 07/27/2010] [Indexed: 11/24/2022]

Ji G, Wu X, Shen Y, Huang J, Quinn Li Q. A classification-based prediction model of messenger RNA polyadenylation sites. J Theor Biol 2010;265:287-96. [DOI: 10.1016/j.jtbi.2010.05.015] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2009] [Revised: 03/21/2010] [Accepted: 05/13/2010] [Indexed: 12/30/2022]

Luo L, Li H, Zhang L. ORF organization and gene recognition in the yeast genome. Comp Funct Genomics 2010;4:318-28. [PMID: 18629282 PMCID: PMC2448446 DOI: 10.1002/cfg.292] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2002] [Revised: 03/03/2003] [Accepted: 03/10/2003] [Indexed: 11/10/2022] Open

Wood V, Rutherford KM, Ivens A, Rajandream MA, Barrell B. A re-annotation of the Saccharomyces cerevisiae genome. Comp Funct Genomics 2010;2:143-54. [PMID: 18628908 PMCID: PMC2447204 DOI: 10.1002/cfg.86] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2001] [Accepted: 04/19/2001] [Indexed: 11/22/2022] Open

Gao N, Chen LL, Ji HF, Wang W, Chang JW, Gao B, Zhang L, Zhang SC, Zhang HY. DIGA--a database of improved gene annotation for phytopathogens. BMC Genomics 2010;11:54. [PMID: 20089203 PMCID: PMC2825234 DOI: 10.1186/1471-2164-11-54] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2009] [Accepted: 01/21/2010] [Indexed: 11/28/2022] Open

Lee A, Hansen KD, Bullard J, Dudoit S, Sherlock G. Novel low abundance and transient RNAs in yeast revealed by tiling microarrays and ultra high-throughput sequencing are not conserved across closely related yeast species. PLoS Genet 2008;4:e1000299. [PMID: 19096707 PMCID: PMC2601015 DOI: 10.1371/journal.pgen.1000299] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2008] [Accepted: 11/06/2008] [Indexed: 11/18/2022] Open

Abstract

A complete description of the transcriptome of an organism is crucial for a comprehensive understanding of how it functions and how its transcriptional networks are controlled, and may provide insights into the organism's evolution. Despite the status of Saccharomyces cerevisiae as arguably the most well-studied model eukaryote, we still do not have a full catalog or understanding of all its genes. In order to interrogate the transcriptome of S. cerevisiae for low abundance or rapidly turned over transcripts, we deleted elements of the RNA degradation machinery with the goal of preferentially increasing the relative abundance of such transcripts. We then used high-resolution tiling microarrays and ultra high–throughput sequencing (UHTS) to identify, map, and validate unannotated transcripts that are more abundant in the RNA degradation mutants relative to wild-type cells. We identified 365 currently unannotated transcripts, the majority presumably representing low abundance or short-lived RNAs, of which 185 are previously unknown and unique to this study. It is likely that many of these are cryptic unstable transcripts (CUTs), which are rapidly degraded and whose function(s) within the cell are still unclear, while others may be novel functional transcripts. Of the 185 transcripts we identified as novel to our study, greater than 80 percent come from regions of the genome that have lower conservation scores amongst closely related yeast species than 85 percent of the verified ORFs in S. cerevisiae. Such regions of the genome have typically been less well-studied, and by definition transcripts from these regions will distinguish S. cerevisiae from these closely related species.

The budding yeast Saccharomyces cerevisiae, because of the relative ease of its genetic manipulation and its ease of handling in the laboratory, has long served as a model on which studies in higher organisms have been based. To more fully understand how eukaryotic cells express their genomes, we sought to identify RNA species that are transcribed at very low levels or that are rapidly degraded. We created mutants deficient in the ability to degrade RNA, with the expectation that this would increase the relative abundance of such RNAs, and then used high-resolution microarrays and sequencing technologies to locate and identify from where these RNAs are transcribed. Using this approach, we have identified 365 transcripts that do not appear in the most current list of annotated S. cerevisiae RNA transcripts; of these, 185 are unique to our study. Many of these novel transcripts derive from regions of the genome that are poorly conserved between S. cerevisiae and other closely related yeast species, suggesting that these RNAs may play an important role in the divergent microevolution of S. cerevisiae.

Collapse

Bioinformatics in China: a personal perspective. PLoS Comput Biol 2008;4:e1000020. [PMID: 18437216 PMCID: PMC2291564 DOI: 10.1371/journal.pcbi.1000020] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open

Lin MF, Deoras AN, Rasmussen MD, Kellis M. Performance and scalability of discriminative metrics for comparative gene identification in 12 Drosophila genomes. PLoS Comput Biol 2008;4:e1000067. [PMID: 18421375 PMCID: PMC2291194 DOI: 10.1371/journal.pcbi.1000067] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2007] [Accepted: 03/20/2008] [Indexed: 01/22/2023] Open

Yang JY, Zhou Y, Yu ZG, Anh V, Zhou LQ. Human Pol II promoter recognition based on primary sequences and free energy of dinucleotides. BMC Bioinformatics 2008;9:113. [PMID: 18294399 PMCID: PMC2292139 DOI: 10.1186/1471-2105-9-113] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2007] [Accepted: 02/24/2008] [Indexed: 01/29/2023] Open

Chen LL, Ma BG, Gao N. Reannotation of hypothetical ORFs in plant pathogen Erwinia carotovora subsp. atroseptica SCRI1043. FEBS J 2007;275:198-206. [PMID: 18067578 DOI: 10.1111/j.1742-4658.2007.06190.x] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Ma BG. How to describe genes: Enlightenment from the quaternary number system. Biosystems 2007;90:20-7. [PMID: 16945479 DOI: 10.1016/j.biosystems.2006.06.004] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2005] [Revised: 06/15/2006] [Accepted: 06/19/2006] [Indexed: 11/17/2022]

Law NF, Cheng KO, Siu WC. On relationship of Z-curve and Fourier approaches for DNA coding sequence classification. Bioinformation 2006;1:242-6. [PMID: 17597898 PMCID: PMC1891701 DOI: 10.6026/97320630001242] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2006] [Accepted: 11/02/2006] [Indexed: 11/23/2022] Open

Menconi G, Marangoni R. A Compression-Based Approach for Coding Sequences Identification. I. Application to Prokaryotic Genomes. J Comput Biol 2006;13:1477-88. [PMID: 17061923 DOI: 10.1089/cmb.2006.13.1477] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Fisk DG, Ball CA, Dolinski K, Engel SR, Hong EL, Issel-Tarver L, Schwartz K, Sethuraman A, Botstein D, Cherry JM, The Saccharomyces Genome Database Project. Saccharomyces cerevisiae S288C genome annotation: a working hypothesis. Yeast 2006;23:857-65. [PMID: 17001629 PMCID: PMC3040122 DOI: 10.1002/yea.1400] [Citation(s) in RCA: 83] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open

Gao F, Zhang CT. Isochore structures in the chicken genome. FEBS J 2006;273:1637-48. [PMID: 16623701 DOI: 10.1111/j.1742-4658.2006.05178.x] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Cao Y, Tung WW, Gao JB. Recurrence time statistics: versatile tools for genomic DNA sequence analysis. PROCEEDINGS. IEEE COMPUTATIONAL SYSTEMS BIOINFORMATICS CONFERENCE 2006:40-51. [PMID: 16447998 DOI: 10.1109/csb.2004.1332415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Gao J, Qi Y, Cao Y, Tung WW. Protein coding sequence identification by simultaneously characterizing the periodic and random features of DNA sequences. J Biomed Biotechnol 2006;2005:139-46. [PMID: 16046819 PMCID: PMC1184046 DOI: 10.1155/jbb.2005.139] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open

Zhang CT, Gao F, Zhang R. Segmentation algorithm for DNA sequences. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2005;72:041917. [PMID: 16383430 DOI: 10.1103/physreve.72.041917] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/07/2005] [Indexed: 05/05/2023]

Cao Y, Tung WW, Gao JB, Qi Y. Recurrence time statistics: versatile tools for genomic DNA sequence analysis. J Bioinform Comput Biol 2005;3:677-96. [PMID: 16108089 DOI: 10.1142/s0219720005001235] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2004] [Revised: 11/05/2004] [Accepted: 12/10/2004] [Indexed: 11/18/2022]

Wang Z, Chen Y, Li Y. A brief review of computational gene prediction methods. GENOMICS PROTEOMICS & BIOINFORMATICS 2005;2:216-21. [PMID: 15901250 PMCID: PMC5187414 DOI: 10.1016/s1672-0229(04)02028-5] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]

Kulkarni OC, Vigneshwar R, Jayaraman VK, Kulkarni BD. Identification of coding and non-coding sequences using local Holder exponent formalism. Bioinformatics 2005;21:3818-23. [PMID: 16118261 DOI: 10.1093/bioinformatics/bti639] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open