Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Yin C, Yau SST. Prediction of protein coding regions by the 3-base periodicity analysis of a DNA sequence. J Theor Biol 2007;247:687-94. [PMID: 17509616 DOI: 10.1016/j.jtbi.2007.03.038] [Citation(s) in RCA: 119] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2006] [Revised: 03/24/2007] [Accepted: 03/26/2007] [Indexed: 11/30/2022]

For:	Yin C, Yau SST. Prediction of protein coding regions by the 3-base periodicity analysis of a DNA sequence. J Theor Biol 2007;247:687-94. [PMID: 17509616 DOI: 10.1016/j.jtbi.2007.03.038] [Citation(s) in RCA: 119] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2006] [Revised: 03/24/2007] [Accepted: 03/26/2007] [Indexed: 11/30/2022]

Number

Cited by Other Article(s)

Yadav AK, Nagar BC, Pradhan G. FPGA Implementation of IIR Notch and Anti-Notch Filters With an Application to Localization of Protein Hot-Spots. IEEE Trans Nanobioscience 2023;22:863-871. [PMID: 37022064 DOI: 10.1109/tnb.2023.3238733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]

Wang Y, Zhao P, Du H, Cao Y, Peng Q, Fu L. LncDLSM: Identification of Long Non-Coding RNAs With Deep Learning-Based Sequence Model. IEEE J Biomed Health Inform 2023;27:2117-2127. [PMID: 37027676 DOI: 10.1109/jbhi.2023.3247805] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/24/2023]

Lehilahy M, Ferdi Y. Identification of exon locations in DNA sequences using a fractional digital anti-notch filter. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2022.104362] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]

Chen Z, Meng J, Zhao S, Yin C, Luan Y. sORFPred: A Method Based on Comprehensive Features and Ensemble Learning to Predict the sORFs in Plant LncRNAs. Interdiscip Sci 2023;15:189-201. [PMID: 36705893 DOI: 10.1007/s12539-023-00552-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Revised: 01/11/2023] [Accepted: 01/13/2023] [Indexed: 01/28/2023]

Pal J, Ghosh S, Maji B, Bhattacharya DK. Mathematical Approach to Protein Sequence Comparison Based on Physiochemical Properties. ACS OMEGA 2022;7:39446-39455. [PMID: 36340165 PMCID: PMC9631895 DOI: 10.1021/acsomega.2c06103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Accepted: 09/27/2022] [Indexed: 06/16/2023]

Ansari SA, Dantoft W, Ruiz-Orera J, Syed AP, Blachut S, van Heesch S, Hübner N, Uhlenhaut NH. Integrative analysis of macrophage ribo-Seq and RNA-Seq data define glucocorticoid receptor regulated inflammatory response genes into distinct regulatory classes. Comput Struct Biotechnol J 2022;20:5622-5638. [PMID: 36284713 PMCID: PMC9582734 DOI: 10.1016/j.csbj.2022.09.042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Revised: 09/28/2022] [Accepted: 09/28/2022] [Indexed: 11/03/2022] Open

Language Inference Using Elman Networks with Evolutionary Training. SIGNALS 2022. [DOI: 10.3390/signals3030037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open

Bonidia RP, Sampaio LDH, Domingues DS, Paschoal AR, Lopes FM, de Carvalho ACPLF, Sanches DS. Feature extraction approaches for biological sequences: a comparative study of mathematical features. Brief Bioinform 2021;22:6135010. [PMID: 33585910 DOI: 10.1093/bib/bbab011] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Revised: 12/13/2020] [Accepted: 01/07/2021] [Indexed: 11/14/2022] Open

A pattern recognition model to distinguish cancerous DNA sequences via signal processing methods. Soft comput 2020. [DOI: 10.1007/s00500-020-04942-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]

Paredes O, Romo-Vázquez R, Román-Godínez I, Vélez-Pérez H, Salido-Ruiz RA, Morales JA. Frequency spectra characterization of noncoding human genomic sequences. Genes Genomics 2020;42:1215-1226. [DOI: 10.1007/s13258-020-00980-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2019] [Accepted: 04/27/2020] [Indexed: 11/28/2022]

Raman Kumar M, Vaegae NK. A new numerical approach for DNA representation using modified Gabor wavelet transform for the identification of protein coding regions. Biocybern Biomed Eng 2020. [DOI: 10.1016/j.bbe.2020.03.007] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Michel CJ, Thompson JD. Identification of a circular code periodicity in the bacterial ribosome: origin of codon periodicity in genes? RNA Biol 2020;17:571-583. [PMID: 31960748 PMCID: PMC8647727 DOI: 10.1080/15476286.2020.1719311] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Revised: 01/10/2020] [Accepted: 01/14/2020] [Indexed: 02/09/2023] Open

Abstract

Three-base periodicity (TBP), where nucleotides and higher order n-tuples are preferentially spaced by 3, 6, 9, etc. bases, is a well-known intrinsic property of protein-coding DNA sequences. However, its origins are still not fully understood. One hypothesis is that the periodicity reflects a primordial coding system that was used before the emergence of the modern standard genetic code (SGC). Recent evidence suggests that the X circular code, a set of 20 trinucleotides allowing the reading frames in genes to be retrieved locally, represents a possible ancestor of the SGC. Motifs from the X circular code have been found in the reading frame of protein-coding regions in extant organisms from bacteria to eukaryotes, in many transfer RNA (tRNA) genes and in important functional regions of the ribosomal RNA (rRNA), notably in the peptidyl transferase centre and the decoding centre. Here, we have used a powerful correlation function to search for periodicity patterns involving the 20 trinucleotides of the X circular code in a large set of bacterial protein-coding genes, as well as in the translation machinery, including rRNA and tRNA sequences. As might be expected, we found a strong circular code periodicity 0 modulo 3 in the protein-coding genes. More surprisingly, we also identified a similar circular code periodicity in a large region of the 16S rRNA. This region includes the 3' major domain corresponding to the primordial proto-ribosome decoding centre and containing numerous sites that interact with the tRNA and messenger RNA (mRNA) during translation. Furthermore, 3D structural analysis shows that the periodicity region surrounds the mRNA channel that lies between the head and the body of the SSU. Our results support the hypothesis that the X circular code may constitute an ancestral translation code involved in reading frame retrieval and maintenance, traces of which persist in modern mRNA, tRNA and rRNA despite their long evolution and adaptation to the SGC.

Collapse

Camargo AP, Sourkov V, Pereira G, Carazzolle M. RNAsamba: neural network-based assessment of the protein-coding potential of RNA sequences. NAR Genom Bioinform 2020;2:lqz024. [PMID: 33575571 PMCID: PMC7671399 DOI: 10.1093/nargab/lqz024] [Citation(s) in RCA: 64] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2019] [Revised: 11/15/2019] [Accepted: 12/17/2019] [Indexed: 02/06/2023] Open

Suvorova YM, Korotkova MA, Skryabin KG, Korotkov EV. Search for potential reading frameshifts in cds from Arabidopsis thaliana and other genomes. DNA Res 2019;26:157-170. [PMID: 30726896 PMCID: PMC6476729 DOI: 10.1093/dnares/dsy046] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2018] [Accepted: 12/07/2018] [Indexed: 01/01/2023] Open

Pei S, Dong R, He RL, Yau SST. Large-Scale Genome Comparison Based on Cumulative Fourier Power and Phase Spectra: Central Moment and Covariance Vector. Comput Struct Biotechnol J 2019;17:982-994. [PMID: 31384399 PMCID: PMC6661692 DOI: 10.1016/j.csbj.2019.07.003] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2019] [Revised: 06/24/2019] [Accepted: 07/10/2019] [Indexed: 01/04/2023] Open

Suvorova YM, Korotkov EV. New Method for Potential Fusions Detection in Protein-Coding Sequences. J Comput Biol 2019;26:1253-1261. [PMID: 31211597 DOI: 10.1089/cmb.2019.0122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022] Open

Suvorova YM, Pugacheva VM, Korotkov EV. A Database of Potential Reading Frame Shifts in Coding Sequences from Different Eukaryotic Genomes. Biophysics (Nagoya-shi) 2019. [DOI: 10.1134/s0006350919030217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open

Li Z, Guan Y, Yuan X, Zheng P, Zhu H. Prediction of Sphingosine protein-coding regions with a self adaptive spectral rotation method. PLoS One 2019;14:e0214442. [PMID: 30943219 PMCID: PMC6447165 DOI: 10.1371/journal.pone.0214442] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2019] [Accepted: 03/13/2019] [Indexed: 01/08/2023] Open

Huang HH, Girimurugan SB. Discrete Wavelet Packet Transform Based Discriminant Analysis for Whole Genome Sequences. Stat Appl Genet Mol Biol 2019;18:/j/sagmb.ahead-of-print/sagmb-2018-0045/sagmb-2018-0045.xml. [PMID: 30772870 DOI: 10.1515/sagmb-2018-0045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Bonidia RP, Sampaio LDH, Lopes FM, Sanches DS. Feature Extraction of Long Non-coding RNAs: A Fourier and Numerical Mapping Approach. PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS 2019. [DOI: 10.1007/978-3-030-33904-3_44] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Zhao J, Wang J, Jiang H. Detecting Periodicities in Eukaryotic Genomes by Ramanujan Fourier Transform. J Comput Biol 2018;25:963-975. [PMID: 29963923 DOI: 10.1089/cmb.2017.0252] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Huang HH, Girimurugan SB. A Novel Real-Time Genome Comparison Method Using Discrete Wavelet Transform. J Comput Biol 2017;25:405-416. [PMID: 29272149 DOI: 10.1089/cmb.2017.0115] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Hota MK, Srivastava VK. A multirate DSP structure for the identification of protein-coding regions. INT J BIOMATH 2017. [DOI: 10.1142/s1793524517501121] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Ahmad M, Jung LT, Bhuiyan AA. A biological inspired fuzzy adaptive window median filter (FAWMF) for enhancing DNA signal processing. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2017;149:11-17. [PMID: 28802326 DOI: 10.1016/j.cmpb.2017.06.021] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/15/2016] [Revised: 05/29/2017] [Accepted: 06/23/2017] [Indexed: 06/07/2023]

Abstract

BACKGROUND AND OBJECTIVE

Digital signal processing techniques commonly employ fixed length window filters to process the signal contents. DNA signals differ in characteristics from common digital signals since they carry nucleotides as contents. The nucleotides own genetic code context and fuzzy behaviors due to their special structure and order in DNA strand. Employing conventional fixed length window filters for DNA signal processing produce spectral leakage and hence results in signal noise. A biological context aware adaptive window filter is required to process the DNA signals.

METHODS

This paper introduces a biological inspired fuzzy adaptive window median filter (FAWMF) which computes the fuzzy membership strength of nucleotides in each slide of window and filters nucleotides based on median filtering with a combination of s-shaped and z-shaped filters. Since coding regions cause 3-base periodicity by an unbalanced nucleotides' distribution producing a relatively high bias for nucleotides' usage, such fundamental characteristic of nucleotides has been exploited in FAWMF to suppress the signal noise.

RESULTS

Along with adaptive response of FAWMF, a strong correlation between median nucleotides and the Π shaped filter was observed which produced enhanced discrimination between coding and non-coding regions contrary to fixed length conventional window filters. The proposed FAWMF attains a significant enhancement in coding regions identification i.e. 40% to 125% as compared to other conventional window filters tested over more than 250 benchmarked and randomly taken DNA datasets of different organisms.

CONCLUSION

This study proves that conventional fixed length window filters applied to DNA signals do not achieve significant results since the nucleotides carry genetic code context. The proposed FAWMF algorithm is adaptive and outperforms significantly to process DNA signal contents. The algorithm applied to variety of DNA datasets produced noteworthy discrimination between coding and non-coding regions contrary to fixed window length conventional filters.

Collapse

Chechetkin VR, Lobzin VV. Large-scale chromosome folding versus genomic DNA sequences: A discrete double Fourier transform technique. J Theor Biol 2017;426:162-179. [PMID: 28552553 DOI: 10.1016/j.jtbi.2017.05.033] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2017] [Revised: 04/23/2017] [Accepted: 05/23/2017] [Indexed: 12/15/2022]

Yin C, Yau SST. A coevolution analysis for identifying protein-protein interactions by Fourier transform. PLoS One 2017;12:e0174862. [PMID: 28430779 PMCID: PMC5400233 DOI: 10.1371/journal.pone.0174862] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2016] [Accepted: 03/16/2017] [Indexed: 12/29/2022] Open

Ahmad M, Jung LT, Bhuiyan AA. From DNA to protein: Why genetic code context of nucleotides for DNA signal processing? A review. Biomed Signal Process Control 2017. [DOI: 10.1016/j.bspc.2017.01.004] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]

Pal J, Ghosh S, Maji B, Bhattacharya DK. WITHDRAWN: A Novel Way of Comparing Protein Sequences Represented Under Physio-Chemical Properties of their Amino Acids. Comput Biol Chem 2017. [DOI: 10.1016/j.compbiolchem.2017.04.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]

Hou W, Pan Q, Peng Q, He M. A new method to analyze protein sequence similarity using Dynamic Time Warping. Genomics 2016;109:123-130. [PMID: 27974244 PMCID: PMC7125777 DOI: 10.1016/j.ygeno.2016.12.002] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2016] [Revised: 12/06/2016] [Accepted: 12/10/2016] [Indexed: 12/05/2022]

Morán Losada P, Fischer S, Chouvarine P, Tümmler B. Three-base periodicity of sites of sequence variation in Pseudomonas aeruginosa and Staphylococcus aureus core genomes. FEBS Lett 2016;590:3538-3543. [PMID: 27664047 DOI: 10.1002/1873-3468.12431] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2016] [Revised: 09/08/2016] [Accepted: 09/12/2016] [Indexed: 11/11/2022]

Gouveia S, Scotto MG, Weiß CH, Ferreira PJSG. Binary auto-regressive geometric modelling in a DNA context. J R Stat Soc Ser C Appl Stat 2016. [DOI: 10.1111/rssc.12172] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Hoang T, Yin C, Yau SST. Numerical encoding of DNA sequences by chaos game representation with application in similarity comparison. Genomics 2016;108:134-142. [PMID: 27538895 DOI: 10.1016/j.ygeno.2016.08.002] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2016] [Revised: 08/04/2016] [Accepted: 08/12/2016] [Indexed: 11/19/2022]

Gupta P, Rangan L, Ramesh TV, Gupta M. Comparative analysis of contextual bias around the translation initiation sites in plant genomes. J Theor Biol 2016;404:303-311. [PMID: 27316311 DOI: 10.1016/j.jtbi.2016.06.015] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2016] [Revised: 05/17/2016] [Accepted: 06/10/2016] [Indexed: 10/21/2022]

Pian C, Zhang G, Chen Z, Chen Y, Zhang J, Yang T, Zhang L. LncRNApred: Classification of Long Non-Coding RNAs and Protein-Coding Transcripts by the Ensemble Algorithm with a New Hybrid Feature. PLoS One 2016;11:e0154567. [PMID: 27228152 PMCID: PMC4882039 DOI: 10.1371/journal.pone.0154567] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2015] [Accepted: 04/15/2016] [Indexed: 12/31/2022] Open

Yin C, Wang J. Periodic power spectrum with applications in detection of latent periodicities in DNA sequences. J Math Biol 2016;73:1053-1079. [PMID: 26942584 DOI: 10.1007/s00285-016-0982-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2015] [Revised: 02/19/2016] [Indexed: 12/27/2022]

Woods T, Preeprem T, Lee K, Chang W, Vidakovic B. Characterizing exons and introns by regularity of nucleotide strings. Biol Direct 2016;11:6. [PMID: 26857564 PMCID: PMC4745173 DOI: 10.1186/s13062-016-0108-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2015] [Accepted: 01/28/2016] [Indexed: 11/13/2022] Open

Abstract

Background

Translation of nucleotides into a numeric form has been approached in many ways and has allowed researchers to investigate the properties of protein-coding sequences and noncoding sequences. Typically, more pronounced long-range correlations and increased regularity were found in intron-containing genes and in non-transcribed regulatory DNA sequences, compared to cDNA sequences or intron-less genes. The regularity is assessed by spectral tools defined on numerical translates. In most popular approaches of numerical translation the resulting spectra depend on the assignment of numerical values to nucleotides. Our contribution is to propose and illustrate a spectra which remains invariant to the translation rules used in traditional approaches.

Results

We outline a methodology for representing sequences of DNA nucleotides as numeric matrices in order to analytically investigate important structural characteristics of DNA. This representation allows us to compute the 2-dimensional wavelet transformation and assess regularity characteristics of the sequence via the slope of the wavelet spectra. In addition to computing a global slope measure for a sequence, we can apply our methodology for overlapping sections of nucleotides to obtain an “evolutionary slope.” To illustrate our methodology, we analyzed 376 gene sequences from the first chromosome of the honeybee.

Conclusion

For the genes analyzed, we find that introns are significantly more regular (lead to more negative spectral slopes) than exons, which agrees with the results from the literature where regularity is measured on “DNA walks”. However, unlike DNA walks where the nucleotides are assigned numerical values depending on nucleotide characteristics (purine-pyrimidine, weak-strong hydrogen bonds, keto-amino, etc.) or other spatial assignments, the proposed spectral tool is invariant to the assignment of nucleotides. Thus, ambiguity in numerical translation of nucleotides is eliminated.

Reviewers

This article was reviewed by Dr. Vladimir Kuznetsov, Professor Marek Kimmel and Dr. Natsuhiro Ichinose (nominated by Professor Masanori Arita).

Electronic supplementary material

The online version of this article (doi:10.1186/s13062-016-0108-7) contains supplementary material, which is available to authorized users.

Collapse

Chaley M, Kutyrkin V. Stochastic model of homogeneous coding and latent periodicity in DNA sequences. J Theor Biol 2016;390:106-16. [DOI: 10.1016/j.jtbi.2015.11.014] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2015] [Revised: 09/18/2015] [Accepted: 11/14/2015] [Indexed: 11/24/2022]

Zhang X, Shen Z, Zhang G, Shen Y, Chen M, Zhao J, Wu R. Short Exon Detection via Wavelet Transform Modulus Maxima. PLoS One 2016;11:e0163088. [PMID: 27635656 PMCID: PMC5026382 DOI: 10.1371/journal.pone.0163088] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2016] [Accepted: 09/04/2016] [Indexed: 02/05/2023] Open

Ahmad M, Jung LT, Bhuiyan MAA. On fuzzy semantic similarity measure for DNA coding. Comput Biol Med 2015;69:144-51. [PMID: 26773936 DOI: 10.1016/j.compbiomed.2015.12.017] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2015] [Revised: 12/22/2015] [Accepted: 12/23/2015] [Indexed: 11/28/2022]

Improved gene prediction by principal component analysis based autoregressive Yule-Walker method. Gene 2015;575:488-497. [PMID: 26385320 DOI: 10.1016/j.gene.2015.09.023] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2015] [Revised: 08/25/2015] [Accepted: 09/11/2015] [Indexed: 11/21/2022]

Yin C, Yau SST. An improved model for whole genome phylogenetic analysis by Fourier transform. J Theor Biol 2015;382:99-110. [PMID: 26151589 DOI: 10.1016/j.jtbi.2015.06.033] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2015] [Revised: 06/19/2015] [Accepted: 06/22/2015] [Indexed: 01/07/2023]

Abstract

DNA sequence similarity comparison is one of the major steps in computational phylogenetic studies. The sequence comparison of closely related DNA sequences and genomes is usually performed by multiple sequence alignments (MSA). While the MSA method is accurate for some types of sequences, it may produce incorrect results when DNA sequences undergone rearrangements as in many bacterial and viral genomes. It is also limited by its computational complexity for comparing large volumes of data. Previously, we proposed an alignment-free method that exploits the full information contents of DNA sequences by Discrete Fourier Transform (DFT), but still with some limitations. Here, we present a significantly improved method for the similarity comparison of DNA sequences by DFT. In this method, we map DNA sequences into 2-dimensional (2D) numerical sequences and then apply DFT to transform the 2D numerical sequences into frequency domain. In the 2D mapping, the nucleotide composition of a DNA sequence is a determinant factor and the 2D mapping reduces the nucleotide composition bias in distance measure, and thus improving the similarity measure of DNA sequences. To compare the DFT power spectra of DNA sequences with different lengths, we propose an improved even scaling algorithm to extend shorter DFT power spectra to the longest length of the underlying sequences. After the DFT power spectra are evenly scaled, the spectra are in the same dimensionality of the Fourier frequency space, then the Euclidean distances of full Fourier power spectra of the DNA sequences are used as the dissimilarity metrics. The improved DFT method, with increased computational performance by 2D numerical representation, can be applicable to any DNA sequences of different length ranges. We assess the accuracy of the improved DFT similarity measure in hierarchical clustering of different DNA sequences including simulated and real datasets. The method yields accurate and reliable phylogenetic trees and demonstrates that the improved DFT dissimilarity measure is an efficient and effective similarity measure of DNA sequences. Due to its high efficiency and accuracy, the proposed DFT similarity measure is successfully applied on phylogenetic analysis for individual genes and large whole bacterial genomes.

Collapse

Hoang T, Yin C, Zheng H, Yu C, Lucy He R, Yau SST. A new method to cluster DNA sequences using Fourier power spectrum. J Theor Biol 2015;372:135-45. [PMID: 25747773 PMCID: PMC7094126 DOI: 10.1016/j.jtbi.2015.02.026] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2014] [Revised: 01/15/2015] [Accepted: 02/23/2015] [Indexed: 11/27/2022]

Bioinformatics Analyses to Separate Species Specific mRNAs from Unknown Sequences in de novo Assembled Transcriptomes. ACTA ACUST UNITED AC 2015. [DOI: 10.1007/978-3-319-16480-9_32] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]

Yin C. Representation of DNA sequences in genetic codon context with applications in exon and intron prediction. J Bioinform Comput Biol 2014;13:1550004. [PMID: 25491390 DOI: 10.1142/s0219720015500043] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Yin C, Yin XE, Wang J. A Novel Method for Comparative Analysis of DNA Sequences by Ramanujan-Fourier Transform. J Comput Biol 2014;21:867-79. [DOI: 10.1089/cmb.2014.0120] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

A measure of DNA sequence similarity by Fourier Transform with applications on hierarchical clustering. J Theor Biol 2014;359:18-28. [DOI: 10.1016/j.jtbi.2014.05.043] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2014] [Revised: 04/22/2014] [Accepted: 05/29/2014] [Indexed: 12/24/2022]

Hua W, Wang J, Zhao J. Discrete Ramanujan transform for distinguishing the protein coding regions from other regions. Mol Cell Probes 2014;28:228-36. [DOI: 10.1016/j.mcp.2014.04.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2013] [Revised: 03/31/2014] [Accepted: 04/17/2014] [Indexed: 11/25/2022]

A two-stage exon recognition model based on synergetic neural network. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2014;2014:503132. [PMID: 24790638 PMCID: PMC3984832 DOI: 10.1155/2014/503132] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/30/2014] [Revised: 02/27/2014] [Accepted: 03/03/2014] [Indexed: 11/24/2022]

Roy M, Barman S. Effective gene prediction by high resolution frequency estimator based on least-norm solution technique. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2014;2014:2. [PMID: 24386895 PMCID: PMC3895782 DOI: 10.1186/1687-4153-2014-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/20/2013] [Accepted: 12/15/2013] [Indexed: 11/10/2022]

Balakirev ES, Chechetkin VR, Lobzin VV, Ayala FJ. Computational methods of identification of pseudogenes based on functionality: entropy and GC content. Methods Mol Biol 2014;1167:41-62. [PMID: 24823770 DOI: 10.1007/978-1-4939-0835-6_4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]