1
|
Jayasree K, Kumar Hota M, Dwivedi AK, Ranjan H, Srivastava VK. Identification of exon regions in eukaryotes using fine-tuned variational mode decomposition based on kurtosis and short-time discrete Fourier transform. NUCLEOSIDES, NUCLEOTIDES & NUCLEIC ACIDS 2024; 44:507-530. [PMID: 39126405 DOI: 10.1080/15257770.2024.2388785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Revised: 07/29/2024] [Accepted: 07/31/2024] [Indexed: 08/12/2024]
Abstract
In genomic research, identifying the exon regions in eukaryotes is the most cumbersome task. This article introduces a new promising model-independent method based on short-time discrete Fourier transform (ST-DFT) and fine-tuned variational mode decomposition (FTVMD) for identifying exon regions. The proposed method uses the N/3 periodicity property of the eukaryotic genes to detect the exon regions using the ST-DFT. However, background noise is present in the spectrum of ST-DFT since the sliding rectangular window produces spectral leakage. To overcome this, FTVMD is proposed in this work. VMD is more resilient to noise and sampling errors than other decomposition techniques because it utilizes the generalization of the Wiener filter into several adaptive bands. The performance of VMD is affected due to the improper selection of the penalty factor (α), and the number of modes (K). Therefore, in fine-tuned VMD, the parameters of VMD (K and α) are optimized by maximum kurtosis value. The main objective of this article is to enhance the accuracy in the identification of exon regions in a DNA sequence. At last, a comparative study demonstrates that the proposed technique is superior to its counterparts.
Collapse
Affiliation(s)
- K Jayasree
- Department of Communication Engineering, School of Electronics Engineering, Vellore Institute of Technology, Vellore, India
| | - Malaya Kumar Hota
- Department of Communication Engineering, School of Electronics Engineering, Vellore Institute of Technology, Vellore, India
| | - Atul Kumar Dwivedi
- Department of Communication Engineering, School of Electronics Engineering, Vellore Institute of Technology, Vellore, India
| | - Himanshuram Ranjan
- Department of Communication Engineering, School of Electronics Engineering, Vellore Institute of Technology, Vellore, India
| | - Vinay Kumar Srivastava
- Department of Electronics and Communication Engineering, Motilal Nehru National Institute of Technology, Allahabad, India
| |
Collapse
|
2
|
Raman Kumar M, Vaegae NK. A new numerical approach for DNA representation using modified Gabor wavelet transform for the identification of protein coding regions. Biocybern Biomed Eng 2020. [DOI: 10.1016/j.bbe.2020.03.007] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
3
|
Li J, Zhang L, Li H, Ping Y, Xu Q, Wang R, Tan R, Wang Z, Liu B, Wang Y. Integrated entropy-based approach for analyzing exons and introns in DNA sequences. BMC Bioinformatics 2019; 20:283. [PMID: 31182012 PMCID: PMC6557737 DOI: 10.1186/s12859-019-2772-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND Numerous essential algorithms and methods, including entropy-based quantitative methods, have been developed to analyze complex DNA sequences since the last decade. Exons and introns are the most notable components of DNA and their identification and prediction are always the focus of state-of-the-art research. RESULTS In this study, we designed an integrated entropy-based analysis approach, which involves modified topological entropy calculation, genomic signal processing (GSP) method and singular value decomposition (SVD), to investigate exons and introns in DNA sequences. We optimized and implemented the topological entropy and the generalized topological entropy to calculate the complexity of DNA sequences, highlighting the characteristics of repetition sequences. By comparing digitalizing entropy values of exons and introns, we observed that they are significantly different. After we converted DNA data to numerical topological entropy value, we applied SVD method to effectively investigate exon and intron regions on a single gene sequence. Additionally, several genes across five species are used for exon predictions. CONCLUSIONS Our approach not only helps to explore the complexity of DNA sequence and its functional elements, but also provides an entropy-based GSP method to analyze exon and intron regions. Our work is feasible across different species and extendable to analyze other components in both coding and noncoding region of DNA sequences.
Collapse
Affiliation(s)
- Junyi Li
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong, 518055 China
| | - Li Zhang
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong, 518055 China
| | - Huinian Li
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong, 518055 China
| | - Yuan Ping
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong, 518055 China
| | - Qingzhe Xu
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong, 518055 China
| | - Rongjie Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, 150001 China
| | - Renjie Tan
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, 150001 China
| | - Zhen Wang
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031 China
| | - Bo Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, 150001 China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong, 518055 China
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, 150001 China
| |
Collapse
|
4
|
Kar S, Ganguly M, Das S. USING DIT-FFT ALGORITHM FOR IDENTIFICATION OF PROTEIN CODING REGION IN EUKARYOTIC GENE. BIOMEDICAL ENGINEERING-APPLICATIONS BASIS COMMUNICATIONS 2019. [DOI: 10.4015/s1016237219500029] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The new research platform on biomedical engineering by Digital Signal Processing (DSP) is playing a vital role in the prediction of protein coding regions (Exons) from genomic sequences with great accuracy. We can determine the protein coding area in DNA sequences with the help of period-3 property. It has been seen that in order to find out the period-3 property, the DFT algorithm is mostly used but in this paper, we have tested FFT algorithm instead of DFT algorithm. DSP is basically concerned with processing numerical sequences. When digital signal processing used in DNA sequences analysis, it requires conversion of base characters sequence to the numerical version. The numerical representation of DNA sequences strongly impacts the biological properties mirrored through the numerical genre. In this work, the proposed technique based on DIT-FFT algorithm has been used to identify the exonic area with the help of integer value representation for transforming the DNA sequences. Digital filters are used to read out period 3 components from the output spectrum and to eliminate the unwanted high frequency noise from DNA sequences. To overcome background noise means to suppress the non-coding regions, i.e., Introns. Proposed algorithm is tested on four nucleotide sequences having single or multiple numbers of exons.
Collapse
Affiliation(s)
- Subhajit Kar
- Department of Electronics, West Bengal State University, Barasat, Kolkata 700126, India
| | - Madhabi Ganguly
- Department of Electronics, West Bengal State University, Barasat, Kolkata 700126, India
| | - Saptarshi Das
- Department of Electronics, West Bengal State University, Barasat, Kolkata 700126, India
| |
Collapse
|
5
|
Chowdhury B, Garai A, Garai G. An optimized approach for annotation of large eukaryotic genomic sequences using genetic algorithm. BMC Bioinformatics 2017; 18:460. [PMID: 29065853 PMCID: PMC5655831 DOI: 10.1186/s12859-017-1874-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2017] [Accepted: 10/17/2017] [Indexed: 01/06/2023] Open
Abstract
BACKGROUND Detection of important functional and/or structural elements and identification of their positions in a large eukaryotic genomic sequence are an active research area. Gene is an important functional and structural unit of DNA. The computation of gene prediction is, therefore, very essential for detailed genome annotation. RESULTS In this paper, we propose a new gene prediction technique based on Genetic Algorithm (GA) to determine the optimal positions of exons of a gene in a chromosome or genome. The correct identification of the coding and non-coding regions is difficult and computationally demanding. The proposed genetic-based method, named Gene Prediction with Genetic Algorithm (GPGA), reduces this problem by searching only one exon at a time instead of all exons along with its introns. This representation carries a significant advantage in that it breaks the entire gene-finding problem into a number of smaller sub-problems, thereby reducing the computational complexity. We tested the performance of the GPGA with existing benchmark datasets and compared the results with well-known and relevant techniques. The comparison shows the better or comparable performance of the proposed method. We also used GPGA for annotating the human chromosome 21 (HS21) using cross-species comparisons with the mouse orthologs. CONCLUSION It was noted that the GPGA predicted true genes with better accuracy than other well-known approaches.
Collapse
Affiliation(s)
- Biswanath Chowdhury
- Department of Biophysics, Molecular Biology and Bioinformatics, University of Calcutta, Kolkata, 700009 WB India
| | - Arnav Garai
- Unit of Energy, Utilities, Communications and Services, Infosys Technologies Ltd., Bhubaneswar, 751024 Odisha India
| | - Gautam Garai
- Computational Sciences Division, Saha Institute of Nuclear Physics, Kolkata, 700064 WB India
| |
Collapse
|