1
|
Yang W, Li F, Zhao Y, Lu X, Yang S, Zhu P. Quantitative analysis of heavy metals in soil by X-ray fluorescence with PCA-ANOVA and support vector regression. ANALYTICAL METHODS : ADVANCING METHODS AND APPLICATIONS 2022; 14:3944-3952. [PMID: 36222117 DOI: 10.1039/d2ay00593j] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Heavy metal concentration is an important index for evaluating soil pollution. It is of great significance to measure the trace element content accurately for green agriculture development. In order to detect the trace element content accurately, a new prediction framework including pre-processing, signal extraction, feature selection and decision-making was proposed. The energy dispersive X-ray fluorescence (ED-XRF) spectra of 57 national standard soil samples were investigated based on the proposed methods. Firstly, an innovative background deduction method called iterative adaptive window empirical wavelet transform (IAWEWT) was introduced to extract effective counts of characteristic peaks, and the proposed approach was validated by the coefficient of determination (R2) of the instrumental calibration curve compared with two other conventional methods. Secondly, principal component analysis (PCA) was combined with the analysis of variance (ANOVA) for variable selection optimization of the ED-XRF spectrum. After PCA feature extraction and ANOVA variable selection treatment, the optimum number of principal components for V, Cr, Cu, Zn, Mo, Cd and Pb were determined to be 7, 15, 4, 4, 4, 5 and 12 respectively. Furthermore, the support vector regression (SVR) model was adopted for heavy metal estimation. The evaluation indices included R2 and root mean square error (RMSE). It was demonstrated that the predictive capabilities of seven heavy metal elements were improved substantially for elemental analysis by the proposed PCA-ANOVA-SVR model, with excellent results for V, Cr, Cu, Zn, Mo, Cd and Pb estimates, and the R2 values were 0.993, 0.996, 0.999, 0.999, 0.997, 0.998 and 0.998 respectively. Therefore, the new framework proposed in this paper can effectively eliminate redundant features and determine the concentration of trace elements in soil. It provides an effective alternative for the quantitative analysis of X-ray fluorescence spectrometry.
Collapse
Affiliation(s)
- Wanqi Yang
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, Sichuan 611731, P. R. China.
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou 313001, P. R. China
| | - Fusheng Li
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, Sichuan 611731, P. R. China.
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou 313001, P. R. China
| | - Yanchun Zhao
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, Sichuan 611731, P. R. China.
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou 313001, P. R. China
| | - Xin Lu
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, Sichuan 611731, P. R. China.
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou 313001, P. R. China
| | - Siyuan Yang
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, Sichuan 611731, P. R. China.
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou 313001, P. R. China
| | - Pengfei Zhu
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, Sichuan 611731, P. R. China.
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou 313001, P. R. China
| |
Collapse
|
2
|
Yadav Y, Sharma SN, Shakya DK. Detection of Tandem Repeats in DNA Sequences Using Short-Time Ramanujan Fourier Transform. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1583-1591. [PMID: 33493119 DOI: 10.1109/tcbb.2021.3053656] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Tandem repeats in genomic sequences are characterized by two or more contiguous copies of a pattern of nucleotides. The role of these repeats as molecular markers is well established in various genetic disorders, human evolution studies, DNA forensics and intron retention. In this work a computational method has been developed for the extraction of both exact and approximate tandem repeats. The proposed algorithm uses Ramanujan Fourier Transform (RFT) to identify periodicities in the DNA sequences. Since RFT estimates the period directly, rather than inferring it from the signal's spectrum, it provides a more sensitive and rapid detection of tandem repeats as compared to other available popular computational methods.
Collapse
|
3
|
Sharma S, Sharma SN, Saxena R. Identification of Short Exons Disunited by a Short Intron in Eukaryotic DNA Regions. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1660-1670. [PMID: 30794188 DOI: 10.1109/tcbb.2019.2900040] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Weak codon bias in short exons and separation by a short intron induces difficulty in extracting period-3 component that marks the presence of exonic regions. The annotation task of such short exons has been addressed in the proposed model independent signal processing based method with following features: (a) DNA sequences have been mapped using multiple mapping schemes, (b) period-3 spectrums corresponding to multiple mappings have been optimized to enhance short exon-short intron discrimination, and (c) spectrums corresponding to multiple mapping schemes have been subjected to Principal Component Analysis (PCA) for identifying greater number of such short exons. A comparative study with other methods indicates improved detection of contiguous short exons disunited by a short intron. Apart from the annotation of exonic and intronic regions, the proposed algorithm can also complement the methods for the detection of alternative splicing by intron retention, as one of the characteristic feature for intron retention is the presence of two short exons flanking a short intron.
Collapse
|
4
|
Identification of CpG Islands in DNA Sequences Using Short-Time Fourier Transform. Interdiscip Sci 2020; 12:355-367. [PMID: 32394270 DOI: 10.1007/s12539-020-00370-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Revised: 04/07/2020] [Accepted: 04/17/2020] [Indexed: 10/24/2022]
Abstract
In the era of big data analysis, genomics data analysis is highly needed to extract the hidden information present in the DNA sequences. One of the important hidden features present in the DNA sequences is CpG islands. CpG Islands are important as these are used as gene markers and also these are associated with cancer etc. Therefore, various methods have been reported for the identification of CpG islands in DNA sequences. The key contributions of this work are (i) extraction of the periodicity feature associated with CpG islands using Short-time Fourier transform (ii) a short-time Fourier transform-based algorithm has been proposed for the identification of CpG Islands in DNA sequences. The results of the proposed algorithm amply demonstrate its better performance as compared to other reported methods on CpG islands detection.
Collapse
|
5
|
Das L, Nanda S, Das JK. An integrated approach for identification of exon locations using recursive Gauss Newton tuned adaptive Kaiser window. Genomics 2018; 111:284-296. [PMID: 30342085 DOI: 10.1016/j.ygeno.2018.10.008] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2018] [Revised: 09/11/2018] [Accepted: 10/11/2018] [Indexed: 11/27/2022]
Abstract
Identification of exon location in a DNA sequence has been considered as the most demanding and challenging research topic in the field of Bioinformatics. This work proposes a robust approach combining the Trigonometric mapping with Adaptive tuned Kaiser Windowing approach for locating the protein coding regions (EXONS) in a genetic sequence. For better convergence as well as improved accurateness, the side lobe height control parameter (β) of Kaiser Window in the proposed algorithm is made adaptive to track the changing dynamics of the genetic sequence. This yields better tracking potential of the anticipated Adaptive Kaiser algorithm as it uses the recursive Gauss Newton tuning which in turn utilizes the covariance of the error signal to tune the β factor which has been shown through numerous simulation results under a variety of practical test conditions. A detailed comparative analysis with the existing mapping schemes, windowing techniques, and other signal processing methods like SVD, AN, DFT, STDFT, WT, and ST has also been included in the paper to focus on the strength and efficiency of the proposed approach. Moreover, some critical performance parameters have been computed using the proposed approach to investigate the effectiveness and robustness of the algorithm. In addition to this, the proposed approach has also been successfully applied on a number of benchmark gene sets like Musmusculus, Homosapiens, and C. elegans, etc., where the proposed approach revealed efficient prediction of exon location in contrast to the other existing mapping methods.
Collapse
Affiliation(s)
- Lopamudra Das
- School of Electronics Engineering, KIIT University, Bhubaneswar, India.
| | - Sarita Nanda
- School of Electronics Engineering, KIIT University, Bhubaneswar, India.
| | - J K Das
- School of Electronics Engineering, KIIT University, Bhubaneswar, India.
| |
Collapse
|