1
|
Sun X, Yan S, Liu C, Zhang S, Hu Y, Zhang H. Advancing water quality assurance: A comprehensive Investigation into quantitative bentazone detection in drinking water through surface-enhanced Raman spectroscopy and chemometric insights. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2025; 337:126067. [PMID: 40127613 DOI: 10.1016/j.saa.2025.126067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/29/2024] [Revised: 02/25/2025] [Accepted: 03/17/2025] [Indexed: 03/26/2025]
Abstract
The quality of water is paramount for human health and societal advancement. The widespread usage of pesticides, such as the herbicide bentazone, poses potential threats to water quality of sources and drinking water. This study employed Surface-Enhanced Raman spetroscopy (SERS) in conjunction with chemometric methods, aiming to provide a solution for the trace detection of bentazone residues in drinking water. Utilizing nano-silver sol as an efficient SERS substrate, SERS spectra were acquired for a total of 200 bentazone solution samples, comprising 20 distinct concentrations at ambient temperature. Feature selection and model optimization were conducted based on UVE (uninformative variable elimination), ICO (interval combination optimization), CARS (competitive adaptive re-weighted sampling) and BOSS (bootstrapping soft shinkage). These approaches significantly enhanced the accuracy and stability of the quantitative model. Simultaneously, employing the B3LYP method within density functional theory (DFT) and based on the 6-31G* (d,p) basis set, the Raman spectral characteristics of an individual molecule of bentazone were simulated and theoretically calculated. Molecular vibration modes corresponding to the characteristic peaks were analyzed. Subsequently, the correctness and interpretability of algorithms' selection of spectral region were validated based on this foundation. The results indicated that the BOSS-PLS model showed superior performance, yielded Rc2 = 0.99289, Rp2 = 0.96697, RMSEC = 0.41031 and RMSEP = 0.89701. The feature selection strategy of BOSS algorithm involved 42 variables, significantly fewer than original SERS spectra. Based on the SERS method, the final limits of detection (LOD) and quantification (LOQ) for residual bentazone in drinking water were determined to be 0.016 mg/L and 0.05 mg/L, respectively. These values were significantly lower than the standards set by the relevant standards. In summary, this study presented a novel and efficient approach by integrating SERS technology with the BOSS-PLS model, offering a feasible and reliable solution for the detection of trace residues of bentazone in drinking water.
Collapse
Affiliation(s)
- Xiaorong Sun
- School of Artificial Intelligence, Beijing Technology and Business University, Beijing 100048, China; Beijing Key Laboratory of Big Data Technology for Food Safety, Beijing Technology and Business University, Beijing 100048, China.
| | - Sining Yan
- School of Artificial Intelligence, Beijing Technology and Business University, Beijing 100048, China; Beijing Key Laboratory of Big Data Technology for Food Safety, Beijing Technology and Business University, Beijing 100048, China
| | - Cuiling Liu
- School of Artificial Intelligence, Beijing Technology and Business University, Beijing 100048, China; Beijing Key Laboratory of Big Data Technology for Food Safety, Beijing Technology and Business University, Beijing 100048, China
| | - Shanzhe Zhang
- School of Artificial Intelligence, Beijing Technology and Business University, Beijing 100048, China; Beijing Key Laboratory of Big Data Technology for Food Safety, Beijing Technology and Business University, Beijing 100048, China.
| | - Yiran Hu
- School of Artificial Intelligence, Beijing Technology and Business University, Beijing 100048, China; Beijing Key Laboratory of Big Data Technology for Food Safety, Beijing Technology and Business University, Beijing 100048, China
| | - Haoyue Zhang
- School of Artificial Intelligence, Beijing Technology and Business University, Beijing 100048, China; Beijing Key Laboratory of Big Data Technology for Food Safety, Beijing Technology and Business University, Beijing 100048, China
| |
Collapse
|
2
|
Wang T, Zheng Y, Xu L, Yun YH. Comprehensive comparison on different wavelength selection methods using several near-infrared spectral datasets with different dimensionalities. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2025; 331:125767. [PMID: 39874707 DOI: 10.1016/j.saa.2025.125767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/02/2024] [Revised: 01/13/2025] [Accepted: 01/18/2025] [Indexed: 01/30/2025]
Abstract
NIR spectroscopy is widely used in chemical analysis, agricultural science, food safety, and other fields, but its high dimensionality and data redundancy bring analytical challenges. This study aims to compare the performance of different wavelength selection methods in NIR spectral datasets with different dimensionalities to provide a reference for researchers. The wavelength selection methods in this study were classified into four categories according to their principles, which are partial least squares (PLS) parameter-based methods, intelligent optimization algorithms (IOA)-based methods, model population analysis (MPA)-based methods and wavelength interval selection (WIS) methods. The performance of the models was compared in terms of R2C, R2P, root mean square error of calibration (RMSEC), root mean square error of prediction (RMSEP), the number of selected variables, computational time, and the improvement ratio of RMSEP (iRMSEP). The results showed that the models established by MPA-based and WIS methods were more stable and superior to the other categories of wavelength selection methods in most datasets. During the twenty characteristic wavelength selection methods in this study, bootstrapping soft shrinkage (BOSS) and genetic algorithm interval partial least squares (GA-iPLS) show the best performance at the overall level.
Collapse
Affiliation(s)
- Tao Wang
- School of Food Science and Engineering, Hainan University, Haikou 570228 PR China
| | - Yun Zheng
- School of Food Science and Engineering, Hainan University, Haikou 570228 PR China
| | - Lilan Xu
- School of Food Science and Engineering, Hainan University, Haikou 570228 PR China
| | - Yong-Huan Yun
- School of Food Science and Engineering, Hainan University, Haikou 570228 PR China; Hainan Institute for Food Control, Key Laboratory of Tropical Fruits and Vegetables Quality and Safety for State Market Regulation, Haikou 570314 PR China.
| |
Collapse
|
3
|
Li B, Li W, Guo J, Wang H, Wan R, Liu Y, Fan M, Wang C, Yang S, Zhao L, Nie C. Outlier Removal with Weight Penalization and Aggregation: A Robust Variable Selection Method for Enhancing Near-Infrared Spectral Analysis Performance. Anal Chem 2025; 97:7325-7332. [PMID: 39970051 DOI: 10.1021/acs.analchem.4c07007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]
Abstract
Full-wavelength near-infrared (NIR) spectroscopy faces significant challenges due to the strong collinearity among spectral variables and the presence of variables that are highly sensitive to sample fluctuations. Additionally, not all spectral variables contribute equally to the NIR model. Weakly influential variables, although not important on their own, can provide substantial improvement when combined with stronger variables, thus increasing both model stability and prediction accuracy. Therefore, this study proposes a new variable selection method called outlier removal with weight penalization and aggregation (OR-WPA). The method begins by removing outlier spectral variables with high coefficient of variation, which enhances model stability. During the variable selection process, multiple submodels are constructed based on variable subsets, with variable weights assigned according to the absolute values of regression coefficients. A moving window is applied to average the weights, and variables with excessively high weights are penalized, promoting the selection of weakly influential variables that positively contribute to model accuracy. The variable space is iteratively reduced, and the subset of variables associated with the highest predictive accuracy is selected as the final characteristic variable combination. The OR-WPA method was evaluated on three NIR spectral data sets, involving corn, heated tobacco substrate, and flue-cured tobacco. The results were compared with three advanced variable selection methods: Monte Carlo uninformative variable elimination, competitive adaptive reweighted sampling, and bootstrapping soft shrinkage. The results indicate that OR-WPA demonstrates better predictive performance, particularly in predicting low-content components, where it significantly enhances both the accuracy and stability of the NIR model.
Collapse
Affiliation(s)
- Beibei Li
- Laboratory of Tobacco Chemistry, Zhengzhou Tobacco Research Institute of CNTC, Zhengzhou 450001, China
| | - Wenting Li
- Laboratory of Tobacco Chemistry, Zhengzhou Tobacco Research Institute of CNTC, Zhengzhou 450001, China
| | - Junwei Guo
- Laboratory of Tobacco Chemistry, Zhengzhou Tobacco Research Institute of CNTC, Zhengzhou 450001, China
| | - Hongbo Wang
- Laboratory of Tobacco Chemistry, Zhengzhou Tobacco Research Institute of CNTC, Zhengzhou 450001, China
| | - Ran Wan
- Laboratory of Tobacco Chemistry, Zhengzhou Tobacco Research Institute of CNTC, Zhengzhou 450001, China
| | - Yu Liu
- Laboratory of Tobacco Chemistry, Zhengzhou Tobacco Research Institute of CNTC, Zhengzhou 450001, China
| | - Meijuan Fan
- Laboratory of Tobacco Chemistry, Zhengzhou Tobacco Research Institute of CNTC, Zhengzhou 450001, China
| | - Cong Wang
- Laboratory of Tobacco Chemistry, Zhengzhou Tobacco Research Institute of CNTC, Zhengzhou 450001, China
| | - Song Yang
- Laboratory of Tobacco Chemistry, Zhengzhou Tobacco Research Institute of CNTC, Zhengzhou 450001, China
| | - Le Zhao
- Laboratory of Tobacco Chemistry, Zhengzhou Tobacco Research Institute of CNTC, Zhengzhou 450001, China
| | - Cong Nie
- Laboratory of Tobacco Chemistry, Zhengzhou Tobacco Research Institute of CNTC, Zhengzhou 450001, China
| |
Collapse
|
4
|
Sahu N, Mahanty B, Haldar D. Rapid quantification of pullulan in fermentation broth using UV-visible spectroscopy and partial least squares regression. ANALYTICAL METHODS : ADVANCING METHODS AND APPLICATIONS 2025; 17:2841-2849. [PMID: 40111211 DOI: 10.1039/d5ay00034c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/22/2025]
Abstract
Quantification of exopolysaccharide (EPS) production in fermentation broth requires solvent precipitation of the polymer, followed by acid or enzymatic hydrolysis, and colorimetric or chromatographic analysis. This lengthy multistep sample preparation and analysis is a major bottleneck in bioprocess monitoring. The development of a nondestructive analytical method requiring minimal sample preparation is warranted. In this study, partial least squares (PLS) regression models were developed to quantify pullulan in cell-free supernatant (PCS) and precipitated pullulan redissolved in distilled water (PDW) from spectral data (204-400 nm). Genetic algorithm, particle swarm optimization, competitive adaptive reweighted sampling, and adaptive bottom-up space exploration strategies were employed to select optimal spectral regions. The full-spectrum model on the PCS (5 latent variables, RMSECV: 0.020 g l-1, RCV2: 0.997) outperformed the PDW (3 latent variables, RCV2: 0.990). Adaptive bottom-up space exploration achieved the lowest RMSECV (0.009 g l-1 for the PCS, 0.027 g l-1 for the PDW), retaining just 16 and 21 spectral variables, respectively. The residual predictive deviation (RPD) for all PLS model variants remains satisfactory (>6.559). The method's limit of detection (0.021 g l-1) was suitable for quantifying pullulan in fermentation broth. The proposed method can be extended to other structurally similar biopolymers where PLS-based soft sensor integration would enable real-time monitoring and bioprocess control.
Collapse
Affiliation(s)
- Nageswar Sahu
- Division of Biotechnology, Karunya Institute of Technology and Sciences, Coimbatore-641114, Tamil Nadu, India.
| | - Biswanath Mahanty
- Division of Biotechnology, Karunya Institute of Technology and Sciences, Coimbatore-641114, Tamil Nadu, India.
| | - Dibyajyoti Haldar
- Division of Biotechnology, Karunya Institute of Technology and Sciences, Coimbatore-641114, Tamil Nadu, India.
| |
Collapse
|
5
|
Meng L, Ding P, Tan Y, Zhang Y, Zhao J. Study on the Ultrasonic-Assisted Extraction Process of Anthocyanin from Purple Cabbage with Deep Eutectic Solvent. Molecules 2025; 30:1281. [PMID: 40142057 PMCID: PMC11944879 DOI: 10.3390/molecules30061281] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2024] [Revised: 12/19/2024] [Accepted: 12/24/2024] [Indexed: 03/28/2025] Open
Abstract
In this paper, purple cabbage was used as raw material for ultrasonic-assisted extraction of anthocyanin with deep eutectic solvent. The effects of extraction solvent type, solid-liquid ratio, moisture, extraction temperature, and time on the yield of anthocyanin from purple cabbage were investigated by single factor test, and the feasibility of this extraction method was verified by standard addition recovery test. The test results showed that the optimal extraction results could be obtained when DES-5 (choline chloride/1, 2-propylene glycol/water) is used as extraction solvent, with solid-liquid ratio of 1:32, moisture of 50%, extraction temperature of 50 °C, and extraction time of 80 min. Under these conditions, the yield of anthocyanin extract purple cabbage reached 21.6%, and the recovery rates were 85.62-87.75%. Therefore, DES was a promising environmentally friendly solvent for extracting anthocyanins instead of organic solvent extraction.
Collapse
Affiliation(s)
- Lifen Meng
- School of Chemical Engineering, Guizhou University of Engineering Science, Bijie 551700, China; (L.M.); (P.D.); (Y.T.); (Y.Z.)
- Analysis and Testing Center, Guizhou University of Engineering Science, Bijie 551700, China
| | - Pengpeng Ding
- School of Chemical Engineering, Guizhou University of Engineering Science, Bijie 551700, China; (L.M.); (P.D.); (Y.T.); (Y.Z.)
| | - Ye Tan
- School of Chemical Engineering, Guizhou University of Engineering Science, Bijie 551700, China; (L.M.); (P.D.); (Y.T.); (Y.Z.)
| | - Yinying Zhang
- School of Chemical Engineering, Guizhou University of Engineering Science, Bijie 551700, China; (L.M.); (P.D.); (Y.T.); (Y.Z.)
| | - Jun Zhao
- School of Chemical Engineering, Guizhou University of Engineering Science, Bijie 551700, China; (L.M.); (P.D.); (Y.T.); (Y.Z.)
| |
Collapse
|
6
|
Tian J, Li M, Zhang X, Lei M, Ke L, Zou L. Enhancing moisture detection in coal gravels: A deep learning-based adaptive microwave spectra fusion method. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2024; 313:124147. [PMID: 38490123 DOI: 10.1016/j.saa.2024.124147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 02/22/2024] [Accepted: 03/09/2024] [Indexed: 03/17/2024]
Abstract
The accurate and effective detection of moisture in coal gravels is crucial. Conventional air oven-drying method suffers from prolonged processing times and their disruptive nature. This paper proposes a deep learning-based adaptive fusion method for multiple microwave spectra to non-destructively detect the moisture content of coal gravels. First, a purpose-built free-space measurement platform is employed to acquire microwave spectra of coal samples, encompassing the magnitude and phase spectra of reflection coefficients (S11) and transmission coefficients (S21). Subsequently, a Monte-Carlo cross-validation-based method is adopted to detect and eliminate outliers in the spectra. Furthermore, a novel feature extraction module is proposed, enhancing the traditional U-shaped network using residual learning (ResNet) and the convolutional block attention module (CBAM) to extract and reconstruct subtle spectral features. Inspired by the high-level data fusion, an adaptive spectra fusion method is then introduced that can autonomously balance the contributions between different spectra. The experimental results underscore the advantages of the proposed method, with narrow frequency intervals between 2.50-3.25 GHz, 3.75-4.00 GHz, and 4.75-5.00 GHz exhibiting superior detection accuracy compared to the entire frequency band, achieving R2 = 0.9034, MAE = 1.0254, RMSE = 1.2948 and RPIQ = 6.0630.
Collapse
Affiliation(s)
- Jun Tian
- Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, China University of Mining and Technology, Xuzhou 221116, China; Institute of Materials Research and Engineering, Agency for Science, Technology and Research (A*Star), Singapore
| | - Ming Li
- Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, China University of Mining and Technology, Xuzhou 221116, China.
| | - Xiaofu Zhang
- China Energy Coal Trading Group Limited, Cangzhou 061199, China
| | - Meng Lei
- Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, China University of Mining and Technology, Xuzhou 221116, China
| | - Lin Ke
- Institute of Materials Research and Engineering, Agency for Science, Technology and Research (A*Star), Singapore
| | - Liang Zou
- Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, China University of Mining and Technology, Xuzhou 221116, China
| |
Collapse
|
7
|
Li K, Zhang X, Zhang J, Du B, Song X, Wang G, Li Q, Zhang Y, Liu F, Zhang Z. Simultaneous Rapid Detection of Multiple Physicochemical Properties of Jet Fuel Using Near-Infrared Spectroscopy. ACS OMEGA 2024; 9:16138-16146. [PMID: 38617685 PMCID: PMC11007685 DOI: 10.1021/acsomega.3c09994] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 03/08/2024] [Accepted: 03/18/2024] [Indexed: 04/16/2024]
Abstract
Jet fuel is the primary fuel used in the aviation industry, and its quality has a direct impact on the safety and operational efficiency of aircraft. The accurate quantitative detection and analysis of various physicochemical property indicators are important for improving and ensuring the quality of jet fuel in the domestic market. This study used near-infrared (NIR) spectroscopy to establish a suitable model for the simultaneous and rapid detection of multiple physicochemical properties in jet fuel. Using more than 40 different sources of jet fuel, a rapid detection model was established by optimizing the spectral processing methods. The measurement models were separately built using the partial least-squares (PLS) and orthogonal PLS algorithms, and the model parameters were optimized. The results show that after the Savitzky-Golay second derivative preprocessing, the PLS model built using the feature spectra selected by the uninformative variable elimination wavelength algorithm achieved the best measurement performance. Compared with the PLS model without preprocessing, the range of the resulting accuracy improvement was at least 15.01%. Under the optimal model parameters, the calibration set regression coefficient (Rc2) of the 11 jet fuel property index models ranged from 0.9102 to 0.9763, with the root-mean-square error of calibration values up to 0.8468 °C (for flash points). The regression coefficient (Rp2) of the validation set ranged from 0.8239 to 0.9557, with the root-mean-square error of prediction values up to 1.1354 °C (for flash points). The ratios of prediction to deviation (RPD) values were all in the range of 1.9-3.0, indicating high accuracy and reliability of the model. The rapid NIR analysis method established in this study enables the simultaneous and rapid detection of multiple physicochemical properties of jet fuel, thereby providing effective technical support for ensuring the quality of jet fuel in the market.
Collapse
Affiliation(s)
- Ke Li
- Center
for Environmental Metrology, National Institute
of Metrology, Beijing 100029, China
| | - Xin Zhang
- College
of Environmental and Chemical Engineering, Dalian University, Dalian 116622, China
| | - Jing Zhang
- College
of Environmental and Chemical Engineering, Dalian University, Dalian 116622, China
| | - Biao Du
- Beijing
Yixingyuan Petrochemical Technology Co., Ltd., Beijing 101301, China
| | - Xiaoping Song
- Center
for Environmental Metrology, National Institute
of Metrology, Beijing 100029, China
| | - Guixuan Wang
- Beijing
Yixingyuan Petrochemical Technology Co., Ltd., Beijing 101301, China
| | - Qi Li
- Center
for Environmental Metrology, National Institute
of Metrology, Beijing 100029, China
| | - Yinglan Zhang
- Leibniz
Institut für Polymerforschung Dresden e.V., Hohe Straße 6, Dresden 01069, Germany
- Institut
für Werkstoffwissenschaft, Technische
Universität Dresden, Dresden 01062, Germany
| | - Fan Liu
- Center
for Environmental Metrology, National Institute
of Metrology, Beijing 100029, China
| | - Zhengdong Zhang
- Center
for Environmental Metrology, National Institute
of Metrology, Beijing 100029, China
| |
Collapse
|
8
|
Duan C, Liu X, Cai W, Shao X. Interpretable Perturbator for Variable Selection in near-Infrared Spectral Analysis. J Chem Inf Model 2024; 64:2508-2514. [PMID: 37801639 DOI: 10.1021/acs.jcim.3c01290] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/08/2023]
Abstract
A perturbator was developed for variable selection in near-infrared (NIR) spectral analysis based on the perturbation strategy in deep learning for developing interpretation methods. A deep learning predictor was first constructed to predict the targets from the spectra in the training set. Then, taking the output of the predictor as a reference, the perturbator was trained to derive the perturbation-positive (P+) and perturbation-negative (P-) features from the spectra. Therefore, the weight (σ) of the perturbator layer can be a criterion to evaluate the importance of the variables in the spectra. Ranking the spectral variables by the criterion, the number of the variables used in the quantitative model can be obtained through cross-validation. Three NIR data sets were used to evaluate the proposed method. The root mean squared error was found to be comparable with or superior to that obtained by the commonly used methods. Moreover, the selected spectral variables are interpretable in identifying the key spectral features related to the prediction target. Therefore, the proposed method provides not only an effective tool for optimizing quantitative model, but also an efficient way for explaining spectra of multicomponent samples.
Collapse
Affiliation(s)
- Chaoshu Duan
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, P. R. China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, P. R. China
| | - Xuyang Liu
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, P. R. China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, P. R. China
| | - Wensheng Cai
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, P. R. China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, P. R. China
| | - Xueguang Shao
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, P. R. China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, P. R. China
| |
Collapse
|
9
|
Chen S, Du K, Shan B, Xu Q, Zhang F. A hybrid variable selection method combining Fisher's linear discriminant combined population analysis and an improved binary cuckoo search algorithm. ANALYTICAL METHODS : ADVANCING METHODS AND APPLICATIONS 2024; 16:1021-1033. [PMID: 38312025 DOI: 10.1039/d3ay01942j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2024]
Abstract
In this paper, a novel hybrid variable selection method for model building by near-infrared (NIR) spectroscopy is proposed for composition measurement in industrial processes. A double-layer structure is designed for variable selection by combining Fisher's linear discriminant combined population analysis (FCPA) and an improved binary cuckoo search algorithm (IBCS). The Fisher classifier combined with model population analysis is used to select the variable interval wherein the useful variables are roughly located even when strong multicollinearity exists among spectral variables. Opposition-based learning (OBL) and jumping genes (JG) are introduced to improve the binary cuckoo search algorithm for the fine selection of key variables, thus avoiding the loss of excellent solutions due to randomness and the local optimum. Different variable selection methods were used to select variables for beer, corn, and diesel fuel datasets, and the partial least squares (PLS) algorithms were used to build calibration models to predict the original extract concentration of beer, the protein and starch content of corn, and the boiling point of diesel fuel, respectively. The results showed that the proposed PLS modeling method based on FCPA-IBCS has higher fitting accuracy and smaller prediction errors.
Collapse
Affiliation(s)
- Shuobo Chen
- College of Automation and Electronic Engineering, Qingdao University of Science & Technology, Qingdao, 266061, P. R. China.
| | - Kang Du
- College of Automation and Electronic Engineering, Qingdao University of Science & Technology, Qingdao, 266061, P. R. China.
| | - Baoming Shan
- College of Automation and Electronic Engineering, Qingdao University of Science & Technology, Qingdao, 266061, P. R. China.
| | - Qilei Xu
- College of Automation and Electronic Engineering, Qingdao University of Science & Technology, Qingdao, 266061, P. R. China.
| | - Fangkun Zhang
- College of Automation and Electronic Engineering, Qingdao University of Science & Technology, Qingdao, 266061, P. R. China.
| |
Collapse
|
10
|
Tang C, Jiang B, Ejaz I, Ameen A, Zhang R, Mo X, Wang Z. High-throughput phenotyping of nutritional quality components in sweet potato roots by near-infrared spectroscopy and chemometrics methods. Food Chem X 2023; 20:100916. [PMID: 38144853 PMCID: PMC10739761 DOI: 10.1016/j.fochx.2023.100916] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Revised: 09/18/2023] [Accepted: 09/30/2023] [Indexed: 12/26/2023] Open
Abstract
The lack of an efficient approach for quality evaluation of sweet potatoes significantly hinders progress in quality breeding. Therefore, this study aimed to establish a near-infrared spectroscopy (NIRS) assay for high-throughput analysis of sweet potato root quality, including total starch, amylose, amylopectin, the ratio of amylopectin to amylose, soluble sugar, crude protein, total flavonoid content, and total phenolic content. A total of 125 representative samples were utilized and a dual-optimized strategy (optimization of sample subset partitioning and variable selection) was applied to NIRS modeling. Eight optimal equations were developed with an excellent coefficient of determination for the calibration (R2C) at 0.95-0.99, cross-validation (R2CV) at 0.93-0.98, external validation (R2V) at 0.89-0.96, and the ratio of prediction to deviation (RPD) at 6.33-11.35. Overall, these NIRS models provide a feasible approach for high-throughput analysis of root quality and permit large-scale screening of elite germplasm in future sweet potato breeding.
Collapse
Affiliation(s)
- Chaochen Tang
- Crops Research Institute, Guangdong Academy of Agricultural Sciences & Key Laboratory of Crop Genetic Improvement of Guangdong Province, Guangzhou 510640, People's Republic of China
| | - Bingzhi Jiang
- Crops Research Institute, Guangdong Academy of Agricultural Sciences & Key Laboratory of Crop Genetic Improvement of Guangdong Province, Guangzhou 510640, People's Republic of China
| | - Irsa Ejaz
- College of Agronomy and Biotechnology, China Agricultural University, Beijing 100193, People's Republic of China
| | - Asif Ameen
- Arid Zone Research Centre, Pakistan Agricultural Research Council, Dera Ismail Khan, Pakistan
| | - Rong Zhang
- Crops Research Institute, Guangdong Academy of Agricultural Sciences & Key Laboratory of Crop Genetic Improvement of Guangdong Province, Guangzhou 510640, People's Republic of China
| | - Xueying Mo
- Crops Research Institute, Guangdong Academy of Agricultural Sciences & Key Laboratory of Crop Genetic Improvement of Guangdong Province, Guangzhou 510640, People's Republic of China
| | - Zhangying Wang
- Crops Research Institute, Guangdong Academy of Agricultural Sciences & Key Laboratory of Crop Genetic Improvement of Guangdong Province, Guangzhou 510640, People's Republic of China
| |
Collapse
|
11
|
Zeng Z, Zhang B, Zhan Y, Huo J, Shi Y, Li X, Zhe W, Li B, Zhang Y, Yang Q. Method Comparison of Sample Pretreatment and Discovery of Differential Compositions of Natural Flavors and Fragrances for Quality Analysis by Using Chemometric Tools. J Chromatogr B Analyt Technol Biomed Life Sci 2023; 1222:123690. [PMID: 37019038 DOI: 10.1016/j.jchromb.2023.123690] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Revised: 03/10/2023] [Accepted: 03/22/2023] [Indexed: 03/29/2023]
Abstract
Natural flavors and fragrances or their extracts have been widely used in a large variety of areas, including food, cosmetic, and tobacco industrial processes, among others. The compositions and intrinsic attributes of flavors and fragrances were related to many factors, such as species, geographical origin, planting environment, storage condition, processing method, and so on. This not only increased the difficulty in analyzing the product quality of flavors and fragrances, but also challenged the idea of "quality-by-design (QbD)". This work proposed an integrated strategy for precise discovery of differential compounds among different classes and subsequent quality analysis of complex samples through flavors and fragrances used in tobacco industry as examples. Three pretreatment methods were first inspected to effectively characterize the sample compositions, including direct injection (DI), thermal desorption (TD), and stir bar sorptive extraction (SBSE)-TD, coupled with gas chromatography-mass spectrometry (GC-MS) analysis to obtain characteristic information of samples of flavors and fragrances. Then, principal component analysis (PCA) was applied to discover the relation and difference between chromatographic fingerprints and peak table data once significant components were recognized in a holistic manner. Model population analysis (MPA) was then used to quantitatively extract the characteristic chemicals representing the quality differences among different classes of samples. Some differential marker compounds were discovered for difference analysis, including benzyl alcohol, latin acid, l-menthol acid, decanoic acid ethyl ester, vanillin, trans-o-coumaric acid, benzyl benzoate, and so on. Furthermore, partial least squares-discriminant analysis (PLS-DA) and support vector machine (SVM) were respectively applied to construct multivariate models for evaluation of quality differences and variations. It was found that the accuracy attains to 100% for sample classification. With the help of optimal sample pretreatment technique and chemometric methods, the strategy for quality analysis and difference discovery proposed in this work can be widely delivered to more areas of complex plants with good interpretability and high accuracy.
Collapse
Affiliation(s)
- Zhongda Zeng
- College of Environmental and Chemical Engineering, Dalian University, Dalian 116622, China
| | - Baohua Zhang
- College of Environmental and Chemical Engineering, Dalian University, Dalian 116622, China
| | - Yifei Zhan
- College of Environmental and Chemical Engineering, Dalian University, Dalian 116622, China
| | - Jinfeng Huo
- College of Environmental and Chemical Engineering, Dalian University, Dalian 116622, China
| | - Yingjiao Shi
- College of Environmental and Chemical Engineering, Dalian University, Dalian 116622, China
| | - Xianyi Li
- Technology Center of China Tobacco Yunnan Industrial Co. Ltd., Kunming 650231, China
| | - Wei Zhe
- Technology Center of China Tobacco Yunnan Industrial Co. Ltd., Kunming 650231, China
| | - Boyan Li
- School of Public Health, Guizhou Medical University, Guiyang 550025, China.
| | - Yipeng Zhang
- Technology Center of China Tobacco Yunnan Industrial Co. Ltd., Kunming 650231, China.
| | - Qianxu Yang
- Technology Center of China Tobacco Yunnan Industrial Co. Ltd., Kunming 650231, China.
| |
Collapse
|
12
|
Monitoring freshness of crayfish (Prokaryophyllus clarkii) through the combination of near-infrared spectroscopy and chemometric method. JOURNAL OF FOOD MEASUREMENT AND CHARACTERIZATION 2022. [DOI: 10.1007/s11694-022-01451-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
13
|
Xia J, Zhang J, Xiong Y, Min S. Feature selection of infrared spectra analysis with convolutional neural network. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2022; 266:120361. [PMID: 34601364 DOI: 10.1016/j.saa.2021.120361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Revised: 08/25/2021] [Accepted: 08/31/2021] [Indexed: 06/13/2023]
Abstract
Data-driven deep learning analysis, especially for convolution neural network (CNN), has been developed and successfully applied in many domains. CNN is regarded as a black box, and the main drawback is the lack of interpretation. In this study, an interpretable CNN model was presented for infrared data analysis. An ascending stepwise linear regression (ASLR)-based approach was leveraged to extract the informative neurons in the flatten layer from the trained model. The characteristic of CNN network was employed to visualize the active variables according to the extracted neurons. Partial least squares (PLS) model was presented for comparison on the performance of extracted features and model interpretation. The CNN models yielded accuracies with extracted features of 93.27%, 97.50% and 96.65% for Tablet, meat, and juice datasets on the test set, while the PLS-DA models obtained accuracies with latent variables (LVs) of 95.19%, 95.50% and 98.17%. Both the CNN and PLS models demonstrated the stable patterns on active variables. The repeatability of CNN model and proposed strategies were verified by conducting the Monte-Carlo cross-validation.
Collapse
Affiliation(s)
- Jingjing Xia
- College of Science, China Agricultural University, Beijing 100193, PR China
| | - Jixiong Zhang
- National Academy of Agriculture Green Development, College of Resources and Environmental Sciences, China Agricultural University, Beijing 100193, PR China.
| | - Yanmei Xiong
- College of Science, China Agricultural University, Beijing 100193, PR China.
| | - Shungeng Min
- College of Science, China Agricultural University, Beijing 100193, PR China.
| |
Collapse
|
14
|
Zhang P, Xu Z, Wang Q, Fan S, Cheng W, Wang H, Wu Y. A novel variable selection method based on combined moving window and intelligent optimization algorithm for variable selection in chemical modeling. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2021; 246:118986. [PMID: 33032116 DOI: 10.1016/j.saa.2020.118986] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/16/2020] [Revised: 08/26/2020] [Accepted: 09/21/2020] [Indexed: 06/11/2023]
Abstract
We propose a new wavelength selection algorithm based on combined moving window (CMW) and variable dimension particle swarm optimization (VDPSO) algorithm. CMW retains the advantages of the moving window algorithm, and different windows can overlap each other to realize automatic optimization of spectral interval width and number. VDPSO algorithms improve the PSO algorithm. They can search the data space in different dimensions, and reduce the risk of limited local extrema and over fitting. Four different high-performance variable selection algorithms-BOSS, VCPA, iVISSA and IRF-are compared in three NIR data sets (corn, beer and fuel). The results show that VDPSO-CMW has better performance. The Matlab codes for implementing PSO-CWM and VDPSO-CMW are freely available on the website: https://www.mathworks.com/matlabcentral/fileexchange/75828-a-variable-selection-method.
Collapse
Affiliation(s)
- Pengfei Zhang
- Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, China
| | - Zhuopin Xu
- Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, China; University of Science and Technology of China, Hefei 230026, China
| | - Qi Wang
- Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, China.
| | - Shuang Fan
- Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, China; University of Science and Technology of China, Hefei 230026, China
| | - Weimin Cheng
- Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, China; University of Science and Technology of China, Hefei 230026, China
| | - Haiping Wang
- Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, China; University of Science and Technology of China, Hefei 230026, China
| | - Yuejin Wu
- Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, China.
| |
Collapse
|
15
|
Sun J, Yang W, Feng M, Liu Q, Kubar MS. An efficient variable selection method based on random frog for the multivariate calibration of NIR spectra. RSC Adv 2020; 10:16245-16253. [PMID: 35498850 PMCID: PMC9052783 DOI: 10.1039/d0ra00922a] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Accepted: 04/08/2020] [Indexed: 11/29/2022] Open
Abstract
Variable selection is a critical step for spectrum modeling. In this study, a new method of variable interval selection based on random frog (RF), known as Interval Selection based on Random Frog (ISRF), is developed. In the ISRF algorithm, RF is used to search the most likely informative variables and then, a local search is applied to expand the interval width of the informative variables. Through multiple runs and visualization of the results, the best informative interval variables are obtained. This method was tested on three near infrared (NIR) datasets. Four variable selection methods, namely, genetic algorithm PLS (GA-PLS), random frog, interval random frog (iRF) and interval variable iterative space shrinkage approach (iVISSA) were used for comparison. The results show that the proposed method is very efficient to find the best interval variables and improve the model's prediction performance and interpretation.
Collapse
Affiliation(s)
- Jingjing Sun
- College of Agriculture, Shanxi Agricultural University South Min-Xian Road, Taigu Shanxi China
- College of Arts and Science, Shanxi Agricultural University South Min-Xian Road, Taigu Shanxi China
| | - Wude Yang
- College of Agriculture, Shanxi Agricultural University South Min-Xian Road, Taigu Shanxi China
| | - Meichen Feng
- College of Agriculture, Shanxi Agricultural University South Min-Xian Road, Taigu Shanxi China
| | - Qifang Liu
- College of Information Science and Engineering, Shanxi Agricultural University South Min-Xian Road, Taigu Shanxi China
| | - Muhammad Saleem Kubar
- College of Agriculture, Shanxi Agricultural University South Min-Xian Road, Taigu Shanxi China
| |
Collapse
|
16
|
Xia Z, Yi T, Liu Y. Rapid and nondestructive determination of sesamin and sesamolin in Chinese sesames by near-infrared spectroscopy coupling with chemometric method. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2020; 228:117777. [PMID: 31727518 DOI: 10.1016/j.saa.2019.117777] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/14/2019] [Revised: 11/06/2019] [Accepted: 11/06/2019] [Indexed: 06/10/2023]
Abstract
Sesame was one of the most important crops in Africa and east Asia. The sesamin and sesamolin in sesames have shown various pharmacological, biological and physiologic activities. In this study, a rapid and nondestructive method for determination of sesamin and sesamolin in Chinese sesames by near-infrared spectroscopy coupled with chemometric method was proposed. The near infrared spectra of sesame samples from three different Chinese areas were collected and the partial least squares (PLS) was used to construct the quantitative models. The spectral preprocessing and variable selection methods were adopted to improve the predictability and stability of the model. Reasonable quantitative results can be obtained when the samples used for model construction and prediction were harvested in same years. For sesamin and sesamolin, the correlation coefficient (R) and root mean square error prediction (RMSEP) were 0.9754, 0.9636 and 151.2951, 39.7720, respectively. The optimized models seem less effective when they were used to predict the samples harvested in other years or countries. However, acceptable results can still be obtained.
Collapse
Affiliation(s)
- Zhenzhen Xia
- Institute of Agricultural Quality Standards and Testing Technology Research, Hubei Academy of Agricultural Science, Wuhan 430064, PR China
| | - Tian Yi
- Institute of Agricultural Quality Standards and Testing Technology Research, Hubei Academy of Agricultural Science, Wuhan 430064, PR China
| | - Yan Liu
- College of Food Science and Engineering, Wuhan Polytechnic University, Wuhan 430023, PR China; Key Laboratory for Deep Processing of Major Grain and Oil (Wuhan Polytechnic University), Ministry of Education, College of Food Science and Engineering, Wuhan Polytechnic University, Wuhan 430023, PR China; Hubei Key Laboratory for Processing and Transformation of Agricultural Products (Wuhan Polytechnic University), College of Food Science and Engineering, Wuhan Polytechnic University, Wuhan 430023, PR China.
| |
Collapse
|
17
|
Discriminating geographic origin of sesame oils and determining lignans by near-infrared spectroscopy combined with chemometric methods. J Food Compost Anal 2019. [DOI: 10.1016/j.jfca.2019.103327] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
|
18
|
Ge X, Sun J, Lu B, Chen Q, Xun W, Jin Y. Classification of oolong tea varieties based on hyperspectral imaging technology and BOSS‐LightGBM model. J FOOD PROCESS ENG 2019. [DOI: 10.1111/jfpe.13289] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Xiao Ge
- School of Electrical and Information EngineeringJiangsu University Zhenjiang China
| | - Jun Sun
- School of Electrical and Information EngineeringJiangsu University Zhenjiang China
| | - Bing Lu
- School of Electrical and Information EngineeringJiangsu University Zhenjiang China
| | - Quansheng Chen
- School of Food and Biological EngineeringJiangsu University Zhenjiang China
| | - Wei Xun
- School of Electrical and Information EngineeringJiangsu University Zhenjiang China
| | - Yanting Jin
- School of Electrical and Information EngineeringJiangsu University Zhenjiang China
| |
Collapse
|
19
|
Xia J, Zhang J, Zhao Y, Huang Y, Xiong Y, Min S. Fourier transform infrared spectroscopy and chemometrics for the discrimination of paper relic types. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2019; 219:8-14. [PMID: 31030050 DOI: 10.1016/j.saa.2018.09.059] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2018] [Revised: 08/27/2018] [Accepted: 09/30/2018] [Indexed: 06/09/2023]
Abstract
The paper relic identification is a pending issue to be resolved in the field of cultural heritage. As we all known, heritage paper has significant importance in archaeological research. Nowadays, there are a variety of research methodologies focuses on the analysis of inks for dating documents. While the paper analysis attained little attention. This work is to explore the non-destructive application of ATR-FTIR technique in discrimination of paper relics. 15 types of paper spectra were collected by ATR-FTIR, which wavenumber range were range from 4000 to 650 cm-1. And the moving average smoothing and normalization was used for pretreatment analysis. Five different classification algorithms, principal component analysis-linear discriminant analysis (PCA-LDA), partial least squares discriminant analysis (PLS-DA), soft independent modeling of class analogy (SIMCA), least squares-support vector machine (LS-SVM), partial least squares-linear discriminant analysis (PLS-LDA) were selected to classify the types of paper. PLS-LDA and LS-SVM are effective techniques with 100% classification accuracy. PCA-LDA, PLS-DA and SIMCA give accuracy of 98.67%, 97.33% and 95.56%, respectively. The present experiment suggested that ATR-FTIR combining with chemometrics will be highly useful in paper identification of cultural heritage.
Collapse
Affiliation(s)
- Jingjing Xia
- College of Science, China Agricultural University, Beijing 100193, PR China
| | - Jixiong Zhang
- College of Science, China Agricultural University, Beijing 100193, PR China
| | - Yueting Zhao
- College of Science, China Agricultural University, Beijing 100193, PR China
| | - Yangming Huang
- College of Science, China Agricultural University, Beijing 100193, PR China
| | - Yanmei Xiong
- College of Science, China Agricultural University, Beijing 100193, PR China.
| | - Shungeng Min
- College of Science, China Agricultural University, Beijing 100193, PR China.
| |
Collapse
|
20
|
Xu B, Chung HY. Quantitative Structure-Activity Relationship Study of Bitter Di-, Tri- and Tetrapeptides Using Integrated Descriptors. Molecules 2019; 24:molecules24152846. [PMID: 31387305 PMCID: PMC6696392 DOI: 10.3390/molecules24152846] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2019] [Revised: 07/23/2019] [Accepted: 08/05/2019] [Indexed: 11/16/2022] Open
Abstract
New quantitative structure–activity relationship (QSAR) models for bitter peptides were built with integrated amino acid descriptors. Datasets contained 48 dipeptides, 52 tripeptides and 23 tetrapeptides with their reported bitter taste thresholds. Independent variables consisted of 14 amino acid descriptor sets. A bootstrapping soft shrinkage approach was utilized for variable selection. The importance of a variable was evaluated by both variable selecting frequency and standardized regression coefficient. Results indicated model qualities for di-, tri- and tetrapeptides with R2 and Q2 at 0.950 ± 0.002, 0.941 ± 0.001; 0.770 ± 0.006, 0.742 ± 0.004; and 0.972 ± 0.002, 0.956 ± 0.002, respectively. The hydrophobic C-terminal amino acid was the key determinant for bitterness in dipeptides, followed by the contribution of bulky hydrophobic N-terminal amino acids. For tripeptides, hydrophobicity of C-terminal amino acids and the electronic properties of the amino acids at the second position were important. For tetrapeptides, bulky hydrophobic amino acids at N-terminus, hydrophobicity and partial specific volume of amino acids at the second position, and the electronic properties of amino acids of the remaining two positions were critical. In summary, this study not only constructs reliable models for predicting the bitterness in different groups of peptides, but also facilitates better understanding of their structure-bitterness relationships and provides insights for their future studies.
Collapse
Affiliation(s)
- Biyang Xu
- Food and Nutritional Sciences Programme, School of Life Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China
| | - Hau Yin Chung
- Food and Nutritional Sciences Programme, School of Life Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China.
| |
Collapse
|
21
|
Yun YH, Li HD, Deng BC, Cao DS. An overview of variable selection methods in multivariate analysis of near-infrared spectra. Trends Analyt Chem 2019. [DOI: 10.1016/j.trac.2019.01.018] [Citation(s) in RCA: 118] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
22
|
Authenticating Raw from Reconstituted Milk Using Fourier Transform Infrared Spectroscopy and Chemometrics. J FOOD QUALITY 2019. [DOI: 10.1155/2019/5487890] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Fourier transform infrared (FTIR) spectroscopy combined with chemometrics was used to authenticate raw milk from their reconstituted counterparts. First, the explanatory principal component analysis (PCA) was employed to visualize the relationship between raw and reconstituted milk samples. However, the degree of separation between two sample classes was not significant according to direct observation of the scores plot, indicating FTIR spectra may contain complicated chemical information. Second, partial least-squares-discriminant analysis (PLS-DA) that incorporate additional class membership information as modelling input was further calculated. The PLS-DA scores yielded clear separation between two classes of samples. Additionally, possible components from the model loading were studied, and the PLS-DA model was validated internally under the model population analysis framework, as well as externally using an independent test set. This study gave insights into the authentication of milk using FTIR spectroscopy with chemometrics techniques.
Collapse
|
23
|
Liu Y, Wang Y, Xia Z, Wang Y, Wu Y, Gong Z. Rapid determination of phytosterols by NIRS and chemometric methods. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2019; 211:336-341. [PMID: 30583164 DOI: 10.1016/j.saa.2018.12.030] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Revised: 12/10/2018] [Accepted: 12/16/2018] [Indexed: 06/09/2023]
Abstract
Phytosterols have been extensively studied because it plays essential roles in the physiology of plants and can be used as nutritional supplement to promote human health. We use a rapid method by coupling near-infrared spectroscopy (NIRS) and chemometric techniques to quickly and efficiently determine three essential phytosterols (β-sitosterol, campesterol and stigmasterol) in vegetable oils. Continuous wavelet transform (CWT) method was adopted to remove the baseline shift in the spectra. The quantitative analysis models were constructed by partial least squares (PLS) regression and randomization test (RT) method was used to further improve the models. The optimized models were used to calculate the phytosterol contents in prediction set in order to evaluate their predictability. We have found that the phytosterol contents obtained by the optimized models and Gas Chromatography/Mass Spectrometry (GC/MS) analysis are almost consistent. The root mean square error of prediction (RMSEP) and ratio of prediction to deviation (RPD) for the three phytosterols are 525.7590, 212.2245, 65.1611 and 4.0060, 4.7195 and 3.5441, respectively. The results have proved the feasibility of the proposed method for rapid and non-destructive analysis of phytosterols in edible oils.
Collapse
Affiliation(s)
- Yan Liu
- College of Food Science and Engineering, Wuhan Polytechnic University, Wuhan 430023, PR China; Key Laboratory for Deep Processing of Major Grain and Oil (Wuhan Polytechnic University), Ministry of Education, College of Food Science and Engineering, Wuhan Polytechnic University, Wuhan 430023, PR China; Hubei Key Laboratory for Processing and Transformation of Agricultural Products (Wuhan Polytechnic University), College of Food Science and Engineering, Wuhan Polytechnic University, Wuhan 430023, PR China.
| | - Yixin Wang
- College of Food Science and Engineering, Wuhan Polytechnic University, Wuhan 430023, PR China
| | - Zhenzhen Xia
- Institute of Agricultural Quality Standards and Testing Technology Research, Hubei Academy of Agricultural Science, Wuhan 430064, PR China
| | - Yingjie Wang
- College of Food Science and Engineering, Wuhan Polytechnic University, Wuhan 430023, PR China
| | - Yongning Wu
- College of Food Science and Engineering, Wuhan Polytechnic University, Wuhan 430023, PR China
| | - Zhiyong Gong
- College of Food Science and Engineering, Wuhan Polytechnic University, Wuhan 430023, PR China; Key Laboratory for Deep Processing of Major Grain and Oil (Wuhan Polytechnic University), Ministry of Education, College of Food Science and Engineering, Wuhan Polytechnic University, Wuhan 430023, PR China; Hubei Key Laboratory for Processing and Transformation of Agricultural Products (Wuhan Polytechnic University), College of Food Science and Engineering, Wuhan Polytechnic University, Wuhan 430023, PR China
| |
Collapse
|
24
|
Yan H, Song X, Tian K, Gao J, Li Q, Xiong Y, Min S. A modification of the bootstrapping soft shrinkage approach for spectral variable selection in the issue of over-fitting, model accuracy and variable selection credibility. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2019; 210:362-371. [PMID: 30502724 DOI: 10.1016/j.saa.2018.10.034] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/13/2018] [Revised: 10/04/2018] [Accepted: 10/20/2018] [Indexed: 06/09/2023]
Abstract
In this study, we proposed a new computational method stabilized bootstrapping soft shrinkage approach (SBOSS) for variable selection based on bootstrapping soft shrinkage approach (BOSS) which can enhance the analysis of chemical interest from the massive variables among the overlapped absorption bands. In SBOSS, variable is selected by the index of stability of regression coefficients instead of regression coefficients absolute value. In each loop, a weighted bootstrap sampling (WBS) is applied to generate sub-models, according to the weights update by conducting model population analysis (MPA) on the stability of regression coefficients (RC) of these sub-models. Finally, the subset with the lowest RMSECV is chosen to be the optimal variable set. The performance of the SBOSS was evaluated by one simulated dataset and three NIR datasets. The results show that SBOSS can select the fewer variables and supply the least RMSEP and latent variable number of the PLS model with the best stability comparing with methods of Monte Carlo uninformative variables elimination (MCUVE), genetic algorithm (GA), competitive reweighted sampling (CARS), stability of competitive adaptive reweighted sampling (SCARS) and BOSS.
Collapse
Affiliation(s)
- Hong Yan
- College of Science, China Agricultural University, Beijing 100193, PR China
| | - Xiangzhong Song
- College of Science, China Agricultural University, Beijing 100193, PR China
| | - Kuangda Tian
- College of Science, China Agricultural University, Beijing 100193, PR China
| | - Jingxian Gao
- College of Science, China Agricultural University, Beijing 100193, PR China
| | - Qianqian Li
- School of Marine Science, China University of Geoscience, Beijing 100083, PR China
| | - Yanmei Xiong
- College of Science, China Agricultural University, Beijing 100193, PR China.
| | - Shungeng Min
- College of Science, China Agricultural University, Beijing 100193, PR China.
| |
Collapse
|
25
|
Li M, He S, Wang J, Liu Z, Xie GH. An NIRS-based assay of chemical composition and biomass digestibility for rapid selection of Jerusalem artichoke clones. BIOTECHNOLOGY FOR BIOFUELS 2018; 11:334. [PMID: 30574187 PMCID: PMC6299672 DOI: 10.1186/s13068-018-1335-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/19/2018] [Accepted: 12/10/2018] [Indexed: 06/09/2023]
Abstract
BACKGROUND High-throughput evaluation of lignocellulosic biomass feedstock quality is the key to the successful commercialization of bioethanol production. Currently, wet chemical methods for the determination of chemical composition and biomass digestibility are expensive and time-consuming, thus hindering comprehensive feedstock quality assessments based on these biomass specifications. To find the ideal bioethanol feedstock, we perform a near-infrared spectroscopic (NIRS) assay to rapidly and comprehensively analyze the chemical composition and biomass digestibility of 59 Jerusalem artichoke (Helianthus tuberosus L., abbreviated JA) clones collected from 24 provinces in six regions of China. RESULTS The distinct geographical distribution of JA accessions generated varied chemical composition as well as related biomass digestibility (after soluble sugars extraction and mild alkali pretreatment). Notably, the soluble sugars, cellulose, hemicellulose, lignin, ash, and released hexoses, pentoses, and total carbohydrates were rapidly and perfectly predicted by partial least squares regression coupled with model population analyses (MPA), which exhibited significantly higher predictive performance than controls. Subsequently, grey relational grade analysis was employed to correlate chemical composition and biomass digestibility with feedstock quality score (FQS), resulting in the assignment of tested JA clones to five feedstock quality grades (FQGs). Ultimately, the FQGs of JA clones were successfully classified using partial least squares-discriminant analysis model coupled with MPA, attaining a significantly higher correct rate of 97.8% in the calibration subset and 91.1% in the validation subset. CONCLUSIONS Based on the diversity of JA clones, the present study has not only rapidly and precisely examined the biomass composition and digestibility with MPA-optimized NIRS models but has also selected the ideal JA clones according to FQS. This method provides a new insight into the selection of ideal bioethanol feedstock for high-efficiency bioethanol production.
Collapse
Affiliation(s)
- Meng Li
- College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100193 China
- National Energy R&D Center for Non-food Biomass, China Agricultural University, Beijing, 100193 China
| | - Siyang He
- College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100193 China
- National Energy R&D Center for Non-food Biomass, China Agricultural University, Beijing, 100193 China
| | - Jun Wang
- College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100193 China
- National Energy R&D Center for Non-food Biomass, China Agricultural University, Beijing, 100193 China
| | - Zuxin Liu
- Chinese Academy of Agricultural Engineering Planning and Design, Beijing, 100125 China
| | - Guang Hui Xie
- College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100193 China
- National Energy R&D Center for Non-food Biomass, China Agricultural University, Beijing, 100193 China
| |
Collapse
|
26
|
Yuan T, Zhao Y, Zhang J, Wang Y. Application of variable selection in the origin discrimination of Wolfiporia cocos (F.A. Wolf) Ryvarden & Gilb. based on near infrared spectroscopy. Sci Rep 2018; 8:89. [PMID: 29311739 PMCID: PMC5758700 DOI: 10.1038/s41598-017-18458-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2017] [Accepted: 12/12/2017] [Indexed: 02/08/2023] Open
Abstract
Dried sclerotium of Wolfiporia cocos (F.A. Wolf) Ryvarden & Gilb. is a traditional Chinese medicine. Its chemical components showed difference among geographical origins, which made it difficult to keep therapeutic potency consistent. The identification of the geographical origin of W. cocos is the fundamental prerequisite for its worldwide recognition and acceptance. Four variable selection methods were employed for near infrared spectroscopy (NIR) variable selection and the characteristic variables were screened for the establishment of Fisher function models in further identification of the origin of W. cocos from Yunnan, China. For the obvious differences between poriae cutis (fu-ling-pi in Chinese, or FLP) and the inner part (bai-fu-ling in Chinese, or BFL) of the sclerotia of W. cocos in the pattern space of principal component analysis (PCA), we established discriminant models for FLP and BFL separately. Through variable selection, the models were significant improved and also the models were simplified by using only a small part of the variables. The characteristic variables were screened (13 for BFL and 10 for FLP) to build Fisher discriminant function models and the validation results showed the models were reliable and effective. Additionally, the characteristic variables were interpreted.
Collapse
Affiliation(s)
- Tianjun Yuan
- Institute of Medicinal Plants, Yunnan Academy of Agricultural Sciences, Kunming, 650200, China.,Yunnan Comtestor CO., LTD., Kunming, 650106, China
| | - Yanli Zhao
- Institute of Medicinal Plants, Yunnan Academy of Agricultural Sciences, Kunming, 650200, China
| | - Ji Zhang
- Institute of Medicinal Plants, Yunnan Academy of Agricultural Sciences, Kunming, 650200, China
| | - Yuanzhong Wang
- Institute of Medicinal Plants, Yunnan Academy of Agricultural Sciences, Kunming, 650200, China.
| |
Collapse
|
27
|
Brownfield B, Kalivas JH. Consensus Outlier Detection Using Sum of Ranking Differences of Common and New Outlier Measures Without Tuning Parameter Selections. Anal Chem 2017; 89:5087-5094. [DOI: 10.1021/acs.analchem.7b00637] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Brett Brownfield
- Department of Chemistry, Idaho State University, Pocatello, Idaho 83209, United States
| | - John H. Kalivas
- Department of Chemistry, Idaho State University, Pocatello, Idaho 83209, United States
| |
Collapse
|
28
|
Deng BC, Yun YH, Cao DS, Yin YL, Wang WT, Lu HM, Luo QY, Liang YZ. A bootstrapping soft shrinkage approach for variable selection in chemical modeling. Anal Chim Acta 2016; 908:63-74. [PMID: 26826688 DOI: 10.1016/j.aca.2016.01.001] [Citation(s) in RCA: 86] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2015] [Revised: 12/14/2015] [Accepted: 01/04/2016] [Indexed: 10/22/2022]
Abstract
In this study, a new variable selection method called bootstrapping soft shrinkage (BOSS) method is developed. It is derived from the idea of weighted bootstrap sampling (WBS) and model population analysis (MPA). The weights of variables are determined based on the absolute values of regression coefficients. WBS is applied according to the weights to generate sub-models and MPA is used to analyze the sub-models to update weights for variables. The optimization procedure follows the rule of soft shrinkage, in which less important variables are not eliminated directly but are assigned smaller weights. The algorithm runs iteratively and terminates until the number of variables reaches one. The optimal variable set with the lowest root mean squared error of cross-validation (RMSECV) is selected. The method was tested on three groups of near infrared (NIR) spectroscopic datasets, i.e. corn datasets, diesel fuels datasets and soy datasets. Three high performing variable selection methods, i.e. Monte Carlo uninformative variable elimination (MCUVE), competitive adaptive reweighted sampling (CARS) and genetic algorithm partial least squares (GA-PLS) are used for comparison. The results show that BOSS is promising with improved prediction performance. The Matlab codes for implementing BOSS are freely available on the website: http://www.mathworks.com/matlabcentral/fileexchange/52770-boss.
Collapse
Affiliation(s)
- Bai-Chuan Deng
- College of Animal Science, South China Agricultural University, Guangzhou 510642, PR China; School of Chemistry and Chemical Engineering, Central South University, Changsha 410083, PR China; Institute of Subtropical Agriculture, Chinese Academy of Sciences, Changsha 410125, PR China
| | - Yong-Huan Yun
- School of Chemistry and Chemical Engineering, Central South University, Changsha 410083, PR China
| | - Dong-Sheng Cao
- School of Pharmaceutical Sciences, Central South University, Changsha 410083, PR China.
| | - Yu-Long Yin
- College of Animal Science, South China Agricultural University, Guangzhou 510642, PR China; Institute of Subtropical Agriculture, Chinese Academy of Sciences, Changsha 410125, PR China
| | - Wei-Ting Wang
- School of Chemistry and Chemical Engineering, Central South University, Changsha 410083, PR China
| | - Hong-Mei Lu
- School of Chemistry and Chemical Engineering, Central South University, Changsha 410083, PR China
| | - Qian-Yi Luo
- School of Chemistry and Chemical Engineering, Central South University, Changsha 410083, PR China
| | - Yi-Zeng Liang
- School of Chemistry and Chemical Engineering, Central South University, Changsha 410083, PR China.
| |
Collapse
|
29
|
Wen M, Deng BC, Cao DS, Yun YH, Yang RH, Lu HM, Liang YZ. The model adaptive space shrinkage (MASS) approach: a new method for simultaneous variable selection and outlier detection based on model population analysis. Analyst 2016; 141:5586-97. [DOI: 10.1039/c6an00764c] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Variable selection and outlier detection are important processes in chemical modeling.
Collapse
Affiliation(s)
- Ming Wen
- School of Pharmaceutical Sciences
- Central South University
- Changsha 410013
- PR China
- College of Chemistry and Chemical Engineering
| | - Bai-Chuan Deng
- College of Animal Science
- South China Agricultural University
- Guangzhou 510642
- P.R. China
| | - Dong-Sheng Cao
- School of Pharmaceutical Sciences
- Central South University
- Changsha 410013
- PR China
| | - Yong-Huan Yun
- College of Chemistry and Chemical Engineering
- Central South University
- Changsha 410083
- PR China
| | - Rui-Han Yang
- College of Chemistry and Chemical Engineering
- Central South University
- Changsha 410083
- PR China
| | - Hong-Mei Lu
- College of Chemistry and Chemical Engineering
- Central South University
- Changsha 410083
- PR China
| | - Yi-Zeng Liang
- College of Chemistry and Chemical Engineering
- Central South University
- Changsha 410083
- PR China
| |
Collapse
|
30
|
Deng BC, Yun YH, Liang YZ, Yi LZ. A novel variable selection approach that iteratively optimizes variable space using weighted binary matrix sampling. Analyst 2015; 139:4836-45. [PMID: 25083512 DOI: 10.1039/c4an00730a] [Citation(s) in RCA: 74] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
In this study, a new optimization algorithm called the Variable Iterative Space Shrinkage Approach (VISSA) that is based on the idea of model population analysis (MPA) is proposed for variable selection. Unlike most of the existing optimization methods for variable selection, VISSA statistically evaluates the performance of variable space in each step of optimization. Weighted binary matrix sampling (WBMS) is proposed to generate sub-models that span the variable subspace. Two rules are highlighted during the optimization procedure. First, the variable space shrinks in each step. Second, the new variable space outperforms the previous one. The second rule, which is rarely satisfied in most of the existing methods, is the core of the VISSA strategy. Compared with some promising variable selection methods such as competitive adaptive reweighted sampling (CARS), Monte Carlo uninformative variable elimination (MCUVE) and iteratively retaining informative variables (IRIV), VISSA showed better prediction ability for the calibration of NIR data. In addition, VISSA is user-friendly; only a few insensitive parameters are needed, and the program terminates automatically without any additional conditions. The Matlab codes for implementing VISSA are freely available on the website: https://sourceforge.net/projects/multivariateanalysis/files/VISSA/.
Collapse
Affiliation(s)
- Bai-chuan Deng
- Department of Chemistry, University of Bergen, Bergen N-5007, Norway
| | | | | | | |
Collapse
|
31
|
Deng BC, Yun YH, Liang YZ, Cao DS, Xu QS, Yi LZ, Huang X. A new strategy to prevent over-fitting in partial least squares models based on model population analysis. Anal Chim Acta 2015; 880:32-41. [PMID: 26092335 DOI: 10.1016/j.aca.2015.04.045] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2014] [Revised: 04/11/2015] [Accepted: 04/23/2015] [Indexed: 11/28/2022]
Abstract
Partial least squares (PLS) is one of the most widely used methods for chemical modeling. However, like many other parameter tunable methods, it has strong tendency of over-fitting. Thus, a crucial step in PLS model building is to select the optimal number of latent variables (nLVs). Cross-validation (CV) is the most popular method for PLS model selection because it selects a model from the perspective of prediction ability. However, a clear minimum of prediction errors may not be obtained in CV which makes the model selection difficult. To solve the problem, we proposed a new strategy for PLS model selection which combines the cross-validated coefficient of determination (Qcv(2)) and model stability (S). S is defined as the stability of PLS regression vectors which is obtained using model population analysis (MPA). The results show that, when a clear maximum of Qcv(2) is not obtained, S can provide additional information of over-fitting and it helps in finding the optimal nLVs. Compared with other regression vector based indictors such as the Euclidean 2-norm (B2), the Durbin Watson statistic (DW) and the jaggedness (J), S is more sensitive to over-fitting. The model selected by our method has both good prediction ability and stability.
Collapse
Affiliation(s)
- Bai-Chuan Deng
- Department of Chemistry, University of Bergen, Bergen N-5007, Norway; School of Chemistry and Chemical Engineering, Central South University, Changsha 410083, PR China
| | - Yong-Huan Yun
- School of Chemistry and Chemical Engineering, Central South University, Changsha 410083, PR China
| | - Yi-Zeng Liang
- School of Chemistry and Chemical Engineering, Central South University, Changsha 410083, PR China.
| | - Dong-Sheng Cao
- School of Pharmaceutical Sciences, Central South University, Changsha 410083, PR China.
| | - Qing-Song Xu
- School of Mathematics and Statistics, Central South University, Changsha 410083, PR China
| | - Lun-Zhao Yi
- Yunnan Food Safety Research Institute, Kunming University of Science and Technology, Kunming 650500, PR China
| | - Xin Huang
- School of Chemistry and Chemical Engineering, Central South University, Changsha 410083, PR China
| |
Collapse
|
32
|
Deng BC, Yun YH, Ma P, Lin CC, Ren DB, Liang YZ. A new method for wavelength interval selection that intelligently optimizes the locations, widths and combinations of the intervals. Analyst 2015; 140:1876-85. [PMID: 25665981 DOI: 10.1039/c4an02123a] [Citation(s) in RCA: 59] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
In this study, a new algorithm for wavelength interval selection, known as interval variable iterative space shrinkage approach (iVISSA), is proposed based on the VISSA algorithm. It combines global and local searches to iteratively and intelligently optimize the locations, widths and combinations of the spectral intervals. In the global search procedure, it inherits the merit of soft shrinkage from VISSA to search the locations and combinations of informative wavelengths, whereas in the local search procedure, it utilizes the information of continuity in spectroscopic data to determine the widths of wavelength intervals. The global and local search procedures are carried out alternatively to realize wavelength interval selection. This method was tested using three near infrared (NIR) datasets. Some high-performing wavelength selection methods, such as synergy interval partial least squares (siPLS), moving window partial least squares (MW-PLS), competitive adaptive reweighted sampling (CARS), genetic algorithm PLS (GA-PLS) and interval random frog (iRF), were used for comparison. The results show that the proposed method is very promising with good results both on prediction capability and stability. The MATLAB codes for implementing iVISSA are freely available on the website: .
Collapse
Affiliation(s)
- Bai-Chuan Deng
- Department of Chemistry, University of Bergen, Bergen N-5007, Norway
| | | | | | | | | | | |
Collapse
|
33
|
Yun YH, Wang WT, Deng BC, Lai GB, Liu XB, Ren DB, Liang YZ, Fan W, Xu QS. Using variable combination population analysis for variable selection in multivariate calibration. Anal Chim Acta 2014; 862:14-23. [PMID: 25682424 DOI: 10.1016/j.aca.2014.12.048] [Citation(s) in RCA: 95] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2014] [Revised: 11/11/2014] [Accepted: 12/26/2014] [Indexed: 11/30/2022]
Abstract
Variable (wavelength or feature) selection techniques have become a critical step for the analysis of datasets with high number of variables and relatively few samples. In this study, a novel variable selection strategy, variable combination population analysis (VCPA), was proposed. This strategy consists of two crucial procedures. First, the exponentially decreasing function (EDF), which is the simple and effective principle of 'survival of the fittest' from Darwin's natural evolution theory, is employed to determine the number of variables to keep and continuously shrink the variable space. Second, in each EDF run, binary matrix sampling (BMS) strategy that gives each variable the same chance to be selected and generates different variable combinations, is used to produce a population of subsets to construct a population of sub-models. Then, model population analysis (MPA) is employed to find the variable subsets with the lower root mean squares error of cross validation (RMSECV). The frequency of each variable appearing in the best 10% sub-models is computed. The higher the frequency is, the more important the variable is. The performance of the proposed procedure was investigated using three real NIR datasets. The results indicate that VCPA is a good variable selection strategy when compared with four high performing variable selection methods: genetic algorithm-partial least squares (GA-PLS), Monte Carlo uninformative variable elimination by PLS (MC-UVE-PLS), competitive adaptive reweighted sampling (CARS) and iteratively retains informative variables (IRIV). The MATLAB source code of VCPA is available for academic research on the website: http://www.mathworks.com/matlabcentral/fileexchange/authors/498750.
Collapse
Affiliation(s)
- Yong-Huan Yun
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, PR China
| | - Wei-Ting Wang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, PR China
| | - Bai-Chuan Deng
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, PR China; Department of Chemistry, University of Bergen, Bergen N-5007, Norway
| | - Guang-Bi Lai
- Heilongjiang University of Chinese Medicine, Heilongjiang, Ha'erbin 150040, PR China
| | - Xin-bo Liu
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, PR China
| | - Da-Bing Ren
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, PR China
| | - Yi-Zeng Liang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, PR China.
| | - Wei Fan
- Joint Lab for Biological Quality and Safety, College of Bioscience and Biotechnology, Hunan Agriculture University, Changsha 410128, PR China.
| | - Qing-Song Xu
- School of Mathematics and Statistics, Central South University, Changsha 410083, PR China
| |
Collapse
|
34
|
Zheng K, Hu H, Tong P, Du Y. Ensemble Regression Coefficient Analysis for Application to Near-Infrared Spectroscopy. ANAL LETT 2014. [DOI: 10.1080/00032719.2014.900776] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
35
|
Yan J, Zhu WW, Kong B, Lu HB, Yun YH, Huang JH, Liang YZ. A Combinational Strategy of Model Disturbance and Outlier Comparison to Define Applicability Domain in Quantitative Structural Activity Relationship. Mol Inform 2014; 33:503-13. [PMID: 27486037 DOI: 10.1002/minf.201300161] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2013] [Accepted: 04/16/2014] [Indexed: 01/21/2023]
Abstract
In order to define an applicability domain for quantitative structure-activity relationship modeling, a combinational strategy of model disturbance and outlier comparison is developed. An indicator named model disturbance index was defined to estimate the prediction error. Moreover, the information of the outliers in the training set was used to filter the unreliable samples in the test set based on "structural similarity". Chromatography retention indices data were used to investigate this approach. The relationship between model disturbance index and prediction error can be found. Also, the comparison between the outlier set and the test set could provide additional information about which unknown samples should be paid more attentions. A novel technique based on model population analysis was used to evaluate the validity of applicability domain. Finally, three commonly used methods, i.e. Leverage, descriptor range-based and model perturbation method, were compared with the proposed approach.
Collapse
Affiliation(s)
- Jun Yan
- Research Center of Modernization of Traditional Chinese Medicine, Central South University, Changsha 410083, P. R. China tel: +86 731 8830831; fax: +86 731 8830831
| | - Wei-Wei Zhu
- Department of Chemical and Bioscience, HeChi University, YiZhou 546300, P. R. China
| | - Bo Kong
- Technology Center of China Tobacco Hunan Industrial Co., LTD, Changsha 410014, P. R. China
| | - Hong-Bing Lu
- Technology Center of China Tobacco Hunan Industrial Co., LTD, Changsha 410014, P. R. China
| | - Yong-Huan Yun
- Research Center of Modernization of Traditional Chinese Medicine, Central South University, Changsha 410083, P. R. China tel: +86 731 8830831; fax: +86 731 8830831
| | - Jian-Hua Huang
- Research Center of Modernization of Traditional Chinese Medicine, Central South University, Changsha 410083, P. R. China tel: +86 731 8830831; fax: +86 731 8830831
| | - Yi-Zeng Liang
- Research Center of Modernization of Traditional Chinese Medicine, Central South University, Changsha 410083, P. R. China tel: +86 731 8830831; fax: +86 731 8830831.
| |
Collapse
|
36
|
Yun YH, Wang WT, Tan ML, Liang YZ, Li HD, Cao DS, Lu HM, Xu QS. A strategy that iteratively retains informative variables for selecting optimal variable subset in multivariate calibration. Anal Chim Acta 2014; 807:36-43. [DOI: 10.1016/j.aca.2013.11.032] [Citation(s) in RCA: 133] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2013] [Revised: 11/13/2013] [Accepted: 11/14/2013] [Indexed: 11/12/2022]
|
37
|
Tang G, Tian K, Song X, Xiong Y, Min S. Comparison of several supervised pattern recognition techniques for detecting additive methamidophos in rotenone preparation by near-infrared spectroscopy. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2013; 121:678-684. [PMID: 24368288 DOI: 10.1016/j.saa.2013.11.104] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/18/2013] [Revised: 11/08/2013] [Accepted: 11/20/2013] [Indexed: 06/03/2023]
Abstract
In this paper, different supervised pattern recognition methods have been applied to detect the manually additive methamidophos in rotenone preparation. The aim of this paper was to examine the performances of different supervised pattern recognition techniques: soft independent modeling of class analogy (SIMCA), partial least squares discriminant analysis (PLS-DA), artificial neutral networks (ANN), and support vector machine (SVM). The results obtained show that SVM is the most effective techniques with 100.0% classification accuracy followed by ANN, PLS-DA and with the accuracy of 97.5% and 93.3% respectively while SIMCA yields the poorest result of 85.8%. We hope that the results obtained in this study will help both further chemometric investigations and investigations in the sphere of applied vibrational spectroscopy of sophisticated multicomponent systems. Furthermore, the use of portable instrument and satisfactory classification also indicated the possibility of detecting illicit-addition at scene by near-infrared (NIR) spectroscopy which makes a great sense in pesticide quality control.
Collapse
Affiliation(s)
- Guo Tang
- College of Science, China Agricultural University, Beijing 100193, PR China
| | - Kuangda Tian
- College of Science, China Agricultural University, Beijing 100193, PR China
| | - Xiangzhong Song
- College of Science, China Agricultural University, Beijing 100193, PR China
| | - Yanmei Xiong
- College of Science, China Agricultural University, Beijing 100193, PR China
| | - Shungeng Min
- College of Science, China Agricultural University, Beijing 100193, PR China.
| |
Collapse
|
38
|
Huang JH, Yan J, Wu QH, Duarte Ferro M, Yi LZ, Lu HM, Xu QS, Liang YZ. Selective of informative metabolites using random forests based on model population analysis. Talanta 2013; 117:549-55. [PMID: 24209380 DOI: 10.1016/j.talanta.2013.07.070] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2013] [Revised: 07/22/2013] [Accepted: 07/27/2013] [Indexed: 01/31/2023]
Abstract
One of the main goals of metabolomics studies is to discover informative metabolites or biomarkers, which may be used to diagnose diseases and to find out pathology. Sophisticated feature selection approaches are required to extract the information hidden in such complex 'omics' data. In this study, it is proposed a new and robust selective method by combining random forests (RF) with model population analysis (MPA), for selecting informative metabolites from three metabolomic datasets. According to the contribution to the classification accuracy, the metabolites were classified into three kinds: informative, no-informative, and interfering metabolites. Based on the proposed method, some informative metabolites were selected for three datasets; further analyses of these metabolites between healthy and diseased groups were then performed, showing by T-test that the P values for all these selected metabolites were lower than 0.05. Moreover, the informative metabolites identified by the current method were demonstrated to be correlated with the clinical outcome under investigation. The source codes of MPA-RF in Matlab can be freely downloaded from http://code.google.com/p/my-research-list/downloads/list.
Collapse
Affiliation(s)
- Jian-Hua Huang
- Research Center of Modernization of Traditional Chinese Medicines, Central South University, Changsha 410083, PR China.
| | | | | | | | | | | | | | | |
Collapse
|
39
|
Yan J, Huang JH, He M, Lu HB, Yang R, Kong B, Xu QS, Liang YZ. Prediction of retention indices for frequently reported compounds of plant essential oils using multiple linear regression, partial least squares, and support vector machine. J Sep Sci 2013; 36:2464-71. [DOI: 10.1002/jssc.201300254] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2013] [Revised: 05/08/2013] [Accepted: 05/11/2013] [Indexed: 11/06/2022]
Affiliation(s)
- Jun Yan
- Research Center of Modernization of Traditional Chinese Medicine; Central South University; Changsha P.R. China
| | - Jian-Hua Huang
- Research Center of Modernization of Traditional Chinese Medicine; Central South University; Changsha P.R. China
| | - Min He
- Research Center of Modernization of Traditional Chinese Medicine; Central South University; Changsha P.R. China
| | - Hong-Bing Lu
- Technology Center of China Tobacco Hunan Industrial Co; Changsha P. R. China
| | - Rui Yang
- Research Center of Modernization of Traditional Chinese Medicine; Central South University; Changsha P.R. China
| | - Bo Kong
- Technology Center of China Tobacco Hunan Industrial Co; Changsha P. R. China
| | - Qing-Song Xu
- School of Mathematical Sciences and Computing Technology; Central South University; Changsha P. R. China
| | - Yi-Zeng Liang
- Research Center of Modernization of Traditional Chinese Medicine; Central South University; Changsha P.R. China
| |
Collapse
|
40
|
Yun YH, Li HD, Wood LRE, Fan W, Wang JJ, Cao DS, Xu QS, Liang YZ. An efficient method of wavelength interval selection based on random frog for multivariate spectral calibration. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2013; 111:31-6. [PMID: 23602956 DOI: 10.1016/j.saa.2013.03.083] [Citation(s) in RCA: 72] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2013] [Revised: 03/08/2013] [Accepted: 03/16/2013] [Indexed: 05/16/2023]
Abstract
Wavelength selection is a critical step for producing better prediction performance when applied to spectral data. Considering the fact that the vibrational and rotational spectra have continuous features of spectral bands, we propose a novel method of wavelength interval selection based on random frog, called interval random frog (iRF). To obtain all the possible continuous intervals, spectra are first divided into intervals by moving window of a fix width over the whole spectra. These overlapping intervals are ranked applying random frog coupled with PLS and the optimal ones are chosen. This method has been applied to two near-infrared spectral datasets displaying higher efficiency in wavelength interval selection than others. The source code of iRF can be freely downloaded for academy research at the website: http://code.google.com/p/multivariate-calibration/downloads/list.
Collapse
Affiliation(s)
- Yong-Huan Yun
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, PR China
| | | | | | | | | | | | | | | |
Collapse
|
41
|
Classification of Green and Black Teas by PCA and SVM Analysis of Cyclic Voltammetric Signals from Metallic Oxide-Modified Electrode. FOOD ANAL METHOD 2013. [DOI: 10.1007/s12161-013-9649-x] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
42
|
Yun YH, Liang YZ, Xie GX, Li HD, Cao DS, Xu QS. A perspective demonstration on the importance of variable selection in inverse calibration for complex analytical systems. Analyst 2013; 138:6412-21. [DOI: 10.1039/c3an00714f] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
43
|
QSRR Study on Flavor Compounds of Diverse Structures on Different Columns with the Help of New Chemometric Methods. Chromatographia 2012. [DOI: 10.1007/s10337-012-2349-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|