1
|
Lee G, Lee K. Feature selection using distributions of orthogonal PLS regression vectors in spectral data. BioData Min 2021; 14:7. [PMID: 33482872 PMCID: PMC7821640 DOI: 10.1186/s13040-021-00240-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2020] [Accepted: 01/10/2021] [Indexed: 12/31/2022] Open
Abstract
Feature selection, which is important for successful analysis of chemometric data, aims to produce parsimonious and predictive models. Partial least squares (PLS) regression is one of the main methods in chemometrics for analyzing multivariate data with input X and response Y by modeling the covariance structure in the X and Y spaces. Recently, orthogonal projections to latent structures (OPLS) has been widely used in processing multivariate data because OPLS improves the interpretability of PLS models by removing systematic variation in the X space not correlated to Y. The purpose of this paper is to present a feature selection method of multivariate data through orthogonal PLS regression (OPLSR), which combines orthogonal signal correction with PLS. The presented method generates empirical distributions of features effects upon Y in OPLSR vectors via permutation tests and examines the significance of the effects of the input features on Y. We show the performance of the proposed method using a simulation study in which a three-layer network structure exists in compared with the false discovery rate method. To demonstrate this method, we apply it to both real-life NIR spectra data and mass spectrometry data.
Collapse
Affiliation(s)
- Geonseok Lee
- Industrial Engineering, Hanyang University, Seoul, Korea
| | - Kichun Lee
- Industrial Engineering, Hanyang University, Seoul, Korea.
| |
Collapse
|
2
|
Abstract
As Industry 4.0 makes its course into the Chemical Processing Industry (CPI), new challenges emerge that require an adaptation of the Process Analytics toolkit. In particular, two recurring classes of problems arise, motivated by the growing complexity of systems on one hand, and increasing data throughput (i.e., the product of two well-known “V’s” from Big Data: Volume × Velocity) on the other. More specifically, as enabling IT technologies (IoT, smart sensors, etc.) enlarge the focus of analysis from the unit level to the entire plant or even to the supply chain level, the existence of relevant dynamics at multiple scales becomes a common pattern; therefore, multiscale methods are called for and must be applied in order to avoid biased analysis towards a certain scale, compromising the benefits from the balanced exploitation of the information content at all scales. Also, these same enabling technologies currently collect large volumes of data at high-sampling rates, creating a flood of digital information that needs to be properly handled; optimal data aggregation provides an efficient solution to this challenge, leading to the emergence of multi-granularity frameworks. In this article, an overview is presented on multiscale and multi-granularity methods that are likely to play an important role in the future of Process Analytics with respect to several common activities, such as data integration/fusion, de-noising, process monitoring and predictive modelling, among others.
Collapse
|
3
|
Rato TJ, Reis MS. Building Optimal Multiresolution Soft Sensors for Continuous Processes. Ind Eng Chem Res 2018. [DOI: 10.1021/acs.iecr.7b04623] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Tiago J. Rato
- CIEPQPF, Department of Chemical Engineering, University of Coimbra, Rua Sílvio Lima, 3030-790, Coimbra, Portugal
| | - Marco S. Reis
- CIEPQPF, Department of Chemical Engineering, University of Coimbra, Rua Sílvio Lima, 3030-790, Coimbra, Portugal
| |
Collapse
|
4
|
Chemometric Methods for Classification and Feature Selection. COMPREHENSIVE ANALYTICAL CHEMISTRY 2018. [DOI: 10.1016/bs.coac.2018.08.006] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
|
5
|
Application of Long-Wave Near Infrared Hyperspectral Imaging for Measurement of Soluble Solid Content (SSC) in Pear. FOOD ANAL METHOD 2016. [DOI: 10.1007/s12161-016-0498-2] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
6
|
Villa-Vialaneix N, Hernández N, Paris A, Domange C, Priymenko N, Besse P. On Combining Wavelets Expansion and Sparse Linear Models for Regression on Metabolomic Data and Biomarker Selection. COMMUN STAT-SIMUL C 2016. [DOI: 10.1080/03610918.2013.862273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
7
|
Variable Selection in Visible and Near-Infrared Spectral Analysis for Noninvasive Determination of Soluble Solids Content of ‘Ya’ Pear. FOOD ANAL METHOD 2014. [DOI: 10.1007/s12161-014-9832-8] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
8
|
Green method based on a flow-batch analyzer system for the simultaneous determination of ciprofloxacin and dexamethasone in pharmaceuticals using a chemometric approach. Talanta 2013; 115:314-22. [DOI: 10.1016/j.talanta.2013.05.002] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2013] [Revised: 04/29/2013] [Accepted: 05/01/2013] [Indexed: 11/23/2022]
|
9
|
Cetó X, Céspedes F, del Valle M. Comparison of methods for the processing of voltammetric electronic tongues data. Mikrochim Acta 2013. [DOI: 10.1007/s00604-012-0938-7] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
10
|
Alsberg BK, Kell DB, Goodacre R. Variable selection in discriminant partial least-squares analysis. Anal Chem 2012; 70:4126-33. [PMID: 21651249 DOI: 10.1021/ac980506o] [Citation(s) in RCA: 96] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Variable selection enhances the understanding and interpretability of multivariate classification models. A new chemometric method based on the selection of the most important variables in discriminant partial least-squares (VS-DPLS) analysis is described. The suggested method is a simple extension of DPLS where a small number of elements in the weight vector w is retained for each factor. The optimal number of DPLS factors is determined by cross-validation. The new algorithm is applied to four different high-dimensional spectral data sets with excellent results. Spectral profiles from Fourier transform infrared spectroscopy and pyrolysis mass spectrometry are used. To investigate the uniqueness of the selected variables an iterative VS-DPLS procedure is performed. At each iteration, the previously found selected variables are removed to see if a new VS-DPLS classification model can be constructed using a different set of variables. In this manner, it is possible to determine regions rather than individual variables that are important for a successful classification.
Collapse
Affiliation(s)
- B K Alsberg
- Institute of Biological Sciences, Cledwyn Building, University of Wales, Aberystwyth, Ceredigion, SY23 3DD, United Kingdom
| | | | | |
Collapse
|
11
|
Tom JA, Sinsheimer JS, Suchard MA. Does history repeat itself? Wavelets and the phylodynamics of influenza A. Mol Biol Evol 2011; 29:1367-77. [PMID: 22160768 DOI: 10.1093/molbev/msr305] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Unprecedented global surveillance of viruses will result in massive sequence data sets that require new statistical methods. These data sets press the limits of Bayesian phylogenetics as the high-dimensional parameters that comprise a phylogenetic tree increase the already sizable computational burden of these techniques. This burden often results in partitioning the data set, for example, by gene, and inferring the evolutionary dynamics of each partition independently, a compromise that results in stratified analyses that depend only on data within a given partition. However, parameter estimates inferred from these stratified models are likely strongly correlated, considering they rely on data from a single data set. To overcome this shortfall, we exploit the existing Monte Carlo realizations from stratified Bayesian analyses to efficiently estimate a nonparametric hierarchical wavelet-based model and learn about the time-varying parameters of effective population size that reflect levels of genetic diversity across all partitions simultaneously. Our methods are applied to complete genome influenza A sequences that span 13 years. We find that broad peaks and trends, as opposed to seasonal spikes, in the effective population size history distinguish individual segments from the complete genome. We also address hypotheses regarding intersegment dynamics within a formal statistical framework that accounts for correlation between segment-specific parameters.
Collapse
Affiliation(s)
- Jennifer A Tom
- Department of Biostatistics, School of Public Health, University of California, Los Angeles, CA, USA.
| | | | | |
Collapse
|
12
|
Comparing different means of signal treatment for improving the detection power in HPLC-ICP-MS. Anal Bioanal Chem 2011; 403:1109-16. [DOI: 10.1007/s00216-011-5571-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2011] [Revised: 11/09/2011] [Accepted: 11/09/2011] [Indexed: 10/15/2022]
|
13
|
Ensemble wavelet modelling for determination of wheat and gasoline properties by near and middle infrared spectroscopy. Anal Chim Acta 2010; 682:37-47. [DOI: 10.1016/j.aca.2010.09.039] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2009] [Revised: 09/25/2010] [Accepted: 09/27/2010] [Indexed: 11/18/2022]
|
14
|
Acebal CC, Grünhut M, Lista AG, Fernández Band BS. Successive projections algorithm applied to spectral data for the simultaneous determination of flavour enhancers. Talanta 2010; 82:222-6. [DOI: 10.1016/j.talanta.2010.04.024] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2010] [Revised: 04/12/2010] [Accepted: 04/14/2010] [Indexed: 11/27/2022]
|
15
|
Wongravee K, Heinrich N, Holmboe M, Schaefer ML, Reed RR, Trevejo J, Brereton RG. Variable selection using iterative reformulation of training set models for discrimination of samples: application to gas chromatography/mass spectrometry of mouse urinary metabolites. Anal Chem 2009; 81:5204-17. [PMID: 19507882 DOI: 10.1021/ac900251c] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The paper discusses variable selection as used in large metabolomic studies, exemplified by mouse urinary gas chromatography of 441 mice in three experiments to detect the influence of age, diet, and stress on their chemosignal. Partial least squares discriminant analysis (PLS-DA) was applied to obtain class models, using a procedure of 20,000 iterations including the bootstrap for model optimization and random splits into test and training sets for validation. Variables are selected using PLS regression coefficients on the training set using an optimized number of components obtained from the bootstrap. The variables are ranked in order of significance, and the overall optimal variables are selected as those that appear as highly significant over 100 different test and training set splits. Cost/benefit analysis of performing the model on a reduced number of variables is also illustrated. This paper provides a strategy for properly validated methods for determining which variables are most significant for discriminating between two groups in large metabolomic data sets avoiding the common pitfall of overfitting if variables are selected on a combined training and test set and also taking into account that different variables may be selected each time the samples are split into training and test sets using iterative procedures.
Collapse
Affiliation(s)
- Kanet Wongravee
- Centre for Chemometrics, School of Chemistry, University of Bristol, Cantocks Close, Bristol BS8 1TS, UK
| | | | | | | | | | | | | |
Collapse
|
16
|
Application of successive projections algorithm for variable selection to determine organic acids of plum vinegar. Food Chem 2009. [DOI: 10.1016/j.foodchem.2009.01.073] [Citation(s) in RCA: 71] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
17
|
Tan F, Feng X, Li M, Wang Z, Yang L, Li Y, Feng Y, Nie F. Construction and application of a novel library: Fourier transform infrared wavelet coefficients library. Anal Chim Acta 2008; 629:38-46. [DOI: 10.1016/j.aca.2008.09.017] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2008] [Revised: 09/03/2008] [Accepted: 09/07/2008] [Indexed: 10/21/2022]
|
18
|
Reis MS, Saraiva PM, Bakshi BR. Multiscale statistical process control using wavelet packets. AIChE J 2008. [DOI: 10.1002/aic.11523] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
19
|
Chu YH, Kim D, Han C, Yoon ES. Two-Stage Variable Selection Using the Wavelet Transform of Batch Trajectories for Data Interpretation and Construction of Parsimonious Quality-Estimation Models. Ind Eng Chem Res 2007. [DOI: 10.1021/ie0614475] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Young-Hwan Chu
- Samsung Petrochemical, Bugok-Dong 500, Nam-ku, Ulsan 680-110, Korea, and School of Chemical and Biological Engineering, Institute of Chemical Processes, Seoul National University, San 56-1, Shillim-dong, Kwanak-gu, Seoul 151-742, Korea
| | - Daeyoun Kim
- Samsung Petrochemical, Bugok-Dong 500, Nam-ku, Ulsan 680-110, Korea, and School of Chemical and Biological Engineering, Institute of Chemical Processes, Seoul National University, San 56-1, Shillim-dong, Kwanak-gu, Seoul 151-742, Korea
| | - Chonghun Han
- Samsung Petrochemical, Bugok-Dong 500, Nam-ku, Ulsan 680-110, Korea, and School of Chemical and Biological Engineering, Institute of Chemical Processes, Seoul National University, San 56-1, Shillim-dong, Kwanak-gu, Seoul 151-742, Korea
| | - En-Sup Yoon
- Samsung Petrochemical, Bugok-Dong 500, Nam-ku, Ulsan 680-110, Korea, and School of Chemical and Biological Engineering, Institute of Chemical Processes, Seoul National University, San 56-1, Shillim-dong, Kwanak-gu, Seoul 151-742, Korea
| |
Collapse
|
20
|
Rao R, Lakshminarayanan S. Variable interaction network based variable selection for multivariate calibration. Anal Chim Acta 2007; 599:24-35. [PMID: 17765060 DOI: 10.1016/j.aca.2007.08.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2007] [Revised: 07/25/2007] [Accepted: 08/01/2007] [Indexed: 11/29/2022]
Abstract
Multivariate calibration problems often involve the identification of a meaningful subset of variables, from a vast number of variables for better prediction of output variables. A new graph theoretic method based on partial correlations (variable interaction network-VIN) is proposed. Many well studied representative calibration datasets spanning different application domains are selected for investigating the performance. Partial least squares (PLS) regression models combined with variable selection techniques are employed for benchmarking the performance. Subsets of variables with different number of variables are retained for the final analysis after VIN selection and progressive prediction accuracies are used for comparison. VIN-PLS results show significant improvement in prediction efficiencies and variable subset optimization. Improvement of up to 45% over existing methods with significantly fewer variables is achieved using the new method. Advantages of VIN based variable selection are highlighted.
Collapse
Affiliation(s)
- Raghuraj Rao
- Department of Chemical and Biomolecular Engineering, National University of Singapore, Singapore 117576
| | | |
Collapse
|
21
|
Sousa AC, Lucio MMLM, Bezerra Neto OF, Marcone GPS, Pereira AFC, Dantas EO, Fragoso WD, Araujo MCU, Galvão RKH. A method for determination of COD in a domestic wastewater treatment plant by using near-infrared reflectance spectrometry of seston. Anal Chim Acta 2007; 588:231-6. [PMID: 17386815 DOI: 10.1016/j.aca.2007.02.022] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2006] [Revised: 02/07/2007] [Accepted: 02/13/2007] [Indexed: 10/23/2022]
Abstract
This paper proposes a method for determination of chemical oxygen demand (COD) in domestic wastewater. The proposed method is based on near-infrared reflectance (NIRR) measurements of seston collected from wastewater samples by filtration. The analysis does not require any special reagent, catalyst or solvent. Inherent baseline and noise features present in NIRR spectra are removed by a Savitzky-Golay derivative procedure followed by wavelet denoising. The resulting wavelet approximation coefficients are used for partial-least-squares modelling and subsequent prediction of COD values in new samples. The model is calibrated by using COD values obtained according to the American Public Health Association (APHA) reference method. The proposed method is applied to effluent samples from the anaerobic ponds of the Mangabeira municipal wastewater treatment plant in the city of João Pessoa (Paraíba, Brazil). By comparing the NIRR prediction results with the APHA reference values, a root-mean-square error of prediction (RMSEP) of 19 mg O2 L(-1) and a correlation of 0.97 were obtained. Such results are deemed adequate in view of the joint estimate of the standard error of the reference method, which was calculated as 21 mg O2 L(-1).
Collapse
Affiliation(s)
- Antonio C Sousa
- Departamento de Química, Universidade Federal da Paraíba, João Pessoa, PB, Brazil
| | | | | | | | | | | | | | | | | |
Collapse
|
22
|
Reis MS, Saraiva PM. Generalized Multiresolution Decomposition Frameworks for the Analysis of Industrial Data with Uncertainty and Missing Values. Ind Eng Chem Res 2006. [DOI: 10.1021/ie051313b] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Marco S. Reis
- GEPSI-PSE Group, Department of Chemical Engineering, University of Coimbra, Pólo II−Rua Sílvio Lima, 3030-790 Coimbra, Portugal
| | - Pedro M. Saraiva
- GEPSI-PSE Group, Department of Chemical Engineering, University of Coimbra, Pólo II−Rua Sílvio Lima, 3030-790 Coimbra, Portugal
| |
Collapse
|
23
|
Westra S, Sharma A. Dominant modes of interannual variability in Australian rainfall analyzed using wavelets. ACTA ACUST UNITED AC 2006. [DOI: 10.1029/2005jd005996] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
24
|
de Medeiros VM, Ugulino Araújo MC, Harrop Galvão RK, da Silva EC, Bezerra Saldanha TC, Salata Toscano IA, Ribeiro de Oliveira MDS, Barbosa Freitas SK, Neto MM. Screening analysis of river seston downstream of an effluent discharge point using near-infrared reflectance spectrometry and wavelet-based spectral region selection. WATER RESEARCH 2005; 39:3089-97. [PMID: 15998532 DOI: 10.1016/j.watres.2005.05.018] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/12/2004] [Revised: 05/16/2005] [Accepted: 05/18/2005] [Indexed: 05/03/2023]
Abstract
A methodology for screening analysis of river seston downstream of an industry effluent by using near-infrared reflectance spectrometry was developed. A wavelet transform (WT)-based strategy is used to select a spectral region in which the effect of the effluent on the optical properties of the seston is more evident. The methodology was applied to samples from the River Mumbaba in northeast Brazil. Four sites were monitored: two upstream (1 and 2), one at the discharge point of the effluent (3), and another downstream (4). Soft Independent Modelling of Class Analogies (SIMCA) models were built for site 1 and were then applied to the classification of samples from sites 2 and 4. The results reveal that the WT-based spectral region selection is essential to ensure good sensitivity and specificity with respect to the detection of events associated to the effluent discharges at site 3. In fact, the changes in site 4 caused by the effluent are masked by other environmental factors when the full spectrum is employed.
Collapse
Affiliation(s)
- Vânia Maria de Medeiros
- Universidade Federal da Paraíba, CCEN, Departamento de Química, Caixa Postal 5093, 58051-970-João Pessoa, PB, Brazil
| | | | | | | | | | | | | | | | | |
Collapse
|
25
|
Liu Y, Brown SD. Wavelet multiscale regression from the perspective of data fusion: new conceptual approaches. Anal Bioanal Chem 2004; 380:445-52. [PMID: 15448968 DOI: 10.1007/s00216-004-2776-x] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2004] [Revised: 05/28/2004] [Accepted: 07/20/2004] [Indexed: 11/28/2022]
Abstract
Wavelet regression is a very promising technique for modern multivariate calibration and calibration transfer. Multiscale analysis of wavelet scales provides a connection between wavelet regression and data fusion. In this paper, current wavelet regression methods are reviewed from the novel perspective of data fusion. Illustrated by analysis of a public domain near-infrared dataset, the advantages and drawbacks of these methods are examined. For wavelet regression, the non-uniformity of the wavelet components, the multiscale nature of the signal, and the prevention of information leakage are crucial issues that will be addressed.
Collapse
Affiliation(s)
- Yang Liu
- Department of Chemistry and Biochemistry, University of Delaware, Newark, DE 19716, USA
| | | |
Collapse
|
26
|
Leger MN, Wentzell PD. Maximum likelihood principal components regression on wavelet-compressed data. APPLIED SPECTROSCOPY 2004; 58:855-862. [PMID: 15282053 DOI: 10.1366/0003702041389382] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Maximum likelihood principal component regression (MLPCR) is an errors-in-variables method used to accommodate measurement error information when building multivariate calibration models. A hindrance of MLPCR has been the substantial demand on computational resources sometimes made by the algorithm, especially for certain types of error structures. Operations on these large matrices are memory intensive and time consuming, especially when techniques such as cross-validation are used. This work describes the use of wavelet transforms (WT) as a data compression method for MLPCR. It is shown that the error covariance matrix in the wavelet and spectral domains are related through a two-dimensional WT. This allows the user to account for any effects of the wavelet transform on spectral and error structures. The wavelet transform can be applied to MLPCR when using either the full error covariance matrix or the smaller pooled error covariance matrix. Simulated and experimental near-infrared data sets are used to demonstrate the benefits of using wavelets with the MLPCR algorithm. In all cases, significant compression can be obtained while maintaining favorable predictive ability. Considerable time savings were also attained, with improvements ranging from a factor of 2 to a factor of 720. Using the WT-compressed data in MLPCR gave a reduction in prediction errors compared to using the raw data in MLPCR. An analogous reduction in prediction errors was not always seen when using PCR.
Collapse
Affiliation(s)
- Marc N Leger
- Trace Analysis Research Centre, Department of Chemistry, Dalhousie University, Halifax, Nova Scotia B3H 4J3, Canada
| | | |
Collapse
|
27
|
Abdollahi H, Bagheri L. Simultaneous Spectrophotometric Determination of p-Benzoquinone and Chloranil after Microcrystalline Naphthalene Extraction by Using Genetic Algorithm-Based Wavelength Selection-Partial Least Squares Regression. ANAL SCI 2004; 20:1701-6. [PMID: 15636519 DOI: 10.2116/analsci.20.1701] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Microcrystalline naphthalene extraction has been used for the preconcentration of p-benzoquinone and tetrachloro-p-benzoquinone (chloranil), after their reaction by aniline, and later simultaneous spectrophotometric analysis by genetic algorithm-partial least squares (GA-PLS) calibration. The chemical variables affecting the analytical performance of the methodology were studied and optimized. Under the optimum conditions i.e., [aniline] = 0.05 M and [naphthalene] = 2.2% (w/v), preconcentration of 25 ml of sample solution permitted the detection of 0.32 and 0.23 microg ml(-1) for p-benzoquinone and chloranil, respectively. The predictive abilities of partial least squares regression (PLS) and genetic algorithm-partial least squares regression (GA-PLS) were examined for simultaneous determination of two quinones. The GA-PLS shows superiority over other PLS methods due to the wavelength selection in PLS calibration using a genetic algorithm without loss of prediction capacity, provides useful information about the chemical system.
Collapse
Affiliation(s)
- Hamid Abdollahi
- Department of Chemistry, Institute for Advanced Studies in Basic Sciences, Zanjan 45195-159, Iran.
| | | |
Collapse
|
28
|
Martins VL, de Almeida LF, de Castro SL, Galvão RKH, de Araújo MCU, da Silva EC. A Multiscale Wavelet Data Treatment for Reliable Localization of Inflection Points for Analytical Purposes. ACTA ACUST UNITED AC 2003; 43:1725-32. [PMID: 14632417 DOI: 10.1021/ci034112w] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Instrumental analysis techniques that employ measurements based on inflection points may have their accuracy compromised due to the need for signal differentiation, which is very sensitive to instrumental noise. This paper presents a strategy for localizing inflection points that exploits the multiscale processing capability of the Wavelet Transform and avoids the need for explicit signal differentiation. The strategy is illustrated in simulated examples and also in a real analytical problem involving the determination of Pb and Cd by potentiometric stripping analysis. In this application, the results were in good agreement with the expected values and were slightly better than those obtained from the first derivative of the curves after smoothing by a Windowed Fourier Transform.
Collapse
Affiliation(s)
- Valdomiro Lacerda Martins
- Departamento de Química Fundamental, Universidade Federal de Pernambuco, CCEN, CEP 50740-901 - Recife PE, Brazil
| | | | | | | | | | | |
Collapse
|
29
|
Tan H, Brown SD. Multivariate calibration of spectral data using dual-domain regression analysis. Anal Chim Acta 2003. [DOI: 10.1016/s0003-2670(03)00351-9] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
30
|
Gributs CEW, Burns DH. Haar transform analysis of photon time-of-flight measurements for quantification of optical properties in scattering media. APPLIED OPTICS 2003; 42:2923-2930. [PMID: 12790441 DOI: 10.1364/ao.42.002923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
A method to independently quantify the absorption and the scattering properties of samples based on the analysis of the Haar transform (HT) of photon time-of-flight (TOF) distributions is described. A series of reflectance photon TOF measurements were acquired from absorbing/scattering milk samples of known composition (0 < mu(a) < 0.025 mm(-1); 100 < mu(s) < 250 mm(-1)). The HT of the profiles was calculated, and the regression based on the most parsimonious subset of wavelets was determined by the genetic algorithm (GA). In addition, the utility of computing the logarithm of the profiles or of the absolute value of the wavelet coefficients before the GA was studied. Results show that the absorption coefficient could be estimated with a coefficient of variation (C.V.) of 6.7% and an r2 of 0.99 by use of the log of selected wavelets of frequency less than 800 MHz. Scattering coefficients were estimated with a C.V. of 2.3% and an r2 of 0.99 with the log of wavelets of frequency less than 400 MHz. The above results suggest that a simplified instrument based on low-frequency switches could be developed to quantify the optical properties of highly scattering media.
Collapse
Affiliation(s)
- Claudia E W Gributs
- Department of Chemistry, McGill University, 801 Sherbrooke Street West, Montréal, Quebec, Canada H3A 2K6
| | | |
Collapse
|
31
|
Coelho CJ, Galvão RKH, de Araújo MCU, Pimentel MF, da Silva EC. A linear semi-infinite programming strategy for constructing optimal wavelet transforms in multivariate calibration problems. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES 2003; 43:928-33. [PMID: 12767151 DOI: 10.1021/ci025657d] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
A novel strategy for the optimization of wavelet transforms with respect to the statistics of the data set in multivariate calibration problems is proposed. The optimization follows a linear semi-infinite programming formulation, which does not display local maxima problems and can be reproducibly solved with modest computational effort. After the optimization, a variable selection algorithm is employed to choose a subset of wavelet coefficients with minimal collinearity. The selection allows the building of a calibration model by direct multiple linear regression on the wavelet coefficients. In an illustrative application involving the simultaneous determination of Mn, Mo, Cr, Ni, and Fe in steel samples by ICP-AES, the proposed strategy yielded more accurate predictions than PCR, PLS, and nonoptimized wavelet regression.
Collapse
Affiliation(s)
- Clarimar José Coelho
- Universidade Federal da Paraíba, Depto de Química, Caixa Postal 5093, 58051-970, João Pessoa, PB, Brazil
| | | | | | | | | |
Collapse
|
32
|
Galvão RKH, Hadjiloucas S, Bowen JW. Use of the statistical properties of the wavelet-transform coefficients for optimization of integration time in Fourier transform spectrometry. OPTICS LETTERS 2002; 27:643-645. [PMID: 18007889 DOI: 10.1364/ol.27.000643] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
We show that an analysis of the mean and variance of discrete wavelet coefficients of coaveraged time-domain interferograms can be used as a specification for determining when to stop coaveraging. We also show that, if a prediction model built in the wavelet domain is used to determine the composition of unknown samples, a stopping criterion for the coaveraging process can be developed with respect to the uncertainty tolerated in the prediction.
Collapse
|
33
|
Manganiello L, Vega C, Rı́os A, Valcárcel M. Use of wavelet transform to enhance piezoelectric signals for analytical purposes. Anal Chim Acta 2002. [DOI: 10.1016/s0003-2670(02)00009-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
34
|
Aspects of the successive projections algorithm for variable selection in multivariate calibration applied to plasma emission spectrometry. Anal Chim Acta 2001. [DOI: 10.1016/s0003-2670(01)01182-5] [Citation(s) in RCA: 69] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
35
|
Ehrentreich F, Sümmchen L. Spike removal and denoising of Raman spectra by wavelet transform methods. Anal Chem 2001; 73:4364-73. [PMID: 11569832 DOI: 10.1021/ac0013756] [Citation(s) in RCA: 62] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Wavelet decompositions of Raman spectra were investigated with respect to their usability for spike removal and denoising of the raw data. It could be shown that those operations should be performed sequentially. Suppression of spikes is not straightforwardly possible by wavelet transformation; however, the wavelet transform may be used to recognize the spikes by their first level detail coefficients. Spike locations could be projected from the details to the approximations and, further, to appropriate locations of the original spectrum. After spike recognition, those regions will be replaced by interpolated values. To complete processing, denoising is performed with the despiked spectrum by repeated application of wavelet transform methods.
Collapse
Affiliation(s)
- F Ehrentreich
- Technische Universität Dresden, Institut für Analytische Chemie, Germany
| | | |
Collapse
|
36
|
Harrington PB, Rauch PJ, Cai C. Multivariate curve resolution of wavelet and Fourier compressed spectra. Anal Chem 2001; 73:3247-56. [PMID: 11476222 DOI: 10.1021/ac000956s] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The multivariate curve resolution method SIMPLe to use Interactive Self-Modeling Mixture Analysis (SIMPLISMA) was applied to Fourier and wavelet compressed ion-mobility spectra. The spectra obtained from the SIMPLISMA model were transformed back to their original representation, that is, uncompressed format. SIMPULSMA was able to model the same pure variables for the partial wavelet transform, although for the Fourier and complete wavelet transforms, satisfactory pure variables and models were not obtained. Data were acquired from two samples and two different ion mobility spectrometry (IMS) sensors. The first sample was thermally desorbed sodium gamma-hydroxybutyrate (GHB), and the second sample was a liquid mixture of dicyclohexylamine (DCHA) and diethylmethylphosphonate (DEMP). The spectra were compressed to 6.3% of their original size. SIMPLISMA was applied to the compressed data in the Fourier and wavelet domains. An alternative method of normalizing SIMPLISMA spectra was devised that removes variation in scale between SIMPLISMA results obtained from uncompressed and compressed data. SIMPLISMA was able to accurately extract the spectral features and concentration profiles directly from daublet compressed IMS data at a compression ratio of 93.7% with root-mean-square errors of reconstruction < 3%. The daublet wavelet filters were selected, because they worked well when compared to coiflet and symmlet. The effects of the daublet filter width and compression ratio were evaluated with respect to reconstruction errors of the data sets and SIMPLISMA spectra. For these experiments, the daublet 14 filter performed well for the two data sets.
Collapse
Affiliation(s)
- P B Harrington
- Center for Intelligent Chemical Instrumentation, Chemistry Department, Ohio University, Athens 45701-2979, USA
| | | | | |
Collapse
|
37
|
Albert S, Kinley RD. Multivariate statistical monitoring of batch processes: an industrial case study of fermentation supervision. Trends Biotechnol 2001; 19:53-62. [PMID: 11164554 DOI: 10.1016/s0167-7799(00)01528-6] [Citation(s) in RCA: 76] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Abstract
This article describes the development of Multivariate Statistical Process Control (MSPC) procedures for monitoring batch processes and demonstrates its application with respect to industrial tylosin biosynthesis. Currently, the main fermentation phase is monitored using univariate statistical process control principles implemented within the G2 real-time expert system package. This development addresses integrating various process stages into a monitoring system and observing interactions among individual variables through the use of multivariate projection methods. The benefits of this approach will be discussed from an industrial perspective.
Collapse
Affiliation(s)
- S Albert
- Eli Lilly and Company Limited, Speke Operations, Fleming Road, L24 9LN, Liverpool, UK.
| | | |
Collapse
|
38
|
Jetter K, Depczynski U, Molt K, Niemöller A. Principles and applications of wavelet transformation to chemometrics. Anal Chim Acta 2000. [DOI: 10.1016/s0003-2670(00)00889-8] [Citation(s) in RCA: 80] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|