1
|
Severinsen I, Yu W, Walmsley T, Young B. COVERT: A classless approach to generating balanced datasets for process modelling. ISA TRANSACTIONS 2024; 144:1-10. [PMID: 37951753 DOI: 10.1016/j.isatra.2023.10.031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 09/04/2023] [Accepted: 10/27/2023] [Indexed: 11/14/2023]
Abstract
In this work, a classless oversampling technique, Covert, was developed to improve historical datasets from industrial processing plants to aid process modelling. Using kernel density estimation and nearest neighbour algorithms, sparse regions are identified and resampled, developing a more balanced dataset. When applied to a real dataset from a geothermal power plant, Covert outperforms current best practice (Smote) in uniformly populating the input feature space and generating credible data in the output variable. When used to develop a data-driven model Covert improved model accuracy by 20% when predicting outside the original data's feature space. Smote, however, reduced model accuracy by 6% in the same feature space. Developing reliable models of industrial processes continues to be a significant hurdle in developing a digital twin. Using Covert, existing imbalanced historical data can be used to extend the range of applicability of any process model.
Collapse
Affiliation(s)
- Isaac Severinsen
- Department of Chemical and Materials Engineering, University of Auckland, 5 Grafton Road, Auckland, 1010, New Zealand
| | - Wei Yu
- Department of Chemical and Materials Engineering, University of Auckland, 5 Grafton Road, Auckland, 1010, New Zealand
| | - Timothy Walmsley
- Ahuora - Centre for Smart Energy Systems, School of Engineering, The University of Waikato, Gate 8, Hillcrest Road, Hamilton, 3240, New Zealand
| | - Brent Young
- Department of Chemical and Materials Engineering, University of Auckland, 5 Grafton Road, Auckland, 1010, New Zealand.
| |
Collapse
|
2
|
Wang J, Zhao H, Zhang Y, Wang H, Guo J. Unsupervised Ensemble Learning Improves Discriminability of Stochastic Neighbor Embedding. INT J COMPUT INT SYS 2023. [DOI: 10.1007/s44196-023-00203-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/28/2023] Open
Abstract
AbstractThe purpose of feature learning is to obtain effective representation of the raw data and then improve the performance of machine learning algorithms such as clustering or classification. Some of the existing feature learning algorithms use discriminant information in the data to improve the representation of data features, but the discrimination of the data feature representation is not enough. In order to further enhance the discrimination, discriminant feature learning based on t-distribution stochastic neighbor embedding guided by pairwise constraints (pcDTSNE) is proposed in this paper. pcDTSNE introduces pairwise constraints by clustering ensemble and uses these pairwise constraints to impose penalties on the objective function, which makes sample points in the mapping space present stronger discrimination. In order to verify the feature learning performance of pcDTSNE, extensive experiments are carried out on several public data sets. The experimental results show that the expression ability of data representation generated by pcDTSNE is further improved.
Collapse
|
3
|
Zhu QX, Zhang HT, Tian Y, Zhang N, Xu Y, He YL. Co-training based virtual sample generation for solving the small sample size problem in process industry. ISA TRANSACTIONS 2023; 134:290-301. [PMID: 36064497 DOI: 10.1016/j.isatra.2022.08.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 08/20/2022] [Accepted: 08/22/2022] [Indexed: 06/15/2023]
Abstract
With the development of industrialization, the production scale and complexity of process industries are getting larger and larger. But, limited by the small amounts of samples and the uneven sample distribution in the process industry, it is difficult to establish accurate and efficient data-driven soft sensor models to predict some variables. To further develop the application of soft sensor models, generating new virtual samples based on the original sample distribution to extend the sample set is an ideal approach to solve this problem. In this paper, a novel virtual sample generation method based on the co-training of two K-Nearest Neighbor (KNN) models is proposed. First, according to the sparse parameter, sparse regions in each dimension of the feature space are identified. Second, the input features of virtual samples are generated in these sparse regions by performing interpolation operations. Third, the outputs of virtual samples are predicted by double KNN regressors based on co-training. The qualified virtual samples are screened and the model is updated using these virtual samples to improve the prediction accuracy of the double KNN models. To verify the effectiveness and superiority of the proposed virtual sample generation method based on the co-training (CTVSG), case studies are conducted using two standard functions and a Purified Terephthalic Acid (PTA) industrial dataset, where the effectiveness of CTVSG is confirmed.
Collapse
Affiliation(s)
- Qun-Xiong Zhu
- College of Information Science & Technology, Beijing University of Chemical Technology, Beijing, 100029, China; Engineering Research Center of Intelligent PSE, Ministry of Education of China, Beijing 100029, China
| | - Hong-Tao Zhang
- College of Information Science & Technology, Beijing University of Chemical Technology, Beijing, 100029, China; Engineering Research Center of Intelligent PSE, Ministry of Education of China, Beijing 100029, China
| | - Ye Tian
- College of Information Science & Technology, Beijing University of Chemical Technology, Beijing, 100029, China; Engineering Research Center of Intelligent PSE, Ministry of Education of China, Beijing 100029, China
| | - Ning Zhang
- College of Information Science & Technology, Beijing University of Chemical Technology, Beijing, 100029, China; Engineering Research Center of Intelligent PSE, Ministry of Education of China, Beijing 100029, China
| | - Yuan Xu
- College of Information Science & Technology, Beijing University of Chemical Technology, Beijing, 100029, China; Engineering Research Center of Intelligent PSE, Ministry of Education of China, Beijing 100029, China.
| | - Yan-Lin He
- College of Information Science & Technology, Beijing University of Chemical Technology, Beijing, 100029, China; Engineering Research Center of Intelligent PSE, Ministry of Education of China, Beijing 100029, China.
| |
Collapse
|
4
|
Paepae T, Bokoro PN, Kyamakya K. Data Augmentation for a Virtual-Sensor-Based Nitrogen and Phosphorus Monitoring. SENSORS (BASEL, SWITZERLAND) 2023; 23:1061. [PMID: 36772100 PMCID: PMC9920320 DOI: 10.3390/s23031061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Revised: 01/06/2023] [Accepted: 01/16/2023] [Indexed: 06/18/2023]
Abstract
To better control eutrophication, reliable and accurate information on phosphorus and nitrogen loading is desired. However, the high-frequency monitoring of these variables is economically impractical. This necessitates using virtual sensing to predict them by utilizing easily measurable variables as inputs. While the predictive performance of these data-driven, virtual-sensor models depends on the use of adequate training samples (in quality and quantity), the procurement and operational cost of nitrogen and phosphorus sensors make it impractical to acquire sufficient samples. For this reason, the variational autoencoder, which is one of the most prominent methods in generative models, was utilized in the present work for generating synthetic data. The generation capacity of the model was verified using water-quality data from two tributaries of the River Thames in the United Kingdom. Compared to the current state of the art, our novel data augmentation-including proper experimental settings or hyperparameter optimization-improved the root mean squared errors by 23-63%, with the most significant improvements observed when up to three predictors were used. In comparing the predictive algorithms' performances (in terms of the predictive accuracy and computational cost), k-nearest neighbors and extremely randomized trees were the best-performing algorithms on average.
Collapse
Affiliation(s)
- Thulane Paepae
- Department of Electrical and Electronic Engineering Technology, University of Johannesburg, Doornfontein 2028, South Africa
| | - Pitshou N. Bokoro
- Department of Electrical and Electronic Engineering Technology, University of Johannesburg, Doornfontein 2028, South Africa
| | - Kyandoghere Kyamakya
- Institute for Smart Systems Technologies, Transportation Informatics, Alpen-Adria Universität Klagenfurt, 9020 Klagenfurt, Austria
- Faculté Polytechnique, Université de Kinshasa, P.O. Box 127, Kinshasa XI, Democratic Republic of the Congo
| |
Collapse
|
5
|
Dai Y, Liu A, Chen M, Liu Y, Yao Y. Enhanced Soft Sensor with Qualified Augmented Samples for Quality Prediction of the Polyethylene Process. Polymers (Basel) 2022; 14:polym14214769. [PMID: 36365761 PMCID: PMC9656800 DOI: 10.3390/polym14214769] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2022] [Revised: 11/04/2022] [Accepted: 11/04/2022] [Indexed: 11/09/2022] Open
Abstract
Data-driven soft sensors have increasingly been applied for the quality measurement of industrial polymerization processes in recent years. However, owing to the costly assay process, the limited labeled data available still pose significant obstacles to the construction of accurate models. In this study, a novel soft sensor named the selective Wasserstein generative adversarial network, with gradient penalty-based support vector regression (SWGAN-SVR), is proposed to enhance quality prediction with limited training samples. Specifically, the Wasserstein generative adversarial network with gradient penalty (WGAN-GP) is employed to capture the distribution of the available limited labeled data and to generate virtual candidates. Subsequently, an effective data-selection strategy is developed to alleviate the problem of varied-quality samples caused by the unstable training of the WGAN-GP. The selection strategy includes two parts: the centroid metric criterion and the statistical characteristic criterion. An SVR model is constructed based on the qualified augmented training data to evaluate the prediction performance. The superiority of SWGAN-SVR is demonstrated, using a numerical example and an industrial polyethylene process.
Collapse
Affiliation(s)
- Yun Dai
- Institute of Process Equipment and Control Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Angpeng Liu
- Institute of Process Equipment and Control Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Meng Chen
- Guangdong Basic and Applied Basic Research Foundation, Guangzhou 510640, China
| | - Yi Liu
- Institute of Process Equipment and Control Engineering, Zhejiang University of Technology, Hangzhou 310023, China
- Correspondence: (Y.L.); (Y.Y.); Tel.: +886-3-5713690 (Y.Y.)
| | - Yuan Yao
- Department of Chemical Engineering, National Tsing Hua University, Hsinchu 30013, Taiwan
- Correspondence: (Y.L.); (Y.Y.); Tel.: +886-3-5713690 (Y.Y.)
| |
Collapse
|
6
|
Ultrasound Evaluation of the Primary α Phase Grain Size Based on Generative Adversarial Network. SENSORS 2022; 22:s22093274. [PMID: 35590964 PMCID: PMC9099485 DOI: 10.3390/s22093274] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/27/2022] [Revised: 04/08/2022] [Accepted: 04/17/2022] [Indexed: 02/05/2023]
Abstract
Because of the high cost of experimental data acquisition, the limited size of the sample set available when conducting tissue structure ultrasound evaluation can cause the evaluation model to have low accuracy. To address such a small-sample problem, the sample set size can be expanded by using virtual samples. In this study, an ultrasound evaluation method for the primary α phase grain size based on the generation of virtual samples by a generative adversarial network (GAN) was developed. TC25 titanium alloy forgings were treated as the research object. Virtual samples were generated by the GAN with a fully connected network of different sizes used as the generator and discriminator. A virtual sample screening mechanism was constructed to obtain the virtual sample set, taking the optimization rate as the validity criterion. Moreover, an ultrasound evaluation optimization problem was constructed with accuracy as the target. It was solved by using support vector machine regression to obtain the final ultrasound evaluation model. A benchmark function was adopted to verify the effectiveness of the method, and a series of experiments and comparison experiments were performed on the ultrasound evaluation model using test samples. The results show that the learning accuracy of the original small samples can be increased by effective virtual samples. The ultrasound evaluation model built based on the proposed method has a higher accuracy and better stability than other models.
Collapse
|
7
|
Lin LS, Hu SC, Lin YS, Li DC, Siao LR. A new approach to generating virtual samples to enhance classification accuracy with small data-a case of bladder cancer. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2022; 19:6204-6233. [PMID: 35603398 DOI: 10.3934/mbe.2022290] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
In the medical field, researchers are often unable to obtain the sufficient samples in a short period of time necessary to build a stable data-driven forecasting model used to classify a new disease. To address the problem of small data learning, many studies have demonstrated that generating virtual samples intended to augment the amount of training data is an effective approach, as it helps to improve forecasting models with small datasets. One of the most popular methods used in these studies is the mega-trend-diffusion (MTD) technique, which is widely used in various fields. The effectiveness of the MTD technique depends on the degree of data diffusion. However, data diffusion is seriously affected by extreme values. In addition, the MTD method only considers data fitted using a unimodal triangular membership function. However, in fact, data may come from multiple distributions in the real world. Therefore, considering the fact that data comes from multi-distributions, in this paper, a distance-based mega-trend-diffusion (DB-MTD) technique is proposed to appropriately estimate the degree of data diffusion with less impacts from extreme values. In the proposed method, it is assumed that the data is fitted by the triangular and trapezoidal membership functions to generate virtual samples. In addition, a possibility evaluation mechanism is proposed to measure the applicability of the virtual samples. In our experiment, two bladder cancer datasets are used to verify the effectiveness of the proposed DB-MTD method. The experimental results demonstrated that the proposed method outperforms other VSG techniques in classification and regression items for small bladder cancer datasets.
Collapse
Affiliation(s)
- Liang-Sian Lin
- Department of Information Management, National Taipei University of Nursing and Health Sciences, Ming-te Road, Taipei 112303, Taiwan
| | - Susan C Hu
- Department of Public Health, College of Medicine, National Cheng Kung University, University Road, Tainan 70101, Taiwan
| | - Yao-San Lin
- Singapore Centre for Chinese Language, Nanyang Technological University, Ghim Moh Road Singapore 279623, Singapore
| | - Der-Chiang Li
- Department of Industrial and Information Management, National Cheng Kung University, University Road, Tainan 70101, Taiwan
| | - Liang-Ren Siao
- Department of Industrial and Information Management, National Cheng Kung University, University Road, Tainan 70101, Taiwan
| |
Collapse
|
8
|
Zhang Y, Zhao X, Hui Y, Liu K. Online monitoring and fault diagnosis for uneven length batch process based on multi‐way orthogonal enhanced neighborhood preserving embedding. ASIA-PAC J CHEM ENG 2022. [DOI: 10.1002/apj.2763] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Affiliation(s)
- Yan Zhang
- College of Electrical and Information Engineering Lanzhou University of Technology Lanzhou China
- Key Laboratory of Gansu Advanced Control for Industrial Processes Lanzhou University of Technology Lanzhou China
| | - Xiaoqiang Zhao
- College of Electrical and Information Engineering Lanzhou University of Technology Lanzhou China
- Key Laboratory of Gansu Advanced Control for Industrial Processes Lanzhou University of Technology Lanzhou China
- National Experimental Teaching Center of Electrical and Control Engineering Lanzhou University of Technology Lanzhou China
| | - Yongyong Hui
- College of Electrical and Information Engineering Lanzhou University of Technology Lanzhou China
- Key Laboratory of Gansu Advanced Control for Industrial Processes Lanzhou University of Technology Lanzhou China
- National Experimental Teaching Center of Electrical and Control Engineering Lanzhou University of Technology Lanzhou China
| | - Kai Liu
- College of Electrical and Information Engineering Lanzhou University of Technology Lanzhou China
- Key Laboratory of Gansu Advanced Control for Industrial Processes Lanzhou University of Technology Lanzhou China
| |
Collapse
|
9
|
Li Z, Jin H, Dong S, Qian B, Yang B, Chen X. Semi-supervised ensemble support vector regression based soft sensor for key quality variable estimation of nonlinear industrial processes with limited labeled data. Chem Eng Res Des 2022. [DOI: 10.1016/j.cherd.2022.01.026] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
10
|
Zhu QX, Xu TX, Xu Y, He YL. Improved Virtual Sample Generation Method Using Enhanced Conditional Generative Adversarial Networks with Cycle Structures for Soft Sensors with Limited Data. Ind Eng Chem Res 2021. [DOI: 10.1021/acs.iecr.1c03197] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Affiliation(s)
- Qun-Xiong Zhu
- College of Information Science & Technology, Beijing University of Chemical Technology, Beijing 100029, China
- Engineering Research Center of Intelligent PSE, Ministry of Education of China, Beijing 100029, P. R. China
| | - Tian-xiang Xu
- College of Information Science & Technology, Beijing University of Chemical Technology, Beijing 100029, China
- Engineering Research Center of Intelligent PSE, Ministry of Education of China, Beijing 100029, P. R. China
| | - Yuan Xu
- College of Information Science & Technology, Beijing University of Chemical Technology, Beijing 100029, China
- Engineering Research Center of Intelligent PSE, Ministry of Education of China, Beijing 100029, P. R. China
| | - Yan-Lin He
- College of Information Science & Technology, Beijing University of Chemical Technology, Beijing 100029, China
- Engineering Research Center of Intelligent PSE, Ministry of Education of China, Beijing 100029, P. R. China
| |
Collapse
|
11
|
Zhu QX, Jin C, He YL, Xu Y. Pattern Mining of Alarm Flood Sequences Using an Improved PrefixSpan Algorithm with Tolerance to Short-Term Order Ambiguity. Ind Eng Chem Res 2021. [DOI: 10.1021/acs.iecr.0c05618] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Qun-Xiong Zhu
- College of Information Science & Technology, Beijing University of Chemical Technology, Beijing 100029, China
- Engineering Research Center of Intelligent PSE, Ministry of Education of China, Beijing 100029, China
| | - Chengyan Jin
- College of Information Science & Technology, Beijing University of Chemical Technology, Beijing 100029, China
- Engineering Research Center of Intelligent PSE, Ministry of Education of China, Beijing 100029, China
| | - Yan-Lin He
- College of Information Science & Technology, Beijing University of Chemical Technology, Beijing 100029, China
- Engineering Research Center of Intelligent PSE, Ministry of Education of China, Beijing 100029, China
| | - Yuan Xu
- College of Information Science & Technology, Beijing University of Chemical Technology, Beijing 100029, China
- Engineering Research Center of Intelligent PSE, Ministry of Education of China, Beijing 100029, China
| |
Collapse
|
12
|
Zheng A, Yang H, Pan X, Yin L, Feng Y. Identification of Multi-Class Drugs Based on Near Infrared Spectroscopy and Bidirectional Generative Adversarial Networks. SENSORS 2021; 21:s21041088. [PMID: 33562502 PMCID: PMC7914674 DOI: 10.3390/s21041088] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Revised: 01/24/2021] [Accepted: 01/28/2021] [Indexed: 11/16/2022]
Abstract
Drug detection and identification technology are of great significance in drug supervision and management. To determine the exact source of drugs, it is often necessary to directly identify multiple varieties of drugs produced by multiple manufacturers. Near-infrared spectroscopy (NIR) combined with chemometrics is generally used in these cases. However, existing NIR classification modeling methods have great limitations in dealing with a large number of categories and spectra, especially under the premise of insufficient samples, unbalanced samples, and sensitive identification error cost. Therefore, this paper proposes a NIR multi-classification modeling method based on a modified Bidirectional Generative Adversarial Networks (Bi-GAN). It makes full utilization of the powerful feature extraction ability and good sample generation quality of Bi-GAN and uses the generated samples with obvious features, an equal number between classes, and a sufficient number within classes to replace the unbalanced and insufficient real samples in the courses of spectral classification. 1721 samples of four kinds of drugs produced by 29 manufacturers were used as experimental materials, and the results demonstrate that this method is superior to other comparative methods in drug NIR classification scenarios, and the optimal accuracy rate is even more than 99% under ideal conditions.
Collapse
Affiliation(s)
- Anbing Zheng
- School of Automation, Beijing University of Posts and Telecommunications, 10 Xitucheng Road, Haidian District, Beijing 100086, China;
| | - Huihua Yang
- School of Automation, Beijing University of Posts and Telecommunications, 10 Xitucheng Road, Haidian District, Beijing 100086, China;
- School of Computer Science and Information Security, Guilin University of Electronic Technology, No.1 Jinji Road, Qixing District, Guilin 541004, China;
- Correspondence:
| | - Xipeng Pan
- School of Computer Science and Information Security, Guilin University of Electronic Technology, No.1 Jinji Road, Qixing District, Guilin 541004, China;
| | - Lihui Yin
- China Institute for Food and Drug Control, 2 Tiantan Xili, Dongcheng District, Beijing 100086, China; (L.Y.); (Y.F.)
| | - Yanchun Feng
- China Institute for Food and Drug Control, 2 Tiantan Xili, Dongcheng District, Beijing 100086, China; (L.Y.); (Y.F.)
| |
Collapse
|