1
|
Tharwat A, Schenck W. Active Learning for Handling Missing Data. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:3273-3287. [PMID: 38277246 DOI: 10.1109/tnnls.2024.3352279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2024]
Abstract
Recently, the massive growth of IoT devices and Internet data, which are widely used in many applications, including industry and healthcare, has dramatically increased the amount of free unlabeled data collected. However, this unlabeled data is useless if we want to learn supervised machine learning models. The expensive and time-consuming cost of labeling makes the problem even more challenging. Here, the active learning (AL) technique provides a solution by labeling small but highly informative and representative data, which guarantees a high degree of generalizability over space and improves classification performance with data we have never seen before. The task is more difficult when the active learner has no predefined knowledge, such as initial training data, and when the obtained data is incomplete (i.e., contains missing values). In previous studies, the missing data should first be imputed. Then, the active learner selects from the available unlabeled data, regardless of whether the points were originally observed or imputed. However, selecting inaccurate imputed data points would negatively affect the active learner and prevent it from selecting informative and/or representative points, thus reducing the overall classification performance of the prediction models. This motivated us to introduce a novel query selection strategy that accounts for imputation uncertainty when querying new points. For this purpose, we first introduce a novel multiple imputation method that considers feature importance in selecting the most promising feature groups for missing value estimation. This multiple imputation method provides the ability to quantify the imputation uncertainty of each imputed data point. Furthermore, in each of the two phases of the proposed active learner (exploration and exploitation), imputation uncertainty is taken into account to reduce the probability of selecting points with high imputation uncertainty. We tested the effectiveness of the proposed active learner on different binary and multiclass datasets with different missing rates.
Collapse
|
2
|
Wang B, Wei J, Zhang L, Jiang H, Jin C, Huang S. Soft sensor modeling method for Pichia pastoris fermentation process based on substructure domain transfer learning. BMC Biotechnol 2024; 24:104. [PMID: 39696295 DOI: 10.1186/s12896-024-00928-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2024] [Accepted: 11/25/2024] [Indexed: 12/20/2024] Open
Abstract
BACKGROUND Aiming at the problem that traditional transfer methods are prone to lose data information in the overall domain-level transfer, and it is difficult to achieve the perfect match between source and target domains, thus reducing the accuracy of the soft sensor model. METHODS This paper proposes a soft sensor modeling method based on the transfer modeling framework of substructure domain. Firstly, the Gaussian mixture model clustering algorithm is used to extract local information, cluster the source and target domains into multiple substructure domains, and adaptively weight the substructure domains according to the distances between the sub-source domains and sub-target domains. Secondly, the optimal subspace domain adaptation method integrating multiple metrics is used to obtain the optimal projection matrices W s and W t that are coupled with each other, and the data of source and target domains are projected to the corresponding subspace to perform spatial alignment, so as to reduce the discrepancy between the sample data of different working conditions. Finally, based on the source and target domain data after substructure domain adaptation, the least squares support vector machine algorithm is used to establish the prediction model. RESULTS Taking Pichia pastoris fermentation to produce inulinase as an example, the simulation results verify that the root mean square error of the proposed soft sensor model in predicting Pichia pastoris concentration and inulinase concentration is reduced by 48.7% and 54.9%, respectively. CONCLUSION The proposed soft sensor modeling method can accurately predict Pichia pastoris concentration and inulinase concentration online under different working conditions, and has higher prediction accuracy than the traditional soft sensor modeling method.
Collapse
Affiliation(s)
- Bo Wang
- Key Laboratory of Agricultural Measurement and Control Technology and Equipment for Mechanical Industrial Facilities, School of Electrical and Information Engineering, Jiangsu University, Zhenjiang, 212013, China
| | - Jun Wei
- Key Laboratory of Agricultural Measurement and Control Technology and Equipment for Mechanical Industrial Facilities, School of Electrical and Information Engineering, Jiangsu University, Zhenjiang, 212013, China.
| | - Le Zhang
- Wuxi Key Laboratory of Intelligent Robot and Special Equipment Technology, Wuxi Taihu University, Wuxi, 214064, China
| | - Hui Jiang
- Key Laboratory of Agricultural Measurement and Control Technology and Equipment for Mechanical Industrial Facilities, School of Electrical and Information Engineering, Jiangsu University, Zhenjiang, 212013, China
| | - Cheng Jin
- Wuxi Key Laboratory of Intelligent Robot and Special Equipment Technology, Wuxi Taihu University, Wuxi, 214064, China
| | - Shaowen Huang
- Key Laboratory of Agricultural Measurement and Control Technology and Equipment for Mechanical Industrial Facilities, School of Electrical and Information Engineering, Jiangsu University, Zhenjiang, 212013, China
| |
Collapse
|
3
|
Wang B, Yu A, Wang H, Liu J. Modeling and Optimization of an Enhanced Soft Sensor for the Fermentation Process of Pichia pastoris. SENSORS (BASEL, SWITZERLAND) 2024; 24:3017. [PMID: 38793872 PMCID: PMC11125098 DOI: 10.3390/s24103017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Revised: 05/02/2024] [Accepted: 05/08/2024] [Indexed: 05/26/2024]
Abstract
This paper proposes a novel soft sensor modeling approach, MIC-TCA-INGO-LSSVM, to address the decline in performance of soft sensor models during the fermentation process of Pichia pastoris, caused by changes in working conditions. Initially, the transfer component analysis (TCA) method is utilized to minimize the differences in data distribution across various working conditions. Subsequently, a least squares support vector machine (LSSVM) model is constructed using the dataset adapted by TCA, and strategies for improving the northern goshawk optimization (INGO) algorithm are proposed to optimize the parameters of the LSSVM model. Finally, to further enhance the model's generalization ability and prediction accuracy, considering the transfer of knowledge from multiple-source working conditions, a sub-model weighted ensemble scheme is proposed based on the maximum information coefficient (MIC) algorithm. The proposed soft sensor model is employed to predict cell and product concentrations during the fermentation process of Pichia pastoris. Simulation results indicate that the RMSE of the INGO-LSSVM model in predicting cell and product concentrations is reduced by 47.3% and 42.1%, respectively, compared to the NGO-LSSVM model. Additionally, TCA significantly enhances the model's adaptability when working conditions change. Moreover, the soft sensor model based on TCA and the MIC-weighted ensemble method achieves a reduction of 41.6% and 31.3% in the RMSE for predicting cell and product concentrations, respectively, compared to the single-source condition transfer model TCA-INGO-LSSVM. These results demonstrate the high reliability and predictive performance of the proposed soft sensor method under varying working conditions.
Collapse
Affiliation(s)
| | - Ameng Yu
- Key Laboratory of Agricultural Measurement and Control Technology and Equipment for Mechanical Industrial Facilities, School of Electrical and Information Engineering, Jiangsu University, Zhenjiang 212013, China; (B.W.); (H.W.); (J.L.)
| | | | | |
Collapse
|
4
|
Yang C, Liu Q, Liu Y, Cheung YM. Transfer Dynamic Latent Variable Modeling for Quality Prediction of Multimode Processes. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:6061-6074. [PMID: 37079407 DOI: 10.1109/tnnls.2023.3265762] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Quality prediction is beneficial to intelligent inspection, advanced process control, operation optimization, and product quality improvements of complex industrial processes. Most of the existing work obeys the assumption that training samples and testing samples follow similar data distributions. The assumption is, however, not true for practical multimode processes with dynamics. In practice, traditional approaches mostly establish a prediction model using the samples from the principal operating mode (POM) with abundant samples. The model is inapplicable to other modes with a few samples. In view of this, this article will propose a novel dynamic latent variable (DLV)-based transfer learning approach, called transfer DLV regression (TDLVR), for quality prediction of multimode processes with dynamics. The proposed TDLVR can not only derive the dynamics between process variables and quality variables in the POM but also extract the co-dynamic variations among process variables between the POM and the new mode. This can effectively overcome data marginal distribution discrepancy and enrich the information of the new mode. To make full use of the available labeled samples from the new mode, an error compensation mechanism is incorporated into the established TDLVR, termed compensated TDLVR (CTDLVR), to adapt to the conditional distribution discrepancy. Empirical studies show the efficacy of the proposed TDLVR and CTDLVR methods in several case studies, including numerical simulation examples and two real-industrial process examples.
Collapse
|
5
|
Tian J, Jiang Y, Zhang J, Luo H, Yin S. A novel data augmentation approach to fault diagnosis with class-imbalance problem. RELIABILITY ENGINEERING & SYSTEM SAFETY 2024; 243:109832. [DOI: 10.1016/j.ress.2023.109832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/03/2024]
|
6
|
Nettleton DF, Marí-Buyé N, Marti-Soler H, Egan JR, Hort S, Horna D, Costa M, Vallejo Benítez-Cano E, Goldrick S, Rafiq QA, König N, Schmitt RH, R. Reyes A. Smart Sensor Control and Monitoring of an Automated Cell Expansion Process. SENSORS (BASEL, SWITZERLAND) 2023; 23:9676. [PMID: 38139523 PMCID: PMC10748109 DOI: 10.3390/s23249676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 11/19/2023] [Accepted: 12/05/2023] [Indexed: 12/24/2023]
Abstract
Immune therapy for cancer patients is a new and promising area that in the future may complement traditional chemotherapy. The cell expansion phase is a critical part of the process chain to produce a large number of high-quality, genetically modified immune cells from an initial sample from the patient. Smart sensors augment the ability of the control and monitoring system of the process to react in real-time to key control parameter variations, adapt to different patient profiles, and optimize the process. The aim of the current work is to develop and calibrate smart sensors for their deployment in a real bioreactor platform, with adaptive control and monitoring for diverse patient/donor cell profiles. A set of contrasting smart sensors has been implemented and tested on automated cell expansion batch runs, which incorporate advanced data-driven machine learning and statistical techniques to detect variations and disturbances of the key system features. Furthermore, a 'consensus' approach is applied to the six smart sensor alerts as a confidence factor which helps the human operator identify significant events that require attention. Initial results show that the smart sensors can effectively model and track the data generated by the Aglaris FACER bioreactor, anticipate events within a 30 min time window, and mitigate perturbations in order to optimize the key performance indicators of cell quantity and quality. In quantitative terms for event detection, the consensus for sensors across batch runs demonstrated good stability: the AI-based smart sensors (Fuzzy and Weighted Aggregation) gave 88% and 86% consensus, respectively, whereas the statistically based (Stability Detector and Bollinger) gave 25% and 42% consensus, respectively, the average consensus for all six being 65%. The different results reflect the different theoretical approaches. Finally, the consensus of batch runs across sensors gave even higher stability, ranging from 57% to 98% with an average consensus of 80%.
Collapse
Affiliation(s)
| | | | | | - Joseph R. Egan
- Department of Biochemical Engineering, University College London, London WC1E 6BT, UK; (J.R.E.); (Q.A.R.)
| | - Simon Hort
- Fraunhofer Institute for Production Technology, 52074 Aachen, Germany (N.K.); (R.H.S.)
| | - David Horna
- Aglaris Cell, 28760 Madrid, Spain; (N.M.-B.)
- Aglaris Ltd., Stevenage SG1 2FX, UK
| | - Miquel Costa
- Aglaris Cell, 28760 Madrid, Spain; (N.M.-B.)
- Aglaris Ltd., Stevenage SG1 2FX, UK
| | | | - Stephen Goldrick
- Department of Biochemical Engineering, University College London, London WC1E 6BT, UK; (J.R.E.); (Q.A.R.)
| | - Qasim A. Rafiq
- Department of Biochemical Engineering, University College London, London WC1E 6BT, UK; (J.R.E.); (Q.A.R.)
| | - Niels König
- Fraunhofer Institute for Production Technology, 52074 Aachen, Germany (N.K.); (R.H.S.)
| | - Robert H. Schmitt
- Fraunhofer Institute for Production Technology, 52074 Aachen, Germany (N.K.); (R.H.S.)
- Laboratory for Machine Tools and Production Engineering (WZL), RWTH Aachen University, 52074 Aachen, Germany
| | | |
Collapse
|
7
|
Wang B, Liu J, Yu A, Wang H. Development and Optimization of a Novel Soft Sensor Modeling Method for Fermentation Process of Pichia pastoris. SENSORS (BASEL, SWITZERLAND) 2023; 23:6014. [PMID: 37447863 DOI: 10.3390/s23136014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 06/25/2023] [Accepted: 06/26/2023] [Indexed: 07/15/2023]
Abstract
This paper introduces a novel soft sensor modeling method based on BDA-IPSO-LSSVM designed to address the issue of model failure caused by varying fermentation data distributions resulting from different operating conditions during the fermentation of different batches of Pichia pastoris. First, the problem of significant differences in data distribution among different batches of the fermentation process is addressed by adopting the balanced distribution adaptation (BDA) method from transfer learning. This method reduces the data distribution differences among batches of the fermentation process, while the fuzzy set concept is employed to improve the BDA method by transforming the classification problem into a regression prediction problem for the fermentation process. Second, the soft sensor model for the fermentation process is developed using the least squares support vector machine (LSSVM). The model parameters are optimized by an improved particle swarm optimization (IPSO) algorithm based on individual differences. Finally, the data obtained from the Pichia pastoris fermentation experiment are used for simulation, and the developed soft sensor model is applied to predict the cell concentration and product concentration during the fermentation process of Pichia pastoris. Simulation results demonstrate that the IPSO algorithm has good convergence performance and optimization performance compared with other algorithms. The improved BDA algorithm can make the soft sensor model adapt to different operating conditions, and the proposed soft sensor method outperforms existing methods, exhibiting higher prediction accuracy and the ability to accurately predict the fermentation process of Pichia pastoris under different operating conditions.
Collapse
Affiliation(s)
- Bo Wang
- Key Laboratory of Agricultural Measurement and Control Technology and Equipment for Mechanical Industrial Facilities, School of Electrical and Information Engineering, Jiangsu University, Zhenjiang 212013, China
| | - Jun Liu
- Key Laboratory of Agricultural Measurement and Control Technology and Equipment for Mechanical Industrial Facilities, School of Electrical and Information Engineering, Jiangsu University, Zhenjiang 212013, China
| | - Ameng Yu
- Key Laboratory of Agricultural Measurement and Control Technology and Equipment for Mechanical Industrial Facilities, School of Electrical and Information Engineering, Jiangsu University, Zhenjiang 212013, China
| | - Haibo Wang
- Key Laboratory of Agricultural Measurement and Control Technology and Equipment for Mechanical Industrial Facilities, School of Electrical and Information Engineering, Jiangsu University, Zhenjiang 212013, China
| |
Collapse
|
8
|
Dai Y, Yang C, Zhu J, Liu Y. Adversarial Transferred Data-Assisted Soft Sensor for Enhanced Multigrade Quality Prediction. ACS OMEGA 2023; 8:19900-19911. [PMID: 37305252 PMCID: PMC10249142 DOI: 10.1021/acsomega.3c01832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Accepted: 05/12/2023] [Indexed: 06/13/2023]
Abstract
Although recent transfer learning soft sensors show promising applications in multigrade chemical processes, good prediction performance mainly relies on available target domain data, which is difficult to achieve for a start-up grade. Additionally, only employing a single global model is inadequate to characterize the inner relationship of process variables. A just-in-time adversarial transfer learning (JATL) soft sensing method is developed to enhance multigrade process prediction performance. The distribution discrepancies of process variables between two different operating grades are first reduced by the ATL strategy. Subsequently, by applying the just-in-time learning approach, a similar data set is selected from the transferred source data for reliable model construction. Consequently, with the JATL-based soft sensor, quality prediction of a new target grade is implemented without its own labeled data. Experimental results on two multigrade chemical processes validate that the JATL method can give rise to the improvement of model performance.
Collapse
Affiliation(s)
- Yun Dai
- Institute of Process Equipment and Control Engineering, Zhejiang University of Technology, Hangzhou 310023, People's Republic of China
| | - Chao Yang
- State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang 110819, People's Republic of China
| | - Jialiang Zhu
- Institute of Process Equipment and Control Engineering, Zhejiang University of Technology, Hangzhou 310023, People's Republic of China
| | - Yi Liu
- Institute of Process Equipment and Control Engineering, Zhejiang University of Technology, Hangzhou 310023, People's Republic of China
| |
Collapse
|
9
|
Zhang Y, Jin H, Liu H, Yang B, Dong S. Deep Semi-Supervised Just-in-Time Learning Based Soft Sensor for Mooney Viscosity Estimation in Industrial Rubber Mixing Process. Polymers (Basel) 2022; 14:polym14051018. [PMID: 35267845 PMCID: PMC8914694 DOI: 10.3390/polym14051018] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Revised: 02/26/2022] [Accepted: 03/01/2022] [Indexed: 02/05/2023] Open
Abstract
Soft sensor technology has become an effective tool to enable real-time estimations of key quality variables in industrial rubber-mixing processes, which facilitates efficient monitoring and a control of rubber manufacturing. However, it remains a challenging issue to develop high-performance soft sensors due to improper feature selection/extraction and insufficiency of labeled data. Thus, a deep semi-supervised just-in-time learning-based Gaussian process regression (DSSJITGPR) is developed for Mooney viscosity estimation. It integrates just-in-time learning, semi-supervised learning, and deep learning into a unified modeling framework. In the offline stage, the latent feature information behind the historical process data is extracted through a stacked autoencoder. Then, an evolutionary pseudo-labeling estimation approach is applied to extend the labeled modeling database, where high-confidence pseudo-labeled data are obtained by solving an explicit pseudo-labeling optimization problem. In the online stage, when the query sample arrives, a semi-supervised JITGPR model is built from the enlarged modeling database to achieve Mooney viscosity estimation. Compared with traditional Mooney-viscosity soft sensor methods, DSSJITGPR shows significant advantages in extracting latent features and handling label scarcity, thus delivering superior prediction performance. The effectiveness and superiority of DSSJITGPR has been verified through the Mooney viscosity prediction results from an industrial rubber-mixing process.
Collapse
Affiliation(s)
- Yan Zhang
- Department of Automation, Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China; (Y.Z.); (H.L.); (B.Y.)
- Yunnan Key Laboratory of Computer Technologies Application, Kunming 650500, China
| | - Huaiping Jin
- Department of Automation, Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China; (Y.Z.); (H.L.); (B.Y.)
- Yunnan Key Laboratory of Computer Technologies Application, Kunming 650500, China
- Correspondence: ; Tel.: +86-158-7798-6943
| | - Haipeng Liu
- Department of Automation, Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China; (Y.Z.); (H.L.); (B.Y.)
- Yunnan Key Laboratory of Computer Technologies Application, Kunming 650500, China
| | - Biao Yang
- Department of Automation, Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China; (Y.Z.); (H.L.); (B.Y.)
- Yunnan Key Laboratory of Computer Technologies Application, Kunming 650500, China
| | - Shoulong Dong
- Department of Chemical Engineering, School of Chemistry and Chemical Engineering, Beijing Institute of Technology, Beijing 100081, China;
| |
Collapse
|