1
|
Elaziz MA, Ewees AA, Al-qaness MAA, Alshathri S, Ibrahim RA. Feature Selection for High Dimensional Datasets Based on Quantum-Based Dwarf Mongoose Optimization. MATHEMATICS 2022; 10:4565. [DOI: 10.3390/math10234565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
Abstract
Feature selection (FS) methods play essential roles in different machine learning applications. Several FS methods have been developed; however, those FS methods that depend on metaheuristic (MH) algorithms showed impressive performance in various domains. Thus, in this paper, based on the recent advances in MH algorithms, we introduce a new FS technique to modify the performance of the Dwarf Mongoose Optimization (DMO) Algorithm using quantum-based optimization (QBO). The main idea is to utilize QBO as a local search of the traditional DMO to avoid its search limitations. So, the developed method, named DMOAQ, benefits from the advantages of the DMO and QBO. It is tested with well-known benchmark and high-dimensional datasets, with comprehensive comparisons to several optimization methods, including the original DMO. The evaluation outcomes verify that the DMOAQ has significantly enhanced the search capability of the traditional DMO and outperformed other compared methods in the evaluation experiments.
Collapse
|
2
|
Zhang J, Luo W, Dai Y, Yao Y. Cycle temporal algorithm-based multivariate statistical methods for fault diagnosis in chemical processes. Chin J Chem Eng 2022. [DOI: 10.1016/j.cjche.2021.03.058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
3
|
A Time Series Forecasting of Global Horizontal Irradiance on Geographical Data of Najran Saudi Arabia. ENERGIES 2022. [DOI: 10.3390/en15030928] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Environment-friendly and renewable energy resources are the need of each developed and undeveloped country. Solar energy is one of them, thus accurate forecasting of it can be useful for electricity supply companies. This research focuses on analyzing the daily global solar radiation (GSR) data of Najran province located in Saudi Arabia and proposed a model for the prediction of global horizontal irradiance (GHI). The weather data is collected from Najran University. After inspecting the data, I we found the dependent and independent variables for calculating the GHI. A dataset model has been trained by creating tensor of variables belonging to air, wind, peak wind, relative humidity, and barometric pressure. Furthermore, six machine learning algorithms convolutional neural networks (CNN), K-nearest neighbors (KNN), support vector machines (SVM), logistic regression (LR), random forest classifier (RFC), and support vector classifier (SVC) techniques are used on dataset model to predict the GHI. The evaluation metrics determination coefficients (R2), root mean square error (RMSE), relative root mean square error (rRMSE), mean bias error (MBE), mean absolute bias error (MABE), mean absolute percentage error (MAPE), and T-statistic (t-stat) are used for the result verification of proposed models. Finally, the current work reports that all methods examined in this work may be utilized to accurately predict GHI; however, the SVC technique is the most suitable method amongst all techniques by claiming the precise results using the evaluation metrics.
Collapse
|
4
|
Sadat Lavasani M, Raeisi Ardali N, Sotudeh-Gharebagh R, Zarghami R, Abonyi J, Mostoufi N. Big data analytics opportunities for applications in process engineering. REV CHEM ENG 2021. [DOI: 10.1515/revce-2020-0054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Abstract
Big data is an expression for massive data sets consisting of both structured and unstructured data that are particularly difficult to store, analyze and visualize. Big data analytics has the potential to help companies or organizations improve operations as well as disclose hidden patterns and secret correlations to make faster and intelligent decisions. This article provides useful information on this emerging and promising field for companies, industries, and researchers to gain a richer and deeper insight into advancements. Initially, an overview of big data content, key characteristics, and related topics are presented. The paper also highlights a systematic review of available big data techniques and analytics. The available big data analytics tools and platforms are categorized. Besides, this article discusses recent applications of big data in chemical industries to increase understanding and encourage its implementation in their engineering processes as much as possible. Finally, by emphasizing the adoption of big data analytics in various areas of process engineering, the aim is to provide a practical vision of big data.
Collapse
Affiliation(s)
- Mitra Sadat Lavasani
- Process Design and Simulation Research Center , School of Chemical Engineering, College of Engineering, University of Tehran , P.O. Box 11155-4563, Tehran , Iran
| | - Nahid Raeisi Ardali
- Process Design and Simulation Research Center , School of Chemical Engineering, College of Engineering, University of Tehran , P.O. Box 11155-4563, Tehran , Iran
| | - Rahmat Sotudeh-Gharebagh
- Process Design and Simulation Research Center , School of Chemical Engineering, College of Engineering, University of Tehran , P.O. Box 11155-4563, Tehran , Iran
| | - Reza Zarghami
- Process Design and Simulation Research Center , School of Chemical Engineering, College of Engineering, University of Tehran , P.O. Box 11155-4563, Tehran , Iran
| | - János Abonyi
- Department of Process Engineering , MTA – PE “Lendület” Complex Systems Monitoring Research Group, University of Pannonia , P.O. Box 158 , Veszprém , Hungary
| | - Navid Mostoufi
- Process Design and Simulation Research Center , School of Chemical Engineering, College of Engineering, University of Tehran , P.O. Box 11155-4563, Tehran , Iran
| |
Collapse
|
5
|
Liu K, Chen J. Robust adaptive neural network event-triggered compensation control for continuous stirred tank reactors with prescribed performance and actuator failures. Chem Eng Sci 2021. [DOI: 10.1016/j.ces.2021.116953] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
6
|
Gärtler M, Khaydarov V, Klöpper B, Urbas L. The Machine Learning Life Cycle in Chemical Operations – Status and Open Challenges. CHEM-ING-TECH 2021. [DOI: 10.1002/cite.202100134] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Affiliation(s)
- Marco Gärtler
- ABB Corporate Research Center Wallstadter Straße 59 68526 Ladenburg Germany
| | - Valentin Khaydarov
- Technische Universität Dresden Professur für Prozessleittechnik 01062 Dresden Germany
| | - Benjamin Klöpper
- ABB Corporate Research Center Wallstadter Straße 59 68526 Ladenburg Germany
| | - Leon Urbas
- Technische Universität Dresden Professur für Prozessleittechnik 01062 Dresden Germany
| |
Collapse
|
7
|
Gordon CAK, Pistikopoulos EN. Data‐driven
prescriptive maintenance toward
fault‐tolerant multiparametric
control. AIChE J 2021. [DOI: 10.1002/aic.17489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Christopher A. K. Gordon
- Artie McFerrin Department of Chemical Engineering Texas A&M University College Station Texas USA
- Texas A&M Energy Institute Texas A&M University College Station Texas USA
| | - Efstratios N. Pistikopoulos
- Artie McFerrin Department of Chemical Engineering Texas A&M University College Station Texas USA
- Texas A&M Energy Institute Texas A&M University College Station Texas USA
| |
Collapse
|
8
|
Rippon L, Yousef I, Hosseini B, Bouchoucha A, Beaulieu J, Prévost C, Ruel M, Shah S, Gopaluni R. Representation learning and predictive classification: Application with an electric arc furnace. Comput Chem Eng 2021. [DOI: 10.1016/j.compchemeng.2021.107304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
9
|
Backstepping Methodology to Troubleshoot Plant-Wide Batch Processes in Data-Rich Industrial Environments. Processes (Basel) 2021. [DOI: 10.3390/pr9061074] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Troubleshooting batch processes at a plant-wide level requires first finding the unit causing the fault, and then understanding why the fault occurs in that unit. Whereas in the literature case studies discussing the latter issue abound, little attention has been given so far to the former, which is complex for several reasons: the processing units are often operated in a non-sequential way, with unusual series-parallel arrangements; holding vessels may be required to compensate for lack of production capacity, and reacting phenomena can occur in these vessels; and the evidence of batch abnormality may be available only from the end unit and at the end of the production cycle. We propose a structured methodology to assist the troubleshooting of plant-wide batch processes in data-rich environments where multivariate statistical techniques can be exploited. Namely, we first analyze the last unit wherein the fault manifests itself, and we then step back across the units through the process flow diagram (according to the manufacturing recipe) until the fault cannot be detected by the available field sensors any more. That enables us to isolate the unit wherefrom the fault originates. Interrogation of multivariate statistical models for that unit coupled to engineering judgement allow identifying the most likely root cause of the fault. We apply the proposed methodology to troubleshoot a complex industrial batch process that manufactures a specialty chemical, where productivity was originally limited by unexplained variability of the final product quality. Correction of the fault allowed for a significant increase in productivity.
Collapse
|
10
|
A Novel Domain Adaptation-Based Intelligent Fault Diagnosis Model to Handle Sample Class Imbalanced Problem. SENSORS 2021; 21:s21103382. [PMID: 34066271 PMCID: PMC8152017 DOI: 10.3390/s21103382] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Revised: 04/29/2021] [Accepted: 05/09/2021] [Indexed: 11/17/2022]
Abstract
As the key component to transmit power and torque, the fault diagnosis of rotating machinery is crucial to guarantee the reliable operation of mechanical equipment. Regrettably, sample class imbalance is a common phenomenon in industrial applications, which causes large cross-domain distribution discrepancies for domain adaptation (DA) and results in performance degradation for most of the existing mechanical fault diagnosis approaches. To address this issue, a novel DA approach that simultaneously reduces the cross-domain distribution difference and the geometric difference is proposed, which is defined as MRMI. This work contains three parts to improve the sample class imbalance issue: (1) A novel distance metric method (MVD) is proposed and applied to improve the performance of marginal distribution adaptation. (2) Manifold regularization is combined with instance reweighting to simultaneously explore the intrinsic manifold structure and remove irrelevant source-domain samples adaptively. (3) The ℓ2-norm regularization is applied as the data preprocessing tool to improve the model generalization performance. The gear and rolling bearing datasets with class imbalanced samples are applied to validate the reliability of MRMI. According to the fault diagnosis results, MRMI can significantly outperform competitive approaches under the condition of sample class imbalance.
Collapse
|
11
|
Pistikopoulos EN, Barbosa-Povoa A, Lee JH, Misener R, Mitsos A, Reklaitis GV, Venkatasubramanian V, You F, Gani R. Process systems engineering – The generation next? Comput Chem Eng 2021. [DOI: 10.1016/j.compchemeng.2021.107252] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
12
|
Data-driven anomaly detection and diagnostics for changeover processes in biopharmaceutical drug product manufacturing. Chem Eng Res Des 2021. [DOI: 10.1016/j.cherd.2020.12.018] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
13
|
Taqvi SAA, Zabiri H, Tufa LD, Uddin F, Fatima SA, Maulud AS. A Review on Data‐Driven Learning Approaches for Fault Detection and Diagnosis in Chemical Processes. CHEMBIOENG REVIEWS 2021. [DOI: 10.1002/cben.202000027] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Syed Ali Ammar Taqvi
- NED University of Engineering & Technology Department of Chemical Engineering 75270 Karachi Pakistan
- NED University of Engineering and Technology Neurocomputation Lab, National Centre of Artificial Intelligence 75270 Karachi Pakistan
| | - Haslinda Zabiri
- Universiti Teknologi PETRONAS Chemical Engineering Department 32610 Seri Iskandar, Perak Darul Ridzuan Malaysia
| | - Lemma Dendena Tufa
- Addis Ababa Institute of Technology School of Chemical and Bioengineering King George VI St 1000 Addis Ababa Ethiopia
| | - Fahim Uddin
- NED University of Engineering & Technology Department of Chemical Engineering 75270 Karachi Pakistan
| | - Syeda Anmol Fatima
- Universiti Teknologi PETRONAS Chemical Engineering Department 32610 Seri Iskandar, Perak Darul Ridzuan Malaysia
| | - Abdulhalim Shah Maulud
- Universiti Teknologi PETRONAS Chemical Engineering Department 32610 Seri Iskandar, Perak Darul Ridzuan Malaysia
| |
Collapse
|
14
|
Integrated Diagnostic Framework for Process and Sensor Faults in Chemical Industry. SENSORS 2021; 21:s21030822. [PMID: 33530519 PMCID: PMC7865985 DOI: 10.3390/s21030822] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Revised: 01/19/2021] [Accepted: 01/22/2021] [Indexed: 11/30/2022]
Abstract
This study considers the problem of distinguishing between process and sensor faults in nonlinear chemical processes. An integrated fault diagnosis framework is proposed to distinguish chemical process sensor faults from process faults. The key idea of the framework is to embed the cycle temporal algorithm into the dynamic kernel principal component analysis to improve the fault detection speed and accuracy. It is combined with the fault diagnosis method based on the reconstruction-based contribution graph to diagnose the fault variables and then distinguish the two fault types according to their characteristics. Finally, the integrated fault diagnosis framework is applied to the Tennessee Eastman process and acid gas absorption process, and its effectiveness is proved.
Collapse
|
15
|
Kieslich CA, Alimirzaei F, Song H, Do M, Hall P. Data-driven prediction of antiviral peptides based on periodicities of amino acid properties. 31ST EUROPEAN SYMPOSIUM ON COMPUTER AIDED PROCESS ENGINEERING 2021. [PMCID: PMC8286203 DOI: 10.1016/b978-0-323-88506-5.50312-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
With the emergence of new pathogens, e.g., methicillin-resistant Staphylococcus aureus (MRSA), and the recent novel coronavirus pandemic, there has been an ever-increasing need for novel antimicrobial therapeutics. In this work, we have developed support vector machine (SVM) models to predict antiviral peptide sequences. Oscillations in physicochemical properties in protein sequences have been shown to be predictive of protein structure and function, and in the presented we work we have taken advantage of these known periodicities to develop models that predict antiviral peptide sequences. In developing the presented models, we first generated property factors by applying principal component analysis (PCA) to the AAindex dataset of 544 amino acid properties. We next converted peptide sequences into physicochemical vectors using 18 property factors resulting from the PCA. Fourier transforms were applied to the property factor vectors to measure the amplitude of the physicochemical oscillations, which served as the features to train our SVM models. To train and test the developed models we have used a publicly available database of antiviral peptides (http://crdd.osdd.net/servers/avppred/), and we have used cross-validation to train and tune models based on multiple training and testing sets. To further understand the physicochemical properties of antiviral peptides we have also applied a previously developed feature selection algorithm. Future work will be aimed at computationally designing novel antiviral therapeutics based on the developed machine learning models.
Collapse
|
16
|
Integration of data analytics with cloud services for safer process systems, application examples and implementation challenges. J Loss Prev Process Ind 2020. [DOI: 10.1016/j.jlp.2020.104316] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
|
17
|
Beykal B, Onel M, Onel O, Pistikopoulos EN. A data-driven optimization algorithm for differential algebraic equations with numerical infeasibilities. AIChE J 2020; 66. [PMID: 32921798 DOI: 10.1002/aic.16657] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Support Vector Machines (SVMs) based optimization framework is presented for the data-driven optimization of numerically infeasible Differential Algebraic Equations (DAEs) without the full discretization of the underlying first-principles model. By formulating the stability constraint of the numerical integration of a DAE system as a supervised classification problem, we are able to demonstrate that SVMs can accurately map the boundary of numerical infeasibility. The necessity of this data-driven approach is demonstrated on a 2-dimensional motivating example, where highly accurate SVM models are trained, validated, and tested using the data collected from the numerical integration of DAEs. Furthermore, this methodology is extended and tested for a multi-dimensional case study from reaction engineering (i.e., thermal cracking of natural gas liquids). The data-driven optimization of this complex case study is explored through integrating the SVM models with a constrained global grey-box optimization algorithm, namely the ARGONAUT framework.
Collapse
Affiliation(s)
- Burcu Beykal
- Artie McFerrin Department of Chemical Engineering Texas A&M University College Station Texas USA
- Texas A&M Energy Institute, Texas A&M University College Station Texas USA
| | - Melis Onel
- Artie McFerrin Department of Chemical Engineering Texas A&M University College Station Texas USA
- Texas A&M Energy Institute, Texas A&M University College Station Texas USA
| | - Onur Onel
- Artie McFerrin Department of Chemical Engineering Texas A&M University College Station Texas USA
- Texas A&M Energy Institute, Texas A&M University College Station Texas USA
- Department of Chemical and Biological Engineering Princeton University Princeton New Jersey USA
| | - Efstratios N. Pistikopoulos
- Artie McFerrin Department of Chemical Engineering Texas A&M University College Station Texas USA
- Texas A&M Energy Institute, Texas A&M University College Station Texas USA
| |
Collapse
|
18
|
Mukherjee R, Beykal B, Szafran AT, Onel M, Stossi F, Mancini MG, Lloyd D, Wright FA, Zhou L, Mancini MA, Pistikopoulos EN. Classification of estrogenic compounds by coupling high content analysis and machine learning algorithms. PLoS Comput Biol 2020; 16:e1008191. [PMID: 32970665 PMCID: PMC7538107 DOI: 10.1371/journal.pcbi.1008191] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2020] [Revised: 10/06/2020] [Accepted: 07/25/2020] [Indexed: 12/28/2022] Open
Abstract
Environmental toxicants affect human health in various ways. Of the thousands of chemicals present in the environment, those with adverse effects on the endocrine system are referred to as endocrine-disrupting chemicals (EDCs). Here, we focused on a subclass of EDCs that impacts the estrogen receptor (ER), a pivotal transcriptional regulator in health and disease. Estrogenic activity of compounds can be measured by many in vitro or cell-based high throughput assays that record various endpoints from large pools of cells, and increasingly at the single-cell level. To simultaneously capture multiple mechanistic ER endpoints in individual cells that are affected by EDCs, we previously developed a sensitive high throughput/high content imaging assay that is based upon a stable cell line harboring a visible multicopy ER responsive transcription unit and expressing a green fluorescent protein (GFP) fusion of ER. High content analysis generates voluminous multiplex data comprised of minable features that describe numerous mechanistic endpoints. In this study, we present a machine learning pipeline for rapid, accurate, and sensitive assessment of the endocrine-disrupting potential of benchmark chemicals based on data generated from high content analysis. The multidimensional imaging data was used to train a classification model to ultimately predict the impact of unknown compounds on the ER, either as agonists or antagonists. To this end, both linear logistic regression and nonlinear Random Forest classifiers were benchmarked and evaluated for predicting the estrogenic activity of unknown compounds. Furthermore, through feature selection, data visualization, and model discrimination, the most informative features were identified for the classification of ER agonists/antagonists. The results of this data-driven study showed that highly accurate and generalized classification models with a minimum number of features can be constructed without loss of generality, where these machine learning models serve as a means for rapid mechanistic/phenotypic evaluation of the estrogenic potential of many chemicals. Chemical contaminants or toxicants pose environmental and health-related risks for exposure. The ability to rapidly understand their biological impact, specifically on a key modulator of important physiological and pathological states in the human body is essential for diagnosing and avoiding undesirable health outcomes during environmental emergencies. In this study, we use advanced data analytics for creating statistical models that can accurately predict the endocrinological activity of toxic chemicals based on high throughput/high content image analysis data. We focus on a subclass of chemicals that affect the estrogen receptor (ER), which is a pivotal transcriptional regulator in health and disease. The multidimensional imaging data of these benchmark chemicals are used to train a classification model to ultimately predict the impact of unknown compounds on the ER, either as agonists or antagonists. To this end, we evaluate linear and nonlinear classifiers for predicting the estrogenic activity of unknown compounds and use feature selection, data visualization, and model discrimination methodologies to identify the most informative features for the classification of ER agonists/antagonists.
Collapse
Affiliation(s)
- Rajib Mukherjee
- Texas A&M Energy Institute, Texas A&M University, College Station, TX, United States of America
| | - Burcu Beykal
- Texas A&M Energy Institute, Texas A&M University, College Station, TX, United States of America
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX, United States of America
| | - Adam T. Szafran
- Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX, United States of America
| | - Melis Onel
- Texas A&M Energy Institute, Texas A&M University, College Station, TX, United States of America
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX, United States of America
| | - Fabio Stossi
- Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX, United States of America
- GCC Center for Advanced Microscopy and Image Informatics, Houston, TX, United States of America
| | - Maureen G. Mancini
- Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX, United States of America
- GCC Center for Advanced Microscopy and Image Informatics, Houston, TX, United States of America
| | - Dillon Lloyd
- Bioinformatics Research Center, Center for Human Health and the Environment, Department of Statistics, North Carolina State University, Raleigh, NC, United States of America
| | - Fred A. Wright
- Bioinformatics Research Center, Center for Human Health and the Environment, Department of Statistics, North Carolina State University, Raleigh, NC, United States of America
| | - Lan Zhou
- Department of Statistics, Texas A&M University, College Station, TX, United States of America
| | - Michael A. Mancini
- Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX, United States of America
- GCC Center for Advanced Microscopy and Image Informatics, Houston, TX, United States of America
- Texas A&M University Institute for Bioscience and Technology, Houston, TX, United States of America
- Pharmacology and Chemical Genomics, Baylor College of Medicine, Houston, TX, United States of America
| | - Efstratios N. Pistikopoulos
- Texas A&M Energy Institute, Texas A&M University, College Station, TX, United States of America
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX, United States of America
- * E-mail:
| |
Collapse
|
19
|
|
20
|
Zhang Y, Han L, Zou L, Zhang M, Chi R. Development of an SVR model for microwave-assisted aqueous two-phase extraction of isoflavonoids from Radix Puerariae. CHEM ENG COMMUN 2020. [DOI: 10.1080/00986445.2020.1734578] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Affiliation(s)
- Yuefei Zhang
- School of Chemistry and Environmental Engineering, Wuhan Institute of Technology, Wuhan, China
| | - Lei Han
- School of Electronic Information, Wuhan University, Wuhan, China
| | - Lian Zou
- School of Electronic Information, Wuhan University, Wuhan, China
| | - Mei Zhang
- School of Environmental Ecology and Biological Engineering, Wuhan Institute of Technology, Wuhan, China
| | - Ruan Chi
- Key Laboratory for Green Chemical Process of Ministry of Education, Wuhan Institute of Technology, Wuhan, China
| |
Collapse
|
21
|
Marques CM, Moniz S, de Sousa JP, Barbosa-Povoa AP, Reklaitis G. Decision-support challenges in the chemical-pharmaceutical industry: Findings and future research directions. Comput Chem Eng 2020. [DOI: 10.1016/j.compchemeng.2019.106672] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
22
|
Onel M, Burnak B, Pistikopoulos EN. Integrated Data-Driven Process Monitoring and Explicit Fault-Tolerant Multiparametric Control. Ind Eng Chem Res 2020; 59:2291-2306. [PMID: 32549652 DOI: 10.1021/acs.iecr.9b04226] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
We propose a novel active fault-tolerant control strategy that combines machine learning based process monitoring and explicit/multiparametric model predictive control (mp-MPC). The strategy features (i) data-driven fault detection and diagnosis models by using the support vector machine (SVM) algorithm, (ii) ranking via a nonlinear, kernel-dependent, SVM-based feature selection algorithm, (iii) data-driven regression models for fault magnitude estimation via the random forest algorithm, and (iv) a parametric optimization and control (PAROC) framework for the design of the explicit/multiparametric model predictive controller. The resulting explicit control strategies correspond to affine functions of the system states and the magnitude of the detected fault. A semibatch process, an example for penicillin production, is presented to demonstrate how the proposed framework ensures smart operation for which rapid switches between a priori computed explicit control action strategies are enabled by continuous process monitoring information.
Collapse
Affiliation(s)
- Melis Onel
- † Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, Texas 77843, United States.,‡ Texas A&M Energy Institute, Texas A&M University, College Station, Texas 77843, United States
| | - Baris Burnak
- † Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, Texas 77843, United States.,‡ Texas A&M Energy Institute, Texas A&M University, College Station, Texas 77843, United States
| | - Efstratios N Pistikopoulos
- † Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, Texas 77843, United States.,‡ Texas A&M Energy Institute, Texas A&M University, College Station, Texas 77843, United States
| |
Collapse
|
23
|
Quality-Relevant Monitoring of Batch Processes Based on Stochastic Programming with Multiple Output Modes. Processes (Basel) 2020. [DOI: 10.3390/pr8020164] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
To implement the quality-relevant monitoring scheme for batch processes with multiple output modes, this paper presents a novel methodology based on stochastic programming. Bringing together tools from stochastic programming and ensemble learning, the developed methodology focuses on the robust monitoring of process quality-relevant variables by taking the stochastic nature of batch process parameters explicitly into consideration. To handle the problem of missing data and lack of historical batch data, a bagging approach is introduced to generate individual quality-relevant sub-datasets, which are used to construct the corresponding monitoring sub-models. For each model, stochastic programming is used to construct an optimal quality trajectory, which is regarded as the reference for online quality monitoring. Then, for each sub-model, a corresponding control limit is obtained by computing historical residuals between the actual output and the optimal trajectory. For online monitoring, the current sample is examined by all sub-models, and whether the monitoring statistic exceeds the control limits is recorded for further analysis. The final step is ensemble learning via Bayesian fusion strategy, which is under the probabilistic framework. The implementation and effectiveness of the developed methodology are demonstrated through two case studies, including a numerical example, and a simulated fed-batch penicillin fermentation process.
Collapse
|
24
|
Jiang DN, Li W. Permissible Area Analyses of Measurement Errors with Required Fault Diagnosability Performance. SENSORS 2019; 19:s19224880. [PMID: 31717426 PMCID: PMC6891792 DOI: 10.3390/s19224880] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/14/2019] [Revised: 11/04/2019] [Accepted: 11/07/2019] [Indexed: 11/24/2022]
Abstract
Fault diagnosability is the basis of fault diagnosis. Fault diagnosability evaluation refers to whether there is enough measurable information in the system to support the rapid and reliable detection of a fault. However, due to unavoidable measurement errors in a system, a quantitative evaluation index of system fault diagnosability is inadequate. In order to overcome the adverse effects of measurement errors, improve the accuracy of the quantitative evaluation of fault diagnosability, and improve the safety level of the system, a method for a permissible area analysis of measurement errors for a quantitative evaluation of fault diagnosability is proposed in this paper. Firstly, in order for the residuals obey normal distribution, a design method of the permissible area of measurement errors based on the Kullback–Leibler divergence (KLD) is given. Secondly, two key problems in calculating the KLD are solved by sparse kernel density estimation and the Monte Carlo method. Finally, the feasibility and validity of the method are analyzed through a case study.
Collapse
Affiliation(s)
- Dong-Nian Jiang
- College of Electrical and Information Engineering, Lanzhou University of Technology, Lanzhou 730050, China;
- Key Laboratory of Gansu Advanced Control for Industrial Processes, Lanzhou 730050, China
- Correspondence:
| | - Wei Li
- College of Electrical and Information Engineering, Lanzhou University of Technology, Lanzhou 730050, China;
- Key Laboratory of Gansu Advanced Control for Industrial Processes, Lanzhou 730050, China
| |
Collapse
|
25
|
Onel M, Beykal B, Ferguson K, Chiu WA, McDonald TJ, Zhou L, House JS, Wright FA, Sheen DA, Rusyn I, Pistikopoulos EN. Grouping of complex substances using analytical chemistry data: A framework for quantitative evaluation and visualization. PLoS One 2019; 14:e0223517. [PMID: 31600275 PMCID: PMC6786635 DOI: 10.1371/journal.pone.0223517] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2019] [Accepted: 09/23/2019] [Indexed: 02/01/2023] Open
Abstract
A detailed characterization of the chemical composition of complex substances, such as products of petroleum refining and environmental mixtures, is greatly needed in exposure assessment and manufacturing. The inherent complexity and variability in the composition of complex substances obfuscate the choices for their detailed analytical characterization. Yet, in lieu of exact chemical composition of complex substances, evaluation of the degree of similarity is a sensible path toward decision-making in environmental health regulations. Grouping of similar complex substances is a challenge that can be addressed via advanced analytical methods and streamlined data analysis and visualization techniques. Here, we propose a framework with unsupervised and supervised analyses to optimally group complex substances based on their analytical features. We test two data sets of complex oil-derived substances. The first data set is from gas chromatography-mass spectrometry (GC-MS) analysis of 20 Standard Reference Materials representing crude oils and oil refining products. The second data set consists of 15 samples of various gas oils analyzed using three analytical techniques: GC-MS, GC×GC-flame ionization detection (FID), and ion mobility spectrometry-mass spectrometry (IM-MS). We use hierarchical clustering using Pearson correlation as a similarity metric for the unsupervised analysis and build classification models using the Random Forest algorithm for the supervised analysis. We present a quantitative comparative assessment of clustering results via Fowlkes-Mallows index, and classification results via model accuracies in predicting the group of an unknown complex substance. We demonstrate the effect of (i) different grouping methodologies, (ii) data set size, and (iii) dimensionality reduction on the grouping quality, and (iv) different analytical techniques on the characterization of the complex substances. While the complexity and variability in chemical composition are an inherent feature of complex substances, we demonstrate how the choices of the data analysis and visualization methods can impact the communication of their characteristics to delineate sufficient similarity.
Collapse
Affiliation(s)
- Melis Onel
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX, United States of America
- Texas A&M Energy Institute, Texas A&M University, College Station, TX, United States of America
| | - Burcu Beykal
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX, United States of America
- Texas A&M Energy Institute, Texas A&M University, College Station, TX, United States of America
| | - Kyle Ferguson
- Department of Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, United States of America
| | - Weihsueh A. Chiu
- Department of Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, United States of America
| | - Thomas J. McDonald
- Department of Environmental and Occupational Health, Texas A&M University, College Station, TX, United States of America
| | - Lan Zhou
- Department of Statistics, Texas A&M University, College Station, TX, United States of America
| | - John S. House
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC, United States of America
| | - Fred A. Wright
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC, United States of America
- Departments of Statistics and Biological Sciences, North Carolina State University, Raleigh, NC, United States of America
| | - David A. Sheen
- Chemical Sciences Division, National Institute of Standards and Technology, Gaithersburg, MD, United States of America
| | - Ivan Rusyn
- Department of Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, United States of America
| | - Efstratios N. Pistikopoulos
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX, United States of America
- Texas A&M Energy Institute, Texas A&M University, College Station, TX, United States of America
| |
Collapse
|
26
|
Development of the Texas A&M Superfund Research Program Computational Platform for Data Integration, Visualization, and Analysis. ACTA ACUST UNITED AC 2019; 46:967-972. [PMID: 31612156 DOI: 10.1016/b978-0-12-818634-3.50162-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/22/2023]
Abstract
The National Institute of Environmental Health Sciences (NIEHS) Superfund Research Program (SRP) aims to support university-based multidisciplinary research on human health and environmental issues related to hazardous substances and pollutants. The Texas A&M Superfund Research Program comprehensively evaluates the complexities of hazardous chemical mixtures and their potential adverse health impacts due to exposure through a number of multi-disciplinary projects and cores. One of the essential components of the Texas A&M Superfund Research Center is the Data Science Core, which serves as the basis for translating the data produced by the multi-disciplinary research projects into useful knowledge for the community via data collection, quality control, analysis, and model generation. In this work, we demonstrate the Texas A&M Superfund Research Program computational platform, which houses and integrates large-scale, diverse datasets generated across the Center, provides basic visualization service to facilitate interpretation, monitors data quality, and finally implements a variety of state-of-the-art statistical analysis for model/tool development. The platform is aimed to facilitate effective integration and collaboration across the Center and acts as an enabler for the dissemination of comprehensive ad-hoc tools and models developed to address the environmental and health effects of chemical mixture exposure during environmental emergency-related contamination events.
Collapse
|
27
|
Semi-infinite programming for global guarantees of robust fault detection and isolation in safety-critical systems. Comput Chem Eng 2019. [DOI: 10.1016/j.compchemeng.2019.04.007] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
|
28
|
Nentwich C, Engell S. Surrogate modeling of phase equilibrium calculations using adaptive sampling. Comput Chem Eng 2019. [DOI: 10.1016/j.compchemeng.2019.04.006] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
29
|
Dynamic process fault detection and diagnosis based on a combined approach of hidden Markov and Bayesian network model. Chem Eng Sci 2019. [DOI: 10.1016/j.ces.2019.01.060] [Citation(s) in RCA: 61] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
30
|
Rendall R, Chiang LH, Reis MS. Data-driven methods for batch data analysis – A critical overview and mapping on the complexity scale. Comput Chem Eng 2019. [DOI: 10.1016/j.compchemeng.2019.01.014] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
31
|
Casola G, Siegmund C, Mattern M, Sugiyama H. Data mining algorithm for pre-processing biopharmaceutical drug product manufacturing records. Comput Chem Eng 2019. [DOI: 10.1016/j.compchemeng.2018.12.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
|
32
|
Onel M, Kieslich CA, Pistikopoulos EN. A Nonlinear Support Vector Machine-Based Feature Selection Approach for Fault Detection and Diagnosis: Application to the Tennessee Eastman Process. AIChE J 2019; 65:992-1005. [PMID: 32377021 DOI: 10.1002/aic.16497] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
In this article, we present (1) a feature selection algorithm based on nonlinear support vector machine (SVM) for fault detection and diagnosis in continuous processes and (2) results for the Tennessee Eastman benchmark process. The presented feature selection algorithm is derived from the sensitivity analysis of the dual C-SVM objective function. This enables simultaneous modeling and feature selection paving the way for simultaneous fault detection and diagnosis, where feature ranking guides fault diagnosis. We train fault-specific two-class SVM models to detect faulty operations, while using the feature selection algorithm to improve the accuracy and perform the fault diagnosis. Our results show that the developed SVM models outperform the available ones in the literature both in terms of detection accuracy and latency. Moreover, it is shown that the loss of information is minimized with the use of feature selection techniques compared to feature extraction techniques such as principal component analysis (PCA). This further facilitates a more accurate interpretation of the results.
Collapse
Affiliation(s)
- Melis Onel
- Artie McFerrin Dept. of Chemical Engineering Texas A&M University College Station, Texas 77843
- Texas A&M Energy Institute Texas A&M University College Station, Texas 77843
| | - Chris A. Kieslich
- Artie McFerrin Dept. of Chemical Engineering Texas A&M University College Station, Texas 77843
- Texas A&M Energy Institute Texas A&M University College Station, Texas 77843
- Coulter Dept. of Biomedical Engineering Georgia Institute of Technology Atlanta Georgia
| | - Efstratios N. Pistikopoulos
- Artie McFerrin Dept. of Chemical Engineering Texas A&M University College Station, Texas 77843
- Texas A&M Energy Institute Texas A&M University College Station, Texas 77843
| |
Collapse
|
33
|
|