1
|
Borkenhagen LK, Allen MW, Runstadler JA. Influenza virus genotype to phenotype predictions through machine learning: a systematic review. Emerg Microbes Infect 2021; 10:1896-1907. [PMID: 34498543 PMCID: PMC8462836 DOI: 10.1080/22221751.2021.1978824] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background: There is great interest in understanding the viral genomic predictors of phenotypic traits that allow influenza A viruses to adapt to or become more virulent in different hosts. Machine learning techniques have demonstrated promise in addressing this critical need for other pathogens because the underlying algorithms are especially well equipped to uncover complex patterns in large datasets and produce generalizable predictions for new data. As the body of research where these techniques are applied for influenza A virus phenotype prediction continues to grow, it is useful to consider the strengths and weaknesses of these approaches to understand what has prevented these models from seeing widespread use by surveillance laboratories and to identify gaps that are underexplored with this technology. Methods and Results: We present a systematic review of English literature published through 15 April 2021 of studies employing machine learning methods to generate predictions of influenza A virus phenotypes from genomic or proteomic input. Forty-nine studies were included in this review, spanning the topics of host discrimination, human adaptability, subtype and clade assignment, pandemic lineage assignment, characteristics of infection, and antiviral drug resistance. Conclusions: Our findings suggest that biases in model design and a dearth of wet laboratory follow-up may explain why these models often go underused. We, therefore, offer guidance to overcome these limitations, aid in improving predictive models of previously studied influenza A virus phenotypes, and extend those models to unexplored phenotypes in the ultimate pursuit of tools to enable the characterization of virus isolates across surveillance laboratories.
Collapse
Affiliation(s)
- Laura K Borkenhagen
- Department of Infectious Disease and Global Health, Cummings School of Veterinary Medicine, Tufts University, North Grafton, MA, USA
| | - Martin W Allen
- Department of Computer Science, School of Engineering, Tufts University, Medford, MA, USA
| | - Jonathan A Runstadler
- Department of Infectious Disease and Global Health, Cummings School of Veterinary Medicine, Tufts University, North Grafton, MA, USA
| |
Collapse
|
2
|
Orr A, Wang M, Beykal B, Ganesh HS, Hearon SE, Pistikopoulos EN, Phillips TD, Tamamis P. Combining Experimental Isotherms, Minimalistic Simulations, and a Model to Understand and Predict Chemical Adsorption onto Montmorillonite Clays. ACS OMEGA 2021; 6:14090-14103. [PMID: 34124432 PMCID: PMC8190805 DOI: 10.1021/acsomega.1c00481] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 05/11/2021] [Indexed: 05/05/2023]
Abstract
An attractive approach to minimize human and animal exposures to toxic environmental contaminants is the use of safe and effective sorbent materials to sequester them. Montmorillonite clays have been shown to tightly bind diverse toxic chemicals. Due to their promise as sorbents to mitigate chemical exposures, it is important to understand their function and rapidly screen and predict optimal clay-chemical combinations for further testing. We derived adsorption free-energy values for a structurally and physicochemically diverse set of toxic chemicals using experimental adsorption isotherms performed in the current and previous studies. We studied the diverse set of chemicals using minimalistic MD simulations and showed that their interaction energies with calcium montmorillonite clays calculated using simulation snapshots in combination with their net charge and their corresponding solvent's dielectric constant can be used as inputs to a minimalistic model to predict adsorption free energies in agreement with experiments. Additionally, experiments and computations were used to reveal structural and physicochemical properties associated with chemicals that can be adsorbed to calcium montmorillonite clay. These properties include positively charged groups, phosphine groups, halide-rich moieties, hydrogen bond donor/acceptors, and large, rigid structures. The combined experimental and computational approaches used in this study highlight the importance and potential applicability of analogous methods to study and design novel advanced sorbent systems in the future, broadening their applicability for environmental contaminants.
Collapse
Affiliation(s)
- Asuka
A. Orr
- Artie
McFerrin Department of Chemical Engineering, Texas A&M University, College Station, Texas 77843-3122, United States
- Texas
A&M Energy Institute, Texas A&M
University, College
Station, Texas 77843-3372, United States
| | - Meichen Wang
- Veterinary
Integrative Biosciences Department, College of Veterinary Medicine
and Biomedical Sciences, Texas A&M University, College Station, Texas 77843-3122, United States
| | - Burcu Beykal
- Texas
A&M Energy Institute, Texas A&M
University, College
Station, Texas 77843-3372, United States
| | - Hari S. Ganesh
- Texas
A&M Energy Institute, Texas A&M
University, College
Station, Texas 77843-3372, United States
| | - Sara E. Hearon
- Veterinary
Integrative Biosciences Department, College of Veterinary Medicine
and Biomedical Sciences, Texas A&M University, College Station, Texas 77843-3122, United States
| | - Efstratios N. Pistikopoulos
- Artie
McFerrin Department of Chemical Engineering, Texas A&M University, College Station, Texas 77843-3122, United States
- Texas
A&M Energy Institute, Texas A&M
University, College
Station, Texas 77843-3372, United States
| | - Timothy D. Phillips
- Veterinary
Integrative Biosciences Department, College of Veterinary Medicine
and Biomedical Sciences, Texas A&M University, College Station, Texas 77843-3122, United States
| | - Phanourios Tamamis
- Artie
McFerrin Department of Chemical Engineering, Texas A&M University, College Station, Texas 77843-3122, United States
- Texas
A&M Energy Institute, Texas A&M
University, College
Station, Texas 77843-3372, United States
- Department
of Materials Science and Engineering, Texas
A&M University, College
Station, Texas 77843-3003, United States
| |
Collapse
|
3
|
Kieslich CA, Alimirzaei F, Song H, Do M, Hall P. Data-driven prediction of antiviral peptides based on periodicities of amino acid properties. 31ST EUROPEAN SYMPOSIUM ON COMPUTER AIDED PROCESS ENGINEERING 2021. [PMCID: PMC8286203 DOI: 10.1016/b978-0-323-88506-5.50312-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
With the emergence of new pathogens, e.g., methicillin-resistant Staphylococcus aureus (MRSA), and the recent novel coronavirus pandemic, there has been an ever-increasing need for novel antimicrobial therapeutics. In this work, we have developed support vector machine (SVM) models to predict antiviral peptide sequences. Oscillations in physicochemical properties in protein sequences have been shown to be predictive of protein structure and function, and in the presented we work we have taken advantage of these known periodicities to develop models that predict antiviral peptide sequences. In developing the presented models, we first generated property factors by applying principal component analysis (PCA) to the AAindex dataset of 544 amino acid properties. We next converted peptide sequences into physicochemical vectors using 18 property factors resulting from the PCA. Fourier transforms were applied to the property factor vectors to measure the amplitude of the physicochemical oscillations, which served as the features to train our SVM models. To train and test the developed models we have used a publicly available database of antiviral peptides (http://crdd.osdd.net/servers/avppred/), and we have used cross-validation to train and tune models based on multiple training and testing sets. To further understand the physicochemical properties of antiviral peptides we have also applied a previously developed feature selection algorithm. Future work will be aimed at computationally designing novel antiviral therapeutics based on the developed machine learning models.
Collapse
|
4
|
Gartner MJ, Gorry PR, Tumpach C, Zhou J, Dantanarayana A, Chang JJ, Angelovich TA, Ellenberg P, Laumaea AE, Nonyane M, Moore PL, Lewin SR, Churchill MJ, Flynn JK, Roche M. Longitudinal analysis of subtype C envelope tropism for memory CD4 + T cell subsets over the first 3 years of untreated HIV-1 infection. Retrovirology 2020; 17:24. [PMID: 32762760 PMCID: PMC7409430 DOI: 10.1186/s12977-020-00532-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Accepted: 07/27/2020] [Indexed: 12/14/2022] Open
Abstract
Background HIV-1 infects a wide range of CD4+ T cells with different phenotypic properties and differing expression levels of entry coreceptors. We sought to determine the viral tropism of subtype C (C-HIV) Envelope (Env) clones for different CD4+ T cell subsets and whether tropism changes during acute to chronic disease progression. HIV-1 envs were amplified from the plasma of five C-HIV infected women from three untreated time points; less than 2 months, 1-year and 3-years post-infection. Pseudoviruses were generated from Env clones, phenotyped for coreceptor usage and CD4+ T cell subset tropism was measured by flow cytometry. Results A total of 50 C-HIV envs were cloned and screened for functionality in pseudovirus infection assays. Phylogenetic and variable region characteristic analysis demonstrated evolution in envs between time points. We found 45 pseudoviruses were functional and all used CCR5 to mediate entry into NP2/CD4/CCR5 cells. In vitro infection assays showed transitional memory (TM) and effector memory (EM) CD4+ T cells were more frequently infected (median: 46% and 25% of total infected CD4+ T cells respectively) than naïve, stem cell memory, central memory and terminally differentiated cells. This was not due to these subsets contributing a higher proportion of the CD4+ T cell pool, rather these subsets were more susceptible to infection (median: 5.38% EM and 2.15% TM cells infected), consistent with heightened CCR5 expression on EM and TM cells. No inter- or intra-participant changes in CD4+ T cell subset tropism were observed across the three-time points. Conclusions CD4+ T cell subsets that express more CCR5 were more susceptible to infection with C-HIV Envs, suggesting that these may be the major cellular targets during the first 3 years of infection. Moreover, we found that viral tropism for different CD4+ T cell subsets in vitro did not change between Envs cloned from acute to chronic disease stages. Finally, central memory, naïve and stem cell memory CD4+ T cell subsets were susceptible to infection, albeit inefficiently by Envs from all time-points, suggesting that direct infection of these cells may help establish the latent reservoir early in infection.
Collapse
Affiliation(s)
- Matthew J Gartner
- School of Health and Biomedical Sciences, RMIT University, Bundoora, Melbourne, VIC, Australia.,The Peter Doherty Institute for Infection and Immunity, University of Melbourne and Royal Melbourne Hospital, Melbourne, VIC, Australia
| | - Paul R Gorry
- School of Health and Biomedical Sciences, RMIT University, Bundoora, Melbourne, VIC, Australia
| | - Carolin Tumpach
- The Peter Doherty Institute for Infection and Immunity, University of Melbourne and Royal Melbourne Hospital, Melbourne, VIC, Australia
| | - Jingling Zhou
- School of Health and Biomedical Sciences, RMIT University, Bundoora, Melbourne, VIC, Australia
| | - Ashanti Dantanarayana
- The Peter Doherty Institute for Infection and Immunity, University of Melbourne and Royal Melbourne Hospital, Melbourne, VIC, Australia
| | - J Judy Chang
- The Peter Doherty Institute for Infection and Immunity, University of Melbourne and Royal Melbourne Hospital, Melbourne, VIC, Australia
| | - Thomas A Angelovich
- School of Health and Biomedical Sciences, RMIT University, Bundoora, Melbourne, VIC, Australia.,Life Sciences, Burnet Institute, Melbourne, VIC, Australia
| | - Paula Ellenberg
- The Peter Doherty Institute for Infection and Immunity, University of Melbourne and Royal Melbourne Hospital, Melbourne, VIC, Australia
| | - Annemarie E Laumaea
- School of Health and Biomedical Sciences, RMIT University, Bundoora, Melbourne, VIC, Australia.,Département de Microbiologie, Infectiologie et Immunologie, Université de Montréal, Montreal, QC, Canada
| | - Molati Nonyane
- Centre for HIV and STIs, National Institute for Communicable Diseases (NICD) of the National Health Laboratory Service (NHLS), Johannesburg, South Africa
| | - Penny L Moore
- Centre for HIV and STIs, National Institute for Communicable Diseases (NICD) of the National Health Laboratory Service (NHLS), Johannesburg, South Africa.,Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa.,Centre for the AIDS Programme of Research in South Africa (CAPRISA), University of KwaZulu-Natal, Durban, South Africa
| | - Sharon R Lewin
- The Peter Doherty Institute for Infection and Immunity, University of Melbourne and Royal Melbourne Hospital, Melbourne, VIC, Australia.,Department of Infectious Diseases, Monash University and Alfred Hospital, Melbourne, Australia
| | - Melissa J Churchill
- School of Health and Biomedical Sciences, RMIT University, Bundoora, Melbourne, VIC, Australia
| | - Jacqueline K Flynn
- School of Health and Biomedical Sciences, RMIT University, Bundoora, Melbourne, VIC, Australia. .,The Peter Doherty Institute for Infection and Immunity, University of Melbourne and Royal Melbourne Hospital, Melbourne, VIC, Australia. .,School of Clinical Sciences, Monash University, Melbourne, VIC, Australia.
| | - Michael Roche
- School of Health and Biomedical Sciences, RMIT University, Bundoora, Melbourne, VIC, Australia. .,The Peter Doherty Institute for Infection and Immunity, University of Melbourne and Royal Melbourne Hospital, Melbourne, VIC, Australia.
| |
Collapse
|
5
|
Distefano M, Lanzarotti E, Fernández MF, Mangano A, Martí M, Aulicino P. Identification of novel molecular determinants of co-receptor usage in HIV-1 subtype F V3 envelope sequences. Sci Rep 2020; 10:12583. [PMID: 32724045 PMCID: PMC7387458 DOI: 10.1038/s41598-020-69408-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2019] [Accepted: 06/30/2020] [Indexed: 02/05/2023] Open
Abstract
HIV-1 determinants of coreceptor usage within the gp120 V3 loop have been broadly studied over the past years. This information has led to the development of state-of the-art bioinformatic tools that are useful to predict co-receptor usage based on the V3 loop sequence mainly of subtypes B, C and A. However, these methods show a poor performance for subtype F V3 loops, which are found in an increasing number of HIV-1 strains worldwide. In the present work we investigated determinants of viral tropisms in the understudied subtype F by looking at genotypic and structural information of coreceptor:V3 loop interactions in a novel group of 40 subtype F V3 loops obtained from HIV-1 strains phenotypically characterized either as syncytium inducing or non-syncytium inducing by the MT-2 assay. We provide novel information about estimated interactions energies between a set of V3 loops with known tropism in subtype F, that allowed us to improve predictions of the coreceptor usage for this subtype. Understanding genetic and structural features underlying HIV coreceptor usage across different subtypes is relevant for the rational design of preventive and therapeutic strategies aimed at limiting the HIV-1 epidemic worldwide.
Collapse
Affiliation(s)
- Maximiliano Distefano
- Laboratorio de Biología Celular Y Retrovirus- CONICET, Hospital de Pediatría "J.P. Garrahan", Ciudad Autónoma de Buenos Aires, Argentina
| | - Esteban Lanzarotti
- Departamento de Computación, Facultad de Ciencias Exactas Y Naturales, Universidad de Buenos Aires, Ciudad Autónoma de Buenos Aires, Argentina.,Departamento de Química Biológica, Facultad de Ciencias Exactas Y Naturales, Universidad de Buenos Aires, Ciudad Universitaria, Ciudad Autónoma de Buenos Aires, Argentina
| | - María Florencia Fernández
- Laboratorio de Biología Celular Y Retrovirus- CONICET, Hospital de Pediatría "J.P. Garrahan", Ciudad Autónoma de Buenos Aires, Argentina
| | - Andrea Mangano
- Laboratorio de Biología Celular Y Retrovirus- CONICET, Hospital de Pediatría "J.P. Garrahan", Ciudad Autónoma de Buenos Aires, Argentina
| | - Marcelo Martí
- Departamento de Química Biológica, Facultad de Ciencias Exactas Y Naturales, Universidad de Buenos Aires, Ciudad Universitaria, Ciudad Autónoma de Buenos Aires, Argentina
| | - Paula Aulicino
- Laboratorio de Biología Celular Y Retrovirus- CONICET, Hospital de Pediatría "J.P. Garrahan", Ciudad Autónoma de Buenos Aires, Argentina.
| |
Collapse
|
6
|
HIV-1 Coreceptor Usage and Variable Loop Contact Impact V3 Loop Broadly Neutralizing Antibody Susceptibility. J Virol 2020; 94:JVI.01604-19. [PMID: 31694950 DOI: 10.1128/jvi.01604-19] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2019] [Accepted: 10/22/2019] [Indexed: 12/24/2022] Open
Abstract
In clinical trials, HIV-1 broadly neutralizing antibodies (bnAbs) effectively lower plasma viremia and delay virus reemergence. The presence of less neutralization-susceptible strains prior to treatment decreases the efficacy of these antibody-based treatments, but neutralization sensitivity often cannot be predicted by sequence analysis alone. We found that phenotypically confirmed CXCR4-utilizing strains are less neutralization sensitive, especially to variable loop 3 (V3 loop)-directed bnAbs, than exclusively CCR5-utilizing strains in some, but not all, cases. Homology modeling suggested that the primary V3 loop bnAb epitope is equally accessible among CCR5- and CXCR4-using strains, although variants that exclusively use CXCR4 have V3 loop protrusions that interfere with CCR5 receptor interactions. Homology modeling also showed that among some, but not all, envelopes with decreased neutralization sensitivity, V1 loop orientation interfered with V3 loop-directed bnAb binding. Thus, there are likely different structural reasons for the coreceptor usage restriction and the different bnAb susceptibilities. Importantly, we show that individuals harboring envelopes with higher likelihood of using CXCR4 or greater predicted V1 loop interference have faster virus rebound and a lower maximum decrease in plasma viremia, respectively, after treatment with a V3 loop bnAb. Knowledge of receptor usage and homology models may be useful in developing future algorithms that predict treatment efficacy with V3 loop bnAbs.IMPORTANCE The efficacy of HIV-1 broadly neutralizing antibody (bnAb) therapies may be compromised by the preexistence of less susceptible variants. Sequence-based methods are needed to predict pretreatment variants' neutralization sensitivities. HIV-1 strains that exclusively use the CXCR4 receptor rather than the CCR5 receptor are less neutralization susceptible, especially to variable loop 3 (V3 loop) bnAbs in some, but not all, instances. While the inability to utilize the CCR5 receptor maps to a predicted protrusion in the envelope V3 loop, this viral determinant does not directly influence V3 loop bnAb sensitivity. Homology modeling predicts that contact between the envelope V1 loop and the antibody impacts V3 loop bnAb susceptibility in some cases. Among pretreatment envelopes, increased probability of using CXCR4 and greater predicted V1 interference are associated with faster virus rebound and a smaller decrease in the plasma virus level, respectively, after V3 loop bnAb treatment. Receptor usage information and homology models may be useful for predicting V3 loop bnAb therapy efficacy.
Collapse
|
7
|
Onel M, Beykal B, Ferguson K, Chiu WA, McDonald TJ, Zhou L, House JS, Wright FA, Sheen DA, Rusyn I, Pistikopoulos EN. Grouping of complex substances using analytical chemistry data: A framework for quantitative evaluation and visualization. PLoS One 2019; 14:e0223517. [PMID: 31600275 PMCID: PMC6786635 DOI: 10.1371/journal.pone.0223517] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2019] [Accepted: 09/23/2019] [Indexed: 02/01/2023] Open
Abstract
A detailed characterization of the chemical composition of complex substances, such as products of petroleum refining and environmental mixtures, is greatly needed in exposure assessment and manufacturing. The inherent complexity and variability in the composition of complex substances obfuscate the choices for their detailed analytical characterization. Yet, in lieu of exact chemical composition of complex substances, evaluation of the degree of similarity is a sensible path toward decision-making in environmental health regulations. Grouping of similar complex substances is a challenge that can be addressed via advanced analytical methods and streamlined data analysis and visualization techniques. Here, we propose a framework with unsupervised and supervised analyses to optimally group complex substances based on their analytical features. We test two data sets of complex oil-derived substances. The first data set is from gas chromatography-mass spectrometry (GC-MS) analysis of 20 Standard Reference Materials representing crude oils and oil refining products. The second data set consists of 15 samples of various gas oils analyzed using three analytical techniques: GC-MS, GC×GC-flame ionization detection (FID), and ion mobility spectrometry-mass spectrometry (IM-MS). We use hierarchical clustering using Pearson correlation as a similarity metric for the unsupervised analysis and build classification models using the Random Forest algorithm for the supervised analysis. We present a quantitative comparative assessment of clustering results via Fowlkes-Mallows index, and classification results via model accuracies in predicting the group of an unknown complex substance. We demonstrate the effect of (i) different grouping methodologies, (ii) data set size, and (iii) dimensionality reduction on the grouping quality, and (iv) different analytical techniques on the characterization of the complex substances. While the complexity and variability in chemical composition are an inherent feature of complex substances, we demonstrate how the choices of the data analysis and visualization methods can impact the communication of their characteristics to delineate sufficient similarity.
Collapse
Affiliation(s)
- Melis Onel
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX, United States of America
- Texas A&M Energy Institute, Texas A&M University, College Station, TX, United States of America
| | - Burcu Beykal
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX, United States of America
- Texas A&M Energy Institute, Texas A&M University, College Station, TX, United States of America
| | - Kyle Ferguson
- Department of Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, United States of America
| | - Weihsueh A. Chiu
- Department of Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, United States of America
| | - Thomas J. McDonald
- Department of Environmental and Occupational Health, Texas A&M University, College Station, TX, United States of America
| | - Lan Zhou
- Department of Statistics, Texas A&M University, College Station, TX, United States of America
| | - John S. House
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC, United States of America
| | - Fred A. Wright
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC, United States of America
- Departments of Statistics and Biological Sciences, North Carolina State University, Raleigh, NC, United States of America
| | - David A. Sheen
- Chemical Sciences Division, National Institute of Standards and Technology, Gaithersburg, MD, United States of America
| | - Ivan Rusyn
- Department of Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, United States of America
| | - Efstratios N. Pistikopoulos
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX, United States of America
- Texas A&M Energy Institute, Texas A&M University, College Station, TX, United States of America
| |
Collapse
|
8
|
Affiliation(s)
- Jianyuan Zhai
- School of Chemical & Biomolecular Engineering Georgia Institute of Technology Atlanta Georgia
| | - Fani Boukouvala
- School of Chemical & Biomolecular Engineering Georgia Institute of Technology Atlanta Georgia
| |
Collapse
|
9
|
Onel M, Kieslich CA, Pistikopoulos EN. A Nonlinear Support Vector Machine-Based Feature Selection Approach for Fault Detection and Diagnosis: Application to the Tennessee Eastman Process. AIChE J 2019; 65:992-1005. [PMID: 32377021 DOI: 10.1002/aic.16497] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
In this article, we present (1) a feature selection algorithm based on nonlinear support vector machine (SVM) for fault detection and diagnosis in continuous processes and (2) results for the Tennessee Eastman benchmark process. The presented feature selection algorithm is derived from the sensitivity analysis of the dual C-SVM objective function. This enables simultaneous modeling and feature selection paving the way for simultaneous fault detection and diagnosis, where feature ranking guides fault diagnosis. We train fault-specific two-class SVM models to detect faulty operations, while using the feature selection algorithm to improve the accuracy and perform the fault diagnosis. Our results show that the developed SVM models outperform the available ones in the literature both in terms of detection accuracy and latency. Moreover, it is shown that the loss of information is minimized with the use of feature selection techniques compared to feature extraction techniques such as principal component analysis (PCA). This further facilitates a more accurate interpretation of the results.
Collapse
Affiliation(s)
- Melis Onel
- Artie McFerrin Dept. of Chemical Engineering Texas A&M University College Station, Texas 77843
- Texas A&M Energy Institute Texas A&M University College Station, Texas 77843
| | - Chris A. Kieslich
- Artie McFerrin Dept. of Chemical Engineering Texas A&M University College Station, Texas 77843
- Texas A&M Energy Institute Texas A&M University College Station, Texas 77843
- Coulter Dept. of Biomedical Engineering Georgia Institute of Technology Atlanta Georgia
| | - Efstratios N. Pistikopoulos
- Artie McFerrin Dept. of Chemical Engineering Texas A&M University College Station, Texas 77843
- Texas A&M Energy Institute Texas A&M University College Station, Texas 77843
| |
Collapse
|
10
|
|
11
|
Onel M, Kieslich CA, Guzman YA, Pistikopoulos EN. Simultaneous Fault Detection and Identification in Continuous Processes via nonlinear Support Vector Machine based Feature Selection. INTERNATIONAL SYMPOSIUM ON PROCESS SYSTEMS ENGINEERING 2018; 44:2077-2082. [PMID: 30534633 DOI: 10.1016/b978-0-444-64241-7.50341-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Rapid detection and identification of process faults in industrial applications is crucial to sustain a safe and profitable operation. Today, the advances in sensor technologies have facilitated large amounts of chemical process data collection in real time which subsequently broadened the use of data-driven process monitoring techniques via machine learning and multivariate statistical analysis. One of the well-known machine learning techniques is Support Vector Machines (SVM) which allows the use of high dimensional feature sets for learning problems such as classification and regression. In this paper, we present the application of a novel nonlinear (kernel-dependent) SVM-based feature selection algorithm to process monitoring and fault detection of continuous processes. The developed methodology is derived from sensitivity analysis of the dual SVM objective and utilizes existing and novel greedy algorithms to rank features that also guides fault diagnosis. Specifically, we train fault-specific two-class SVM models to detect faulty operations, while using the feature selection algorithm to improve the accuracy of the fault detection models and perform fault diagnosis. We present results for the Tennessee Eastman process as a case study and compare our approach to existing approaches for fault detection, diagnosis and identification.
Collapse
Affiliation(s)
- Melis Onel
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX, 77843, USA.,Texas A&M Energy Institute, Texas A&M University, College Station, TX, 77843, USA
| | - Chris A Kieslich
- Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA
| | - Yannis A Guzman
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX, 77843, USA.,Texas A&M Energy Institute, Texas A&M University, College Station, TX, 77843, USA.,Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ, 08544, USA
| | - Efstratios N Pistikopoulos
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX, 77843, USA.,Texas A&M Energy Institute, Texas A&M University, College Station, TX, 77843, USA
| |
Collapse
|
12
|
Onel M, Kieslich CA, Guzman YA, Floudas CA, Pistikopoulos EN. Reprint of: Big data approach to batch process monitoring: Simultaneous fault detection and diagnosis using nonlinear support vector machine-based feature selection. Comput Chem Eng 2018. [DOI: 10.1016/j.compchemeng.2018.10.016] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
13
|
Model order reduction of nonlinear parabolic PDE systems with moving boundaries using sparse proper orthogonal decomposition: Application to hydraulic fracturing. Comput Chem Eng 2018. [DOI: 10.1016/j.compchemeng.2018.02.004] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
14
|
Onel M, Kieslich CA, Guzman YA, Floudas CA, Pistikopoulos EN. Big Data Approach to Batch Process Monitoring: Simultaneous Fault Detection and Diagnosis Using Nonlinear Support Vector Machine-based Feature Selection. Comput Chem Eng 2018; 115:46-63. [PMID: 30386002 DOI: 10.1016/j.compchemeng.2018.03.025] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Abstract
This paper presents a novel data-driven framework for process monitoring in batch processes, a critical task in industry to attain a safe operability and minimize loss of productivity and profit. We exploit high dimensional process data with nonlinear Support Vector Machine-based feature selection algorithm, where we aim to retrieve the most informative process measurements for accurate and simultaneous fault detection and diagnosis. The proposed framework is applied to an extensive benchmark dataset which includes process data describing 22,200 batches with 15 faults. We train fault and time-specific models on the prealigned batch data trajectories via three distinct time horizon approaches: one-step rolling, two-step rolling, and evolving which varies the amount of data incorporation during modeling. The results show that two-step rolling and evolving time horizon approaches perform superior to the other. Regardless of the approach, proposed framework provides a promising decision support tool for online simultaneous fault detection and diagnosis for batch processes.
Collapse
Affiliation(s)
- Melis Onel
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX 77843, USA.,Texas A&M Energy Institute, Texas A&M University, College Station, TX 77843, USA
| | - Chris A Kieslich
- Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA.,Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX 77843, USA.,Texas A&M Energy Institute, Texas A&M University, College Station, TX 77843, USA
| | - Yannis A Guzman
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544, USA.,Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX 77843, USA.,Texas A&M Energy Institute, Texas A&M University, College Station, TX 77843, USA
| | - Christodoulos A Floudas
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX 77843, USA.,Texas A&M Energy Institute, Texas A&M University, College Station, TX 77843, USA
| | - Efstratios N Pistikopoulos
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX 77843, USA.,Texas A&M Energy Institute, Texas A&M University, College Station, TX 77843, USA
| |
Collapse
|
15
|
Stylianou A, Gkretsi V, Patrickios CS, Stylianopoulos T. Exploring the Nano-Surface of Collagenous and Other Fibrotic Tissues with AFM. Methods Mol Biol 2017; 1627:453-489. [PMID: 28836219 DOI: 10.1007/978-1-4939-7113-8_29] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Atomic force microscope (AFM) is a powerful and invaluable tool for imaging and probing the mechanical properties of biological samples at the nanometric scale. The importance of nano-scale characterization and nanomechanics of soft biological tissues is becoming widely appreciated, and AFM offers unique advantages in this direction. In this chapter, we describe the procedure to collect data sets (imaging and mechanical properties measurement) of collagen gels and tumor tissues. We provide step-by-step instructions throughout the procedure, from sample preparation to cantilever calibration, data acquisition, analysis, and visualization, using two commercial AFM systems (PicoPlus and Cypher ES) and software that accompanied the AFM systems and/or are freeware available (WSxM, AtomicJ). Our protocols are written specifically for these two systems and the mentioned software; however, most of the general concepts can be readily translated to other AFM systems and software.
Collapse
Affiliation(s)
- Andreas Stylianou
- Cancer Biophysics Laboratory, Department of Mechanical and Manufacturing Engineering, University of Cyprus, Nicosia, Cyprus, Greece.
| | - Vasiliki Gkretsi
- Cancer Biophysics Laboratory, Department of Mechanical and Manufacturing Engineering, University of Cyprus, Nicosia, Cyprus, Greece
| | | | - Triantafyllos Stylianopoulos
- Cancer Biophysics Laboratory, Department of Mechanical and Manufacturing Engineering, University of Cyprus, Nicosia, Cyprus, Greece.
| |
Collapse
|
16
|
Orr AA, Wördehoff MM, Hoyer W, Tamamis P. Uncovering the Binding and Specificity of β-Wrapins for Amyloid-β and α-Synuclein. J Phys Chem B 2016; 120:12781-12794. [DOI: 10.1021/acs.jpcb.6b08485] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Affiliation(s)
- Asuka A. Orr
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, Texas 77843-3122, United States
| | - Michael M. Wördehoff
- Institut
für Physikalische Biologie, Heinrich-Heine-Universität Düsseldorf, 40204 Düsseldorf, Germany
| | - Wolfgang Hoyer
- Institut
für Physikalische Biologie, Heinrich-Heine-Universität Düsseldorf, 40204 Düsseldorf, Germany
- Institute
of Structural Biochemistry (ICS-6), Research Centre Jülich, 52425 Jülich, Germany
| | - Phanourios Tamamis
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, Texas 77843-3122, United States
| |
Collapse
|
17
|
Kieslich CA, Smadbeck J, Khoury GA, Floudas CA. conSSert: Consensus SVM Model for Accurate Prediction of Ordered Secondary Structure. J Chem Inf Model 2016; 56:455-61. [DOI: 10.1021/acs.jcim.5b00566] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
| | - James Smadbeck
- Department
of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08544, United States
| | - George A. Khoury
- Department
of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08544, United States
| | | |
Collapse
|