1
|
Rosenberg MC, Proctor JL, Steele KM. Quantifying changes in individual-specific template-based representations of center-of-mass dynamics during walking with ankle exoskeletons using Hybrid-SINDy. Sci Rep 2024; 14:1031. [PMID: 38200078 PMCID: PMC10781730 DOI: 10.1038/s41598-023-50999-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 12/28/2023] [Indexed: 01/12/2024] Open
Abstract
Ankle exoskeletons alter whole-body walking mechanics, energetics, and stability by altering center-of-mass (CoM) motion. Controlling the dynamics governing CoM motion is, therefore, critical for maintaining efficient and stable gait. However, how CoM dynamics change with ankle exoskeletons is unknown, and how to optimally model individual-specific CoM dynamics, especially in individuals with neurological injuries, remains a challenge. Here, we evaluated individual-specific changes in CoM dynamics in unimpaired adults and one individual with post-stroke hemiparesis while walking in shoes-only and with zero-stiffness and high-stiffness passive ankle exoskeletons. To identify optimal sets of physically interpretable mechanisms describing CoM dynamics, termed template signatures, we leveraged hybrid sparse identification of nonlinear dynamics (Hybrid-SINDy), an equation-free data-driven method for inferring sparse hybrid dynamics from a library of candidate functional forms. In unimpaired adults, Hybrid-SINDy automatically identified spring-loaded inverted pendulum-like template signatures, which did not change with exoskeletons (p > 0.16), except for small changes in leg resting length (p < 0.001). Conversely, post-stroke paretic-leg rotary stiffness mechanisms increased by 37-50% with zero-stiffness exoskeletons. While unimpaired CoM dynamics appear robust to passive ankle exoskeletons, how neurological injuries alter exoskeleton impacts on CoM dynamics merits further investigation. Our findings support Hybrid-SINDy's potential to discover mechanisms describing individual-specific CoM dynamics with assistive devices.
Collapse
Affiliation(s)
- Michael C Rosenberg
- Department of Mechanical Engineering, University of Washington, Seattle, USA.
| | - Joshua L Proctor
- Department of Mechanical Engineering, University of Washington, Seattle, USA
- Department of Applied Mathematics, University of Washington, Seattle, USA
| | - Katherine M Steele
- Department of Mechanical Engineering, University of Washington, Seattle, USA
| |
Collapse
|
2
|
Haring M, Grotli EI, Riemer-Sorensen S, Seel K, Hanssen KG. A Levenberg-Marquardt Algorithm for Sparse Identification of Dynamical Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:9323-9336. [PMID: 35316196 DOI: 10.1109/tnnls.2022.3157963] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Low complexity of a system model is essential for its use in real-time applications. However, sparse identification methods commonly have stringent requirements that exclude them from being applied in an industrial setting. In this article, we introduce a flexible method for the sparse identification of dynamical systems described by ordinary differential equations. Our method relieves many of the requirements imposed by other methods that relate to the structure of the model and the dataset, such as fixed sampling rates, full state measurements, and linearity of the model. The Levenberg-Marquardt algorithm is used to solve the identification problem. We show that the Levenberg-Marquardt algorithm can be written in a form that enables parallel computing, which greatly diminishes the time required to solve the identification problem. An efficient backward elimination strategy is presented to construct a lean system model.
Collapse
|
3
|
Goyal P, Benner P. Neural ordinary differential equations with irregular and noisy data. ROYAL SOCIETY OPEN SCIENCE 2023; 10:221475. [PMID: 37476515 PMCID: PMC10354476 DOI: 10.1098/rsos.221475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/04/2022] [Accepted: 06/23/2023] [Indexed: 07/22/2023]
Abstract
Measurement noise is an integral part of collecting data of a physical process. Thus, noise removal is necessary to draw conclusions from these data, and it often becomes essential to construct dynamical models using these data. We discuss a methodology to learn differential equation(s) using noisy and irregularly sampled measurements. In our methodology, the main innovation can be seen in the integration of deep neural networks with the neural ordinary differential equations (ODEs) approach. Precisely, we aim at learning a neural network that provides (approximately) an implicit representation of the data and an additional neural network that models the vector fields of the dependent variables. We combine these two networks by constraints using neural ODEs. The proposed framework to learn a model describing the vector field is highly effective under noisy measurements. The approach can handle scenarios where dependent variables are unavailable at the same temporal grid. Moreover, a particular structure, e.g. second order with respect to time, can easily be incorporated. We demonstrate the effectiveness of the proposed method for learning models using data obtained from various differential equations and present a comparison with the neural ODE method that does not make any special treatment to noise. Additionally, we discuss an ensemble approach to improve the performance of the proposed approach further.
Collapse
Affiliation(s)
- Pawan Goyal
- Max Planck Institute for Dynamics of Complex Technical Systems, Standtorstrasse 1, 39106 Magdeburg, Germany
| | - Peter Benner
- Max Planck Institute for Dynamics of Complex Technical Systems, Standtorstrasse 1, 39106 Magdeburg, Germany
| |
Collapse
|
4
|
Saha E, Ho LST, Tran G. SPADE4: Sparsity and Delay Embedding Based Forecasting of Epidemics. Bull Math Biol 2023; 85:71. [PMID: 37335437 DOI: 10.1007/s11538-023-01174-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2023] [Accepted: 05/27/2023] [Indexed: 06/21/2023]
Abstract
Predicting the evolution of diseases is challenging, especially when the data availability is scarce and incomplete. The most popular tools for modelling and predicting infectious disease epidemics are compartmental models. They stratify the population into compartments according to health status and model the dynamics of these compartments using dynamical systems. However, these predefined systems may not capture the true dynamics of the epidemic due to the complexity of the disease transmission and human interactions. In order to overcome this drawback, we propose Sparsity and Delay Embedding based Forecasting (SPADE4) for predicting epidemics. SPADE4 predicts the future trajectory of an observable variable without the knowledge of the other variables or the underlying system. We use random features model with sparse regression to handle the data scarcity issue and employ Takens' delay embedding theorem to capture the nature of the underlying system from the observed variable. We show that our approach outperforms compartmental models when applied to both simulated and real data.
Collapse
Affiliation(s)
- Esha Saha
- Department of Applied Mathematics, University of Waterloo, Waterloo, Canada
| | - Lam Si Tung Ho
- Department of Mathematics and Statistics, Dalhousie University, Halifax, NS, Canada.
| | - Giang Tran
- Department of Applied Mathematics, University of Waterloo, Waterloo, Canada
| |
Collapse
|
5
|
Prokop B, Gelens L, Pelz PF, Friesen J. Challenges in identifying simple pattern-forming mechanisms in the development of settlements using demographic data. Phys Rev E 2023; 107:064305. [PMID: 37464706 DOI: 10.1103/physreve.107.064305] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Accepted: 05/09/2023] [Indexed: 07/20/2023]
Abstract
The rapid increase of population and settlement structures in the Global South during recent decades has motivated the development of suitable models to describe their formation and evolution. Such settlement formation has been previously suggested to be dynamically driven by simple pattern-forming mechanisms. Here, we explore the use of a data-driven white-box approach, called SINDy, to discover differential equation models directly from available spatiotemporal demographic data for three representative regions of the Global South. We show that the current resolution and observation time of the available data are insufficient to uncover relevant pattern-forming mechanisms in settlement development. Using synthetic data generated with a generic pattern-forming model, the Allen-Cahn equation, we characterize what the requirements are for spatial and temporal resolution, as well as observation time, to successfully identify possible model system equations. Overall, the study provides a theoretical framework for the analysis of large-scale geographical and/or ecological systems, and it motivates further improvements in optimization approaches and data collection.
Collapse
Affiliation(s)
- Bartosz Prokop
- Laboratory of Dynamics in Biological Systems, Department of Cellular and Molecular Medicine, KU Leuven, Leuven 3000, Belgium
| | - Lendert Gelens
- Laboratory of Dynamics in Biological Systems, Department of Cellular and Molecular Medicine, KU Leuven, Leuven 3000, Belgium
| | - Peter F Pelz
- Chair of Fluid Systems, TU Darmstadt, 64287 Darmstadt, Germany
| | - John Friesen
- Chair of Fluid Systems, TU Darmstadt, 64287 Darmstadt, Germany
| |
Collapse
|
6
|
Yamagami M, Peterson LN, Howell D, Roth E, Burden SA. Effect of Handedness on Learned Controllers and Sensorimotor Noise During Trajectory-Tracking. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:2039-2050. [PMID: 34587106 DOI: 10.1109/tcyb.2021.3110187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
In human-in-the-loop control systems, operators can learn to manually control dynamic machines with either hand using a combination of reactive (feedback) and predictive (feedforward) control. This article studies the effect of handedness on learned controllers and performance during a trajectory-tracking task. In an experiment with 18 participants, subjects perform an assay of unimanual trajectory-tracking and disturbance-rejection tasks through second-order machine dynamics, first with one hand then the other. To assess how hand preference (or dominance) affects learned controllers, we extend, validate, and apply a nonparametric modeling method to estimate the concurrent feedback and feedforward controllers. We find that performance improves because feedback adapts, regardless of the hand used. We do not detect statistically significant differences in performance or learned controllers between hands. Adaptation to reject disturbances arising exogenously (i.e., applied by the experimenter) and endogenously (i.e., generated by sensorimotor noise) explains observed performance improvements.
Collapse
|
7
|
Supekar R, Song B, Hastewell A, Choi GPT, Mietke A, Dunkel J. Learning hydrodynamic equations for active matter from particle simulations and experiments. Proc Natl Acad Sci U S A 2023; 120:e2206994120. [PMID: 36763535 PMCID: PMC9963139 DOI: 10.1073/pnas.2206994120] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Accepted: 01/12/2023] [Indexed: 02/11/2023] Open
Abstract
Recent advances in high-resolution imaging techniques and particle-based simulation methods have enabled the precise microscopic characterization of collective dynamics in various biological and engineered active matter systems. In parallel, data-driven algorithms for learning interpretable continuum models have shown promising potential for the recovery of underlying partial differential equations (PDEs) from continuum simulation data. By contrast, learning macroscopic hydrodynamic equations for active matter directly from experiments or particle simulations remains a major challenge, especially when continuum models are not known a priori or analytic coarse graining fails, as often is the case for nondilute and heterogeneous systems. Here, we present a framework that leverages spectral basis representations and sparse regression algorithms to discover PDE models from microscopic simulation and experimental data, while incorporating the relevant physical symmetries. We illustrate the practical potential through a range of applications, from a chiral active particle model mimicking nonidentical swimming cells to recent microroller experiments and schooling fish. In all these cases, our scheme learns hydrodynamic equations that reproduce the self-organized collective dynamics observed in the simulations and experiments. This inference framework makes it possible to measure a large number of hydrodynamic parameters in parallel and directly from video data.
Collapse
Affiliation(s)
- Rohit Supekar
- Department of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, MA02139
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Boya Song
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Alasdair Hastewell
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Gary P. T. Choi
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Alexander Mietke
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Jörn Dunkel
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA02139
| |
Collapse
|
8
|
A computational framework for physics-informed symbolic regression with straightforward integration of domain knowledge. Sci Rep 2023; 13:1249. [PMID: 36690644 PMCID: PMC9870915 DOI: 10.1038/s41598-023-28328-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Accepted: 01/17/2023] [Indexed: 01/24/2023] Open
Abstract
Discovering a meaningful symbolic expression that explains experimental data is a fundamental challenge in many scientific fields. We present a novel, open-source computational framework called Scientist-Machine Equation Detector (SciMED), which integrates scientific discipline wisdom in a scientist-in-the-loop approach, with state-of-the-art symbolic regression (SR) methods. SciMED combines a wrapper selection method, that is based on a genetic algorithm, with automatic machine learning and two levels of SR methods. We test SciMED on five configurations of a settling sphere, with and without aerodynamic non-linear drag force, and with excessive noise in the measurements. We show that SciMED is sufficiently robust to discover the correct physically meaningful symbolic expressions from the data, and demonstrate how the integration of domain knowledge enhances its performance. Our results indicate better performance on these tasks than the state-of-the-art SR software packages , even in cases where no knowledge is integrated. Moreover, we demonstrate how SciMED can alert the user about possible missing features, unlike the majority of current SR systems.
Collapse
|
9
|
Precision Calorimeter Model Development: Generative Design Approach. Processes (Basel) 2023. [DOI: 10.3390/pr11010152] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
In a wide range of applications, heating or cooling systems provide not only temperature changes, but also small temperature gradients in a sample or industrial facility. Although a conventional proportional-integral-derivative (PID) controller usually solves the problem, it is not optimal because it does not use information about the main sources of change—the current power of the heater or cooler. The quality of control can be significantly improved by including a model of thermal processes in the control algorithm. Although the temperature distribution in the device can be calculated from a full-fledged 3D model based on partial differential equations, this approach has at least two drawbacks: the presence of many difficult-to-determine parameters and excessive complexity for control tasks. The development of a simplified mathematical model, free from these shortcomings, makes it possible to significantly improve the quality of control. The development of such a model using generative design techniques is considered as an example for a precision adiabatic calorimeter designed to measure the specific heat capacity of solids. The proposed approach, which preserves the physical meaning of the equations, allows for not only significantly improving the consistency between the calculation and experimental data, but also improving the understanding of real processes in the installation.
Collapse
|
10
|
Jiang F, Du L, Yang F, Deng ZC. Regularized least absolute deviation-based sparse identification of dynamical systems. CHAOS (WOODBURY, N.Y.) 2023; 33:013103. [PMID: 36725653 DOI: 10.1063/5.0130526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Accepted: 12/06/2022] [Indexed: 06/18/2023]
Abstract
This work develops a regularized least absolute deviation-based sparse identification of dynamics (RLAD-SID) method to address outlier problems in the classical metric-based loss function and the sparsity constraint framework. Our method uses absolute derivation loss as a substitute of Euclidean loss. Moreover, a corresponding computationally efficient optimization algorithm is derived on the basis of the alternating direction method of multipliers due to the non-smoothness of both the new proposed loss function and the regularization term. Numerical experiments are performed to evaluate the effectiveness of RLAD-SID using several exemplary nonlinear dynamical systems, such as the van der Pol equation, the Lorenz system, and the 1D discrete logistic map. Furthermore, detailed numerical comparisons are provided with other existing methods in metric-based sparse regression. Numerical results demonstrate that (1) RLAD-SID shows significant robustness toward a large outlier and (2) RLAD-SID can be seen as a particular metric-based sparse regression strategy that exhibits the effectiveness of the metric-based sparse regression framework for solving outlier problems in a dynamical system identification.
Collapse
Affiliation(s)
- Feng Jiang
- MIIT Key Laboratory of Dynamics and Control of Complex Systems, Northwestern Polytechnical University, Xi'an 710072, China
| | - Lin Du
- MIIT Key Laboratory of Dynamics and Control of Complex Systems, Northwestern Polytechnical University, Xi'an 710072, China
| | - Fan Yang
- MIIT Key Laboratory of Dynamics and Control of Complex Systems, Northwestern Polytechnical University, Xi'an 710072, China
| | - Zi-Chen Deng
- MIIT Key Laboratory of Dynamics and Control of Complex Systems, Northwestern Polytechnical University, Xi'an 710072, China
| |
Collapse
|
11
|
Johnston ST, Faria M. Equation learning to identify nano-engineered particle-cell interactions: an interpretable machine learning approach. NANOSCALE 2022; 14:16502-16515. [PMID: 36314284 DOI: 10.1039/d2nr04668g] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Designing nano-engineered particles capable of the delivery of therapeutic and diagnostic agents to a specific target remains a significant challenge. Understanding how interactions between particles and cells are impacted by the physicochemical properties of the particle will help inform rational design choices. Mathematical and computational techniques allow for details regarding particle-cell interactions to be isolated from the interwoven set of biological, chemical, and physical phenomena involved in the particle delivery process. Here we present a machine learning framework capable of elucidating particle-cell interactions from experimental data. This framework employs a data-driven modelling approach, augmented by established biological knowledge. Crucially, the model of particle-cell interactions learned by the framework can be interpreted and analysed, in contrast to the 'black box' models inherent to other machine learning approaches. We apply the framework to association data for thirty different particle-cell pairs. This library of data contains both adherent and suspension cell lines, as well as a diverse collection of particles. We consider hyperbranched polymer and poly(methacrylic acid) particles, from 6 nm to 1032 nm in diameter, with small molecule, monoclonal antibody, and peptide surface functionalisations. Despite the diverse nature of the experiments, the learned models of particle-cell interactions for each particle-cell pair are remarkably consistent: out of 2048 potential models, only four unique models are learned. The models reveal that nonlinear saturation effects are a key feature governing particle-cell interactions. Further, the framework provides robust estimates of particle performance, which facilitates quantitative evaluation of particle design choices.
Collapse
Affiliation(s)
- Stuart T Johnston
- School of Mathematics and Statistics, The University of Melbourne, Victoria, Australia.
| | - Matthew Faria
- Department of Biomedical Engineering, The University of Melbourne, Victoria, Australia
| |
Collapse
|
12
|
Cárdenas SD, Reznik CJ, Ranaweera R, Song F, Chung CH, Fertig EJ, Gevertz JL. Model-informed experimental design recommendations for distinguishing intrinsic and acquired targeted therapeutic resistance in head and neck cancer. NPJ Syst Biol Appl 2022; 8:32. [PMID: 36075912 PMCID: PMC9458753 DOI: 10.1038/s41540-022-00244-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Accepted: 08/05/2022] [Indexed: 11/09/2022] Open
Abstract
The promise of precision medicine has been limited by the pervasive resistance to many targeted therapies for cancer. Inferring the timing (i.e., pre-existing or acquired) and mechanism (i.e., drug-induced) of such resistance is crucial for designing effective new therapeutics. This paper studies cetuximab resistance in head and neck squamous cell carcinoma (HNSCC) using tumor volume data obtained from patient-derived tumor xenografts. We ask if resistance mechanisms can be determined from this data alone, and if not, what data would be needed to deduce the underlying mode(s) of resistance. To answer these questions, we propose a family of mathematical models, with each member of the family assuming a different timing and mechanism of resistance. We present a method for fitting these models to individual volumetric data, and utilize model selection and parameter sensitivity analyses to ask: which member(s) of the family of models best describes HNSCC response to cetuximab, and what does that tell us about the timing and mechanisms driving resistance? We find that along with time-course volumetric data to a single dose of cetuximab, the initial resistance fraction and, in some instances, dose escalation volumetric data are required to distinguish among the family of models and thereby infer the mechanisms of resistance. These findings can inform future experimental design so that we can best leverage the synergy of wet laboratory experimentation and mathematical modeling in the study of novel targeted cancer therapeutics.
Collapse
Affiliation(s)
- Santiago D Cárdenas
- Department of Mathematics and Statistics, The College of New Jersey, Ewing, NJ, USA
| | - Constance J Reznik
- Department of Mathematics and Statistics, The College of New Jersey, Ewing, NJ, USA.,Datacor, Inc., Florham Park, NJ, USA
| | - Ruchira Ranaweera
- Department of Head and Neck-Endocrine Oncology, Moffitt Cancer Center, Tampa, FL, USA
| | - Feifei Song
- Department of Head and Neck-Endocrine Oncology, Moffitt Cancer Center, Tampa, FL, USA
| | - Christine H Chung
- Department of Head and Neck-Endocrine Oncology, Moffitt Cancer Center, Tampa, FL, USA
| | - Elana J Fertig
- Convergence Institute, Department of Oncology, Department of Biomedical Engineering, Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD, USA.
| | - Jana L Gevertz
- Department of Mathematics and Statistics, The College of New Jersey, Ewing, NJ, USA.
| |
Collapse
|
13
|
Lau MSY, Becker A, Madden W, Waller LA, Metcalf CJE, Grenfell BT. Comparing and linking machine learning and semi-mechanistic models for the predictability of endemic measles dynamics. PLoS Comput Biol 2022; 18:e1010251. [PMID: 36074763 PMCID: PMC9455846 DOI: 10.1371/journal.pcbi.1010251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Accepted: 08/02/2022] [Indexed: 11/29/2022] Open
Abstract
Measles is one the best-documented and most-mechanistically-studied non-linear infectious disease dynamical systems. However, systematic investigation into the comparative performance of traditional mechanistic models and machine learning approaches in forecasting the transmission dynamics of this pathogen are still rare. Here, we compare one of the most widely used semi-mechanistic models for measles (TSIR) with a commonly used machine learning approach (LASSO), comparing performance and limits in predicting short to long term outbreak trajectories and seasonality for both regular and less regular measles outbreaks in England and Wales (E&W) and the United States. First, our results indicate that the proposed LASSO model can efficiently use data from multiple major cities and achieve similar short-to-medium term forecasting performance to semi-mechanistic models for E&W epidemics. Second, interestingly, the LASSO model also captures annual to biennial bifurcation of measles epidemics in E&W caused by susceptible response to the late 1940s baby boom. LASSO may also outperform TSIR for predicting less-regular dynamics such as those observed in major cities in US between 1932–45. Although both approaches capture short-term forecasts, accuracy suffers for both methods as we attempt longer-term predictions in highly irregular, post-vaccination outbreaks in E&W. Finally, we illustrate that the LASSO model can both qualitatively and quantitatively reconstruct mechanistic assumptions, notably susceptible dynamics, in the TSIR model. Our results characterize the limits of predictability of infectious disease dynamics for strongly immunizing pathogens with both mechanistic and machine learning models, and identify connections between these two approaches. Machine learning techniques in infectious disease modeling have grown in popularity in recent years. However, systematic investigation into the comparative performance of these approaches with traditional mechanistic models are still rare. In this paper, we compare one of the most widely used semi-mechanistic models for measles (TSIR) with a commonly used machine learning approach (LASSO), comparing performance and limits in predicting short to long term outbreaks of measles, one of the best-documented and most-mechanistically-studied non-linear infectious disease dynamical systems. Our results show that in general the LASSO outperform TSIR for predicting less-regular dynamics, and it can achieve similar performance in other scenarios when compared to the TSIR. The LASSO also has the advantages of not requiring explicit demographic data in model training. Finally, we identify connections between these two approaches and show that the LASSO model can both qualitatively and quantitatively reconstruct mechanistic assumptions, notably susceptible dynamics, in the TSIR model.
Collapse
Affiliation(s)
- Max S. Y. Lau
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, United States of America
- * E-mail:
| | - Alex Becker
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, United States of America
| | - Wyatt Madden
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, United States of America
| | - Lance A. Waller
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, United States of America
| | - C. Jessica E. Metcalf
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, United States of America
| | - Bryan T. Grenfell
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, United States of America
| |
Collapse
|
14
|
Naozuka GT, Rocha HL, Silva RS, Almeida RC. SINDy-SA framework: enhancing nonlinear system identification with sensitivity analysis. NONLINEAR DYNAMICS 2022; 110:2589-2609. [PMID: 36060282 PMCID: PMC9424817 DOI: 10.1007/s11071-022-07755-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Accepted: 07/25/2022] [Indexed: 06/15/2023]
Abstract
UNLABELLED Machine learning methods have revolutionized studies in several areas of knowledge, helping to understand and extract information from experimental data. Recently, these data-driven methods have also been used to discover structures of mathematical models. The sparse identification of nonlinear dynamics (SINDy) method has been proposed with the aim of identifying nonlinear dynamical systems, assuming that the equations have only a few important terms that govern the dynamics. By defining a library of possible terms, the SINDy approach solves a sparse regression problem by eliminating terms whose coefficients are smaller than a threshold. However, the choice of this threshold is decisive for the correct identification of the model structure. In this work, we build on the SINDy method by integrating it with a global sensitivity analysis (SA) technique that allows to hierarchize terms according to their importance in relation to the desired quantity of interest, thus circumventing the need to define the SINDy threshold. The proposed SINDy-SA framework also includes the formulation of different experimental settings, recalibration of each identified model, and the use of model selection techniques to select the best and most parsimonious model. We investigate the use of the proposed SINDy-SA framework in a variety of applications. We also compare the results against the original SINDy method. The results demonstrate that the SINDy-SA framework is a promising methodology to accurately identify interpretable data-driven models. SUPPLEMENTARY INFORMATION The online version contains supplementary material available at 10.1007/s11071-022-07755-2.
Collapse
Affiliation(s)
| | - Heber L. Rocha
- Department of Intelligent Systems Engineering, Indiana University, Bloomington, IN USA
| | - Renato S. Silva
- Laboratório Nacional de Computação Científica, Petrópolis, RJ Brazil
| | - Regina C. Almeida
- Laboratório Nacional de Computação Científica, Petrópolis, RJ Brazil
| |
Collapse
|
15
|
Ribera H, Shirman S, Nguyen AV, Mangan NM. Model selection of chaotic systems from data with hidden variables using sparse data assimilation. CHAOS (WOODBURY, N.Y.) 2022; 32:063101. [PMID: 35778121 DOI: 10.1063/5.0066066] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/07/2021] [Accepted: 05/06/2022] [Indexed: 06/15/2023]
Abstract
Many natural systems exhibit chaotic behavior, including the weather, hydrology, neuroscience, and population dynamics. Although many chaotic systems can be described by relatively simple dynamical equations, characterizing these systems can be challenging due to sensitivity to initial conditions and difficulties in differentiating chaotic behavior from noise. Ideally, one wishes to find a parsimonious set of equations that describe a dynamical system. However, model selection is more challenging when only a subset of the variables are experimentally accessible. Manifold learning methods using time-delay embeddings can successfully reconstruct the underlying structure of the system from data with hidden variables, but not the equations. Recent work in sparse-optimization based model selection has enabled model discovery given a library of possible terms, but regression-based methods require measurements of all state variables. We present a method combining variational annealing-a technique previously used for parameter estimation in chaotic systems with hidden variables-with sparse-optimization methods to perform model identification for chaotic systems with unmeasured variables. We applied the method to ground-truth time-series simulated from the classic Lorenz system and experimental data from an electrical circuit with Lorenz-system like behavior. In both cases, we successfully recover the expected equations with two measured and one hidden variable. Application to simulated data from the Colpitts oscillator demonstrates successful model selection of terms within nonlinear functions. We discuss the robustness of our method to varying noise.
Collapse
Affiliation(s)
- H Ribera
- Department of Engineering Sciences and Applied Mathematics, Northwestern University, Evanston, Illinois 60208, USA
| | - S Shirman
- Department of Engineering Sciences and Applied Mathematics, Northwestern University, Evanston, Illinois 60208, USA
| | - A V Nguyen
- Department of Engineering Sciences and Applied Mathematics, Northwestern University, Evanston, Illinois 60208, USA
| | - N M Mangan
- Department of Engineering Sciences and Applied Mathematics, Northwestern University, Evanston, Illinois 60208, USA
| |
Collapse
|
16
|
Mojgani R, Chattopadhyay A, Hassanzadeh P. Discovery of interpretable structural model errors by combining Bayesian sparse regression and data assimilation: A chaotic Kuramoto-Sivashinsky test case. CHAOS (WOODBURY, N.Y.) 2022; 32:061105. [PMID: 35778119 DOI: 10.1063/5.0091282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/13/2022] [Accepted: 06/01/2022] [Indexed: 06/15/2023]
Abstract
Models of many engineering and natural systems are imperfect. The discrepancy between the mathematical representations of a true physical system and its imperfect model is called the model error. These model errors can lead to substantial differences between the numerical solutions of the model and the state of the system, particularly in those involving nonlinear, multi-scale phenomena. Thus, there is increasing interest in reducing model errors, particularly by leveraging the rapidly growing observational data to understand their physics and sources. Here, we introduce a framework named MEDIDA: Model Error Discovery with Interpretability and Data Assimilation. MEDIDA only requires a working numerical solver of the model and a small number of noise-free or noisy sporadic observations of the system. In MEDIDA, first, the model error is estimated from differences between the observed states and model-predicted states (the latter are obtained from a number of one-time-step numerical integrations from the previous observed states). If observations are noisy, a data assimilation technique, such as the ensemble Kalman filter, is employed to provide the analysis state of the system, which is then used to estimate the model error. Finally, an equation-discovery technique, here the relevance vector machine, a sparsity-promoting Bayesian method, is used to identify an interpretable, parsimonious, and closed-form representation of the model error. Using the chaotic Kuramoto-Sivashinsky system as the test case, we demonstrate the excellent performance of MEDIDA in discovering different types of structural/parametric model errors, representing different types of missing physics, using noise-free and noisy observations.
Collapse
Affiliation(s)
- Rambod Mojgani
- Department of Mechanical Engineering, Rice University, Houston, Texas 77005, USA
| | - Ashesh Chattopadhyay
- Department of Mechanical Engineering, Rice University, Houston, Texas 77005, USA
| | - Pedram Hassanzadeh
- Department of Mechanical Engineering, Rice University, Houston, Texas 77005, USA
| |
Collapse
|
17
|
Kaheman K, Brunton SL, Nathan Kutz J. Automatic differentiation to simultaneously identify nonlinear dynamics and extract noise probability distributions from data. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2022. [DOI: 10.1088/2632-2153/ac567a] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Abstract
The sparse identification of nonlinear dynamics (SINDy) is a regression framework for the discovery of parsimonious dynamic models and governing equations from time-series data. As with all system identification methods, noisy measurements compromise the accuracy and robustness of the model discovery procedure. In this work we develop a variant of the SINDy algorithm that integrates automatic differentiation and recent time-stepping constrained motivated by Rudy et al (2019 J. Computat. Phys.
396 483–506) for simultaneously (1) denoising the data, (2) learning and parametrizing the noise probability distribution, and (3) identifying the underlying parsimonious dynamical system responsible for generating the time-series data. Thus within an integrated optimization framework, noise can be separated from signal, resulting in an architecture that is approximately twice as robust to noise as state-of-the-art methods, handling as much as 40% noise on a given time-series signal and explicitly parametrizing the noise probability distribution. We demonstrate this approach on several numerical examples, from Lotka-Volterra models to the spatio-temporal Lorenz 96 model. Further, we show the method can learn a diversity of probability distributions for the measurement noise, including Gaussian, uniform, Gamma, and Rayleigh distributions.
Collapse
|
18
|
Abstract
Information is the resolution of uncertainty and manifests itself as patterns. Although complex, most observable phenomena are not random and instead are associated with deterministic, chaotic systems. The underlying patterns and symmetries expressed from these phenomena determine their information content and compressibility. While some patterns, such as the existence of Fourier modes, are easy to extract, advances in machine learning have enabled more comprehensive methods in feature extraction, most notably in their ability to elicit non-linear relationships. Herein we review methods concerned with the encoding and reconstruction of natural signals and how they might inform the discovery of useful transform bases. Additionally, we illustrate the efficacy of data-driven bases over generic ones in encoding information whilst discussing these developments in the context of “fourth paradigm” metrology. Toward this end, we propose that existing metrological standards and norms may need to be redefined within the context of a data-rich world.
Collapse
|
19
|
Lejarza F, Baldea M. Discovering governing equations via moving horizon learning: the case of reacting systems. AIChE J 2022. [DOI: 10.1002/aic.17567] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Affiliation(s)
- Fernando Lejarza
- McKetta Department of Chemical Engineering The University of Texas at Austin Austin Texas USA
| | - Michael Baldea
- McKetta Department of Chemical Engineering The University of Texas at Austin Austin Texas USA
- Oden Institute for Computational Engineering and Sciences The University of Texas at Austin Austin Texas USA
| |
Collapse
|
20
|
Abdullah F, Wu Z, Christofides PD. Handling noisy data in sparse model identification using subsampling and co-teaching. Comput Chem Eng 2022. [DOI: 10.1016/j.compchemeng.2021.107628] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
21
|
Messenger DA, Bortz DM. WEAK SINDY FOR PARTIAL DIFFERENTIAL EQUATIONS. JOURNAL OF COMPUTATIONAL PHYSICS 2021; 443:110525. [PMID: 34744183 PMCID: PMC8570254 DOI: 10.1016/j.jcp.2021.110525] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Sparse Identification of Nonlinear Dynamics (SINDy) is a method of system discovery that has been shown to successfully recover governing dynamical systems from data [6, 39]. Recently, several groups have independently discovered that the weak formulation provides orders of magnitude better robustness to noise. Here we extend our Weak SINDy (WSINDy) framework introduced in [28] to the setting of partial differential equations (PDEs). The elimination of pointwise derivative approximations via the weak form enables effective machine-precision recovery of model coefficients from noise-free data (i.e. below the tolerance of the simulation scheme) as well as robust identification of PDEs in the large noise regime (with signal-to-noise ratio approaching one in many well-known cases). This is accomplished by discretizing a convolutional weak form of the PDE and exploiting separability of test functions for efficient model identification using the Fast Fourier Transform. The resulting WSINDy algorithm for PDEs has a worst-case computational complexity of O ( N D + 1 log ( N ) ) for datasets with N points in each of D + 1 dimensions. Furthermore, our Fourier-based implementation reveals a connection between robustness to noise and the spectra of test functions, which we utilize in an a priori selection algorithm for test functions. Finally, we introduce a learning algorithm for the threshold in sequential-thresholding least-squares (STLS) that enables model identification from large libraries, and we utilize scale invariance at the continuum level to identify PDEs from poorly-scaled datasets. We demonstrate WSINDy's robustness, speed and accuracy on several challenging PDEs. Code is publicly available on GitHub at https://github.com/MathBioCU/WSINDy_PDE.
Collapse
Affiliation(s)
- Daniel A Messenger
- Department of Applied Mathematics, University of Colorado Boulder, 11 Engineering Dr., Boulder, CO 80309, USA
| | - David M Bortz
- Department of Applied Mathematics, University of Colorado Boulder, 11 Engineering Dr., Boulder, CO 80309, USA
| |
Collapse
|
22
|
The interactive effects of non-alcoholic fatty liver disease and hemoglobin concentration in the first trimester on the development of gestational diabetes mellitus. PLoS One 2021; 16:e0257391. [PMID: 34516586 PMCID: PMC8437282 DOI: 10.1371/journal.pone.0257391] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 08/31/2021] [Indexed: 01/04/2023] Open
Abstract
Gestational diabetes mellitus (GDM) is associated with adverse perinatal and maternal outcomes. Epidemiological studies have reported that non-alcoholic fatty liver disease (NAFLD) and a high hemoglobin (Hb) concentration are risk factors for GDM in the middle trimester. However, no consistent conclusions have been reached, especially in Chinese pregnant women. A case-control study was conducted to better understand the associations between NAFLD and Hb concentration in the first trimester and the risk of GDM and their interactive effects. Multivariable logistic regression analysis and a cross-product term of Hb and steatosis were used to evaluate the associations between first trimester Hb concentration, steatosis, and GDM and their interactive effects. Odds ratios (ORs) and 95% confidence intervals (CIs) were calculated using two-sided statistical tests at an alpha level of 0.05. For the study, 1,017 normal pregnant women, and 343 pregnant women diagnosed with GDM (25.22%) were recruited from the First Hospital of Shanxi Medical University, Shanxi Province, China. NAFLD-associated steatosis was found to be independent risk factors for developing GDM compared with grade 0 steatosis, with ORs of 1.98 (95% CI: 1.35–2.89) and 2.27 (95% CI:1.29–3.96), respectively. Meanwhile, a high Hb concentration was found to be a risk factor for developing GDM compared with the normal Hb concentration (OR = 1.88; 95% CI:1.24–2.83). The risk of developing GDM was more pronounced among pregnant women who had both high-grade steatosis and higher Hb concentrations during their first trimester (OR = 6.24; 95% CI: 1.81–23.66). However, we found no significant interactions between Hb concentration and steatosis grade. In conclusion, our study confirmed that a high Hb concentration and NAFLD-associated steatosis during the first trimester play important roles in predicting the risk of GDM in Chinese women. Future studies are required to verify the interactive effects between NAFLD-associated steatosis and Hb concentration.
Collapse
|
23
|
Benchmarking Optimisation Methods for Model Selection and Parameter Estimation of Nonlinear Systems. VIBRATION 2021. [DOI: 10.3390/vibration4030036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Characterisation and quantification of nonlinearities in the engineering structures include selecting and fitting a good mathematical model to a set of experimental vibration data with significant nonlinear features. These tasks involve solving an optimisation problem where it is difficult to choose a priori the best optimisation technique. This paper presents a systematic comparison of ten optimisation methods used to select the best nonlinear model and estimate its parameters through nonlinear system identification. The model selection framework fits the structure’s equation of motions using time-domain dynamic response data and takes into account couplings due to the presence of the nonlinearities. Three benchmark problems are used to evaluate the performance of two families of optimisation methods: (i) deterministic local searches and (ii) global optimisation metaheuristics. Furthermore, hybrid local–global optimisation methods are examined. All benchmark problems include a free play nonlinearity commonly found in mechanical structures. Multiple performance criteria are considered based on computational efficiency and robustness, that is, finding the best nonlinear model. Results show that hybrid methods, that is, the multi-start strategy with local gradient-based Levenberg–Marquardt method and the particle swarm with Levenberg–Marquardt method, lead to a successful selection of nonlinear models and an accurate estimation of their parameters within acceptable computational times.
Collapse
|
24
|
Jiang YX, Xiong X, Zhang S, Wang JX, Li JC, Du L. Modeling and prediction of the transmission dynamics of COVID-19 based on the SINDy-LM method. NONLINEAR DYNAMICS 2021; 105:2775-2794. [PMID: 34312574 PMCID: PMC8295551 DOI: 10.1007/s11071-021-06707-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Accepted: 07/04/2021] [Indexed: 06/13/2023]
Abstract
The transmission dynamics of COVID-19 is investigated in this study. A SINDy-LM modeling method that can effectively balance model complexity and prediction accuracy is proposed based on data-driven technique. First, the Sparse Identification of Nonlinear Dynamical systems (SINDy) method is used to discover and describe the nonlinear functional relationship between the dynamic terms in the model in accordance with the observation data of the COVID-19 epidemic. Moreover, the Levenberg-Marquardt (LM) algorithm is utilized to optimize the obtained model for improving the accuracy of the SINDy algorithm. Second, the obtained model, which is consistent with the logistic model in mathematical form with small errors and high robustness, is leveraged to review the epidemic situation in China. Otherwise, the evolution of the epidemic in Australia and Egypt is predicted, which demonstrates that this method has universality for constructing the global COVID-19 model. The proposed model is also compared with the extreme learning machine (ELM), which shows that the prediction accuracy of the SINDy-LM method outperforms that of the ELM method and the generated model has higher sparsity.
Collapse
Affiliation(s)
- Yu-Xin Jiang
- School of Mathematics and Statistics, Northwestern Polythechnical University, Xi’an, 710129 China
| | - Xiong Xiong
- School of Mathematics and Statistics, Northwestern Polythechnical University, Xi’an, 710129 China
- MIIT Key Laboratory of Dynamics and Control of Complex Systems, Xi’an, 710129 China
| | - Shuo Zhang
- School of Mathematics and Statistics, Northwestern Polythechnical University, Xi’an, 710129 China
- MIIT Key Laboratory of Dynamics and Control of Complex Systems, Xi’an, 710129 China
| | - Jia-Xiang Wang
- School of Mathematics and Statistics, Northwestern Polythechnical University, Xi’an, 710129 China
| | - Jia-Chun Li
- School of Mathematics and Statistics, Northwestern Polythechnical University, Xi’an, 710129 China
| | - Lin Du
- School of Mathematics and Statistics, Northwestern Polythechnical University, Xi’an, 710129 China
- MIIT Key Laboratory of Dynamics and Control of Complex Systems, Xi’an, 710129 China
| |
Collapse
|
25
|
|
26
|
Robust learning from noisy, incomplete, high-dimensional experimental data via physically constrained symbolic regression. Nat Commun 2021; 12:3219. [PMID: 34050155 PMCID: PMC8163752 DOI: 10.1038/s41467-021-23479-0] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Accepted: 04/30/2021] [Indexed: 12/03/2022] Open
Abstract
Machine learning offers an intriguing alternative to first-principle analysis for discovering new physics from experimental data. However, to date, purely data-driven methods have only proven successful in uncovering physical laws describing simple, low-dimensional systems with low levels of noise. Here we demonstrate that combining a data-driven methodology with some general physical principles enables discovery of a quantitatively accurate model of a non-equilibrium spatially extended system from high-dimensional data that is both noisy and incomplete. We illustrate this using an experimental weakly turbulent fluid flow where only the velocity field is accessible. We also show that this hybrid approach allows reconstruction of the inaccessible variables – the pressure and forcing field driving the flow. Reinbold et al. propose a physics-informed data-driven approach that successfully discovers a dynamical model using high-dimensional, noisy and incomplete experimental data describing a weakly turbulent fluid flow. This approach is relevant to other non-equilibrium spatially-extended systems.
Collapse
|
27
|
Hocharoen L, Noppiboon S, Kitsubun P. Toward QbD Process Understanding on DNA Vaccine Purification Using Design of Experiment. Front Bioeng Biotechnol 2021; 9:657201. [PMID: 34055759 PMCID: PMC8153680 DOI: 10.3389/fbioe.2021.657201] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Accepted: 04/08/2021] [Indexed: 01/13/2023] Open
Abstract
DNA vaccines, the third generation of vaccines, are a promising therapeutic option for many diseases as they offer the customization of their ability on protection and treatment with high stability. The production of DNA vaccines is considered rapid and less complicated compared to others such as mRNA vaccines, viral vaccines, or subunit protein vaccines. However, the main issue for DNA vaccines is how to produce the active DNA, a supercoiled isoform, to comply with the regulations. Our work therefore focuses on gaining a process understanding of the purification step which processes parameters that have impacts on the critical quality attribute (CQA), supercoiled DNA and performance attribute (PA), and step yield. Herein, pVax1/lacZ was used as a model. The process parameters of interest were sample application flow rates and salt concentration at washing step and at elution step in the hydrophobic interaction chromatography (HIC). Using a Design of Experiment (DoE) with central composite face centered (CCF) approach, 14 experiments plus four additional runs at the center points were created. The response data was used to establish regression predictive models and simulation was conducted in 10,000 runs to provide tolerance intervals of these CQA and PA. The approach of this process understanding can be applied for Quality by Design (QbD) on other DNA vaccines and on a larger production scale as well.
Collapse
Affiliation(s)
- Lalintip Hocharoen
- Bioprocess Research and Innovation Centre (BRIC), National Biopharmaceutical Facility (NBF), King Mongkut's University of Technology Thonburi (KMUTT), Bangkok, Thailand
| | - Sarawuth Noppiboon
- Bioprocess Research and Innovation Centre (BRIC), National Biopharmaceutical Facility (NBF), King Mongkut's University of Technology Thonburi (KMUTT), Bangkok, Thailand
| | - Panit Kitsubun
- Biochemical Engineering and System Biology Research Group (IBEG), National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Bangkok, Thailand
| |
Collapse
|
28
|
Nardini JT, Baker RE, Simpson MJ, Flores KB. Learning differential equation models from stochastic agent-based model simulations. J R Soc Interface 2021; 18:20200987. [PMID: 33726540 PMCID: PMC8086865 DOI: 10.1098/rsif.2020.0987] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Accepted: 02/22/2021] [Indexed: 12/15/2022] Open
Abstract
Agent-based models provide a flexible framework that is frequently used for modelling many biological systems, including cell migration, molecular dynamics, ecology and epidemiology. Analysis of the model dynamics can be challenging due to their inherent stochasticity and heavy computational requirements. Common approaches to the analysis of agent-based models include extensive Monte Carlo simulation of the model or the derivation of coarse-grained differential equation models to predict the expected or averaged output from the agent-based model. Both of these approaches have limitations, however, as extensive computation of complex agent-based models may be infeasible, and coarse-grained differential equation models can fail to accurately describe model dynamics in certain parameter regimes. We propose that methods from the equation learning field provide a promising, novel and unifying approach for agent-based model analysis. Equation learning is a recent field of research from data science that aims to infer differential equation models directly from data. We use this tutorial to review how methods from equation learning can be used to learn differential equation models from agent-based model simulations. We demonstrate that this framework is easy to use, requires few model simulations, and accurately predicts model dynamics in parameter regions where coarse-grained differential equation models fail to do so. We highlight these advantages through several case studies involving two agent-based models that are broadly applicable to biological phenomena: a birth-death-migration model commonly used to explore cell biology experiments and a susceptible-infected-recovered model of infectious disease spread.
Collapse
Affiliation(s)
- John T. Nardini
- North Carolina State University, Mathematics, Raleigh, NC, USA
| | - Ruth E. Baker
- Mathematical Institute, University of Oxford, Oxford, UK
| | - Matthew J. Simpson
- School of Mathematical Sciences, Queensland University of Technology, Brisbane 4001, Australia
| | - Kevin B. Flores
- North Carolina State University, Mathematics, Raleigh, NC, USA
| |
Collapse
|
29
|
Hocharoen L, Noppiboon S, Kitsubun P. Process Characterization by Definitive Screening Design Approach on DNA Vaccine Production. Front Bioeng Biotechnol 2020; 8:574809. [PMID: 33178673 PMCID: PMC7593689 DOI: 10.3389/fbioe.2020.574809] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2020] [Accepted: 09/14/2020] [Indexed: 11/16/2022] Open
Abstract
Plasmid DNA is a vital biological tool for molecular cloning and transgene expression of recombinant proteins; however, decades ago, it has become an exceptionally appealing as a potential biopharmaceutical product as genetic immunization for animal and human use. The demand for large-quantity production of DNA vaccines also increases. Thus, we, herein, presented a systematic approach for process characterization of fed-batch Escherichia coli DH5α fermentation producing a porcine DNA vaccine. Design of Experiments (DoE) was employed to determine process parameters that have impacts on a critical quality attribute of the product, which is the active form of plasmid DNA referred as supercoiled plasmid DNA content, as well as the performance attributes, which are volumetric yield and specific yield from fermentation. The parameters of interest were temperature, pH, dissolved oxygen, cultivation time, and feed rate. Using the definitive-screening design, there were 16 runs, including 3 additional center points to create the predictive model, which then was used to simulate the operational ranges for capability analysis.
Collapse
Affiliation(s)
- Lalintip Hocharoen
- Bioprocess Research and Innovation Centre (BRIC), National Biopharmaceutical Facility (NBF), King Mongkut’s University of Technology Thonburi (KMUTT), Bangkok, Thailand
| | - Sarawuth Noppiboon
- Bioprocess Research and Innovation Centre (BRIC), National Biopharmaceutical Facility (NBF), King Mongkut’s University of Technology Thonburi (KMUTT), Bangkok, Thailand
| | - Panit Kitsubun
- Biochemical Engineering and System Biology Research Group (IBEG), National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Bangkok, Thailand
| |
Collapse
|
30
|
Kaheman K, Kutz JN, Brunton SL. SINDy-PI: a robust algorithm for parallel implicit sparse identification of nonlinear dynamics. Proc Math Phys Eng Sci 2020; 476:20200279. [PMID: 33214760 PMCID: PMC7655768 DOI: 10.1098/rspa.2020.0279] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Accepted: 09/10/2020] [Indexed: 12/15/2022] Open
Abstract
Accurately modelling the nonlinear dynamics of a system from measurement data is a challenging yet vital topic. The sparse identification of nonlinear dynamics (SINDy) algorithm is one approach to discover dynamical systems models from data. Although extensions have been developed to identify implicit dynamics, or dynamics described by rational functions, these extensions are extremely sensitive to noise. In this work, we develop SINDy-PI (parallel, implicit), a robust variant of the SINDy algorithm to identify implicit dynamics and rational nonlinearities. The SINDy-PI framework includes multiple optimization algorithms and a principled approach to model selection. We demonstrate the ability of this algorithm to learn implicit ordinary and partial differential equations and conservation laws from limited and noisy data. In particular, we show that the proposed approach is several orders of magnitude more noise robust than previous approaches, and may be used to identify a class of ODE and PDE dynamics that were previously unattainable with SINDy, including for the double pendulum dynamics and simplified model for the Belousov-Zhabotinsky (BZ) reaction.
Collapse
Affiliation(s)
- Kadierdan Kaheman
- Department of Mechanical Engineering, University of Washington, Seattle, WA 98195, USA
| | - J Nathan Kutz
- Department of Applied Mathematics, University of Washington, Seattle, WA 98195, USA
| | - Steven L Brunton
- Department of Mechanical Engineering, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
31
|
Nardini JT, Lagergren JH, Hawkins-Daarud A, Curtin L, Morris B, Rutter EM, Swanson KR, Flores KB. Learning Equations from Biological Data with Limited Time Samples. Bull Math Biol 2020; 82:119. [PMID: 32909137 PMCID: PMC8409251 DOI: 10.1007/s11538-020-00794-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Accepted: 08/16/2020] [Indexed: 01/25/2023]
Abstract
Equation learning methods present a promising tool to aid scientists in the modeling process for biological data. Previous equation learning studies have demonstrated that these methods can infer models from rich datasets; however, the performance of these methods in the presence of common challenges from biological data has not been thoroughly explored. We present an equation learning methodology comprised of data denoising, equation learning, model selection and post-processing steps that infers a dynamical systems model from noisy spatiotemporal data. The performance of this methodology is thoroughly investigated in the face of several common challenges presented by biological data, namely, sparse data sampling, large noise levels, and heterogeneity between datasets. We find that this methodology can accurately infer the correct underlying equation and predict unobserved system dynamics from a small number of time samples when the data are sampled over a time interval exhibiting both linear and nonlinear dynamics. Our findings suggest that equation learning methods can be used for model discovery and selection in many areas of biology when an informative dataset is used. We focus on glioblastoma multiforme modeling as a case study in this work to highlight how these results are informative for data-driven modeling-based tumor invasion predictions.
Collapse
Affiliation(s)
- John T Nardini
- Department of Mathematics, North Carolina State University, Raleigh, NC, USA.
- The Statistical and Applied Mathematical Sciences Institute, Durham, NC, USA.
| | - John H Lagergren
- Department of Mathematics, North Carolina State University, Raleigh, NC, USA
| | - Andrea Hawkins-Daarud
- Mathematical NeuroOncology Laboratory, Precision Neurotherapeutics Innovation Program, Mayo Clinic, Phoenix, AZ, USA
| | - Lee Curtin
- Mathematical NeuroOncology Laboratory, Precision Neurotherapeutics Innovation Program, Mayo Clinic, Phoenix, AZ, USA
| | - Bethan Morris
- Centre for Mathematical Medicine and Biology, University of Nottingham, Nottingham, UK
| | - Erica M Rutter
- Department of Applied Mathematics, University of California, Merced, Merced, CA, USA
| | - Kristin R Swanson
- Mathematical NeuroOncology Laboratory, Precision Neurotherapeutics Innovation Program, Mayo Clinic, Phoenix, AZ, USA
| | - Kevin B Flores
- Department of Mathematics, North Carolina State University, Raleigh, NC, USA
| |
Collapse
|
32
|
de Silva BM, Higdon DM, Brunton SL, Kutz JN. Discovery of Physics From Data: Universal Laws and Discrepancies. Front Artif Intell 2020; 3:25. [PMID: 33733144 PMCID: PMC7861345 DOI: 10.3389/frai.2020.00025] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2019] [Accepted: 03/30/2020] [Indexed: 12/30/2022] Open
Abstract
Machine learning (ML) and artificial intelligence (AI) algorithms are now being used to automate the discovery of physics principles and governing equations from measurement data alone. However, positing a universal physical law from data is challenging without simultaneously proposing an accompanying discrepancy model to account for the inevitable mismatch between theory and measurements. By revisiting the classic problem of modeling falling objects of different size and mass, we highlight a number of nuanced issues that must be addressed by modern data-driven methods for automated physics discovery. Specifically, we show that measurement noise and complex secondary physical mechanisms, like unsteady fluid drag forces, can obscure the underlying law of gravitation, leading to an erroneous model. We use the sparse identification of non-linear dynamics (SINDy) method to identify governing equations for real-world measurement data and simulated trajectories. Incorporating into SINDy the assumption that each falling object is governed by a similar physical law is shown to improve the robustness of the learned models, but discrepancies between the predictions and observations persist due to subtleties in drag dynamics. This work highlights the fact that the naive application of ML/AI will generally be insufficient to infer universal physical laws without further modification.
Collapse
Affiliation(s)
- Brian M de Silva
- Applied Mathematics, University of Washington, Seattle, WA, United States
| | - David M Higdon
- Department of Statistics, Virginia Polytechnic Institute and State University, Blacksburg, VA, United States
| | - Steven L Brunton
- Mechanical Engineering, University of Washington, Seattle, WA, United States
| | - J Nathan Kutz
- Applied Mathematics, University of Washington, Seattle, WA, United States
| |
Collapse
|
33
|
Horrocks J, Bauch CT. Algorithmic discovery of dynamic models from infectious disease data. Sci Rep 2020; 10:7061. [PMID: 32341374 PMCID: PMC7184751 DOI: 10.1038/s41598-020-63877-w] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Accepted: 04/07/2020] [Indexed: 11/09/2022] Open
Abstract
Theoretical models are typically developed through a deductive process where a researcher formulates a system of dynamic equations from hypothesized mechanisms. Recent advances in algorithmic methods can discover dynamic models inductively-directly from data. Most previous research has tested these methods by rediscovering models from synthetic data generated by the already known model. Here we apply Sparse Identification of Nonlinear Dynamics (SINDy) to discover mechanistic equations for disease dynamics from case notification data for measles, chickenpox, and rubella. The discovered models provide a good qualitative fit to the observed dynamics for all three diseases, However, the SINDy chickenpox model appears to overfit the empirical data, and recovering qualitatively correct rubella dynamics requires using power spectral density in the goodness-of-fit criterion. When SINDy uses a library of second-order functions, the discovered models tend to include mass action incidence and a seasonally varying transmission rate-a common feature of existing epidemiological models for childhood infectious diseases. We also find that the SINDy measles model is capable of out-of-sample prediction of a dynamical regime shift in measles case notification data. These results demonstrate the potential for algorithmic model discovery to enrich scientific understanding by providing a complementary approach to developing theoretical models.
Collapse
Affiliation(s)
- Jonathan Horrocks
- Department of Applied Mathematics, University of Waterloo, Waterloo, N2L 3G1, Canada
| | - Chris T Bauch
- Department of Applied Mathematics, University of Waterloo, Waterloo, N2L 3G1, Canada.
| |
Collapse
|
34
|
Coenen AR, Hu SK, Luo E, Muratore D, Weitz JS. A Primer for Microbiome Time-Series Analysis. Front Genet 2020; 11:310. [PMID: 32373155 PMCID: PMC7186479 DOI: 10.3389/fgene.2020.00310] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2019] [Accepted: 03/16/2020] [Indexed: 12/22/2022] Open
Abstract
Time-series can provide critical insights into the structure and function of microbial communities. The analysis of temporal data warrants statistical considerations, distinct from comparative microbiome studies, to address ecological questions. This primer identifies unique challenges and approaches for analyzing microbiome time-series. In doing so, we focus on (1) identifying compositionally similar samples, (2) inferring putative interactions among populations, and (3) detecting periodic signals. We connect theory, code and data via a series of hands-on modules with a motivating biological question centered on marine microbial ecology. The topics of the modules include characterizing shifts in community structure and activity, identifying expression levels with a diel periodic signal, and identifying putative interactions within a complex community. Modules are presented as self-contained, open-access, interactive tutorials in R and Matlab. Throughout, we highlight statistical considerations for dealing with autocorrelated and compositional data, with an eye to improving the robustness of inferences from microbiome time-series. In doing so, we hope that this primer helps to broaden the use of time-series analytic methods within the microbial ecology research community.
Collapse
Affiliation(s)
- Ashley R. Coenen
- School of Physics, Georgia Institute of Technology, Atlanta, GA, United States
| | - Sarah K. Hu
- Woods Hole Oceanographic Institution, Marine Chemistry and Geochemistry, Woods Hole, MA, United States
| | - Elaine Luo
- Daniel K. Inouye Center for Microbial Oceanography: Research and Education, University of Hawaii, Honolulu, HI, United States
| | - Daniel Muratore
- Interdisciplinary Graduate Program in Quantitative Biosciences, Georgia Institute of Technology, Atlanta, GA, United States
| | - Joshua S. Weitz
- School of Physics, Georgia Institute of Technology, Atlanta, GA, United States
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, United States
| |
Collapse
|
35
|
Lagergren JH, Nardini JT, Michael Lavigne G, Rutter EM, Flores KB. Learning partial differential equations for biological transport models from noisy spatio-temporal data. Proc Math Phys Eng Sci 2020; 476:20190800. [PMID: 32201481 DOI: 10.1098/rspa.2019.0800] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Accepted: 01/17/2020] [Indexed: 12/20/2022] Open
Abstract
We investigate methods for learning partial differential equation (PDE) models from spatio-temporal data under biologically realistic levels and forms of noise. Recent progress in learning PDEs from data have used sparse regression to select candidate terms from a denoised set of data, including approximated partial derivatives. We analyse the performance in using previous methods to denoise data for the task of discovering the governing system of PDEs. We also develop a novel methodology that uses artificial neural networks (ANNs) to denoise data and approximate partial derivatives. We test the methodology on three PDE models for biological transport, i.e. the advection-diffusion, classical Fisher-Kolmogorov-Petrovsky-Piskunov (Fisher-KPP) and nonlinear Fisher-KPP equations. We show that the ANN methodology outperforms previous denoising methods, including finite differences and both local and global polynomial regression splines, in the ability to accurately approximate partial derivatives and learn the correct PDE model.
Collapse
Affiliation(s)
- John H Lagergren
- Department of Mathematics, North Carolina State University, Raleigh, NC, USA.,Center for Research in Scientific Computation, North Carolina State University, Raleigh, NC, USA
| | - John T Nardini
- Department of Mathematics, North Carolina State University, Raleigh, NC, USA.,The Statistical and Applied Mathematical Sciences Institute, Durham, NC, USA
| | - G Michael Lavigne
- Department of Mathematics, North Carolina State University, Raleigh, NC, USA.,Center for Research in Scientific Computation, North Carolina State University, Raleigh, NC, USA
| | - Erica M Rutter
- Department of Mathematics, North Carolina State University, Raleigh, NC, USA.,Center for Research in Scientific Computation, North Carolina State University, Raleigh, NC, USA.,Department of Applied Mathematics, University of California, Merced, CA, USA
| | - Kevin B Flores
- Department of Mathematics, North Carolina State University, Raleigh, NC, USA.,Center for Research in Scientific Computation, North Carolina State University, Raleigh, NC, USA
| |
Collapse
|
36
|
Dasgupta A, Wang H, O'Brien N, Burrows S. Separating the Wheat from the Chaff: Comparative Visual Cues for Transparent Diagnostics of Competing Models. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:1043-1053. [PMID: 31478858 DOI: 10.1109/tvcg.2019.2934540] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Experts in data and physical sciences have to regularly grapple with the problem of competing models. Be it analytical or physics-based models, a cross-cutting challenge for experts is to reliably diagnose which model outcomes appropriately predict or simulate real-world phenomena. Expert judgment involves reconciling information across many, and often, conflicting criteria that describe the quality of model outcomes. In this paper, through a design study with climate scientists, we develop a deeper understanding of the problem and solution space of model diagnostics, resulting in the following contributions: i) a problem and task characterization using which we map experts' model diagnostics goals to multi-way visual comparison tasks, ii) a design space of comparative visual cues for letting experts quickly understand the degree of disagreement among competing models and gauge the degree of stability of model outputs with respect to alternative criteria, and iii) design and evaluation of MyriadCues, an interactive visualization interface for exploring alternative hypotheses and insights about good and bad models by leveraging comparative visual cues. We present case studies and subjective feedback by experts, which validate how MyriadCues enables more transparent model diagnostic mechanisms, as compared to the state of the art.
Collapse
|
37
|
Guimerà R, Reichardt I, Aguilar-Mogas A, Massucci FA, Miranda M, Pallarès J, Sales-Pardo M. A Bayesian machine scientist to aid in the solution of challenging scientific problems. SCIENCE ADVANCES 2020; 6:eaav6971. [PMID: 32064326 PMCID: PMC6994216 DOI: 10.1126/sciadv.aav6971] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/11/2018] [Accepted: 11/20/2019] [Indexed: 05/06/2023]
Abstract
Closed-form, interpretable mathematical models have been instrumental for advancing our understanding of the world; with the data revolution, we may now be in a position to uncover new such models for many systems from physics to the social sciences. However, to deal with increasing amounts of data, we need "machine scientists" that are able to extract these models automatically from data. Here, we introduce a Bayesian machine scientist, which establishes the plausibility of models using explicit approximations to the exact marginal posterior over models and establishes its prior expectations about models by learning from a large empirical corpus of mathematical expressions. It explores the space of models using Markov chain Monte Carlo. We show that this approach uncovers accurate models for synthetic and real data and provides out-of-sample predictions that are more accurate than those of existing approaches and of other nonparametric methods.
Collapse
Affiliation(s)
- Roger Guimerà
- ICREA, Barcelona 08010, Catalonia, Spain
- Department of Chemical Engineering, Universitat Rovira i Virgili, Tarragona 43007, Catalonia, Spain
- Corresponding author.
| | - Ignasi Reichardt
- Department of Chemical Engineering, Universitat Rovira i Virgili, Tarragona 43007, Catalonia, Spain
| | - Antoni Aguilar-Mogas
- Department of Chemical Engineering, Universitat Rovira i Virgili, Tarragona 43007, Catalonia, Spain
- Division of Research, Economic Development and Engagement, East Carolina University, Greenville, NC 27858, USA
| | - Francesco A. Massucci
- Department of Chemical Engineering, Universitat Rovira i Virgili, Tarragona 43007, Catalonia, Spain
- SIRIS Lab, Research Division of SIRIS Academic, Barcelona 08003, Catalonia, Spain
| | - Manuel Miranda
- Department of Chemical Engineering, Universitat Rovira i Virgili, Tarragona 43007, Catalonia, Spain
| | - Jordi Pallarès
- Department of Mechanical Engineering, Universitat Rovira i Virgili, Tarragona 43007, Catalonia, Spain
| | - Marta Sales-Pardo
- Department of Chemical Engineering, Universitat Rovira i Virgili, Tarragona 43007, Catalonia, Spain
| |
Collapse
|
38
|
Chen Y, Angulo MT, Liu YY. Revealing Complex Ecological Dynamics via Symbolic Regression. Bioessays 2019; 41:e1900069. [PMID: 31617228 PMCID: PMC7339472 DOI: 10.1002/bies.201900069] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2019] [Revised: 09/14/2019] [Indexed: 12/24/2022]
Abstract
Understanding the dynamics of complex ecosystems is a necessary step to maintain and control them. Yet, reverse-engineering ecological dynamics remains challenging largely due to the very broad class of dynamics that ecosystems may take. Here, this challenge is tackled through symbolic regression, a machine learning method that automatically reverse-engineers both the model structure and parameters from temporal data. How combining symbolic regression with a "dictionary" of possible ecological functional responses opens the door to correctly reverse-engineering ecosystem dynamics, even in the case of poorly informative data, is shown. This strategy is validated using both synthetic and experimental data, and it is found that this strategy is promising for the systematic modeling of complex ecological systems.
Collapse
Affiliation(s)
- Yize Chen
- Channing Division of Network Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA
- Department of Electrical and Computer Engineering, University of Washington, Seattle, WA, 98195, USA
| | - Marco Tulio Angulo
- CONACyT - Institute of Mathematics, Universidad Nacional Autónoma de México, Juriquilla, 76230, México
| | - Yang-Yu Liu
- Channing Division of Network Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA
- Center for Cancer Systems Biology, Dana-Farber Cancer Institute, Boston, MA, 02115, USA
| |
Collapse
|
39
|
Machine learning-based adaptive model identification of systems: Application to a chemical process. Chem Eng Res Des 2019. [DOI: 10.1016/j.cherd.2019.09.009] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
|
40
|
Abstract
Governing equations are essential to the study of physical systems, providing models that can generalize to predict previously unseen behaviors. There are many systems of interest across disciplines where large quantities of data have been collected, but the underlying governing equations remain unknown. This work introduces an approach to discover governing models from data. The proposed method addresses a key limitation of prior approaches by simultaneously discovering coordinates that admit a parsimonious dynamical model. Developing parsimonious and interpretable governing models has the potential to transform our understanding of complex systems, including in neuroscience, biology, and climate science. The discovery of governing equations from scientific data has the potential to transform data-rich fields that lack well-characterized quantitative descriptions. Advances in sparse regression are currently enabling the tractable identification of both the structure and parameters of a nonlinear dynamical system from data. The resulting models have the fewest terms necessary to describe the dynamics, balancing model complexity with descriptive ability, and thus promoting interpretability and generalizability. This provides an algorithmic approach to Occam’s razor for model discovery. However, this approach fundamentally relies on an effective coordinate system in which the dynamics have a simple representation. In this work, we design a custom deep autoencoder network to discover a coordinate transformation into a reduced space where the dynamics may be sparsely represented. Thus, we simultaneously learn the governing equations and the associated coordinate system. We demonstrate this approach on several example high-dimensional systems with low-dimensional behavior. The resulting modeling framework combines the strengths of deep neural networks for flexible representation and sparse identification of nonlinear dynamics (SINDy) for parsimonious models. This method places the discovery of coordinates and models on an equal footing.
Collapse
|
41
|
Gabel M, Hohl T, Imle A, Fackler OT, Graw F. FAMoS: A Flexible and dynamic Algorithm for Model Selection to analyse complex systems dynamics. PLoS Comput Biol 2019; 15:e1007230. [PMID: 31419221 PMCID: PMC6697322 DOI: 10.1371/journal.pcbi.1007230] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2019] [Accepted: 06/30/2019] [Indexed: 01/12/2023] Open
Abstract
Most biological systems are difficult to analyse due to a multitude of interacting components and the concomitant lack of information about the essential dynamics. Finding appropriate models that provide a systematic description of such biological systems and that help to identify their relevant factors and processes can be challenging given the sheer number of possibilities. Model selection algorithms that evaluate the performance of a multitude of different models against experimental data provide a useful tool to identify appropriate model structures. However, many algorithms addressing the analysis of complex dynamical systems, as they are often used in biology, compare a preselected number of models or rely on exhaustive searches of the total model space which might be unfeasible dependent on the number of possibilities. Therefore, we developed an algorithm that is able to perform model selection on complex systems and searches large model spaces in a dynamical way. Our algorithm includes local and newly developed non-local search methods that can prevent the algorithm from ending up in local minima of the model space by accounting for structurally similar processes. We tested and validated the algorithm based on simulated data and showed its flexibility for handling different model structures. We also used the algorithm to analyse experimental data on the cell proliferation dynamics of CD4+ and CD8+ T cells that were cultured under different conditions. Our analyses indicated dynamical changes within the proliferation potential of cells that was reduced within tissue-like 3D ex vivo cultures compared to suspension. Due to the flexibility in handling various model structures, the algorithm is applicable to a large variety of different biological problems and represents a useful tool for the data-oriented evaluation of complex model spaces. Identifying the systematic interactions of multiple components within a complex biological system can be challenging due to the number of potential processes and the concomitant lack of information about the essential dynamics. Selection algorithms that allow an automated evaluation of a large number of different models provide a useful tool in identifying the systematic relationships between experimental data. However, many of the existing model selection algorithms are not able to address complex model structures, such as systems of differential equations, and partly rely on local or exhaustive search methods which are inappropriate for the analysis of various biological systems. Therefore, we developed a flexible model selection algorithm that performs a robust and dynamical search of large model spaces to identify complex systems dynamics and applied it to the analysis of T cell proliferation dynamics within different culture conditions. The algorithm, which is available as an R-package, provides an advanced tool for the analysis of complex systems behaviour and, due to its flexible structure, can be applied to a large variety of biological problems.
Collapse
Affiliation(s)
- Michael Gabel
- Center for Modelling and Simulation in the Biosciences, BioQuant-Center, Heidelberg University, Heidelberg, Germany
- * E-mail: (MG); (FG)
| | - Tobias Hohl
- Center for Modelling and Simulation in the Biosciences, BioQuant-Center, Heidelberg University, Heidelberg, Germany
| | - Andrea Imle
- Department of Infectious Diseases, Centre for Integrative Infectious Disease Research (CIID), Integrative Virology, University Hospital Heidelberg, Heidelberg, Germany
| | - Oliver T. Fackler
- Department of Infectious Diseases, Centre for Integrative Infectious Disease Research (CIID), Integrative Virology, University Hospital Heidelberg, Heidelberg, Germany
| | - Frederik Graw
- Center for Modelling and Simulation in the Biosciences, BioQuant-Center, Heidelberg University, Heidelberg, Germany
- * E-mail: (MG); (FG)
| |
Collapse
|
42
|
Equation Discovery Using Fast Function Extraction: a Deterministic Symbolic Regression Approach. FLUIDS 2019. [DOI: 10.3390/fluids4020111] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Advances in machine learning (ML) coupled with increased computational power have enabled identification of patterns in data extracted from complex systems. ML algorithms are actively being sought in recovering physical models or mathematical equations from data. This is a highly valuable technique where models cannot be built using physical reasoning alone. In this paper, we investigate the application of fast function extraction (FFX), a fast, scalable, deterministic symbolic regression algorithm to recover partial differential equations (PDEs). FFX identifies active bases among a huge set of candidate basis functions and their corresponding coefficients from recorded snapshot data. This approach uses a sparsity-promoting technique from compressive sensing and sparse optimization called pathwise regularized learning to perform feature selection and parameter estimation. Furthermore, it recovers several models of varying complexity (number of basis terms). FFX finally filters out many identified models using non-dominated sorting and forms a Pareto front consisting of optimal models with respect to minimizing complexity and test accuracy. Numerical experiments are carried out to recover several ubiquitous PDEs such as wave and heat equations among linear PDEs and Burgers, Korteweg–de Vries (KdV), and Kawahara equations among higher-order nonlinear PDEs. Additional simulations are conducted on the same PDEs under noisy conditions to test the robustness of the proposed approach.
Collapse
|
43
|
Mangan NM, Askham T, Brunton SL, Kutz JN, Proctor JL. Model selection for hybrid dynamical systems via sparse regression. Proc Math Phys Eng Sci 2019; 475:20180534. [PMID: 31007544 PMCID: PMC6451978 DOI: 10.1098/rspa.2018.0534] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2018] [Accepted: 01/25/2019] [Indexed: 12/14/2022] Open
Abstract
Hybrid systems are traditionally difficult to identify and analyse using classical dynamical systems theory. Moreover, recently developed model identification methodologies largely focus on identifying a single set of governing equations solely from measurement data. In this article, we develop a new methodology, Hybrid-Sparse Identification of Nonlinear Dynamics, which identifies separate nonlinear dynamical regimes, employs information theory to manage uncertainty and characterizes switching behaviour. Specifically, we use the nonlinear geometry of data collected from a complex system to construct a set of coordinates based on measurement data and augmented variables. Clustering the data in these measurement-based coordinates enables the identification of nonlinear hybrid systems. This methodology broadly empowers nonlinear system identification without constraining the data locally in time and has direct connections to hybrid systems theory. We demonstrate the success of this method on numerical examples including a mass–spring hopping model and an infectious disease model. Characterizing complex systems that switch between dynamic behaviours is integral to overcoming modern challenges such as eradication of infectious diseases, the design of efficient legged robots and the protection of cyber infrastructures.
Collapse
Affiliation(s)
- N M Mangan
- Department of Engineering Sciences and Applied Mathematics, Northwestern University, Evanston, IL 60208, USA
| | - T Askham
- Department of Applied Mathematics, University of Washington, Seattle, WA 98195, USA
| | - S L Brunton
- Institute for Disease Modeling, Bellevue, WA 98005, USA
| | - J N Kutz
- Department of Applied Mathematics, University of Washington, Seattle, WA 98195, USA
| | - J L Proctor
- Department of Mechanical Engineering, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
44
|
Letten AD, Stouffer DB. The mechanistic basis for higher-order interactions and non-additivity in competitive communities. Ecol Lett 2019; 22:423-436. [PMID: 30675983 DOI: 10.1111/ele.13211] [Citation(s) in RCA: 67] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2018] [Revised: 09/18/2018] [Accepted: 11/27/2018] [Indexed: 11/27/2022]
Abstract
Motivated by both analytical tractability and empirical practicality, community ecologists have long treated the species pair as the fundamental unit of study. This notwithstanding, the challenge of understanding more complex systems has repeatedly generated interest in the role of so-called higher-order interactions (HOIs) imposed by species beyond the focal pair. Here we argue that HOIs - defined as non-additive effects of density on per capita growth - are best interpreted as emergent properties of phenomenological models (e.g. Lotka-Volterra competition) rather than as distinct 'ecological processes' in their own right. Using simulations of consumer-resource models, we explore the mechanisms and system properties that give rise to HOIs in observational data. We demonstrate that HOIs emerge under all but the most restrictive of assumptions, and that incorporating non-additivity into phenomenological models improves the quantitative and qualitative accuracy of model predictions. Notably, we also observe that HOIs derive primarily from mechanisms and system properties that apply equally to single-species or pairwise systems as they do to more diverse communities. Consequently, there exists a strong mandate for further recognition of non-additive effects in both theoretical and empirical research.
Collapse
Affiliation(s)
- Andrew D Letten
- Centre for Integrative Ecology, University of Canterbury, Christchurch, 8140, New Zealand.,Institute of Integrative Biology, Department of Environmental Systems Science, ETH Zürich, 8092, Zürich, Switzerland
| | - Daniel B Stouffer
- Centre for Integrative Ecology, University of Canterbury, Christchurch, 8140, New Zealand
| |
Collapse
|
45
|
Kaiser E, Kutz JN, Brunton SL. Sparse identification of nonlinear dynamics for model predictive control in the low-data limit. Proc Math Phys Eng Sci 2018; 474:20180335. [PMID: 30839858 PMCID: PMC6283900 DOI: 10.1098/rspa.2018.0335] [Citation(s) in RCA: 59] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2018] [Accepted: 10/11/2018] [Indexed: 02/07/2023] Open
Abstract
Data-driven discovery of dynamics via machine learning is pushing the frontiers of modelling and control efforts, providing a tremendous opportunity to extend the reach of model predictive control (MPC). However, many leading methods in machine learning, such as neural networks (NN), require large volumes of training data, may not be interpretable, do not easily include known constraints and symmetries, and may not generalize beyond the attractor where models are trained. These factors limit their use for the online identification of a model in the low-data limit, for example following an abrupt change to the system dynamics. In this work, we extend the recent sparse identification of nonlinear dynamics (SINDY) modelling procedure to include the effects of actuation and demonstrate the ability of these models to enhance the performance of MPC, based on limited, noisy data. SINDY models are parsimonious, identifying the fewest terms in the model needed to explain the data, making them interpretable and generalizable. We show that the resulting SINDY-MPC framework has higher performance, requires significantly less data, and is more computationally efficient and robust to noise than NN models, making it viable for online training and execution in response to rapid system changes. SINDY-MPC also shows improved performance over linear data-driven models, although linear models may provide a stopgap until enough data is available for SINDY. SINDY-MPC is demonstrated on a variety of dynamical systems with different challenges, including the chaotic Lorenz system, a simple model for flight control of an F8 aircraft, and an HIV model incorporating drug treatment.
Collapse
Affiliation(s)
- E. Kaiser
- Department of Mechanical Engineering, University of Washington, Seattle, WA, 98195
| | - J. N. Kutz
- Department of Applied Mathematics, University of Washington, Seattle, WA, 98195
| | - S. L. Brunton
- Department of Mechanical Engineering, University of Washington, Seattle, WA, 98195
| |
Collapse
|
46
|
Zhang S, Lin G. Robust data-driven discovery of governing physical laws with error bars. Proc Math Phys Eng Sci 2018; 474:20180305. [PMID: 30333709 PMCID: PMC6189595 DOI: 10.1098/rspa.2018.0305] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2018] [Accepted: 08/22/2018] [Indexed: 11/12/2022] Open
Abstract
Discovering governing physical laws from noisy data is a grand challenge in many science and engineering research areas. We present a new approach to data-driven discovery of ordinary differential equations (ODEs) and partial differential equations (PDEs), in explicit or implicit form. We demonstrate our approach on a wide range of problems, including shallow water equations and Navier-Stokes equations. The key idea is to select candidate terms for the underlying equations using dimensional analysis, and to approximate the weights of the terms with error bars using our threshold sparse Bayesian regression. This new algorithm employs Bayesian inference to tune the hyperparameters automatically. Our approach is effective, robust and able to quantify uncertainties by providing an error bar for each discovered candidate equation. The effectiveness of our algorithm is demonstrated through a collection of classical ODEs and PDEs. Numerical experiments demonstrate the robustness of our algorithm with respect to noisy data and its ability to discover various candidate equations with error bars that represent the quantified uncertainties. Detailed comparisons with the sequential threshold least-squares algorithm and the lasso algorithm are studied from noisy time-series measurements and indicate that the proposed method provides more robust and accurate results. In addition, the data-driven prediction of dynamics with error bars using discovered governing physical laws is more accurate and robust than classical polynomial regressions.
Collapse
Affiliation(s)
- Sheng Zhang
- Department of Mathematics, Purdue University, West Lafayette, IN 47907, USA
| | - Guang Lin
- Department of Mathematics, Purdue University, West Lafayette, IN 47907, USA
- School of Mechanical Engineering, Purdue University, West Lafayette, IN 47907, USA
| |
Collapse
|
47
|
Quade M, Abel M, Nathan Kutz J, Brunton SL. Sparse identification of nonlinear dynamics for rapid model recovery. CHAOS (WOODBURY, N.Y.) 2018; 28:063116. [PMID: 29960401 DOI: 10.1063/1.5027470] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Big data have become a critically enabling component of emerging mathematical methods aimed at the automated discovery of dynamical systems, where first principles modeling may be intractable. However, in many engineering systems, abrupt changes must be rapidly characterized based on limited, incomplete, and noisy data. Many leading automated learning techniques rely on unrealistically large data sets, and it is unclear how to leverage prior knowledge effectively to re-identify a model after an abrupt change. In this work, we propose a conceptual framework to recover parsimonious models of a system in response to abrupt changes in the low-data limit. First, the abrupt change is detected by comparing the estimated Lyapunov time of the data with the model prediction. Next, we apply the sparse identification of nonlinear dynamics (SINDy) regression to update a previously identified model with the fewest changes, either by addition, deletion, or modification of existing model terms. We demonstrate this sparse model recovery on several examples for abrupt system change detection in periodic and chaotic dynamical systems. Our examples show that sparse updates to a previously identified model perform better with less data, have lower runtime complexity, and are less sensitive to noise than identifying an entirely new model. The proposed abrupt-SINDy architecture provides a new paradigm for the rapid and efficient recovery of a system model after abrupt changes.
Collapse
Affiliation(s)
- Markus Quade
- Institut für Physik und Astronomie, Universität Potsdam, Karl-Liebknecht-Straße 24/25, 14476 Potsdam, Germany
| | - Markus Abel
- Institut für Physik und Astronomie, Universität Potsdam, Karl-Liebknecht-Straße 24/25, 14476 Potsdam, Germany
| | - J Nathan Kutz
- Department of Applied Mathematics, University of Washington, Seattle, Washington 98195, USA
| | - Steven L Brunton
- Department of Mechanical Engineering, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
48
|
Brunton SL, Brunton BW, Proctor JL, Kaiser E, Kutz JN. Chaos as an intermittently forced linear system. Nat Commun 2017; 8:19. [PMID: 28559566 PMCID: PMC5449398 DOI: 10.1038/s41467-017-00030-8] [Citation(s) in RCA: 69] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2016] [Accepted: 04/05/2017] [Indexed: 11/09/2022] Open
Abstract
Understanding the interplay of order and disorder in chaos is a central challenge in modern quantitative science. Approximate linear representations of nonlinear dynamics have long been sought, driving considerable interest in Koopman theory. We present a universal, data-driven decomposition of chaos as an intermittently forced linear system. This work combines delay embedding and Koopman theory to decompose chaotic dynamics into a linear model in the leading delay coordinates with forcing by low-energy delay coordinates; this is called the Hankel alternative view of Koopman (HAVOK) analysis. This analysis is applied to the Lorenz system and real-world examples including Earth’s magnetic field reversal and measles outbreaks. In each case, forcing statistics are non-Gaussian, with long tails corresponding to rare intermittent forcing that precedes switching and bursting phenomena. The forcing activity demarcates coherent phase space regions where the dynamics are approximately linear from those that are strongly nonlinear. The huge amount of data generated in fields like neuroscience or finance calls for effective strategies that mine data to reveal underlying dynamics. Here Brunton et al.develop a data-driven technique to analyze chaotic systems and predict their dynamics in terms of a forced linear model.
Collapse
Affiliation(s)
- Steven L Brunton
- Department of Mechanical Engineering, University of Washington, Seattle, WA, 98195, USA.
| | - Bingni W Brunton
- Department of Biology, University of Washington, Seattle, WA, 98195, USA
| | | | - Eurika Kaiser
- Department of Mechanical Engineering, University of Washington, Seattle, WA, 98195, USA
| | - J Nathan Kutz
- Department of Applied Mathematics, University of Washington, Seattle, WA, 98195, USA
| |
Collapse
|