1
|
Schäfer JL, Keller BG. Implementation of Girsanov Reweighting in OpenMM and Deeptime. J Phys Chem B 2024. [PMID: 38865491 DOI: 10.1021/acs.jpcb.4c01702] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2024]
Abstract
Classical molecular dynamics (MD) simulations provide invaluable insights into complex molecular systems but face limitations in capturing phenomena occurring on time scales beyond their reach. To bridge this gap, various enhanced sampling techniques have been developed, which are complemented by reweighting techniques to recover the unbiased dynamics. Girsanov reweighting is a reweighting technique that reweights simulation paths, generated by a stochastic MD integrator, without evoking an effective model of the dynamics. Instead, it calculates the relative path probability density at the time resolution of the MD integrator. Efficient implementation of Girsanov reweighting requires that the reweighting factors are calculated on-the-fly during the simulations and thus needs to be implemented within the MD integrator. Here, we present a comprehensive guide for implementing Girsanov reweighting into MD simulations. We demonstrate the implementation in the MD simulation package OpenMM by extending the library openmmtools. Additionally, we implemented a reweighted Markov state model estimator within the time series analysis package Deeptime.
Collapse
Affiliation(s)
- Joana-Lysiane Schäfer
- Department of Biology, Chemistry, and Pharmacy, Freie Universität Berlin, Berlin 14195, Germany
| | - Bettina G Keller
- Department of Biology, Chemistry, and Pharmacy, Freie Universität Berlin, Berlin 14195, Germany
| |
Collapse
|
2
|
Wang D, Qiu Y, Beyerle ER, Huang X, Tiwary P. Information Bottleneck Approach for Markov Model Construction. J Chem Theory Comput 2024. [PMID: 38859575 DOI: 10.1021/acs.jctc.4c00449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/12/2024]
Abstract
Markov state models (MSMs) have proven valuable in studying the dynamics of protein conformational changes via statistical analysis of molecular dynamics simulations. In MSMs, the complex configuration space is coarse-grained into conformational states, with dynamics modeled by a series of Markovian transitions among these states at discrete lag times. Constructing the Markovian model at a specific lag time necessitates defining states that circumvent significant internal energy barriers, enabling internal dynamics relaxation within the lag time. This process effectively coarse-grains time and space, integrating out rapid motions within metastable states. Thus, MSMs possess a multiresolution nature, where the granularity of states can be adjusted according to the time-resolution, offering flexibility in capturing system dynamics. This work introduces a continuous embedding approach for molecular conformations using the state predictive information bottleneck (SPIB), a framework that unifies dimensionality reduction and state space partitioning via a continuous, machine learned basis set. Without explicit optimization of the VAMP-based scores, SPIB demonstrates state-of-the-art performance in identifying slow dynamical processes and constructing predictive multiresolution Markovian models. Through applications to well-validated mini-proteins, SPIB showcases unique advantages compared to competing methods. It autonomously and self-consistently adjusts the number of metastable states based on a specified minimal time resolution, eliminating the need for manual tuning. While maintaining efficacy in dynamical properties, SPIB excels in accurately distinguishing metastable states and capturing numerous well-populated macrostates. This contrasts with existing VAMP-based methods, which often emphasize slow dynamics at the expense of incorporating numerous sparsely populated states. Furthermore, SPIB's ability to learn a low-dimensional continuous embedding of the underlying MSMs enhances the interpretation of dynamic pathways. With these benefits, we propose SPIB as an easy-to-implement methodology for end-to-end MSM construction.
Collapse
Affiliation(s)
- Dedi Wang
- Biophysics Program and Institute for Physical Science and Technology, University of Maryland, College Park, Maryland 20742, United States
| | - Yunrui Qiu
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
- Data Science Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Eric R Beyerle
- Institute for Physical Science and Technology, University of Maryland, College Park, Maryland 20742, United States
| | - Xuhui Huang
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
- Data Science Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Pratyush Tiwary
- Department of Chemistry and Biochemistry and Institute for Physical Science and Technology, University of Maryland, College Park, Maryland 20742, United States
- University of Maryland Institute for Health Computing, Bethesda, Maryland 20852, United States
| |
Collapse
|
3
|
Liu Y, Ghosh TK, Lin G, Chen M. Unbiasing Enhanced Sampling on a High-Dimensional Free Energy Surface with a Deep Generative Model. J Phys Chem Lett 2024; 15:3938-3945. [PMID: 38568182 DOI: 10.1021/acs.jpclett.3c03515] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/12/2024]
Abstract
Biased enhanced sampling methods that utilize collective variables (CVs) are powerful tools for sampling conformational ensembles. Due to their large intrinsic dimensions, efficiently generating conformational ensembles for complex systems requires enhanced sampling on high-dimensional free energy surfaces. While temperature-accelerated molecular dynamics (TAMD) can trivially adopt many CVs in a simulation, unbiasing the simulation to generate unbiased conformational ensembles requires accurate modeling of a high-dimensional CV probability distribution, which is challenging for traditional density estimation techniques. Here we propose an unbiasing method based on the score-based diffusion model, a deep generative learning method that excels in density estimation across complex data landscapes. We demonstrate that this unbiasing approach, tested on multiple TAMD simulations, significantly outperforms traditional unbiasing methods and can generate accurate unbiased conformational ensembles. With the proposed approach, TAMD can adopt CVs that focus on improving sampling efficiency and the proposed unbiasing method enables accurate evaluation of ensemble averages of important chemical features.
Collapse
Affiliation(s)
- Yikai Liu
- Department of Mechanical Engineering, Purdue University, West Lafayette, Indiana 47906, United States
| | - Tushar K Ghosh
- Department of Chemistry, Purdue University, West Lafayette, Indiana 47906, United States
| | - Guang Lin
- Department of Mechanical Engineering, Purdue University, West Lafayette, Indiana 47906, United States
| | - Ming Chen
- Department of Chemistry, Purdue University, West Lafayette, Indiana 47906, United States
| |
Collapse
|
4
|
Xu T, Li Y, Gao X, Zhang L. Understanding the Fast-Triggering Unfolding Dynamics of FK-11 upon Photoexcitation of Azobenzene. J Phys Chem Lett 2024; 15:3531-3540. [PMID: 38526058 DOI: 10.1021/acs.jpclett.4c00091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/26/2024]
Abstract
Photoswitchable molecules can control the activity and functions of biomolecules by triggering conformational changes. However, it is still challenging to fully understand such fast-triggering conformational evolution from nonequilibrium to equilibrium distribution at the molecular level. Herein, we successfully simulated the unfolding of the FK-11 peptide upon the photoinduced trans-to-cis isomerization of azobenzene based on the Markov state model. We found that the ensemble of FK-11 contains five conformational states, constituting two unfolding pathways. More intriguingly, we observed the microsecond-scale conformational propagation of the FK-11 peptide from the fully folded state to the equilibrium populations of the five states. The computed CD spectra match well with the experimental data, validating our simulation method. Overall, our study not only offers a protocol to study the photoisomerization-induced conformational changes of enzymes but also could orientate the rational design of a photoswitchable molecule to manipulate biological functions.
Collapse
Affiliation(s)
- Tiantian Xu
- State Key Laboratory of Structural Chemistry, Fujian Institute of Research on the Structure of Matter, Chinese Academy of Sciences, Fuzhou, Fujian 350002, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yongfang Li
- State Key Laboratory of Structural Chemistry, Fujian Institute of Research on the Structure of Matter, Chinese Academy of Sciences, Fuzhou, Fujian 350002, China
| | - Xin Gao
- Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia
| | - Lu Zhang
- State Key Laboratory of Structural Chemistry, Fujian Institute of Research on the Structure of Matter, Chinese Academy of Sciences, Fuzhou, Fujian 350002, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Fuzhou, Fujian 361005, China
| |
Collapse
|
5
|
Sahimi M. Physics-informed and data-driven discovery of governing equations for complex phenomena in heterogeneous media. Phys Rev E 2024; 109:041001. [PMID: 38755895 DOI: 10.1103/physreve.109.041001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Indexed: 05/18/2024]
Abstract
Rapid evolution of sensor technology, advances in instrumentation, and progress in devising data-acquisition software and hardware are providing vast amounts of data for various complex phenomena that occur in heterogeneous media, ranging from those in atmospheric environment, to large-scale porous formations, and biological systems. The tremendous increase in the speed of scientific computing has also made it possible to emulate diverse multiscale and multiphysics phenomena that contain elements of stochasticity or heterogeneity, and to generate large volumes of numerical data for them. Thus, given a heterogeneous system with annealed or quenched disorder in which a complex phenomenon occurs, how should one analyze and model the system and phenomenon, explain the data, and make predictions for length and time scales much larger than those over which the data were collected? We divide such systems into three distinct classes. (i) Those for which the governing equations for the physical phenomena of interest, as well as data, are known, but solving the equations over large length scales and long times is very difficult. (ii) Those for which data are available, but the governing equations are only partially known, in the sense that they either contain various coefficients that must be evaluated based on the data, or that the number of degrees of freedom of the system is so large that deriving the complete equations is very difficult, if not impossible, as a result of which one must develop the governing equations with reduced dimensionality. (iii) In the third class are systems for which large amounts of data are available, but the governing equations for the phenomena of interest are not known. Several classes of physics-informed and data-driven approaches for analyzing and modeling of the three classes of systems have been emerging, which are based on machine learning, symbolic regression, the Koopman operator, the Mori-Zwanzig projection operator formulation, sparse identification of nonlinear dynamics, data assimilation combined with a neural network, and stochastic optimization and analysis. This perspective describes such methods and the latest developments in this highly important and rapidly expanding area and discusses possible future directions.
Collapse
Affiliation(s)
- Muhammad Sahimi
- Mork Family Department of Chemical Engineering and Materials Science, University of Southern California, Los Angeles, California 90089-1211, USA
| |
Collapse
|
6
|
Lelièvre T, Pigeon T, Stoltz G, Zhang W. Analyzing Multimodal Probability Measures with Autoencoders. J Phys Chem B 2024; 128:2607-2631. [PMID: 38466759 DOI: 10.1021/acs.jpcb.3c07075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/13/2024]
Abstract
Finding collective variables to describe some important coarse-grained information on physical systems, in particular metastable states, remains a key issue in molecular dynamics. Recently, machine learning techniques have been intensively used to complement and possibly bypass expert knowledge in order to construct collective variables. Our focus here is on neural network approaches based on autoencoders. We study some relevant mathematical properties of the loss function considered for training autoencoders and provide physical interpretations based on conditional variances and minimum energy paths. We also consider various extensions in order to better describe physical systems, by incorporating more information on transition states at saddle points, and/or allowing for multiple decoders in order to describe several transition paths. Our results are illustrated on toy two-dimensional systems and on alanine dipeptide.
Collapse
Affiliation(s)
- Tony Lelièvre
- CERMICS, École des Ponts ParisTech, 6-8 Avenue Blaise Pascal, 77455 Marne-la-Vallée, France
- MATHERIALS Team-project, Inria Paris, 2 Rue Simone Iff, 75012 Paris, France
| | - Thomas Pigeon
- CERMICS, École des Ponts ParisTech, 6-8 Avenue Blaise Pascal, 77455 Marne-la-Vallée, France
- MATHERIALS Team-project, Inria Paris, 2 Rue Simone Iff, 75012 Paris, France
- IFP Energies Nouvelles, Rond-Point de l'Echangeur de Solaize, BP 3, 69360 Solaize, France
| | - Gabriel Stoltz
- CERMICS, École des Ponts ParisTech, 6-8 Avenue Blaise Pascal, 77455 Marne-la-Vallée, France
- MATHERIALS Team-project, Inria Paris, 2 Rue Simone Iff, 75012 Paris, France
| | - Wei Zhang
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 14, 14195 Berlin, Germany
- Zuse Institute Berlin, Takustraße 7, 14195 Berlin, Germany
| |
Collapse
|
7
|
Sisk TR, Robustelli P. Folding-upon-binding pathways of an intrinsically disordered protein from a deep Markov state model. Proc Natl Acad Sci U S A 2024; 121:e2313360121. [PMID: 38294935 PMCID: PMC10861926 DOI: 10.1073/pnas.2313360121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 11/22/2023] [Indexed: 02/02/2024] Open
Abstract
A central challenge in the study of intrinsically disordered proteins is the characterization of the mechanisms by which they bind their physiological interaction partners. Here, we utilize a deep learning-based Markov state modeling approach to characterize the folding-upon-binding pathways observed in a long timescale molecular dynamics simulation of a disordered region of the measles virus nucleoprotein NTAIL reversibly binding the X domain of the measles virus phosphoprotein complex. We find that folding-upon-binding predominantly occurs via two distinct encounter complexes that are differentiated by the binding orientation, helical content, and conformational heterogeneity of NTAIL. We observe that folding-upon-binding predominantly proceeds through a multi-step induced fit mechanism with several intermediates and do not find evidence for the existence of canonical conformational selection pathways. We observe four kinetically separated native-like bound states that interconvert on timescales of eighty to five hundred nanoseconds. These bound states share a core set of native intermolecular contacts and stable NTAIL helices and are differentiated by a sequential formation of native and non-native contacts and additional helical turns. Our analyses provide an atomic resolution structural description of intermediate states in a folding-upon-binding pathway and elucidate the nature of the kinetic barriers between metastable states in a dynamic and heterogenous, or "fuzzy", protein complex.
Collapse
Affiliation(s)
- Thomas R. Sisk
- Department of Chemistry, Dartmouth College, Hanover, NH03755
| | - Paul Robustelli
- Department of Chemistry, Dartmouth College, Hanover, NH03755
| |
Collapse
|
8
|
Wu H, Noé F. Reaction coordinate flows for model reduction of molecular kinetics. J Chem Phys 2024; 160:044109. [PMID: 38270975 DOI: 10.1063/5.0176078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Accepted: 12/26/2023] [Indexed: 01/26/2024] Open
Abstract
In this work, we introduce a flow based machine learning approach called reaction coordinate (RC) flow for the discovery of low-dimensional kinetic models of molecular systems. The RC flow utilizes a normalizing flow to design the coordinate transformation and a Brownian dynamics model to approximate the kinetics of RC, where all model parameters can be estimated in a data-driven manner. In contrast to existing model reduction methods for molecular kinetics, RC flow offers a trainable and tractable model of reduced kinetics in continuous time and space due to the invertibility of the normalizing flow. Furthermore, the Brownian dynamics-based reduced kinetic model investigated in this work yields a readily discernible representation of metastable states within the phase space of the molecular system. Numerical experiments demonstrate how effectively the proposed method discovers interpretable and accurate low-dimensional representations of given full-state kinetics from simulations.
Collapse
Affiliation(s)
- Hao Wu
- School of Mathematical Sciences, Institute of Natural Sciences and MOE-LSC, Shanghai Jiao Tong University, Shanghai, People's Republic of China
| | - Frank Noé
- Department of Mathematics and Computer Science and Department of Physics, Freie Universität Berlin, Berlin, Germany
- Department of Chemistry, Rice University, Houston, Texas 77005, USA
- Microsoft Research AI4Science, Berlin, Germany
| |
Collapse
|
9
|
Copperman J, Mclean IC, Gross SM, Chang YH, Zuckerman DM, Heiser LM. Single-cell morphodynamical trajectories enable prediction of gene expression accompanying cell state change. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.18.576248. [PMID: 38293173 PMCID: PMC10827140 DOI: 10.1101/2024.01.18.576248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2024]
Abstract
Extracellular signals induce changes to molecular programs that modulate multiple cellular phenotypes, including proliferation, motility, and differentiation status. The connection between dynamically adapting phenotypic states and the molecular programs that define them is not well understood. Here we develop data-driven models of single-cell phenotypic responses to extracellular stimuli by linking gene transcription levels to "morphodynamics" - changes in cell morphology and motility observable in time-lapse image data. We adopt a dynamics-first view of cell state by grouping single-cell trajectories into states with shared morphodynamic responses. The single-cell trajectories enable development of a first-of-its-kind computational approach to map live-cell dynamics to snapshot gene transcript levels, which we term MMIST, Molecular and Morphodynamics-Integrated Single-cell Trajectories. The key conceptual advance of MMIST is that cell behavior can be quantified based on dynamically defined states and that extracellular signals alter the overall distribution of cell states by altering rates of switching between states. We find a cell state landscape that is bound by epithelial and mesenchymal endpoints, with distinct sequences of epithelial to mesenchymal transition (EMT) and mesenchymal to epithelial transition (MET) intermediates. The analysis yields predictions for gene expression changes consistent with curated EMT gene sets and provides a prediction of thousands of RNA transcripts through extracellular signal-induced EMT and MET with near-continuous time resolution. The MMIST framework leverages true single-cell dynamical behavior to generate molecular-level omics inferences and is broadly applicable to other biological domains, time-lapse imaging approaches and molecular snapshot data.
Collapse
Affiliation(s)
- Jeremy Copperman
- Cancer Early Detection Advanced Research Center, Oregon Health and Science University, Portland OR 97239, U.S.A
| | - Ian C. Mclean
- Department of Biomedical Engineering, Oregon Health and Science University, Portland OR 97239, U.S.A
| | | | - Young Hwan Chang
- Department of Biomedical Engineering, Oregon Health and Science University, Portland OR 97239, U.S.A
- Knight Cancer Institute, Oregon Health and Science University, Portland OR 97239, U.S.A
| | - Daniel M. Zuckerman
- Department of Biomedical Engineering, Oregon Health and Science University, Portland OR 97239, U.S.A
- Knight Cancer Institute, Oregon Health and Science University, Portland OR 97239, U.S.A
| | - Laura M. Heiser
- Department of Biomedical Engineering, Oregon Health and Science University, Portland OR 97239, U.S.A
- Knight Cancer Institute, Oregon Health and Science University, Portland OR 97239, U.S.A
| |
Collapse
|
10
|
Tian J, Dong X, Wu T, Wen P, Liu X, Zhang M, An X, Shi D. Revealing the conformational dynamics of UDP-GlcNAc recognition by O-GlcNAc transferase via Markov state model. Int J Biol Macromol 2024; 256:128405. [PMID: 38016609 DOI: 10.1016/j.ijbiomac.2023.128405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 11/20/2023] [Accepted: 11/22/2023] [Indexed: 11/30/2023]
Abstract
The O-linked N-acetylglucosamine (O-GlcNAc) glycosylation is a critical post-translational modification and closely linked to various physiological and pathological conditions. The O-GlcNAc transferase (OGT) functions as the only glycosyltransferase of O-GlcNAc glycosylation by transferring GlcNAc from UDP-GlcNAc to serine or threonine residues on protein substrates. The interaction mode of UDP-GlcNAc against OGT has been preliminarily revealed by the crystal structures, yet an atomic-level comprehension for the conformational dynamics of the recognition process remains elusive. Here, we construct the Markov state model based on extensive all-atom molecular dynamics (MD) simulations with an aggregated simulation time of ∼9 μs, and reveal that the UDP-GlcNAc recognition process by OGT encompasses four key metastable states, occurring within an estimated timescale of ∼10 μs. During UDP-GlcNAc recognition process, we find the pyrophosphate moiety (P2O52-) initially anchors to the active pocket via salt bridge and hydrogen bonds, facilitating subsequent binding of the uridine and GlcNAc moieties. Furthermore, the functional roles of K842 involved in the salt bridge with P2O52- were evaluated through extra mutant MD simulations. Overall, our study provides valuable insights into the UDP-GlcNAc recognition mechanism by OGT, which could further aid in mechanistic studies of O-GlcNAc glycosylation and drug development targeting on OGT.
Collapse
Affiliation(s)
- Jiaqi Tian
- School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu Province, China
| | - Xin Dong
- School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu Province, China
| | - Tianshuo Wu
- School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu Province, China
| | - Pengbo Wen
- School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu Province, China
| | - Xin Liu
- School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu Province, China
| | - Mengying Zhang
- School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu Province, China
| | - Xiaoli An
- School of Chemical Engineering, Institute of Pharmaceutical Engineering Technology and Application, Sichuan University of Science & Engineering, Xueyuan Street 180, Huixing Road, Zigong 643000, Sichuan, China.
| | - Danfeng Shi
- Warshel Institute for Computational Biology, School of Life and Health Sciences, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, Guangdong, China.
| |
Collapse
|
11
|
Oh M, da Hora GCA, Swanson JMJ. tICA-Metadynamics for Identifying Slow Dynamics in Membrane Permeation. J Chem Theory Comput 2023; 19:8886-8900. [PMID: 37943658 DOI: 10.1021/acs.jctc.3c00526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2023]
Abstract
Molecular simulations are commonly used to understand the mechanism of membrane permeation of small molecules, particularly for biomedical and pharmaceutical applications. However, despite significant advances in computing power and algorithms, calculating an accurate permeation free energy profile remains elusive for many drug molecules because it can require identifying the rate-limiting degrees of freedom (i.e., appropriate reaction coordinates). To resolve this issue, researchers have developed machine learning approaches to identify slow system dynamics. In this work, we apply time-lagged independent component analysis (tICA), an unsupervised dimensionality reduction algorithm, to molecular dynamics simulations with well-tempered metadynamics to find the slowest collective degrees of freedom of the permeation process of trimethoprim through a multicomponent membrane. We show that tICA-metadynamics yields translational and orientational collective variables (CVs) that increase convergence efficiency ∼1.5 times. However, crossing the periodic boundary is shown to introduce artifacts in the translational CV that can be corrected by taking absolute values of molecular features. Additionally, we find that the convergence of the tICA CVs is reached with approximately five membrane crossings and that data reweighting is required to avoid deviations in the translational CV.
Collapse
Affiliation(s)
- Myongin Oh
- Department of Chemistry, University of Utah, 315 South 1400 East, Rm 2020, Salt Lake City, Utah 84112, United States
| | - Gabriel C A da Hora
- Department of Chemistry, University of Utah, 315 South 1400 East, Rm 2020, Salt Lake City, Utah 84112, United States
| | - Jessica M J Swanson
- Department of Chemistry, University of Utah, 315 South 1400 East, Rm 2020, Salt Lake City, Utah 84112, United States
| |
Collapse
|
12
|
Fu H, Liu H, Xing J, Zhao T, Shao X, Cai W. Deep-Learning-Assisted Enhanced Sampling for Exploring Molecular Conformational Changes. J Phys Chem B 2023; 127:9926-9935. [PMID: 37947397 DOI: 10.1021/acs.jpcb.3c05284] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2023]
Abstract
We present a novel strategy to explore conformational changes and identify stable states of molecular objects, eliminating the need for a priori knowledge. The approach applies a deep learning method to extract information about the movement modes of the molecular object from a short, high-dimensional, and parameter-free preliminary enhanced-sampling simulation. The gathered information is described by a small set of deep-learning-based collective variables (dCVs), which steer the production-enhanced-sampling simulation. Considering the challenge of adequately exploring the configurational space using the low-dimensional, suboptimal dCVs, we incorporate a method designed for ergodic sampling, namely, Gaussian-accelerated molecular dynamics (MD), into the framework of CV-based enhanced sampling. MD simulations on both toy models and nontrivial examples demonstrate the remarkable computational efficiency of the strategy in capturing the conformational changes of molecular objects without a priori knowledge. Specifically, we achieved the blind folding of two fast folders, chignolin and villin, within a time scale of hundreds of nanoseconds and successfully reconstructed the free-energy landscapes that characterize their reversible folding. All in all, the presented strategy holds significant promise for investigating conformational changes in macromolecules, and it is anticipated to find extensive applications in the fields of chemistry and biology.
Collapse
Affiliation(s)
- Haohao Fu
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Han Liu
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Jingya Xing
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Tong Zhao
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
| | - Xueguang Shao
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Wensheng Cai
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| |
Collapse
|
13
|
Lemcke S, Appeldorn JH, Wand M, Speck T. Toward a structural identification of metastable molecular conformations. J Chem Phys 2023; 159:114105. [PMID: 37712784 DOI: 10.1063/5.0164145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Accepted: 08/21/2023] [Indexed: 09/16/2023] Open
Abstract
Interpreting high-dimensional data from molecular dynamics simulations is a persistent challenge. In this paper, we show that for a small peptide, deca-alanine, metastable states can be identified through a neural net based on structural information alone. While processing molecular dynamics data, dimensionality reduction is a necessary step that projects high-dimensional data onto a low-dimensional representation that, ideally, captures the conformational changes in the underlying data. Conventional methods make use of the temporal information contained in trajectories generated through integrating the equations of motion, which forgoes more efficient sampling schemes. We demonstrate that EncoderMap, an autoencoder architecture with an additional distance metric, can find a suitable low-dimensional representation to identify long-lived molecular conformations using exclusively structural information. For deca-alanine, which exhibits several helix-forming pathways, we show that this approach allows us to combine simulations with different biasing forces and yields representations comparable in quality to other established methods. Our results contribute to computational strategies for the rapid automatic exploration of the configuration space of peptides and proteins.
Collapse
Affiliation(s)
- Simon Lemcke
- Institut für Physik, Johannes Gutenberg-Universität Mainz, Staudingerweg 7-9, 55128 Mainz, Germany
| | - Jörn H Appeldorn
- Institut für Physik, Johannes Gutenberg-Universität Mainz, Staudingerweg 7-9, 55128 Mainz, Germany
| | - Michael Wand
- Institut für Informatik, Johannes Gutenberg-Universität Mainz, Staudingerweg 9, 55128 Mainz, Germany
| | - Thomas Speck
- Institut für Theoretische Physik IV, Universität Stuttgart, Heisenbergstr. 3, 70569 Stuttgart, Germany
| |
Collapse
|
14
|
Oh M, da Hora GCA, Swanson JMJ. tICA-Metadynamics for Identifying Slow Dynamics in Membrane Permeation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.16.553477. [PMID: 37645884 PMCID: PMC10462029 DOI: 10.1101/2023.08.16.553477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]
Abstract
Molecular simulations are commonly used to understand the mechanism of membrane permeation of small molecules, particularly for biomedical and pharmaceutical applications. However, despite significant advances in computing power and algorithms, calculating an accurate permeation free energy profile remains elusive for many drug molecules because it can require identifying the rate-limiting degrees of freedom (i.e., appropriate reaction coordinates). To resolve this issue, researchers have developed machine learning approaches to identify slow system dynamics. In this work, we apply time-lagged independent component analysis (tICA), an unsupervised dimensionality reduction algorithm, to molecular dynamics simulations with well-tempered metadynamics to find the slowest collective degrees of freedom of the permeation process of trimethoprim through a multicomponent membrane. We show that tICA-metadynamics yields translational and orientational collective variables (CVs) that increase convergence efficiency ∼1.5 times. However, crossing the periodic boundary is shown to introduce artefacts in the translational CV that can be corrected by taking absolute values of molecular features. Additionally, we find that the convergence of the tICA CVs is reached with approximately five membrane crossings, and that data reweighting is required to avoid deviations in the translational CV.
Collapse
|
15
|
Nagel D, Sartore S, Stock G. Toward a Benchmark for Markov State Models: The Folding of HP35. J Phys Chem Lett 2023; 14:6956-6967. [PMID: 37504674 DOI: 10.1021/acs.jpclett.3c01561] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Adopting a 300 μs long MD trajectory of the folding of villin headpiece (HP35) by D. E. Shaw Research, we recently constructed a Markov state model (MSM) based on inter-residue contacts. The model reproduces the folding time and predicts that the native basin and unfolded region consist of metastable substates that are structurally well-characterized. Recognizing the need to establish well-defined benchmark problems, we study to what extent and in what sense this MSM can be employed as a reference model. Hence, we test the robustness of the MSM by comparing it to models that use alternative combinations of features, dimensionality reduction methods, and clustering schemes. The study suggests some main characteristics of the folding of HP35 that should be reproduced by other competitive models. Moreover, the discussion reveals which parts of the MSM workflow matter most for the considered problem and illustrates the promises and pitfalls of state-based models for the interpretation of biomolecular simulations.
Collapse
Affiliation(s)
- Daniel Nagel
- Biomolecular Dynamics, Institute of Physics, University of Freiburg, 79104 Freiburg, Germany
| | - Sofia Sartore
- Biomolecular Dynamics, Institute of Physics, University of Freiburg, 79104 Freiburg, Germany
| | - Gerhard Stock
- Biomolecular Dynamics, Institute of Physics, University of Freiburg, 79104 Freiburg, Germany
| |
Collapse
|
16
|
Ghorbani M, Brooks BR, Klauda JB. Conformational Fluctuations in β2-Microglubulin Using Markov State Modeling and Molecular Dynamics. J Phys Chem B 2023; 127:6887-6895. [PMID: 37527428 DOI: 10.1021/acs.jpcb.3c02473] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/03/2023]
Abstract
Conformational dynamics in proteins can give rise to aggregation prone states during folding, and these kinetically stable states could form oligomers and aggregates. In this study, we investigate the intermediate states and near-folded states of β2-microglobulin and their physico-chemical properties using molecular dynamics and Markov state modeling. Analysis of hundreds of microseconds simulation show the importance of the edge strands in the misfolded states that give rise to a high exposure of hydrophobic residues in the core of the protein that could initiate oligomerization and aggregate formation. Our study sheds light on the first step of aggregation of β2m monomers and gave a better picture of the landscape of protein misfolding and aggregation.
Collapse
Affiliation(s)
- Mahdi Ghorbani
- Department of Chemical and Biomolecular Engineering, University of Maryland, College Park, Maryland 20742, United States
- Laboratory of Computational Biology, National, Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, Maryland 20824, United States
| | - Bernard R Brooks
- Laboratory of Computational Biology, National, Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, Maryland 20824, United States
| | - Jeffery B Klauda
- Department of Chemical and Biomolecular Engineering, University of Maryland, College Park, Maryland 20742, United States
| |
Collapse
|
17
|
Sisk T, Robustelli P. Folding-upon-binding pathways of an intrinsically disordered protein from a deep Markov state model. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.21.550103. [PMID: 37546728 PMCID: PMC10401938 DOI: 10.1101/2023.07.21.550103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
A central challenge in the study of intrinsically disordered proteins is the characterization of the mechanisms by which they bind their physiological interaction partners. Here, we utilize a deep learning based Markov state modeling approach to characterize the folding-upon-binding pathways observed in a long-time scale molecular dynamics simulation of a disordered region of the measles virus nucleoprotein NTAIL reversibly binding the X domain of the measles virus phosphoprotein complex. We find that folding-upon-binding predominantly occurs via two distinct encounter complexes that are differentiated by the binding orientation, helical content, and conformational heterogeneity of NTAIL. We do not, however, find evidence for the existence of canonical conformational selection or induced fit binding pathways. We observe four kinetically separated native-like bound states that interconvert on time scales of eighty to five hundred nanoseconds. These bound states share a core set of native intermolecular contacts and stable NTAIL helices and are differentiated by a sequential formation of native and non-native contacts and additional helical turns. Our analyses provide an atomic resolution structural description of intermediate states in a folding-upon-binding pathway and elucidate the nature of the kinetic barriers between metastable states in a dynamic and heterogenous, or "fuzzy", protein complex.
Collapse
Affiliation(s)
- Thomas Sisk
- Dartmouth College, Department of Chemistry, Hanover, NH, 03755
| | - Paul Robustelli
- Dartmouth College, Department of Chemistry, Hanover, NH, 03755
| |
Collapse
|
18
|
Chen H, Roux B, Chipot C. Discovering Reaction Pathways, Slow Variables, and Committor Probabilities with Machine Learning. J Chem Theory Comput 2023. [PMID: 37224455 DOI: 10.1021/acs.jctc.3c00028] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
A significant challenge faced by atomistic simulations is the difficulty, and often impossibility, to sample the transitions between metastable states of the free-energy landscape associated with slow molecular processes. Importance-sampling schemes represent an appealing option to accelerate the underlying dynamics by smoothing out the relevant free-energy barriers, but require the definition of suitable reaction-coordinate (RC) models expressed in terms of compact low-dimensional sets of collective variables (CVs). While most computational studies of slow molecular processes have traditionally relied on educated guesses based on human intuition to reduce the dimensionality of the problem at hand, a variety of machine-learning (ML) algorithms have recently emerged as powerful alternatives to discover meaningful CVs capable of capturing the dynamics of the slowest degrees of freedom. Considering a simple paradigmatic situation in which the long-time dynamics is dominated by the transition between two known metastable states, we compare two variational data-driven ML methods based on Siamese neural networks aimed at discovering a meaningful RC model─the slowest decorrelating CV of the molecular process, and the committor probability to first reach one of the two metastable states. One method is the state-free reversible variational approach for Markov processes networks (VAMPnets), or SRVs─the other, inspired by the transition path theory framework, is the variational committor-based neural networks, or VCNs. The relationship and the ability of these methodologies to discover the relevant descriptors of the slow molecular process of interest are illustrated with a series of simple model systems. We also show that both strategies are amenable to importance-sampling schemes through an appropriate reweighting algorithm that approximates the kinetic properties of the transition.
Collapse
Affiliation(s)
- Haochuan Chen
- Laboratoire International Associé Centre National de la Recherche Scientifique et University of Illinois at Urbana-Champaign, Unité Mixte de Recherche n°7019, Université de Lorraine, B.P. 70239, 54506 Vandœuvre-lès-Nancy cedex, France
| | - Benoît Roux
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, 60637, United States
| | - Christophe Chipot
- Laboratoire International Associé Centre National de la Recherche Scientifique et University of Illinois at Urbana-Champaign, Unité Mixte de Recherche n°7019, Université de Lorraine, B.P. 70239, 54506 Vandœuvre-lès-Nancy cedex, France
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, 60637, United States
- NIH Center for Macromolecular Modeling and Bioinformatics, Beckman Institute for Advanced Science and Technology, and Department of Physics, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| |
Collapse
|
19
|
Nagel D, Sartore S, Stock G. Selecting Features for Markov Modeling: A Case Study on HP35. J Chem Theory Comput 2023. [PMID: 37167425 DOI: 10.1021/acs.jctc.3c00240] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
Markov state models represent a popular means to interpret molecular dynamics trajectories in terms of memoryless transitions between metastable conformational states. To provide a mechanistic understanding of the considered biomolecular process, these states should reflect structurally distinct conformations and ensure a time scale separation between fast intrastate and slow interstate dynamics. Adopting the folding of villin headpiece (HP35) as a well-established model problem, here we discuss the selection of suitable input coordinates or "features", such as backbone dihedral angles and interresidue distances. We show that dihedral angles account accurately for the structure of the native energy basin of HP35, while the unfolded region of the free energy landscape and the folding process are best described by tertiary contacts of the protein. To construct a contact-based model, we consider various ways to define and select contact distances and introduce a low-pass filtering of the feature trajectory as well as a correlation-based characterization of states. Relying on input data that faithfully account for the mechanistic origin of the studied process, the states of the resulting Markov model are clearly discriminated by the features, describe consistently the hierarchical structure of the free energy landscape, and─as a consequence─correctly reproduce the slow time scales of the process.
Collapse
Affiliation(s)
- Daniel Nagel
- Biomolecular Dynamics, Institute of Physics, University of Freiburg, 79104 Freiburg, Germany
| | - Sofia Sartore
- Biomolecular Dynamics, Institute of Physics, University of Freiburg, 79104 Freiburg, Germany
| | - Gerhard Stock
- Biomolecular Dynamics, Institute of Physics, University of Freiburg, 79104 Freiburg, Germany
| |
Collapse
|
20
|
Dominic AJ, Cao S, Montoya-Castillo A, Huang X. Memory Unlocks the Future of Biomolecular Dynamics: Transformative Tools to Uncover Physical Insights Accurately and Efficiently. J Am Chem Soc 2023; 145:9916-9927. [PMID: 37104720 DOI: 10.1021/jacs.3c01095] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]
Abstract
Conformational changes underpin function and encode complex biomolecular mechanisms. Gaining atomic-level detail of how such changes occur has the potential to reveal these mechanisms and is of critical importance in identifying drug targets, facilitating rational drug design, and enabling bioengineering applications. While the past two decades have brought Markov state model techniques to the point where practitioners can regularly use them to glimpse the long-time dynamics of slow conformations in complex systems, many systems are still beyond their reach. In this Perspective, we discuss how including memory (i.e., non-Markovian effects) can reduce the computational cost to predict the long-time dynamics in these complex systems by orders of magnitude and with greater accuracy and resolution than state-of-the-art Markov state models. We illustrate how memory lies at the heart of successful and promising techniques, ranging from the Fokker-Planck and generalized Langevin equations to deep-learning recurrent neural networks and generalized master equations. We delineate how these techniques work, identify insights that they can offer in biomolecular systems, and discuss their advantages and disadvantages in practical settings. We show how generalized master equations can enable the investigation of, for example, the gate-opening process in RNA polymerase II and demonstrate how our recent advances tame the deleterious influence of statistical underconvergence of the molecular dynamics simulations used to parameterize these techniques. This represents a significant leap forward that will enable our memory-based techniques to interrogate systems that are currently beyond the reach of even the best Markov state models. We conclude by discussing some current challenges and future prospects for how exploiting memory will open the door to many exciting opportunities.
Collapse
Affiliation(s)
- Anthony J Dominic
- Department of Chemistry, University of Colorado Boulder, Boulder, Colorado 80309, USA
| | - Siqin Cao
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | | | - Xuhui Huang
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| |
Collapse
|
21
|
Shmilovich K, Ferguson AL. Girsanov Reweighting Enhanced Sampling Technique (GREST): On-the-Fly Data-Driven Discovery of and Enhanced Sampling in Slow Collective Variables. J Phys Chem A 2023; 127:3497-3517. [PMID: 37036804 DOI: 10.1021/acs.jpca.3c00505] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/11/2023]
Abstract
Molecular dynamics simulations of microscopic phenomena are limited by the short integration time steps which are required for numerical stability but which limit the practically achievable simulation time scales. Collective variable (CV) enhanced sampling techniques apply biases to predefined collective coordinates to promote barrier crossing, phase space exploration, and sampling of rare events. The efficacy of these techniques is contingent on the selection of good CVs correlated with the molecular motions governing the long-time dynamical evolution of the system. In this work, we introduce Girsanov Reweighting Enhanced Sampling Technique (GREST) as an adaptive sampling scheme that interleaves rounds of data-driven slow CV discovery and enhanced sampling along these coordinates. Since slow CVs are inherently dynamical quantities, a key ingredient in our approach is the use of both thermodynamic and dynamical Girsanov reweighting corrections for rigorous estimation of slow CVs from biased simulation data. We demonstrate our approach on a toy 1D 4-well potential, a simple biomolecular system alanine dipeptide, and the Trp-Leu-Ala-Leu-Leu (WLALL) pentapeptide. In each case GREST learns appropriate slow CVs and drives sampling of all thermally accessible metastable states starting from zero prior knowledge of the system. We make GREST accessible to the community via a publicly available open source Python package.
Collapse
Affiliation(s)
- Kirill Shmilovich
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Andrew L Ferguson
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| |
Collapse
|
22
|
Yang W, Templeton C, Rosenberger D, Bittracher A, Nüske F, Noé F, Clementi C. Slicing and Dicing: Optimal Coarse-Grained Representation to Preserve Molecular Kinetics. ACS CENTRAL SCIENCE 2023; 9:186-196. [PMID: 36844497 PMCID: PMC9951291 DOI: 10.1021/acscentsci.2c01200] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Indexed: 05/05/2023]
Abstract
The aim of molecular coarse-graining approaches is to recover relevant physical properties of the molecular system via a lower-resolution model that can be more efficiently simulated. Ideally, the lower resolution still accounts for the degrees of freedom necessary to recover the correct physical behavior. The selection of these degrees of freedom has often relied on the scientist's chemical and physical intuition. In this article, we make the argument that in soft matter contexts desirable coarse-grained models accurately reproduce the long-time dynamics of a system by correctly capturing the rare-event transitions. We propose a bottom-up coarse-graining scheme that correctly preserves the relevant slow degrees of freedom, and we test this idea for three systems of increasing complexity. We show that in contrast to this method existing coarse-graining schemes such as those from information theory or structure-based approaches are not able to recapitulate the slow time scales of the system.
Collapse
Affiliation(s)
- Wangfei Yang
- Center
for Theoretical Biological Physics, Rice
University, Houston, Texas77005, United States
- Graduate
Program in Systems, Synthetic and Physical Biology, Rice University, Houston, Texas77005, United States
| | - Clark Templeton
- Department
of Physics, Freie Universität Berlin, Arnimallee 12, 14195Berlin, Germany
| | - David Rosenberger
- Department
of Physics, Freie Universität Berlin, Arnimallee 12, 14195Berlin, Germany
| | - Andreas Bittracher
- Department
of Mathematics and Computer Science, Freie
Universität Berlin, Arnimallee 12, 14195Berlin, Germany
| | - Feliks Nüske
- Max
Planck Institute for Dynamics of Complex Technical Systems, Sandtorstrasse 1, 39106Magdeburg, Germany
| | - Frank Noé
- Center
for Theoretical Biological Physics, Rice
University, Houston, Texas77005, United States
- Department
of Physics, Freie Universität Berlin, Arnimallee 12, 14195Berlin, Germany
- Department
of Mathematics and Computer Science, Freie
Universität Berlin, Arnimallee 12, 14195Berlin, Germany
- Department
of Chemistry, Rice University, Houston, Texas77005, United States
| | - Cecilia Clementi
- Center
for Theoretical Biological Physics, Rice
University, Houston, Texas77005, United States
- Department
of Physics, Freie Universität Berlin, Arnimallee 12, 14195Berlin, Germany
- Department
of Chemistry, Rice University, Houston, Texas77005, United States
- Department
of Physics, Rice University, Houston, Texas77005, United States
- E-mail:
| |
Collapse
|
23
|
Cao Z, Bao R, Zheng J, Hou Z. Fast Functionalization with High Performance in the Autonomous Information Engine. J Phys Chem Lett 2023; 14:66-72. [PMID: 36566388 DOI: 10.1021/acs.jpclett.2c03335] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Mandal and Jarzynski have proposed a fully autonomous information heat engine, consisting of a demon, a mass, and a memory register interacting with a thermal reservoir. This device converts thermal energy into mechanical work by writing information to a memory register or, conversely, erasing information by consuming mechanical work. Here, we derive a speed limit inequality between the relaxation time of state transformation and the distance between the initial and final distributions, where the combination of the dynamical activity and entropy production plays an important role. Such inequality provides a hint that a speed-performance trade-off relation exists between the relaxation time to a functional state and the average production. To obtain fast functionalization while maintaining the performance, we show that the relaxation dynamics of the information heat engine can be accelerated significantly by devising an optimal initial state of the demon. Our design principle is inspired by the so-called Mpemba effect, where water freezes faster when initially heated.
Collapse
Affiliation(s)
- Zhiyu Cao
- Department of Chemical Physics and Hefei National Research Center for Physical Sciences at the Microscale, University of Science and Technology of China, Hefei, Anhui230026, China
| | - Ruicheng Bao
- Department of Chemical Physics and Hefei National Research Center for Physical Sciences at the Microscale, University of Science and Technology of China, Hefei, Anhui230026, China
| | - Jiming Zheng
- Department of Chemical Physics and Hefei National Research Center for Physical Sciences at the Microscale, University of Science and Technology of China, Hefei, Anhui230026, China
| | - Zhonghuai Hou
- Department of Chemical Physics and Hefei National Research Center for Physical Sciences at the Microscale, University of Science and Technology of China, Hefei, Anhui230026, China
| |
Collapse
|
24
|
Predicting efficacy of drug-carrier nanoparticle designs for cancer treatment: a machine learning-based solution. Sci Rep 2023; 13:547. [PMID: 36631637 PMCID: PMC9834306 DOI: 10.1038/s41598-023-27729-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Accepted: 01/06/2023] [Indexed: 01/13/2023] Open
Abstract
Molecular Dynamic (MD) simulations are very effective in the discovery of nanomedicines for treating cancer, but these are computationally expensive and time-consuming. Existing studies integrating machine learning (ML) into MD simulation to enhance the process and enable efficient analysis cannot provide direct insights without the complete simulation. In this study, we present an ML-based approach for predicting the solvent accessible surface area (SASA) of a nanoparticle (NP), denoting its efficacy, from a fraction of the MD simulations data. The proposed framework uses a time series model for simulating the MD, resulting in an intermediate state, and a second model to calculate the SASA in that state. Empirically, the solution can predict the SASA value 260 timesteps ahead 7.5 times faster with a very low average error of 1956.93. We also introduce the use of an explainability technique to validate the predictions. This work can reduce the computational expense of both processing and data size greatly while providing reliable solutions for the nanomedicine design process.
Collapse
|
25
|
Donati L, Weber M. Assessing transition rates as functions of environmental variables. J Chem Phys 2022; 157:224103. [PMID: 36546809 DOI: 10.1063/5.0109555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
We present a method to estimate the transition rates of molecular systems under different environmental conditions that cause the formation or the breaking of bonds and require the sampling of the Grand Canonical Ensemble. For this purpose, we model the molecular system in terms of probable "scenarios," governed by different potential energy functions, which are separately sampled by classical MD simulations. Reweighting the canonical distribution of each scenario according to specific environmental variables, we estimate the grand canonical distribution, then use the Square Root Approximation method to discretize the Fokker-Planck operator into a rate matrix and the robust Perron Cluster Cluster Analysis method to coarse-grain the kinetic model. This permits efficiently estimating the transition rates of conformational states as functions of environmental variables, for example, the local pH at a cell membrane. In this work, we formalize the theoretical framework of the procedure, and we present a numerical experiment comparing the results with those provided by a constant-pH method based on non-equilibrium Molecular Dynamics Monte Carlo simulations. The method is relevant for the development of new drug design strategies that take into account how the cellular environment influences biochemical processes.
Collapse
Affiliation(s)
- Luca Donati
- Zuse Institute Berlin, Takustr. 7, D-14195 Berlin, Germany
| | - Marcus Weber
- Zuse Institute Berlin, Takustr. 7, D-14195 Berlin, Germany
| |
Collapse
|
26
|
Shmilovich K, Stieffenhofer M, Charron NE, Hoffmann M. Temporally Coherent Backmapping of Molecular Trajectories From Coarse-Grained to Atomistic Resolution. J Phys Chem A 2022; 126:9124-9139. [PMID: 36417670 PMCID: PMC9743211 DOI: 10.1021/acs.jpca.2c07716] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Coarse-graining offers a means to extend the achievable time and length scales of molecular dynamics simulations beyond what is practically possible in the atomistic regime. Sampling molecular configurations of interest can be done efficiently using coarse-grained simulations, from which meaningful physicochemical information can be inferred if the corresponding all-atom configurations are reconstructed. However, this procedure of backmapping to reintroduce the lost atomistic detail into coarse-grain structures has proven a challenging task due to the many feasible atomistic configurations that can be associated with one coarse-grain structure. Existing backmapping methods are strictly frame-based, relying on either heuristics to replace coarse-grain particles with atomic fragments and subsequent relaxation or parametrized models to propose atomic coordinates separately and independently for each coarse-grain structure. These approaches neglect information from previous trajectory frames that is critical to ensuring temporal coherence of the backmapped trajectory, while also offering information potentially helpful to producing higher-fidelity atomic reconstructions. In this work, we present a deep learning-enabled data-driven approach for temporally coherent backmapping that explicitly incorporates information from preceding trajectory structures. Our method trains a conditional variational autoencoder to nondeterministically reconstruct atomistic detail conditioned on both the target coarse-grain configuration and the previously reconstructed atomistic configuration. We demonstrate our backmapping approach on two exemplar biomolecular systems: alanine dipeptide and the miniprotein chignolin. We show that our backmapped trajectories accurately recover the structural, thermodynamic, and kinetic properties of the atomistic trajectory data.
Collapse
Affiliation(s)
- Kirill Shmilovich
- Pritzker
School of Molecular Engineering, University
of Chicago, Chicago, Illinois60637, United States,E-mail:
| | | | - Nicholas E. Charron
- Weiss
School of Natural Sciences, Department of Physics and Astronomy, Rice University, Houston, Texas77005, United States,Department
of Physics, Freie Universität Berlin, Berlin14195, Germany
| | - Moritz Hoffmann
- Fachbereich
Mathematik und Informatik, Freie Universität
Berlin, Berlin14195, Germany
| |
Collapse
|
27
|
Jin J, Pak AJ, Durumeric AEP, Loose TD, Voth GA. Bottom-up Coarse-Graining: Principles and Perspectives. J Chem Theory Comput 2022; 18:5759-5791. [PMID: 36070494 PMCID: PMC9558379 DOI: 10.1021/acs.jctc.2c00643] [Citation(s) in RCA: 70] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Indexed: 01/14/2023]
Abstract
Large-scale computational molecular models provide scientists a means to investigate the effect of microscopic details on emergent mesoscopic behavior. Elucidating the relationship between variations on the molecular scale and macroscopic observable properties facilitates an understanding of the molecular interactions driving the properties of real world materials and complex systems (e.g., those found in biology, chemistry, and materials science). As a result, discovering an explicit, systematic connection between microscopic nature and emergent mesoscopic behavior is a fundamental goal for this type of investigation. The molecular forces critical to driving the behavior of complex heterogeneous systems are often unclear. More problematically, simulations of representative model systems are often prohibitively expensive from both spatial and temporal perspectives, impeding straightforward investigations over possible hypotheses characterizing molecular behavior. While the reduction in resolution of a study, such as moving from an atomistic simulation to that of the resolution of large coarse-grained (CG) groups of atoms, can partially ameliorate the cost of individual simulations, the relationship between the proposed microscopic details and this intermediate resolution is nontrivial and presents new obstacles to study. Small portions of these complex systems can be realistically simulated. Alone, these smaller simulations likely do not provide insight into collectively emergent behavior. However, by proposing that the driving forces in both smaller and larger systems (containing many related copies of the smaller system) have an explicit connection, systematic bottom-up CG techniques can be used to transfer CG hypotheses discovered using a smaller scale system to a larger system of primary interest. The proposed connection between different CG systems is prescribed by (i) the CG representation (mapping) and (ii) the functional form and parameters used to represent the CG energetics, which approximate potentials of mean force (PMFs). As a result, the design of CG methods that facilitate a variety of physically relevant representations, approximations, and force fields is critical to moving the frontier of systematic CG forward. Crucially, the proposed connection between the system used for parametrization and the system of interest is orthogonal to the optimization used to approximate the potential of mean force present in all systematic CG methods. The empirical efficacy of machine learning techniques on a variety of tasks provides strong motivation to consider these approaches for approximating the PMF and analyzing these approximations.
Collapse
Affiliation(s)
- Jaehyeok Jin
- Department of Chemistry,
Chicago Center for Theoretical Chemistry, Institute for Biophysical
Dynamics, and James Franck Institute, The
University of Chicago, Chicago, Illinois 60637, United States
| | - Alexander J. Pak
- Department of Chemistry,
Chicago Center for Theoretical Chemistry, Institute for Biophysical
Dynamics, and James Franck Institute, The
University of Chicago, Chicago, Illinois 60637, United States
| | - Aleksander E. P. Durumeric
- Department of Chemistry,
Chicago Center for Theoretical Chemistry, Institute for Biophysical
Dynamics, and James Franck Institute, The
University of Chicago, Chicago, Illinois 60637, United States
| | - Timothy D. Loose
- Department of Chemistry,
Chicago Center for Theoretical Chemistry, Institute for Biophysical
Dynamics, and James Franck Institute, The
University of Chicago, Chicago, Illinois 60637, United States
| | - Gregory A. Voth
- Department of Chemistry,
Chicago Center for Theoretical Chemistry, Institute for Biophysical
Dynamics, and James Franck Institute, The
University of Chicago, Chicago, Illinois 60637, United States
| |
Collapse
|
28
|
Köhs L, Kukovetz K, Rauh O, Koeppl H. Nonparametric Bayesian inference for meta-stable conformational dynamics. Phys Biol 2022; 19. [PMID: 35944548 DOI: 10.1088/1478-3975/ac885e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Accepted: 08/09/2022] [Indexed: 11/11/2022]
Abstract
Analyses of structural dynamics of biomolecules hold great promise to deepen the understanding of and ability to construct complex molecular systems. To this end, both experimental and computational means are available, such as fluorescence quenching experiments or molecular dynamics simulations, respectively. We argue that while seemingly disparate, both fields of study have to deal with the same type of data about the same underlying phenomenon of conformational switching. Two central challenges typically arise in both contexts: (i) the amount of obtained data is large, and (ii) it is often unknown how many distinct molecular states underlie these data. In this study, we build on the established idea of Markov state modeling and propose a generative, Bayesian nonparametric hidden Markov state model that addresses these challenges. Utilizing hierarchical Dirichlet processes, we treat different meta-stable molecule conformations as distinct Markov states, the number of which we then do not have to set a priori. In contrast to existing approaches to both experimental as well as simulation data that are based on the same idea, we leverage a mean-field variational inference approach, enabling scalable inference on large amounts of data. Furthermore, we specify the model also for the important case of angular data, which however proves to be computationally intractable. Addressing this issue, we propose a computationally tractable approximation to the angular model. We demonstrate the method on synthetic ground truth data and apply it to known benchmark problems as well as electrophysiological experimental data from a conformation-switching ion channel to highlight its practical utility.
Collapse
Affiliation(s)
- Lukas Köhs
- Centre for Synthetic Biology, Technische Universität Darmstadt, Rundeturmstrasse 12, Darmstadt, 64283, GERMANY
| | - Kerri Kukovetz
- Biology Department, Technische Universität Darmstadt, Schnittspahnstrasse 3, Darmstadt, 64287, GERMANY
| | - Oliver Rauh
- Biology Department, Technische Universität Darmstadt, Schnittspahnstrasse 3, Darmstadt, 64287, GERMANY
| | - Heinz Koeppl
- Centre for Synthetic Biology, Technische Universität Darmstadt, Rundeturmstrasse 12, Darmstadt, 64283, GERMANY
| |
Collapse
|
29
|
Cao Z, Hou Z. Improved estimation for energy dissipation in biochemical oscillations. J Chem Phys 2022; 157:025102. [DOI: 10.1063/5.0092126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Biochemical oscillations, regulating the timing of life processes, need consume energy to achieve good performance on crucial functions, such as high accuracy of phase period and high sensitivity to external signals. However, it is a great challenge to precisely estimate the energy dissipation in such systems. Here, based on the stochastic normal form theory (SNFT), we calculate the Pearson correlation coefficient between the oscillatory amplitude and phase, and a trade-off relation between transport efficiency and phase sensitivity can then be derived, which serves as a tighter form than the estimator resulting from the conventional thermodynamic uncertainty relation (TUR). Our findings demonstrate that a more precise energy dissipation estimation can be obtained by enhancing the sensitivity of the biochemical oscillations. Moreover, the internal noise and amplitude power effects have also been discovered.
Collapse
Affiliation(s)
- Zhiyu Cao
- Department of Chemical Physics and Hefei National Laboratory for Physical Sciences at Microscales, iChEM, University of Science and Technology of China, University of Science and Technology of China Department of Chemical Physics, China
| | - Zhonghuai Hou
- Department of Chemical Physics, University of Science and Technology of China Hefei National Laboratory for Physical Sciences at the Microscale, China
| |
Collapse
|
30
|
Hsu WT, Ramirez DA, Sammakia T, Tan Z, Shirts MR. Identifying signatures of proteolytic stability and monomeric propensity in O-glycosylated insulin using molecular simulation. J Comput Aided Mol Des 2022; 36:313-328. [PMID: 35507105 DOI: 10.1007/s10822-022-00453-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Accepted: 04/06/2022] [Indexed: 11/24/2022]
Abstract
Insulin has been commonly adopted as a peptide drug to treat diabetes as it facilitates the uptake of glucose from the blood. The development of oral insulin remains elusive over decades owing to its susceptibility to the enzymes in the gastrointestinal tract and poor permeability through the intestinal epithelium upon dimerization. Recent experimental studies have revealed that certain O-linked glycosylation patterns could enhance insulin's proteolytic stability and reduce its dimerization propensity, but understanding such phenomena at the molecular level is still difficult. To address this challenge, we proposed and tested several structural determinants that could potentially influence insulin's proteolytic stability and dimerization propensity. We used these metrics to assess the properties of interest from [Formula: see text] aggregate molecular dynamics of each of 12 targeted insulin glyco-variants from multiple wild-type crystal structures. We found that glycan-involved hydrogen bonds and glycan-dimer occlusion were useful metrics predicting the proteolytic stability and dimerization propensity of insulin, respectively, as was in part the solvent-accessible surface area of proteolytic sites. However, other plausible metrics were not generally predictive. This work helps better explain how O-linked glycosylation influences the proteolytic stability and monomeric propensity of insulin, illuminating a path towards rational molecular design of insulin glycoforms.
Collapse
Affiliation(s)
- Wei-Tse Hsu
- Department of Chemical & Biological Engineering, University of Colorado Boulder, Boulder, CO, 80309, USA
| | - Dominique A Ramirez
- Department of Biochemistry, University of Colorado Boulder, Boulder, CO, 80309, USA
| | - Tarek Sammakia
- Department of Chemistry, University of Colorado Boulder, Boulder, CO, 80309, USA
| | - Zhongping Tan
- Institute of Materia Medica, Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, 100050, China.
| | - Michael R Shirts
- Department of Chemical & Biological Engineering, University of Colorado Boulder, Boulder, CO, 80309, USA.
| |
Collapse
|
31
|
Integration of machine learning with computational structural biology of plants. Biochem J 2022; 479:921-928. [PMID: 35484946 DOI: 10.1042/bcj20200942] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Revised: 04/01/2022] [Accepted: 04/06/2022] [Indexed: 11/17/2022]
Abstract
Computational structural biology of proteins has developed rapidly in recent decades with the development of new computational tools and the advancement of computing hardware. However, while these techniques have widely been used to make advancements in human medicine, these methods have seen less utilization in the plant sciences. In the last several years, machine learning methods have gained popularity in computational structural biology. These methods have enabled the development of new tools which are able to address the major challenges that have hampered the wide adoption of the computational structural biology of plants. This perspective examines the remaining challenges in computational structural biology and how the development of machine learning techniques enables more in-depth computational structural biology of plants.
Collapse
|
32
|
Ghorbani M, Prasad S, Klauda J, Brooks B. GraphVAMPNet, using graph neural networks and variational approach to Markov processes for dynamical modeling of biomolecules. J Chem Phys 2022; 156:184103. [PMID: 35568532 PMCID: PMC9094994 DOI: 10.1063/5.0085607] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Finding low dimensional representation of data from long-timescale trajectories of biomolecular processes such as protein-folding or ligand-receptor binding is of fundamental importance and kinetic models such as Markov modeling have proven useful in describing the kinetics of these systems. Recently, an unsupervised machine learning technique called VAMPNet was introduced to learn the low dimensional representation and linear dynamical model in an end-to-end manner. VAMPNet is based on variational approach to Markov processes (VAMP) and relies on neural networks to learn the coarse-grained dynamics. In this contribution, we combine VAMPNet and graph neural networks to generate an end-to-end framework to efficiently learn high-level dynamics and metastable states from the long-timescale molecular dynamics trajectories. This method bears the advantages of graph representation learning and uses graph message passing operations to generate an embedding for each datapoint which is used in the VAMPNet to generate a coarse-grained representation. This type of molecular representation results in a higher resolution and more interpretable Markov model than the standard VAMPNet enabling a more detailed kinetic study of the biomolecular processes. Our GraphVAMPNet approach is also enhanced with an attention mechanism to find the important residues for classification into different metastable states.
Collapse
Affiliation(s)
- Mahdi Ghorbani
- University of Maryland at College Park, United States of America
| | - Samarjeet Prasad
- National Heart Lung and Blood Institute, United States of America
| | - Jeffery Klauda
- Chemical and Biomolecular Engineering, University of Maryland at College Park, United States of America
| | - Bernard Brooks
- Laboratory of Computational Biology, National Heart, Lung, and Blood Institute, United States of America
| |
Collapse
|
33
|
Abstract
The kinetics of a dynamical system dominated by two metastable states is examined from the perspective of the activated-dynamics reactive flux formalism, Markov state eigenvalue spectral decomposition, and committor-based transition path theory. Analysis shows that the different theoretical formulations are consistent, clarifying the significance of the inherent microscopic lag-times that are implicated, and that the most meaningful one-dimensional reaction coordinate in the region of the transition state is along the gradient of the committor in the multidimensional subspace of collective variables. It is shown that the familiar reactive flux activated dynamics formalism provides an effective route to calculate the transition rate in the case of a narrow sharp barrier but much less so in the case of a broad flat barrier. In this case, the standard reactive flux correlation function decays very slowly to the plateau value that corresponds to the transmission coefficient. Treating the committor function as a reaction coordinate does not alleviate all issues caused by the slow relaxation of the reactive flux correlation function. A more efficient activated dynamics simulation algorithm may be achieved from a modified reactive flux weighted by the committor. Simulation results on simple systems are used to illustrate the various conceptual points.
Collapse
Affiliation(s)
- Benoît Roux
- Department of Biochemistry and Molecular Biology, Department of Chemistry, The University of Chicago, 5735 S Ellis Ave., Chicago, Illinois 60637, USA
| |
Collapse
|
34
|
Hoffmann M, Scherer M, Hempel T, Mardt A, de Silva B, Husic BE, Klus S, Wu H, Kutz N, Brunton SL, Noé F. Deeptime: a Python library for machine learning dynamical models from time series data. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2022. [DOI: 10.1088/2632-2153/ac3de0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Abstract
Generation and analysis of time-series data is relevant to many quantitative fields ranging from economics to fluid mechanics. In the physical sciences, structures such as metastable and coherent sets, slow relaxation processes, collective variables, dominant transition pathways or manifolds and channels of probability flow can be of great importance for understanding and characterizing the kinetic, thermodynamic and mechanistic properties of the system. Deeptime is a general purpose Python library offering various tools to estimate dynamical models based on time-series data including conventional linear learning methods, such as Markov state models (MSMs), Hidden Markov Models and Koopman models, as well as kernel and deep learning approaches such as VAMPnets and deep MSMs. The library is largely compatible with scikit-learn, having a range of Estimator classes for these different models, but in contrast to scikit-learn also provides deep Model classes, e.g. in the case of an MSM, which provide a multitude of analysis methods to compute interesting thermodynamic, kinetic and dynamical quantities, such as free energies, relaxation times and transition paths. The library is designed for ease of use but also easily maintainable and extensible code. In this paper we introduce the main features and structure of the deeptime software. Deeptime can be found under https://deeptime-ml.github.io/.
Collapse
|
35
|
Gianti E, Percec S. Machine Learning at the Interface of Polymer Science and Biology: How Far Can We Go? Biomacromolecules 2022; 23:576-591. [PMID: 35133143 DOI: 10.1021/acs.biomac.1c01436] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
This Perspective outlines recent progress and future directions for using machine learning (ML), a data-driven method, to address critical questions in the design, synthesis, processing, and characterization of biomacromolecules. The achievement of these tasks requires the navigation of vast and complex chemical and biological spaces, difficult to accomplish with reasonable speed. Using modern algorithms and supercomputers, quantum physics methods are able to examine systems containing a few hundred interacting species and determine the probability of finding them in a particular region of phase space, thereby anticipating their properties. Likewise, modern approaches in chemistry and biomolecular simulation, supported by high performance computing, have culminated in producing data sets of escalating size and intrinsically high complexity. Hence, using ML to extract relevant information from these fields is of paramount importance to advance our understanding of chemical and biomolecular systems. At the heart of ML approaches lie statistical algorithms, which by evaluating a portion of a given data set, identify, learn, and manipulate the underlying rules that govern the whole data set. The assembly of a quality model to represent the data followed by the predictions and elimination of error sources are the key steps in ML. In addition to a growing infrastructure of ML tools to address complex problems, an increasing number of aspects related to our understanding of the fundamental properties of biomacromolecules are exposed to ML. These fields, including those residing at the interface of polymer science and biology (i.e., structure determination, de novo design, folding, and dynamics), strive to adopt and take advantage of the transformative power offered by approaches in the ML domain, which clearly has the potential of accelerating research in the field of biomacromolecules.
Collapse
Affiliation(s)
- Eleonora Gianti
- Institute for Computational Molecular Science (ICMS), Temple University, Philadelphia, Pennsylvania 19122, United States.,Department of Chemistry, Temple University, Philadelphia, Pennsylvania 19122, United States
| | - Simona Percec
- Department of Chemistry, Temple University, Philadelphia, Pennsylvania 19122, United States
| |
Collapse
|
36
|
Dylewsky D, Kaiser E, Brunton SL, Kutz JN. Principal component trajectories for modeling spectrally continuous dynamics as forced linear systems. Phys Rev E 2022; 105:015312. [PMID: 35193205 DOI: 10.1103/physreve.105.015312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Accepted: 01/07/2022] [Indexed: 05/08/2023]
Abstract
Delay embeddings of time series data have emerged as a promising coordinate basis for data-driven estimation of the Koopman operator, which seeks a linear representation for observed nonlinear dynamics. Recent work has demonstrated the efficacy of dynamic mode decomposition (DMD) for obtaining finite-dimensional Koopman approximations in delay coordinates. In this paper we demonstrate how nonlinear dynamics with sparse Fourier spectra can be (i) represented by a superposition of principal component trajectories and (ii) modeled by DMD in this coordinate space. For continuous or mixed (discrete and continuous) spectra, DMD can be augmented with an external forcing term. We present a method for learning linear control models in delay coordinates while simultaneously discovering the corresponding exogenous forcing signal in a fully unsupervised manner. This extends the existing DMD with control (DMDc) algorithm to cases where a control signal is not known a priori. We provide examples to validate the learned forcing against a known ground truth and illustrate their statistical similarity. Finally, we offer a demonstration of this method applied to real-world power grid load data to show its utility for diagnostics and interpretation on systems in which somewhat periodic behavior is strongly forced by unknown and unmeasurable environmental variables.
Collapse
Affiliation(s)
- Daniel Dylewsky
- Department of Physics, University of Washington, Seattle, Washington 98195, USA
| | - Eurika Kaiser
- Department of Mechanical Engineering, University of Washington, Seattle, Washington 98195, USA
| | | | - J Nathan Kutz
- Department of Applied Mathematics, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
37
|
Belkacemi Z, Gkeka P, Lelièvre T, Stoltz G. Chasing Collective Variables Using Autoencoders and Biased Trajectories. J Chem Theory Comput 2021; 18:59-78. [PMID: 34965117 DOI: 10.1021/acs.jctc.1c00415] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Free energy biasing methods have proven to be powerful tools to accelerate the simulation of important conformational changes of molecules by modifying the sampling measure. However, most of these methods rely on the prior knowledge of low-dimensional slow degrees of freedom, i.e., collective variables (CVs). Alternatively, such CVs can be identified using machine learning (ML) and dimensionality reduction algorithms. In this context, approaches where the CVs are learned in an iterative way using adaptive biasing have been proposed: at each iteration, the learned CV is used to perform free energy adaptive biasing to generate new data and learn a new CV. In this paper, we introduce a new iterative method involving CV learning with autoencoders: Free Energy Biasing and Iterative Learning with AutoEncoders (FEBILAE). Our method includes a reweighting scheme to ensure that the learning model optimizes the same loss at each iteration and achieves CV convergence. Using the alanine dipeptide system and the solvated chignolin mini-protein system as examples, we present results of our algorithm using the extended adaptive biasing force as the free energy adaptive biasing method.
Collapse
Affiliation(s)
- Zineb Belkacemi
- CERMICS, Ecole des Ponts ParisTech, 77455 Marne-la-Vallée, France.,Structure Design and Informatics, Sanofi 1371 R&D, 91385 Chilly-Mazarin, France
| | - Paraskevi Gkeka
- Structure Design and Informatics, Sanofi 1371 R&D, 91385 Chilly-Mazarin, France
| | - Tony Lelièvre
- CERMICS, Ecole des Ponts ParisTech, 77455 Marne-la-Vallée, France.,MATHERIALS Team-Project, Inria, 75589 Paris, France
| | - Gabriel Stoltz
- CERMICS, Ecole des Ponts ParisTech, 77455 Marne-la-Vallée, France.,MATHERIALS Team-Project, Inria, 75589 Paris, France
| |
Collapse
|
38
|
Vlachas PR, Zavadlav J, Praprotnik M, Koumoutsakos P. Accelerated Simulations of Molecular Systems through Learning of Effective Dynamics. J Chem Theory Comput 2021; 18:538-549. [PMID: 34890204 DOI: 10.1021/acs.jctc.1c00809] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Simulations are vital for understanding and predicting the evolution of complex molecular systems. However, despite advances in algorithms and special purpose hardware, accessing the time scales necessary to capture the structural evolution of biomolecules remains a daunting task. In this work, we present a novel framework to advance simulation time scales by up to 3 orders of magnitude by learning the effective dynamics (LED) of molecular systems. LED augments the equation-free methodology by employing a probabilistic mapping between coarse and fine scales using mixture density network (MDN) autoencoders and evolves the non-Markovian latent dynamics using long short-term memory MDNs. We demonstrate the effectiveness of LED in the Müller-Brown potential, the Trp cage protein, and the alanine dipeptide. LED identifies explainable reduced-order representations, i.e., collective variables, and can generate, at any instant, all-atom molecular trajectories consistent with the collective variables. We believe that the proposed framework provides a dramatic increase to simulation capabilities and opens new horizons for the effective modeling of complex molecular systems.
Collapse
Affiliation(s)
- Pantelis R Vlachas
- Computational Science and Engineering Laboratory, ETH Zurich, CH-8092, Switzerland
| | - Julija Zavadlav
- Professorship of Multiscale Modeling of Fluid Materials, TUM School of Engineering and Design, Technical University of Munich, 85748 Garching bei München, Germany.,Munich Data Science Institute, Technical University of Munich, 85748 Munich, Germany
| | - Matej Praprotnik
- Laboratory for Molecular Modeling, National Institute of Chemistry, SI-1001 Ljubljana, Slovenia.,Department of Physics, Faculty of Mathematics and Physics, University of Ljubljana, SI-1000 Ljubljana, Slovenia
| | - Petros Koumoutsakos
- John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts 02138, United States
| |
Collapse
|
39
|
Mardt A, Noé F. Progress in deep Markov state modeling: Coarse graining and experimental data restraints. J Chem Phys 2021; 155:214106. [PMID: 34879670 DOI: 10.1063/5.0064668] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Recent advances in deep learning frameworks have established valuable tools for analyzing the long-timescale behavior of complex systems, such as proteins. In particular, the inclusion of physical constraints, e.g., time-reversibility, was a crucial step to make the methods applicable to biophysical systems. Furthermore, we advance the method by incorporating experimental observables into the model estimation showing that biases in simulation data can be compensated for. We further develop a new neural network layer in order to build a hierarchical model allowing for different levels of details to be studied. Finally, we propose an attention mechanism, which highlights important residues for the classification into different states. We demonstrate the new methodology on an ultralong molecular dynamics simulation of the Villin headpiece miniprotein.
Collapse
Affiliation(s)
- Andreas Mardt
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
| | - Frank Noé
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
| |
Collapse
|
40
|
Wu Z, Brunton SL, Revzen S. Challenges in dynamic mode decomposition. J R Soc Interface 2021; 18:20210686. [PMID: 34932929 PMCID: PMC8692036 DOI: 10.1098/rsif.2021.0686] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Accepted: 11/30/2021] [Indexed: 12/24/2022] Open
Abstract
Dynamic mode decomposition (DMD) is a powerful tool for extracting spatial and temporal patterns from multi-dimensional time series, and it has been used successfully in a wide range of fields, including fluid mechanics, robotics and neuroscience. Two of the main challenges remaining in DMD research are noise sensitivity and issues related to Krylov space closure when modelling nonlinear systems. Here, we investigate the combination of noise and nonlinearity in a controlled setting, by studying a class of systems with linear latent dynamics which are observed via multinomial observables. Our numerical models include system and measurement noise. We explore the influences of dataset metrics, the spectrum of the latent dynamics, the normality of the system matrix and the geometry of the dynamics. Our results show that even for these very mildly nonlinear conditions, DMD methods often fail to recover the spectrum and can have poor predictive ability. Our work is motivated by our experience modelling multilegged robot data, where we have encountered great difficulty in reconstructing time series for oscillatory systems with intermediate transients, which decay only slightly faster than a period.
Collapse
Affiliation(s)
- Ziyou Wu
- University of Michigan, Ann Arbor, USA
| | | | | |
Collapse
|
41
|
Gin CR, Shea DE, Brunton SL, Kutz JN. DeepGreen: deep learning of Green's functions for nonlinear boundary value problems. Sci Rep 2021; 11:21614. [PMID: 34732757 PMCID: PMC8566504 DOI: 10.1038/s41598-021-00773-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 10/14/2021] [Indexed: 11/30/2022] Open
Abstract
Boundary value problems (BVPs) play a central role in the mathematical analysis of constrained physical systems subjected to external forces. Consequently, BVPs frequently emerge in nearly every engineering discipline and span problem domains including fluid mechanics, electromagnetics, quantum mechanics, and elasticity. The fundamental solution, or Green's function, is a leading method for solving linear BVPs that enables facile computation of new solutions to systems under any external forcing. However, fundamental Green's function solutions for nonlinear BVPs are not feasible since linear superposition no longer holds. In this work, we propose a flexible deep learning approach to solve nonlinear BVPs using a dual-autoencoder architecture. The autoencoders discover an invertible coordinate transform that linearizes the nonlinear BVP and identifies both a linear operator L and Green's function G which can be used to solve new nonlinear BVPs. We find that the method succeeds on a variety of nonlinear systems including nonlinear Helmholtz and Sturm-Liouville problems, nonlinear elasticity, and a 2D nonlinear Poisson equation and can solve nonlinear BVPs at orders of magnitude faster than traditional methods without the need for an initial guess. The method merges the strengths of the universal approximation capabilities of deep learning with the physics knowledge of Green's functions to yield a flexible tool for identifying fundamental solutions to a variety of nonlinear systems.
Collapse
Affiliation(s)
- Craig R Gin
- Department of Population Health and Pathobiology, North Carolina State University, Raleigh, NC, 27695, USA.
| | - Daniel E Shea
- Department of Materials Science and Engineering, University of Washington, Seattle, WA, 98195, USA.
| | - Steven L Brunton
- Department of Mechanical Engineering, University of Washington, Seattle, WA, 98195, USA
| | - J Nathan Kutz
- Department of Applied Mathematics, University of Washington, Seattle, WA, 98195, USA
| |
Collapse
|
42
|
Busto-Moner L, Feng CJ, Antoszewski A, Tokmakoff A, Dinner AR. Structural Ensemble of the Insulin Monomer. Biochemistry 2021; 60:3125-3136. [PMID: 34637307 PMCID: PMC8552439 DOI: 10.1021/acs.biochem.1c00583] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Revised: 09/21/2021] [Indexed: 11/29/2022]
Abstract
Experimental evidence suggests that monomeric insulin exhibits significant conformational heterogeneity, and modifications of apparently disordered regions affect both biological activity and the longevity of pharmaceutical formulations, presumably through receptor binding and fibrillation/degradation, respectively. However, a microscopic understanding of conformational heterogeneity has been lacking. Here, we integrate all-atom molecular dynamics simulations with an analysis pipeline to investigate the structural ensemble of human insulin monomers. We find that 60% of the structures present at least one of the following elements of disorder: melting of the A-chain N-terminal helix, detachment of the B-chain N-terminus, and detachment of the B-chain C-terminus. We also observe partial melting and extension of the B-chain helix and significant conformational heterogeneity in the region containing the B-chain β-turn. We then estimate hydrogen-exchange protection factors for the sampled ensemble and find them in line with experimental results for KP-insulin, although the simulations underestimate the importance of unfolded states. Our results help explain the ready exchange of specific amide sites that appear to be protected in crystal structures. Finally, we discuss the implications for insulin function and stability.
Collapse
Affiliation(s)
- Luis Busto-Moner
- Department
of Chemistry, The University of Chicago, Chicago, Illinois 60637, United States
| | - Chi-Jui Feng
- Department
of Chemistry, The University of Chicago, Chicago, Illinois 60637, United States
| | - Adam Antoszewski
- Department
of Chemistry, The University of Chicago, Chicago, Illinois 60637, United States
| | - Andrei Tokmakoff
- Department
of Chemistry, The University of Chicago, Chicago, Illinois 60637, United States
- James
Franck Institute, The University of Chicago, Chicago, Illinois 60637, United States
- Institute
for Biophysical Dynamics, The University
of Chicago, Chicago, Illinois 60637, United
States
| | - Aaron R. Dinner
- Department
of Chemistry, The University of Chicago, Chicago, Illinois 60637, United States
- James
Franck Institute, The University of Chicago, Chicago, Illinois 60637, United States
- Institute
for Biophysical Dynamics, The University
of Chicago, Chicago, Illinois 60637, United
States
| |
Collapse
|
43
|
Sharpe DJ, Wales DJ. Nearly reducible finite Markov chains: Theory and algorithms. J Chem Phys 2021; 155:140901. [PMID: 34654307 DOI: 10.1063/5.0060978] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Finite Markov chains, memoryless random walks on complex networks, appear commonly as models for stochastic dynamics in condensed matter physics, biophysics, ecology, epidemiology, economics, and elsewhere. Here, we review exact numerical methods for the analysis of arbitrary discrete- and continuous-time Markovian networks. We focus on numerically stable methods that are required to treat nearly reducible Markov chains, which exhibit a separation of characteristic timescales and are therefore ill-conditioned. In this metastable regime, dense linear algebra methods are afflicted by propagation of error in the finite precision arithmetic, and the kinetic Monte Carlo algorithm to simulate paths is unfeasibly inefficient. Furthermore, iterative eigendecomposition methods fail to converge without the use of nontrivial and system-specific preconditioning techniques. An alternative approach is provided by state reduction procedures, which do not require additional a priori knowledge of the Markov chain. Macroscopic dynamical quantities, such as moments of the first passage time distribution for a transition to an absorbing state, and microscopic properties, such as the stationary, committor, and visitation probabilities for nodes, can be computed robustly using state reduction algorithms. The related kinetic path sampling algorithm allows for efficient sampling of trajectories on a nearly reducible Markov chain. Thus, all of the information required to determine the kinetically relevant transition mechanisms, and to identify the states that have a dominant effect on the global dynamics, can be computed reliably even for computationally challenging models. Rare events are a ubiquitous feature of realistic dynamical systems, and so the methods described herein are valuable in many practical applications.
Collapse
Affiliation(s)
- Daniel J Sharpe
- Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| | - David J Wales
- Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| |
Collapse
|
44
|
Thomas T, Roux B. TYROSINE KINASES: COMPLEX MOLECULAR SYSTEMS CHALLENGING COMPUTATIONAL METHODOLOGIES. THE EUROPEAN PHYSICAL JOURNAL. B 2021; 94:203. [PMID: 36524055 PMCID: PMC9749240 DOI: 10.1140/epjb/s10051-021-00207-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Accepted: 09/14/2021] [Indexed: 05/28/2023]
Abstract
Classical molecular dynamics (MD) simulations based on atomic models play an increasingly important role in a wide range of applications in physics, biology, and chemistry. Nonetheless, generating genuine knowledge about biological systems using MD simulations remains challenging. Protein tyrosine kinases are important cellular signaling enzymes that regulate cell growth, proliferation, metabolism, differentiation, and migration. Due to the large conformational changes and long timescales involved in their function, these kinases present particularly challenging problems to modern computational and theoretical frameworks aimed at elucidating the dynamics of complex biomolecular systems. Markov state models have achieved limited success in tackling the broader conformational ensemble and biased methods are often employed to examine specific long timescale events. Recent advances in machine learning continue to push the limitations of current methodologies and provide notable improvements when integrated with the existing frameworks. A broad perspective is drawn from a critical review of recent studies.
Collapse
|
45
|
Abstract
![]()
The kinetics of
a dynamical system comprising two metastable states
is formulated in terms of a finite-time propagator in phase space
(position and velocity) adapted to the underdamped Langevin equation.
Dimensionality reduction to a subspace of collective variables yields
familiar expressions for the propagator, committor, and steady-state
flux. A quadratic expression for the steady-state flux between the
two metastable states can serve as a robust variational principle
to determine an optimal approximate committor expressed in terms of
a set of collective variables. The theoretical formulation is exploited
to clarify the foundation of the string method with swarms-of-trajectories,
which relies on the mean drift of short trajectories to determine
the optimal transition pathway. It is argued that the conditions for
Markovity within a subspace of collective variables may not be satisfied
with an arbitrary short time-step and that proper kinetic behaviors
appear only when considering the effective propagator for longer lag
times. The effective propagator with finite lag time is amenable to
an eigenvalue-eigenvector spectral analysis, as elaborated previously
in the context of position-based Markov models. The time-correlation
functions calculated by swarms-of-trajectories along the string pathway
constitutes a natural extension of these developments. The present
formulation provides a powerful theoretical framework to characterize
the optimal pathway between two metastable states of a system.
Collapse
Affiliation(s)
- Benoît Roux
- Department of Biochemistry and Molecular Biology, The University of Chicago, Chicago, Illinois 60637, United States.,Department of Chemistry, The University of Chicago, 5735 S. Ellis Avenue, Chicago, Illinois 60637, United States
| |
Collapse
|
46
|
Mitxelena I, López X, de Sancho D. Markov state models from hierarchical density-based assignment. J Chem Phys 2021; 155:054102. [PMID: 34364321 DOI: 10.1063/5.0056748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Markov state models (MSMs) have become one of the preferred methods for the analysis and interpretation of molecular dynamics (MD) simulations of conformational transitions in biopolymers. While there is great variation in terms of implementation, a well-defined workflow involving multiple steps is often adopted. Typically, molecular coordinates are first subjected to dimensionality reduction and then clustered into small "microstates," which are subsequently lumped into "macrostates" using the information from the slowest eigenmodes. However, the microstate dynamics is often non-Markovian, and long lag times are required to converge the relevant slow dynamics in the MSM. Here, we propose a variation on this typical workflow, taking advantage of hierarchical density-based clustering. When applied to simulation data, this type of clustering separates high population regions of conformational space from others that are rarely visited. In this way, density-based clustering naturally implements assignment of the data based on transitions between metastable states, resulting in a core-set MSM. As a result, the state definition becomes more consistent with the assumption of Markovianity, and the timescales of the slow dynamics of the system are recovered more effectively. We present results of this simplified workflow for a model potential and MD simulations of the alanine dipeptide and the FiP35 WW domain.
Collapse
Affiliation(s)
- Ion Mitxelena
- Polimero eta Material Aurreratuak: Fisika, Kimika eta Teknologia, Kimika Fakultatea, UPV/EHU & Donostia International Physics Center (DIPC), PK 1072, 20018 Donostia-San Sebastian, Euskadi, Spain
| | - Xabier López
- Polimero eta Material Aurreratuak: Fisika, Kimika eta Teknologia, Kimika Fakultatea, UPV/EHU & Donostia International Physics Center (DIPC), PK 1072, 20018 Donostia-San Sebastian, Euskadi, Spain
| | - David de Sancho
- Polimero eta Material Aurreratuak: Fisika, Kimika eta Teknologia, Kimika Fakultatea, UPV/EHU & Donostia International Physics Center (DIPC), PK 1072, 20018 Donostia-San Sebastian, Euskadi, Spain
| |
Collapse
|
47
|
Glielmo A, Husic BE, Rodriguez A, Clementi C, Noé F, Laio A. Unsupervised Learning Methods for Molecular Simulation Data. Chem Rev 2021; 121:9722-9758. [PMID: 33945269 PMCID: PMC8391792 DOI: 10.1021/acs.chemrev.0c01195] [Citation(s) in RCA: 116] [Impact Index Per Article: 38.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Indexed: 12/21/2022]
Abstract
Unsupervised learning is becoming an essential tool to analyze the increasingly large amounts of data produced by atomistic and molecular simulations, in material science, solid state physics, biophysics, and biochemistry. In this Review, we provide a comprehensive overview of the methods of unsupervised learning that have been most commonly used to investigate simulation data and indicate likely directions for further developments in the field. In particular, we discuss feature representation of molecular systems and present state-of-the-art algorithms of dimensionality reduction, density estimation, and clustering, and kinetic models. We divide our discussion into self-contained sections, each discussing a specific method. In each section, we briefly touch upon the mathematical and algorithmic foundations of the method, highlight its strengths and limitations, and describe the specific ways in which it has been used-or can be used-to analyze molecular simulation data.
Collapse
Affiliation(s)
- Aldo Glielmo
- International
School for Advanced Studies (SISSA) 34014 Trieste, Italy
| | - Brooke E. Husic
- Freie
Universität Berlin, Department of Mathematics
and Computer Science, 14195 Berlin, Germany
| | - Alex Rodriguez
- International Centre for Theoretical
Physics (ICTP), Condensed Matter and Statistical
Physics Section, 34100 Trieste, Italy
| | - Cecilia Clementi
- Freie
Universität Berlin, Department for
Physics, 14195 Berlin, Germany
- Rice
University Houston, Department of Chemistry, Houston, Texas 77005, United States
| | - Frank Noé
- Freie
Universität Berlin, Department of Mathematics
and Computer Science, 14195 Berlin, Germany
- Freie
Universität Berlin, Department for
Physics, 14195 Berlin, Germany
- Rice
University Houston, Department of Chemistry, Houston, Texas 77005, United States
| | - Alessandro Laio
- International
School for Advanced Studies (SISSA) 34014 Trieste, Italy
- International Centre for Theoretical
Physics (ICTP), Condensed Matter and Statistical
Physics Section, 34100 Trieste, Italy
| |
Collapse
|
48
|
Hempel T, Del Razo MJ, Lee CT, Taylor BC, Amaro RE, Noé F. Independent Markov decomposition: Toward modeling kinetics of biomolecular complexes. Proc Natl Acad Sci U S A 2021; 118:e2105230118. [PMID: 34321356 PMCID: PMC8346863 DOI: 10.1073/pnas.2105230118] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
To advance the mission of in silico cell biology, modeling the interactions of large and complex biological systems becomes increasingly relevant. The combination of molecular dynamics (MD) simulations and Markov state models (MSMs) has enabled the construction of simplified models of molecular kinetics on long timescales. Despite its success, this approach is inherently limited by the size of the molecular system. With increasing size of macromolecular complexes, the number of independent or weakly coupled subsystems increases, and the number of global system states increases exponentially, making the sampling of all distinct global states unfeasible. In this work, we present a technique called independent Markov decomposition (IMD) that leverages weak coupling between subsystems to compute a global kinetic model without requiring the sampling of all combinatorial states of subsystems. We give a theoretical basis for IMD and propose an approach for finding and validating such a decomposition. Using empirical few-state MSMs of ion channel models that are well established in electrophysiology, we demonstrate that IMD models can reproduce experimental conductance measurements with a major reduction in sampling compared with a standard MSM approach. We further show how to find the optimal partition of all-atom protein simulations into weakly coupled subunits.
Collapse
Affiliation(s)
- Tim Hempel
- Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany
- Department of Physics, Freie Universität Berlin, 14195 Berlin, Germany
| | - Mauricio J Del Razo
- Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany
- Van't Hoff Institute for Molecular Sciences, University of Amsterdam, 1090 GD Amsterdam, The Netherlands
- Korteweg-de Vries Institute for Mathematics, University of Amsterdam, 1090 GE Amsterdam, The Netherlands
- Dutch Institute for Emergent Phenomena, 1090 GL Amsterdam, The Netherlands
| | - Christopher T Lee
- Department of Mechanical and Aerospace Engineering, University of California San Diego, La Jolla, CA 92093
| | - Bryn C Taylor
- Biomedical Sciences Graduate Program, University of California San Diego, La Jolla, CA 92093
| | - Rommie E Amaro
- Department of Chemistry & Biochemistry, University of California San Diego, La Jolla, CA 92093;
| | - Frank Noé
- Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany;
- Department of Physics, Freie Universität Berlin, 14195 Berlin, Germany
- Department of Chemistry, Rice University, Houston, TX 77005
| |
Collapse
|
49
|
Computational methods for exploring protein conformations. Biochem Soc Trans 2021; 48:1707-1724. [PMID: 32756904 PMCID: PMC7458412 DOI: 10.1042/bst20200193] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Revised: 07/07/2020] [Accepted: 07/09/2020] [Indexed: 12/13/2022]
Abstract
Proteins are dynamic molecules that can transition between a potentially wide range of structures comprising their conformational ensemble. The nature of these conformations and their relative probabilities are described by a high-dimensional free energy landscape. While computer simulation techniques such as molecular dynamics simulations allow characterisation of the metastable conformational states and the transitions between them, and thus free energy landscapes, to be characterised, the barriers between states can be high, precluding efficient sampling without substantial computational resources. Over the past decades, a dizzying array of methods have emerged for enhancing conformational sampling, and for projecting the free energy landscape onto a reduced set of dimensions that allow conformational states to be distinguished, known as collective variables (CVs), along which sampling may be directed. Here, a brief description of what biomolecular simulation entails is followed by a more detailed exposition of the nature of CVs and methods for determining these, and, lastly, an overview of the myriad different approaches for enhancing conformational sampling, most of which rely upon CVs, including new advances in both CV determination and conformational sampling due to machine learning.
Collapse
|
50
|
Ge Y, Zhang S, Erdelyi M, Voelz VA. Solution-State Preorganization of Cyclic β-Hairpin Ligands Determines Binding Mechanism and Affinities for MDM2. J Chem Inf Model 2021; 61:2353-2367. [PMID: 33905247 PMCID: PMC9960209 DOI: 10.1021/acs.jcim.1c00029] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Understanding mechanisms of protein folding and binding is crucial to designing their molecular function. Molecular dynamics (MD) simulations and Markov state model (MSM) approaches provide a powerful way to understand complex conformational change that occurs over long time scales. Such dynamics are important for the design of therapeutic peptidomimetic ligands, whose affinity and binding mechanism are dictated by a combination of folding and binding. To examine the role of preorganization in peptide binding to protein targets, we performed massively parallel explicit-solvent MD simulations of cyclic β-hairpin ligands designed to mimic the p53 transactivation domain and competitively bind mouse double minute 2 homologue (MDM2). Disrupting the MDM2-p53 interaction is a therapeutic strategy to prevent degradation of the p53 tumor suppressor in cancer cells. MSM analysis of over 3 ms of aggregate trajectory data enabled us to build a detailed mechanistic model of coupled folding and binding of four cyclic peptides which we compare to experimental binding affinities and rates. The results show a striking relationship between the relative preorganization of each ligand in solution and its affinity for MDM2. Specifically, changes in peptide conformational populations predicted by the MSMs suggest that entropy loss upon binding is the main factor influencing affinity. The MSMs also enable detailed examination of non-native interactions which lead to misfolded states and comparison of structural ensembles with experimental NMR measurements. In contrast to an MSM study of p53 transactivation domain (TAD) binding to MDM2, MSMs of cyclic β-hairpin binding show a conformational selection mechanism. Finally, we make progress toward predicting accurate off rates of cyclic peptides using multiensemble Markov models (MEMMs) constructed from unbiased and biased simulated trajectories.
Collapse
Affiliation(s)
- Yunui Ge
- Department of Chemistry, Temple University, Philadelphia, PA 19122, USA
| | - Si Zhang
- Department of Chemistry, Temple University, Philadelphia, PA 19122, USA
| | - Mate Erdelyi
- Department of Chemistry - BMC, Uppsala University, SE-75123 Uppsala, Sweden
| | - Vincent A. Voelz
- Department of Chemistry, Temple University, Philadelphia, PA 19122, USA
| |
Collapse
|