1
|
Liu Y, Ghosh TK, Lin G, Chen M. Unbiasing Enhanced Sampling on a High-Dimensional Free Energy Surface with a Deep Generative Model. J Phys Chem Lett 2024; 15:3938-3945. [PMID: 38568182 DOI: 10.1021/acs.jpclett.3c03515] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/12/2024]
Abstract
Biased enhanced sampling methods that utilize collective variables (CVs) are powerful tools for sampling conformational ensembles. Due to their large intrinsic dimensions, efficiently generating conformational ensembles for complex systems requires enhanced sampling on high-dimensional free energy surfaces. While temperature-accelerated molecular dynamics (TAMD) can trivially adopt many CVs in a simulation, unbiasing the simulation to generate unbiased conformational ensembles requires accurate modeling of a high-dimensional CV probability distribution, which is challenging for traditional density estimation techniques. Here we propose an unbiasing method based on the score-based diffusion model, a deep generative learning method that excels in density estimation across complex data landscapes. We demonstrate that this unbiasing approach, tested on multiple TAMD simulations, significantly outperforms traditional unbiasing methods and can generate accurate unbiased conformational ensembles. With the proposed approach, TAMD can adopt CVs that focus on improving sampling efficiency and the proposed unbiasing method enables accurate evaluation of ensemble averages of important chemical features.
Collapse
Affiliation(s)
- Yikai Liu
- Department of Mechanical Engineering, Purdue University, West Lafayette, Indiana 47906, United States
| | - Tushar K Ghosh
- Department of Chemistry, Purdue University, West Lafayette, Indiana 47906, United States
| | - Guang Lin
- Department of Mechanical Engineering, Purdue University, West Lafayette, Indiana 47906, United States
| | - Ming Chen
- Department of Chemistry, Purdue University, West Lafayette, Indiana 47906, United States
| |
Collapse
|
2
|
Hsu T, Sadigh B, Bulatov V, Zhou F. Score Dynamics: Scaling Molecular Dynamics with Picoseconds Time Steps via Conditional Diffusion Model. J Chem Theory Comput 2024; 20:2335-2348. [PMID: 38489243 DOI: 10.1021/acs.jctc.3c01361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/17/2024]
Abstract
We propose score dynamics (SD), a general framework for learning accelerated evolution operators with large timesteps from molecular dynamics (MD) simulations. SD is centered around scores or derivatives of the transition log-probability with respect to the dynamical degrees of freedom. The latter play the same role as force fields in MD but are used in denoising diffusion probability models to generate discrete transitions of the dynamical variables in an SD time step, which can be orders of magnitude larger than a typical MD time step. In this work, we construct graph neural network-based SD models of realistic molecular systems that are evolved with 10 ps timesteps. We demonstrate the efficacy of SD with case studies of the alanine dipeptide and short alkanes in aqueous solution. Both equilibrium predictions derived from the stationary distributions of the conditional probability and kinetic predictions for the transition rates and transition paths are in good agreement with MD. Our current SD implementation is about 2 orders of magnitude faster than the MD counterpart for the systems studied in this work. Open challenges and possible future remedies to improve SD are also discussed.
Collapse
Affiliation(s)
- Tim Hsu
- Lawrence Livermore National Laboratory, Livermore, California 94551, United States
| | - Babak Sadigh
- Lawrence Livermore National Laboratory, Livermore, California 94551, United States
| | - Vasily Bulatov
- Lawrence Livermore National Laboratory, Livermore, California 94551, United States
| | - Fei Zhou
- Lawrence Livermore National Laboratory, Livermore, California 94551, United States
| |
Collapse
|
3
|
Zou Z, Tiwary P. Enhanced Sampling of Crystal Nucleation with Graph Representation Learnt Variables. J Phys Chem B 2024. [PMID: 38502931 DOI: 10.1021/acs.jpcb.4c00080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/21/2024]
Abstract
In this study, we present a graph neural network (GNN)-based learning approach using an autoencoder setup to derive low-dimensional variables from features observed in experimental crystal structures. These variables are then biased in enhanced sampling to observe state-to-state transitions and reliable thermodynamic weights. In our approach, we used simple convolution and pooling methods. To verify the effectiveness of our protocol, we examined the nucleation of various allotropes and polymorphs of iron and glycine in their molten states. Our graph latent variables, when biased in well-tempered metadynamics, consistently show transitions between states and achieve accurate thermodynamic rankings in agreement with experiments, both of which are indicators of dependable sampling. This underscores the strength and promise of our GNN variables for improved sampling. The protocol shown here should be applicable for other systems and other sampling methods.
Collapse
Affiliation(s)
- Ziyue Zou
- Department of Chemistry and Biochemistry, University of Maryland, College Park 20742, Maryland, United States
| | - Pratyush Tiwary
- Department of Chemistry and Biochemistry, University of Maryland, College Park 20742, Maryland, United States
- Institute for Physical Science and Technology, University of Maryland, College Park 20742, Maryland, United States
- University of Maryland Institute for Health Computing, Rockville, Maryland 20852, United States
| |
Collapse
|
4
|
Jones MS, Shmilovich K, Ferguson AL. DiAMoNDBack: Diffusion-Denoising Autoregressive Model for Non-Deterministic Backmapping of Cα Protein Traces. J Chem Theory Comput 2023; 19:7908-7923. [PMID: 37906711 DOI: 10.1021/acs.jctc.3c00840] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
Coarse-grained molecular models of proteins permit access to length and time scales unattainable by all-atom models and the simulation of processes that occur on long time scales, such as aggregation and folding. The reduced resolution realizes computational accelerations, but an atomistic representation can be vital for a complete understanding of mechanistic details. Backmapping is the process of restoring all-atom resolution to coarse-grained molecular models. In this work, we report DiAMoNDBack (Diffusion-denoising Autoregressive Model for Non-Deterministic Backmapping) as an autoregressive denoising diffusion probability model to restore all-atom details to coarse-grained protein representations retaining only Cα coordinates. The autoregressive generation process proceeds from the protein N-terminus to C-terminus in a residue-by-residue fashion conditioned on the Cα trace and previously backmapped backbone and side-chain atoms within the local neighborhood. The local and autoregressive nature of our model makes it transferable between proteins. The stochastic nature of the denoising diffusion process means that the model generates a realistic ensemble of backbone and side-chain all-atom configurations consistent with the coarse-grained Cα trace. We train DiAMoNDBack over 65k+ structures from the Protein Data Bank (PDB) and validate it in applications to a hold-out PDB test set, intrinsically disordered protein structures from the Protein Ensemble Database (PED), molecular dynamics simulations of fast-folding mini-proteins from DE Shaw Research, and coarse-grained simulation data. We achieve state-of-the-art reconstruction performance in terms of correct bond formation, avoidance of side-chain clashes, and the diversity of the generated side-chain configurational states. We make the DiAMoNDBack model publicly available as a free and open-source Python package.
Collapse
Affiliation(s)
- Michael S Jones
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Kirill Shmilovich
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Andrew L Ferguson
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| |
Collapse
|
5
|
Sarkar D, Lee H, Vant JW, Turilli M, Vermaas JV, Jha S, Singharoy A. Adaptive Ensemble Refinement of Protein Structures in High Resolution Electron Microscopy Density Maps with Radical Augmented Molecular Dynamics Flexible Fitting. J Chem Inf Model 2023; 63:5834-5846. [PMID: 37661856 DOI: 10.1021/acs.jcim.3c00350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]
Abstract
Recent advances in cryo-electron microscopy (cryo-EM) have enabled modeling macromolecular complexes that are essential components of the cellular machinery. The density maps derived from cryo-EM experiments are often integrated with manual, knowledge-driven or artificial intelligence-driven and physics-guided computational methods to build, fit, and refine molecular structures. Going beyond a single stationary-structure determination scheme, it is becoming more common to interpret the experimental data with an ensemble of models that contributes to an average observation. Hence, there is a need to decide on the quality of an ensemble of protein structures on-the-fly while refining them against the density maps. We introduce such an adaptive decision-making scheme during the molecular dynamics flexible fitting (MDFF) of biomolecules. Using RADICAL-Cybertools, the new RADICAL augmented MDFF implementation (R-MDFF) is examined in high-performance computing environments for refinement of two prototypical protein systems, adenylate kinase and carbon monoxide dehydrogenase. For these test cases, use of multiple replicas in flexible fitting with adaptive decision making in R-MDFF improves the overall correlation to the density by 40% relative to the refinements of the brute-force MDFF. The improvements are particularly significant at high, 2-3 Å map resolutions. More importantly, the ensemble model captures key features of biologically relevant molecular dynamics that are inaccessible to a single-model interpretation. Finally, the pipeline is applicable to systems of growing sizes, which is demonstrated using ensemble refinement of capsid proteins from the chimpanzee adenovirus. The overhead for decision making remains low and robust to computing environments. The software is publicly available on GitHub and includes a short user guide to install R-MDFF on different computing environments, from local Linux-based workstations to high-performance computing environments.
Collapse
Affiliation(s)
- Daipayan Sarkar
- MSU-DOE Plant Research Laboratory, East Lansing, Michigan 48824, United States
- School of Molecular Sciences, Arizona State University, Tempe, Arizona 85281, United States
| | - Hyungro Lee
- Pacific Northwest National Laboratory, Richland, Washington 99354, United States
- Electrical & Computer Engineering, Rutgers University, New Brunswick, New Jersey 08854, United States
| | - John W Vant
- School of Molecular Sciences, Arizona State University, Tempe, Arizona 85281, United States
| | - Matteo Turilli
- Electrical & Computer Engineering, Rutgers University, New Brunswick, New Jersey 08854, United States
- Computational Science Initiative, Brookhaven National Laboratory, Upton, New York 11973, United States
| | - Josh V Vermaas
- MSU-DOE Plant Research Laboratory, East Lansing, Michigan 48824, United States
| | - Shantenu Jha
- Electrical & Computer Engineering, Rutgers University, New Brunswick, New Jersey 08854, United States
- Computational Science Initiative, Brookhaven National Laboratory, Upton, New York 11973, United States
| | - Abhishek Singharoy
- School of Molecular Sciences, Arizona State University, Tempe, Arizona 85281, United States
| |
Collapse
|
6
|
Yuan Y, Cui Q. Accurate and Efficient Multilevel Free Energy Simulations with Neural Network-Assisted Enhanced Sampling. J Chem Theory Comput 2023; 19:5394-5406. [PMID: 37527495 PMCID: PMC10810721 DOI: 10.1021/acs.jctc.3c00591] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/03/2023]
Abstract
Free energy differences (ΔF) are essential to quantitative characterization and understanding of chemical and biological processes. Their direct estimation with an accurate quantum mechanical potential is of great interest and yet impractical due to high computational cost and incompatibility with typical alchemical free energy protocols. One promising solution is the multilevel free energy simulation in which the estimate of ΔF at an inexpensive low level of theory is combined with the correction toward a higher level of theory. The poor configurational overlap generally expected between the two levels of theory, however, presents a major challenge. We overcome this challenge by using a deep neural network model and enhanced sampling simulations. An adversarial autoencoder is used to identify a low-dimensional (latent) space that compactly represents the degrees of freedom that encode the distinct distributions at the two levels of theory. Enhanced sampling in this latent space is then used to drive the sampling of configurations that predominantly contribute to the free energy correction. Results for both gas phase and condensed phase systems demonstrate that this data-driven approach offers high accuracy and efficiency with great potential for scalability to complex systems.
Collapse
Affiliation(s)
- Yuchen Yuan
- Department of Chemistry, Boston University, 590 Commonwealth Avenue, Boston, Massachusetts 02215, United States
| | - Qiang Cui
- Department of Chemistry, Boston University, 590 Commonwealth Avenue, Boston, Massachusetts 02215, United States
- Department of Physics, Boston University, 590 Commonwealth Avenue, Boston, Massachusetts 02215, United States
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, Massachusetts 02215, United States
| |
Collapse
|
7
|
Do HN, Miao Y. Deep Boosted Molecular Dynamics (DBMD): Accelerating molecular simulations with Gaussian boost potentials generated using probabilistic Bayesian deep neural network. bioRxiv 2023:2023.03.25.534210. [PMID: 37034713 PMCID: PMC10081221 DOI: 10.1101/2023.03.25.534210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/22/2023]
Abstract
We have developed a new Deep Boosted Molecular Dynamics (DBMD) method. Probabilistic Bayesian neural network models were implemented to construct boost potentials that exhibit Gaussian distribution with minimized anharmonicity, thereby allowing for accurate energetic reweighting and enhanced sampling of molecular simulations. DBMD was demonstrated on model systems of alanine dipeptide and the fast-folding protein and RNA structures. For alanine dipeptide, 30ns DBMD simulations captured up to 83-125 times more backbone dihedral transitions than 1μs conventional molecular dynamics (cMD) simulations and were able to accurately reproduce the original free energy profiles. Moreover, DBMD sampled multiple folding and unfolding events within 300ns simulations of the chignolin model protein and identified low-energy conformational states comparable to previous simulation findings. Finally, DBMD captured a general folding pathway of three hairpin RNAs with the GCAA, GAAA, and UUCG tetraloops. Based on Deep Learning neural network, DBMD provides a powerful and generally applicable approach to boosting biomolecular simulations. DBMD is available with open source in OpenMM at https://github.com/MiaoLab20/DBMD/.
Collapse
Affiliation(s)
- Hung N. Do
- Center for Computational Biology and Department of Molecular Biosciences, University of Kansas, Lawrence, Kansas 66047
| | - Yinglong Miao
- Center for Computational Biology and Department of Molecular Biosciences, University of Kansas, Lawrence, Kansas 66047
- To whom correspondence should be addressed:
| |
Collapse
|
8
|
Jung H, Covino R, Arjun A, Leitold C, Dellago C, Bolhuis PG, Hummer G. Machine-guided path sampling to discover mechanisms of molecular self-organization. Nat Comput Sci 2023; 3:334-345. [PMID: 38177937 PMCID: PMC10766509 DOI: 10.1038/s43588-023-00428-z] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Accepted: 03/10/2023] [Indexed: 01/06/2024]
Abstract
Molecular self-organization driven by concerted many-body interactions produces the ordered structures that define both inanimate and living matter. Here we present an autonomous path sampling algorithm that integrates deep learning and transition path theory to discover the mechanism of molecular self-organization phenomena. The algorithm uses the outcome of newly initiated trajectories to construct, validate and-if needed-update quantitative mechanistic models. Closing the learning cycle, the models guide the sampling to enhance the sampling of rare assembly events. Symbolic regression condenses the learned mechanism into a human-interpretable form in terms of relevant physical observables. Applied to ion association in solution, gas-hydrate crystal formation, polymer folding and membrane-protein assembly, we capture the many-body solvent motions governing the assembly process, identify the variables of classical nucleation theory, uncover the folding mechanism at different levels of resolution and reveal competing assembly pathways. The mechanistic descriptions are transferable across thermodynamic states and chemical space.
Collapse
Affiliation(s)
- Hendrik Jung
- Department of Theoretical Biophysics, Max Planck Institute of Biophysics, Frankfurt am Main, Germany
| | - Roberto Covino
- Frankfurt Institute for Advanced Studies, Frankfurt am Main, Germany
| | - A Arjun
- van 't Hoff Institute for Molecular Sciences, University of Amsterdam, Amsterdam, The Netherlands
| | | | | | - Peter G Bolhuis
- van 't Hoff Institute for Molecular Sciences, University of Amsterdam, Amsterdam, The Netherlands
| | - Gerhard Hummer
- Department of Theoretical Biophysics, Max Planck Institute of Biophysics, Frankfurt am Main, Germany.
- Institute of Biophysics, Goethe University Frankfurt, Frankfurt am Main, Germany.
| |
Collapse
|
9
|
Abstract
Coarse-grained models are a core computational tool in theoretical chemistry and biophysics. A judicious choice of a coarse-grained model can yield physical insights by isolating the essential degrees of freedom that dictate the thermodynamic properties of a complex, condensed-phase system. The reduced complexity of the model typically leads to lower computational costs and more efficient sampling compared with atomistic models. Designing "good" coarse-grained models is an art. Generally, the mapping from fine-grained configurations to coarse-grained configurations itself is not optimized in any way; instead, the energy function associated with the mapped configurations is. In this work, we explore the consequences of optimizing the coarse-grained representation alongside its potential energy function. We use a graph machine learning framework to embed atomic configurations into a low-dimensional space to produce efficient representations of the original molecular system. Because the representation we obtain is no longer directly interpretable as a real-space representation of the atomic coordinates, we also introduce an inversion process and an associated thermodynamic consistency relation that allows us to rigorously sample fine-grained configurations conditioned on the coarse-grained sampling. We show that this technique is robust, recovering the first two moments of the distribution of several observables in proteins such as chignolin and alanine dipeptide.
Collapse
Affiliation(s)
| | - David J Toomer
- Department of Chemistry, Stanford University, Stanford, California 94305, USA
| | - Grant M Rotskoff
- Department of Chemistry, Stanford University, Stanford, California 94305, USA
| |
Collapse
|
10
|
Abstract
We combine replica exchange (parallel tempering) with normalizing flows, a class of deep generative models. These two sampling strategies complement each other, resulting in an efficient method for sampling molecular systems characterized by rare events, which we call learned replica exchange (LREX). In LREX, a normalizing flow is trained to map the configurations of the fastest-mixing replica into configurations belonging to the target distribution, allowing direct exchanges between the two without the need to simulate intermediate replicas. This can significantly reduce the computational cost compared to standard replica exchange. The proposed method also offers several advantages with respect to Boltzmann generators that directly use normalizing flows to sample the target distribution. We apply LREX to some prototypical molecular dynamics systems, highlighting the improvements over previous methods.
Collapse
Affiliation(s)
- Michele Invernizzi
- Department of Mathematics and Computer Science, Freie Universität Berlin, 14195Berlin, Germany
| | - Andreas Krämer
- Department of Mathematics and Computer Science, Freie Universität Berlin, 14195Berlin, Germany
| | - Cecilia Clementi
- Department of Physics, Freie Universität Berlin, 14195Berlin, Germany
- Department of Chemistry, Rice University, 77005Houston, United States
- Center for Theoretical Biological Physics, Rice University, 77005Houston, United States
| | - Frank Noé
- Department of Mathematics and Computer Science, Freie Universität Berlin, 14195Berlin, Germany
- Department of Physics, Freie Universität Berlin, 14195Berlin, Germany
- Department of Chemistry, Rice University, 77005Houston, United States
- Microsoft Research AI4Science, 10178Berlin, Germany
| |
Collapse
|
11
|
Matsunaga Y, Kamiya M, Oshima H, Jung J, Ito S, Sugita Y. Use of multistate Bennett acceptance ratio method for free-energy calculations from enhanced sampling and free-energy perturbation. Biophys Rev 2022; 14:1503-1512. [PMID: 36659993 PMCID: PMC9842838 DOI: 10.1007/s12551-022-01030-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 12/01/2022] [Indexed: 12/15/2022] Open
Abstract
Multistate Bennett acceptance ratio (MBAR) works as a method to analyze molecular dynamics (MD) simulation data after the simulations have been finished. It is widely used to estimate free-energy changes between different states and averaged properties at the states of interest. MBAR allows us to treat a wide range of states from those at different temperature/pressure to those with different model parameters. Due to the broad applicability, the MBAR equations are rather difficult to apply for free-energy calculations using different types of MD simulations including enhanced conformational sampling methods and free-energy perturbation. In this review, we first summarize the basic theory of the MBAR equations and categorize the representative usages into the following four: (i) perturbation, (ii) scaling, (iii) accumulation, and (iv) full potential energy. For each, we explain how to prepare input data using MD simulation trajectories for solving the MBAR equations. MBAR is also useful to estimate reliable free-energy differences using MD trajectories based on a semi-empirical quantum mechanics/molecular mechanics (QM/MM) model and ab initio QM/MM energy calculations on the MD snapshots. We also explain how to use the MBAR software in the GENESIS package, which we call mbar_analysis, for the four representative cases. The proposed estimations of free-energy changes and thermodynamic averages are effective and useful for various biomolecular systems.
Collapse
Affiliation(s)
- Yasuhiro Matsunaga
- grid.263023.60000 0001 0703 3735Graduate School of Science and Engineering, Saitama University, Saitama, Saitama 338-8570 Japan
| | - Motoshi Kamiya
- grid.467196.b0000 0001 2285 6123Institute for Molecular Science, Myodaiji, Okazaki, Aichi 444-8585 Japan
| | - Hiraku Oshima
- grid.508743.dLaboratory for Biomolecular Function Simulation, RIKEN Center for Biosystems Dynamics Research, Kobe, Hyogo 650-0047 Japan
| | - Jaewoon Jung
- grid.474693.bComputational Biophysics Research Team, RIKEN Center for Computational Science, Kobe, Hyogo 650-0047 Japan
| | - Shingo Ito
- grid.7597.c0000000094465255Theoretical Molecular Science Laboratory, RIKEN Cluster for Pioneering Research, Wako, Saitama 351-0198 Japan
| | - Yuji Sugita
- grid.508743.dLaboratory for Biomolecular Function Simulation, RIKEN Center for Biosystems Dynamics Research, Kobe, Hyogo 650-0047 Japan ,grid.474693.bComputational Biophysics Research Team, RIKEN Center for Computational Science, Kobe, Hyogo 650-0047 Japan ,grid.7597.c0000000094465255Theoretical Molecular Science Laboratory, RIKEN Cluster for Pioneering Research, Wako, Saitama 351-0198 Japan
| |
Collapse
|