1
|
Vitalis A, Winkler S, Zhang Y, Widmer J, Caflisch A. A FAIR-Compliant Management Solution for Molecular Simulation Trajectories. J Chem Inf Model 2025; 65:2443-2455. [PMID: 39977657 PMCID: PMC11898051 DOI: 10.1021/acs.jcim.4c01301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2024] [Revised: 02/06/2025] [Accepted: 02/07/2025] [Indexed: 02/22/2025]
Abstract
Simulation studies of molecules primarily produce data that represent the configuration of the system as a function of the progress variable, usually time. Because of the high-dimensional nature of these data, which grow very quickly, compromises are often necessary and achieved by storing only a subset of the system's components, for example, stripping solvent, and by restricting the time resolution to a scale significantly coarser than the basic time step of the simulation. The resultant trajectories thus describe the essentially stochastic evolution of the molecules of interest. Maintaining their interpretability through metadata is of interest not only because they can aid researchers interested in specific systems but also for reproducibility studies and model refinement. Here, we introduce a standard for the storage of data created by molecular simulations that improves compliance with the FAIR (Findable, Accessible, Interoperable, and Reusable) principles. We describe a solution conceived in PostgreSQL, along with reference implementations, that provides stringent links between metadata and raw data, which is a major weakness of the established file formats used for storing these data. A possible structure for the logic of SQL queries is included along with salient performance testing. To close, we suggest that a PostgreSQL-based storage of simulation data, in particular when coupled to a visual user interface, can improve the FAIR compliance of molecular simulation data at all levels of visibility, and a prototype solution for accomplishing this is presented.
Collapse
Affiliation(s)
- Andreas Vitalis
- Department of Biochemistry, University of Zurich, Winterthurerstr. 190, 8057 Zurich, Switzerland
| | - Steffen Winkler
- Department of Biochemistry, University of Zurich, Winterthurerstr. 190, 8057 Zurich, Switzerland
| | - Yang Zhang
- Department of Biochemistry, University of Zurich, Winterthurerstr. 190, 8057 Zurich, Switzerland
| | - Julian Widmer
- Department of Biochemistry, University of Zurich, Winterthurerstr. 190, 8057 Zurich, Switzerland
| | - Amedeo Caflisch
- Department of Biochemistry, University of Zurich, Winterthurerstr. 190, 8057 Zurich, Switzerland
| |
Collapse
|
2
|
Zupan H, Keller BG. Toward Grid-Based Models for Molecular Association. J Chem Theory Comput 2025; 21:614-628. [PMID: 39803919 PMCID: PMC11780749 DOI: 10.1021/acs.jctc.4c01293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2024] [Revised: 11/27/2024] [Accepted: 12/27/2024] [Indexed: 01/29/2025]
Abstract
This paper presents a grid-based approach to model molecular association processes as an alternative to sampling-based Markov models. Our method discretizes the six-dimensional space of relative translation and orientation into grid cells. By discretizing the Fokker-Planck operator governing the system dynamics via the square-root approximation, we derive analytical expressions for the transition rate constants between grid cells. These expressions depend on geometric properties of the grid, such as the cell surface area and volume, which we provide. In addition, one needs only the molecular energy at the grid cell center, circumventing the need for extensive MD simulations and reducing the number of energy evaluations to the number of grid cells. The resulting rate matrix is closely related to the Markov state model transition matrix, offering insights into metastable states and association kinetics. We validate the accuracy of the model in identifying metastable states and binding mechanisms, though improvements are necessary to address limitations like ignoring bulk transitions and anisotropic rotational diffusion. The flexibility of this grid-based method makes it applicable to a variety of molecular systems and energy functions, including those derived from quantum mechanical calculations. The software package MolGri, which implements this approach, offers a systematic and computationally efficient tool for studying molecular association processes.
Collapse
Affiliation(s)
- Hana Zupan
- Department of Biology, Chemistry
and Pharmacy, Freie Universität Berlin, Arnimallee 22, 14195 Berlin, Germany
| | - Bettina G. Keller
- Department of Biology, Chemistry
and Pharmacy, Freie Universität Berlin, Arnimallee 22, 14195 Berlin, Germany
| |
Collapse
|
3
|
Shao D, Zhang Z, Liu X, Fu H, Shao X, Cai W. Screening Fast-Mode Motion in Collective Variable Discovery for Biochemical Processes. J Chem Theory Comput 2024; 20:10393-10405. [PMID: 39601677 DOI: 10.1021/acs.jctc.4c01282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
Collective variables (CVs) describing slow degrees of freedom (DOFs) in biomolecular assemblies are crucial for analyzing molecular dynamics trajectories, creating Markov models and performing CV-based enhanced sampling simulations. While time-lagged independent component analysis (tICA) and its nonlinear successor, time-lagged autoencoder (tAE), are widely used, they often struggle to capture protein dynamics due to interference from random fluctuations along fast DOFs. To address this issue, we propose a novel approach integrating discrete wavelet transform (DWT) with dimensionality reduction techniques. DWT effectively separates fast and slow motion in protein simulation trajectories by decoupling high- and low-frequency signals. Based on the trajectory after filtering out high-frequency signals, which corresponds to fast motion, tICA and tAE can accurately extract CVs representing slow DOFs, providing reliable insights into protein dynamics. Our method demonstrates superior performance in identifying CVs that distinguish metastable states compared to standard tICA and tAE, as validated through analyses of conformational changes of alanine dipeptide and tripeptide and folding of CLN025. Moreover, we show that DWT can be used to improve the performance of a variety of CV-finding algorithms by combining it with Deep-tICA, a cutting-edge CV-finding algorithm, to extract CVs for enhanced-sampling calculations. Given its negligible computational cost and remarkable ability to screen fast motion, we propose DWT as a "free lunch" for CV extraction, applicable to a wide range of CV-finding algorithms.
Collapse
Affiliation(s)
- Donghui Shao
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Zhiteng Zhang
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xuyang Liu
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Haohao Fu
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xueguang Shao
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Wensheng Cai
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| |
Collapse
|
4
|
Widmer J, Vitalis A, Caflisch A. On the specificity of the recognition of m6A-RNA by YTH reader domains. J Biol Chem 2024; 300:107998. [PMID: 39551145 PMCID: PMC11699332 DOI: 10.1016/j.jbc.2024.107998] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2024] [Revised: 10/26/2024] [Accepted: 11/12/2024] [Indexed: 11/19/2024] Open
Abstract
Most processes of life are the result of polyvalent interactions between macromolecules, often of heterogeneous types and sizes. Frequently, the times associated with these interactions are prohibitively long for interrogation using atomistic simulations. Here, we study the recognition of N6-methylated adenine (m6A) in RNA by the reader domain YTHDC1, a prototypical, cognate pair that challenges simulations through its composition and required timescales. Simulations of RNA pentanucleotides in water reveal that the unbound state can impact (un)binding kinetics in a manner that is both model- and sequence-dependent. This is important because there are two contributions to the specificity of the recognition of the Gm6AC motif: from the sequence adjacent to the central adenine and from its methylation. Next, we establish a reductionist model consisting of an RNA trinucleotide binding to the isolated reader domain in high salt. An adaptive sampling protocol allows us to quantitatively study the dissociation of this complex. Through joint analysis of a data set including both the cognate and control sequences (GAC, Am6AA, and AAA), we derive that both contributions to specificity, sequence, and methylation, are significant and in good agreement with experimental numbers. Analysis of the kinetics suggests that flexibility in both the RNA and the YTHDC1 recognition loop leads to many low-populated unbinding pathways. This multiple-pathway mechanism might be dominant for the binding of unstructured polymers, including RNA and peptides, to proteins when their association is driven by polyvalent, electrostatic interactions.
Collapse
Affiliation(s)
- Julian Widmer
- Department of Biochemistry, University of Zurich, Zurich, Switzerland
| | - Andreas Vitalis
- Department of Biochemistry, University of Zurich, Zurich, Switzerland.
| | - Amedeo Caflisch
- Department of Biochemistry, University of Zurich, Zurich, Switzerland
| |
Collapse
|
5
|
Rydzewski J, Gökdemir T. Learning Markovian dynamics with spectral maps. J Chem Phys 2024; 160:091102. [PMID: 38436438 DOI: 10.1063/5.0189241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 02/05/2024] [Indexed: 03/05/2024] Open
Abstract
The long-time behavior of many complex molecular systems can often be described by Markovian dynamics in a slow subspace spanned by a few reaction coordinates referred to as collective variables (CVs). However, determining CVs poses a fundamental challenge in chemical physics. Depending on intuition or trial and error to construct CVs can lead to non-Markovian dynamics with long memory effects, hindering analysis. To address this problem, we continue to develop a recently introduced deep-learning technique called spectral map [J. Rydzewski, J. Phys. Chem. Lett. 14, 5216-5220 (2023)]. Spectral map learns slow CVs by maximizing a spectral gap of a Markov transition matrix describing anisotropic diffusion. Here, to represent heterogeneous and multiscale free-energy landscapes with spectral map, we implement an adaptive algorithm to estimate transition probabilities. Through a Markov state model analysis, we validate that spectral map learns slow CVs related to the dominant relaxation timescales and discerns between long-lived metastable states.
Collapse
Affiliation(s)
- Jakub Rydzewski
- Institute of Physics, Faculty of Physics, Astronomy and Informatics, Nicolaus Copernicus University, Grudziadzka 5, 87-100 Toruń, Poland
| | - Tuğçe Gökdemir
- Institute of Physics, Faculty of Physics, Astronomy and Informatics, Nicolaus Copernicus University, Grudziadzka 5, 87-100 Toruń, Poland
| |
Collapse
|
6
|
Maggi L, Orozco M. Main role of fractal-like nature of conformational space in subdiffusion in proteins. Phys Rev E 2024; 109:034402. [PMID: 38632804 DOI: 10.1103/physreve.109.034402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 02/05/2024] [Indexed: 04/19/2024]
Abstract
Protein dynamics involves a myriad of mechanical movements happening at different time and space scales, which make it highly complex. One of the less understood features of protein dynamics is subdiffusivity, defined as sublinear dependence between displacement and time. Here, we use all-atoms molecular dynamics (MD) simulations to directly interrogate an already well-established theory and demonstrate that subdiffusivity arises from the fractal nature of the network of metastable conformations over which the dynamics, thought of as a diffusion process, takes place.
Collapse
Affiliation(s)
- Luca Maggi
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology (BIST), Baldiri Reixac 10, Barcelona 08028, Spain
| | - Modesto Orozco
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology (BIST), Baldiri Reixac 10, Barcelona 08028, Spain
- Departament de Bioquímica i Biomedicina. Facultat de Biologia, Universitat de Barcelona, Avgda Diagonal 647, Barcelona 08028, Spain
| |
Collapse
|