1
|
D'Hondt S, Oramas J, De Winter H. A beginner's approach to deep learning applied to VS and MD techniques. J Cheminform 2025; 17:47. [PMID: 40200329 PMCID: PMC11980327 DOI: 10.1186/s13321-025-00985-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2024] [Accepted: 03/12/2025] [Indexed: 04/10/2025] Open
Abstract
It has become impossible to imagine the fields of biochemistry and medicinal chemistry without computational chemistry and molecular modelling techniques. In many steps of the drug development process in silico methods have become indispensable. Virtual screening (VS) can tremendously expedite the early discovery phase, whilst the use of molecular dynamics (MD) simulations forms a powerful additional tool to in vitro methods throughout the entire drug discovery process. In the field of biochemistry, MD has also become a compelling method for studying biophysical systems (e.g., protein folding) complementary to experimental techniques. However, both VS and MD come with their own limitations and methodological difficulties, from hardware limitations to restrictions in algorithmic capabilities. One solution to overcoming these difficulties lies in the field of machine learning (ML), and more specifically deep learning (DL). There are many ways in which DL can be applied to these molecular modelling techniques to achieve more accurate results in a more efficient manner or expedite the data analysis of the acquired results. Despite steadily increasing interest in DL amidst computational chemists, knowledge is still limited and scattered over different resources. This review is aimed at computational chemists with knowledge of molecular modelling, who wish to possibly integrate DL approaches in their research and already have a basic understanding of the fundamentals of DL. This review focusses on a survey of recent applications of DL in molecular modelling techniques. The different sections are logically subdivided, based on where DL is integrated in the research: (1) for the improvement of VS workflows, (2) for the improvement of certain workflows in MD simulations, (3) for aiding in the calculations of interatomic forces, or (4) for data analysis of MD trajectories. It will become clear that DL has the capacity to completely transform the way molecular modelling is carried out.
Collapse
Affiliation(s)
- Stijn D'Hondt
- Laboratory of Medicinal Chemistry, Department of Pharmaceutical Sciences, IDLab, University of Antwerp, Universiteitsplein 1, 2610, Wilrijk, Belgium
| | - José Oramas
- Department of Computer Science, Sint-Pietersvliet 7, 2000, Antwerp, Belgium
| | - Hans De Winter
- Laboratory of Medicinal Chemistry, Department of Pharmaceutical Sciences, IDLab, University of Antwerp, Universiteitsplein 1, 2610, Wilrijk, Belgium.
| |
Collapse
|
2
|
Ortaakarsu AB, Boğa ÖB, Kurbanoğlu EB. Cardaria draba subspecies Shalepensis exerts in vitro and in silico inhibition of α-glucosidase, TRP1, and DLD-1 proliferation. Sci Rep 2025; 15:10402. [PMID: 40140437 PMCID: PMC11947245 DOI: 10.1038/s41598-025-95538-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2024] [Accepted: 03/21/2025] [Indexed: 03/28/2025] Open
Abstract
In this study, in vitro enzyme activity assays were performed to investigate the inhibitory effects on α-glucosidase and tyrosinase-related protein 1, while in silico molecular docking, molecular dynamics, and protein dynamics analyses were performed to provide information on molecular mechanisms. According to information obtained from in silico approaches, inhibition properties are responsible for conformational changes in protein structures, occupation of the active site cleft by the dominant compounds in the extract, as well as long-term changes in protein folding due to departure from the usual motion. The IC50 values of Cardaria draba (L.) DESV. subsp. Chalepensis (L.) extract for α-glucosidase and tyrosinase-related protein 1 were determined to 1.89 ± 0.13 µg/ml and 1.53 ± 0.13 µg/ml, respectively. In addition, the IC50 value of the antiproliferative effects of the extract on DLD-1 colon cancer cells was found to be 6.9 µg/mL. Preclinical trials are warranted to validate the extract's therapeutic potential. These findings suggest that Cardaria draba extract exhibits enzyme inhibitory and antiproliferative properties, warranting further investigation for its potential role in therapeutic interventions. Further research, particularly in vivo studies, is required to explore the potential of this extract to address DLD-1.
Collapse
Affiliation(s)
| | - Özlem Bakır Boğa
- Department of Biology, Faculty of Science, Ataturk University, Erzurum, Turkey
| | | |
Collapse
|
3
|
Marquez J, Cuendet MA, Caino-Lores S, Estrada T, Deelman E, Weinstein H, Taufer M. Increasing the Efficiency of Ensemble Molecular Dynamics Simulations with Termination of Unproductive Trajectories Identified at Runtime. J Phys Chem A 2025; 129:2317-2324. [PMID: 39903920 DOI: 10.1021/acs.jpca.4c05182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2025]
Abstract
The application of molecular dynamics (MD) simulations to study increasingly larger and more complex systems is challenged by the required amounts of trajectory data needed to sample their conformational space appropriately. The analysis and interpretation phase of such massive data sets that have to be stored and fed to the various algorithms to reveal the dynamic behaviors of the systems and the underlying energetics in structural terms related to functional mechanisms are also a significant challenge. To develop computational means that can address these challenges, we are developing a software framework that can increase the efficiency of this process. We present one component of this framework that can reduce the size of the accumulating data set while maintaining the structural attributes, distribution, and relative probability ranking of the minima in the free energy map for the system. This framework component utilizes early termination of individual trajectories identified as unproductive in the sampling of conformational space. The criteria for termination are derived quantities such as collective variables (CVs) and secondary quantities calculated from the time series of CVs. They are computed and applied during the trajectory generation. The approach is illustrated with simulations of the FS peptide and evaluated from comparisons between the free energy surfaces calculated from ensembles of complete, unabridged simulations with those obtained from ensembles in which ∼5-50% of trajectories were terminated early. Our early termination approach can optimize computational efficiency while achieving a robust representation of conformational space.
Collapse
Affiliation(s)
- Jack Marquez
- Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, Tennessee 37916, United States
| | - Michel A Cuendet
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Oncology, Geneva University Hospital and University of Geneva, 1211 Geneva, Switzerland
- Department of Physiology and Biophysic and Institute for Computational Biomedicine, Weill Cornell Medicine, New York, New York 10065, United States
| | - Silvina Caino-Lores
- Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, Tennessee 37916, United States
- INRIA Center at Rennes University, IRISA, 35042 Rennes, France
| | - Trilce Estrada
- Department of Computer Science, University of New Mexico, Albuquerque, New Mexico 87131, United States
| | - Ewa Deelman
- Information Sciences Institute, University of Southern California, Los Angeles, California 90089, United States
| | - Harel Weinstein
- Department of Physiology and Biophysic and Institute for Computational Biomedicine, Weill Cornell Medicine, New York, New York 10065, United States
| | - Michela Taufer
- Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, Tennessee 37916, United States
| |
Collapse
|
4
|
Mukhaleva E, Manookian B, Chen H, Sivaraj IR, Ma N, Wei W, Urbaniak K, Gogoshin G, Bhattacharya S, Vaidehi N, Rodin AS, Branciamore S. BaNDyT: Bayesian Network Modeling of Molecular Dynamics Trajectories. J Chem Inf Model 2025; 65:1278-1288. [PMID: 39846243 DOI: 10.1021/acs.jcim.4c01981] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2025]
Abstract
Bayesian network modeling (BN modeling, or BNM) is an interpretable machine learning method for constructing probabilistic graphical models from the data. In recent years, it has been extensively applied to diverse types of biomedical data sets. Concurrently, our ability to perform long-time scale molecular dynamics (MD) simulations on proteins and other materials has increased exponentially. However, the analysis of MD simulation trajectories has not been data-driven but rather dependent on the user's prior knowledge of the systems, thus limiting the scope and utility of the MD simulations. Recently, we pioneered using BNM for analyzing the MD trajectories of protein complexes. The resulting BN models yield novel fully data-driven insights into the functional importance of the amino acid residues that modulate proteins' function. In this report, we describe the BaNDyT software package that implements the BNM specifically attuned to the MD simulation trajectories data. We believe that BaNDyT is the first software package to include specialized and advanced features for analyzing MD simulation trajectories using a probabilistic graphical network model. We describe here the software's uses, the methods associated with it, and a comprehensive Python interface to the underlying generalist BNM code. This provides a powerful and versatile mechanism for users to control the workflow. As an application example, we have utilized this methodology and associated software to study how membrane proteins, specifically the G protein-coupled receptors, selectively couple to G proteins. The software can be used for analyzing MD trajectories of any protein as well as polymeric materials.
Collapse
Affiliation(s)
- Elizaveta Mukhaleva
- Department of Computational and Quantitative Medicine, Beckman Research Institute of the City of Hope, 1218 S 5th Ave, Monrovia, California 91016, United States
- Irell and Manella Graduate School of Biological Sciences, Beckman Research Institute of the City of Hope, 1500 E Duarte Road, Duarte, California 91010, United States
| | - Babgen Manookian
- Department of Computational and Quantitative Medicine, Beckman Research Institute of the City of Hope, 1218 S 5th Ave, Monrovia, California 91016, United States
| | - Hanyu Chen
- Department of Computational and Quantitative Medicine, Beckman Research Institute of the City of Hope, 1218 S 5th Ave, Monrovia, California 91016, United States
- Irell and Manella Graduate School of Biological Sciences, Beckman Research Institute of the City of Hope, 1500 E Duarte Road, Duarte, California 91010, United States
| | - Indira R Sivaraj
- Department of Computational and Quantitative Medicine, Beckman Research Institute of the City of Hope, 1218 S 5th Ave, Monrovia, California 91016, United States
| | - Ning Ma
- Department of Computational and Quantitative Medicine, Beckman Research Institute of the City of Hope, 1218 S 5th Ave, Monrovia, California 91016, United States
| | - Wenyuan Wei
- Department of Computational and Quantitative Medicine, Beckman Research Institute of the City of Hope, 1218 S 5th Ave, Monrovia, California 91016, United States
- Irell and Manella Graduate School of Biological Sciences, Beckman Research Institute of the City of Hope, 1500 E Duarte Road, Duarte, California 91010, United States
| | - Konstancja Urbaniak
- Department of Computational and Quantitative Medicine, Beckman Research Institute of the City of Hope, 1218 S 5th Ave, Monrovia, California 91016, United States
| | - Grigoriy Gogoshin
- Department of Computational and Quantitative Medicine, Beckman Research Institute of the City of Hope, 1218 S 5th Ave, Monrovia, California 91016, United States
| | - Supriyo Bhattacharya
- Department of Computational and Quantitative Medicine, Beckman Research Institute of the City of Hope, 1218 S 5th Ave, Monrovia, California 91016, United States
| | - Nagarajan Vaidehi
- Department of Computational and Quantitative Medicine, Beckman Research Institute of the City of Hope, 1218 S 5th Ave, Monrovia, California 91016, United States
- Irell and Manella Graduate School of Biological Sciences, Beckman Research Institute of the City of Hope, 1500 E Duarte Road, Duarte, California 91010, United States
| | - Andrei S Rodin
- Department of Computational and Quantitative Medicine, Beckman Research Institute of the City of Hope, 1218 S 5th Ave, Monrovia, California 91016, United States
- Irell and Manella Graduate School of Biological Sciences, Beckman Research Institute of the City of Hope, 1500 E Duarte Road, Duarte, California 91010, United States
| | - Sergio Branciamore
- Department of Computational and Quantitative Medicine, Beckman Research Institute of the City of Hope, 1218 S 5th Ave, Monrovia, California 91016, United States
- Irell and Manella Graduate School of Biological Sciences, Beckman Research Institute of the City of Hope, 1500 E Duarte Road, Duarte, California 91010, United States
| |
Collapse
|
5
|
Yang J, Balutowski A, Trivedi M, Wencewicz TA. Chemical Logic of Peptide Branching by Iterative Nonlinear Nonribosomal Peptide Synthetases. Biochemistry 2025; 64:719-734. [PMID: 39847710 DOI: 10.1021/acs.biochem.4c00749] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2025]
Abstract
Branch-point syntheses in nonribosomal peptide assembly are rare but useful strategies to generate tripodal peptides with advantageous hexadentate iron-chelating capabilities, as seen in siderophores. However, the chemical logic underlying the peptide branching by nonribosomal peptide synthetase (NRPS) often remains complex and elusive. Here, we review the common strategies for the biosynthesis of branched nonribosomal peptides (NRPs) and present our biochemical investigation on the NRPS-catalyzed assembly of fimsbactin A, a branched mixed-ligand siderophore produced by the human pathogenic strain Acinetobacter baumannii. We untangled the unusual branching mechanism of fimsbactin A biosynthesis through a combination of bioinformatics, site-directed mutagenesis, in vitro reconstitution, molecular modeling, and molecular dynamics simulation. Our findings clarify the roles of the fimsbactin NRPS enzymes, uncovering catalytically redundant domains and identifying the multifunctional nature of the FbsF cyclization (Cy) domain. We demonstrate the dynamic interplay between l-serine and 2,3-dihydroxybenzoic acid derived dipeptides, partitioning between amide and ester forms via a 1,2-N-to-O-acyl shift orchestrated by the noncanonical, multichannel FbsF Cy domain. The branching event occurs in a secondary condensation event facilitated by this Cy domain with two dipeptidyl intermediates, which generates a branched tetrapeptide thioester. Finally, the terminal condensation domain of FbsG recruits a soluble nucleophile to release the final product. This study advances our understanding of the intricate biosynthetic pathways and chemical logic employed by NRPSs, shedding light on the mechanisms underlying the synthesis of complex branched peptides.
Collapse
Affiliation(s)
- Jinping Yang
- Department of Chemistry, Washington University in St. Louis, One Brookings Drive, St. Louis, Missouri 63130, United States
| | - Adam Balutowski
- Department of Chemistry, Washington University in St. Louis, One Brookings Drive, St. Louis, Missouri 63130, United States
| | - Megan Trivedi
- Department of Chemistry, Washington University in St. Louis, One Brookings Drive, St. Louis, Missouri 63130, United States
| | - Timothy A Wencewicz
- Department of Chemistry, Washington University in St. Louis, One Brookings Drive, St. Louis, Missouri 63130, United States
| |
Collapse
|
6
|
Mitra S, Biswas R, Chakrabarty S. WeTICA: A directed search weighted ensemble based enhanced sampling method to estimate rare event kinetics in a reduced dimensional space. J Chem Phys 2025; 162:034106. [PMID: 39812249 DOI: 10.1063/5.0239713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2024] [Accepted: 12/30/2024] [Indexed: 01/16/2025] Open
Abstract
Estimating rare event kinetics from molecular dynamics simulations is a non-trivial task despite the great advances in enhanced sampling methods. Weighted Ensemble (WE) simulation, a special class of enhanced sampling techniques, offers a way to directly calculate kinetic rate constants from biased trajectories without the need to modify the underlying energy landscape using bias potentials. Conventional WE algorithms use different binning schemes to partition the collective variable (CV) space separating the two metastable states of interest. In this work, we have developed a new "binless" WE simulation algorithm to bypass the hurdles of optimizing binning procedures. Our proposed protocol (WeTICA) uses a low-dimensional CV space to drive the WE simulation toward the specified target state. We have applied this new algorithm to recover the unfolding kinetics of three proteins: (A) TC5b Trp-cage mutant, (B) TC10b Trp-cage mutant, and (C) Protein G, with unfolding times spanning the range between 3 and 40 μs using projections along predefined fixed Time-lagged Independent Component Analysis (TICA) eigenvectors as CVs. Calculated unfolding times converge to the reported values with good accuracy with more than one order of magnitude less cumulative WE simulation time than the unfolding time scales with or without a priori knowledge of the CVs that can capture unfolding. Our algorithm can be used with other linear CVs, not limited to TICA. Moreover, the new walker selection criteria for resampling employed in this algorithm can be used on more sophisticated nonlinear CV space for further improvements of binless WE methods.
Collapse
Affiliation(s)
- Sudipta Mitra
- Department of Chemical and Biological Sciences, S. N. Bose National Centre for Basic Sciences, Block-JD, Sector-III, Salt Lake, Kolkata 700106, India
| | - Ranjit Biswas
- Department of Chemical and Biological Sciences, S. N. Bose National Centre for Basic Sciences, Block-JD, Sector-III, Salt Lake, Kolkata 700106, India
| | - Suman Chakrabarty
- Department of Chemical and Biological Sciences, S. N. Bose National Centre for Basic Sciences, Block-JD, Sector-III, Salt Lake, Kolkata 700106, India
| |
Collapse
|
7
|
Vögele M, Thomson NJ, Truong ST, McAvity J, Zachariae U, Dror RO. Systematic analysis of biomolecular conformational ensembles with PENSA. J Chem Phys 2025; 162:014101. [PMID: 39745157 PMCID: PMC11698571 DOI: 10.1063/5.0235544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2024] [Accepted: 12/12/2024] [Indexed: 01/06/2025] Open
Abstract
Atomic-level simulations are widely used to study biomolecules and their dynamics. A common goal in such studies is to compare simulations of a molecular system under several conditions-for example, with various mutations or bound ligands-in order to identify differences between the molecular conformations adopted under these conditions. However, the large amount of data produced by simulations of ever larger and more complex systems often renders it difficult to identify the structural features that are relevant to a particular biochemical phenomenon. We present a flexible software package named Python ENSemble Analysis (PENSA) that enables a comprehensive and thorough investigation into biomolecular conformational ensembles. It provides featurization and feature transformations that allow for a complete representation of biomolecules such as proteins and nucleic acids, including water and ion binding sites, thus avoiding the bias that would come with manual feature selection. PENSA implements methods to systematically compare the distributions of molecular features across ensembles to find the significant differences between them and identify regions of interest. It also includes a novel approach to quantify the state-specific information between two regions of a biomolecule, which allows, for example, tracing information flow to identify allosteric pathways. PENSA also comes with convenient tools for loading data and visualizing results, making them quick to process and easy to interpret. PENSA is an open-source Python library maintained at https://github.com/drorlab/pensa along with an example workflow and a tutorial. We demonstrate its usefulness in real-world examples by showing how it helps us determine molecular mechanisms efficiently.
Collapse
Affiliation(s)
- Martin Vögele
- Authors to whom correspondence should be addressed: and
| | - Neil J. Thomson
- Department of Computational Biology, School of Life Sciences, University of Dundee, Dow Street, Dundee DD1 5EH, United Kingdom
| | - Sang T. Truong
- Department of Computer Science, Stanford University, Stanford, California 94305, USA
| | - Jasper McAvity
- Department of Computer Science, Stanford University, Stanford, California 94305, USA
| | | | - Ron O. Dror
- Authors to whom correspondence should be addressed: and
| |
Collapse
|
8
|
Shao D, Zhang Z, Liu X, Fu H, Shao X, Cai W. Screening Fast-Mode Motion in Collective Variable Discovery for Biochemical Processes. J Chem Theory Comput 2024; 20:10393-10405. [PMID: 39601677 DOI: 10.1021/acs.jctc.4c01282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
Collective variables (CVs) describing slow degrees of freedom (DOFs) in biomolecular assemblies are crucial for analyzing molecular dynamics trajectories, creating Markov models and performing CV-based enhanced sampling simulations. While time-lagged independent component analysis (tICA) and its nonlinear successor, time-lagged autoencoder (tAE), are widely used, they often struggle to capture protein dynamics due to interference from random fluctuations along fast DOFs. To address this issue, we propose a novel approach integrating discrete wavelet transform (DWT) with dimensionality reduction techniques. DWT effectively separates fast and slow motion in protein simulation trajectories by decoupling high- and low-frequency signals. Based on the trajectory after filtering out high-frequency signals, which corresponds to fast motion, tICA and tAE can accurately extract CVs representing slow DOFs, providing reliable insights into protein dynamics. Our method demonstrates superior performance in identifying CVs that distinguish metastable states compared to standard tICA and tAE, as validated through analyses of conformational changes of alanine dipeptide and tripeptide and folding of CLN025. Moreover, we show that DWT can be used to improve the performance of a variety of CV-finding algorithms by combining it with Deep-tICA, a cutting-edge CV-finding algorithm, to extract CVs for enhanced-sampling calculations. Given its negligible computational cost and remarkable ability to screen fast motion, we propose DWT as a "free lunch" for CV extraction, applicable to a wide range of CV-finding algorithms.
Collapse
Affiliation(s)
- Donghui Shao
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Zhiteng Zhang
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xuyang Liu
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Haohao Fu
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xueguang Shao
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Wensheng Cai
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| |
Collapse
|
9
|
Zhang C, Osato M, Mobley DL. Kinetics-Based State Definitions for Discrete Binding Conformations of T4 L99A in MD via Markov State Modeling. J Chem Inf Model 2024; 64:8870-8879. [PMID: 39589162 PMCID: PMC11812578 DOI: 10.1021/acs.jcim.4c01364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2024]
Abstract
As a model system, the binding pocket of the L99A mutant of T4 lysozyme has been the subject of numerous computational free energy studies. However, previous studies have failed to fully sample and account for the observed changes in the binding pocket of T4 L99A upon binding of a congeneric ligand series, limiting the accuracy of results. In this work, we resolve the closed, intermediate, and open states for T4 L99A previously reported in experiment in MD and establish definitions for these states based on the dynamics of the system. From this analysis, we arrive at two primary conclusions. First, assignment of simulation trajectories into discrete states should not be done simply based on RMSD to crystal structures as this can result in misassignment of states. Second, the different metastable conformations studied here need to be carefully treated, as we estimate the time scales for conformational interconversion to be on the order of 102 to 103 ns─far longer than time scales for typical binding calculations. We conclude with a discussion on the need to develop enhanced sampling methods to generally account for significant changes in protein conformation due to relatively small ligand perturbations.
Collapse
Affiliation(s)
- Chris Zhang
- Department of Chemistry, University of California, Irvine, 1120 Natural Sciences II, Irvine, California 92697, United States
| | - Meghan Osato
- Department of Pharmaceutical Sciences, University of California, Irvine, 856 Health Sciences Road, Irvine, California 92697, United States
| | - David L. Mobley
- Department of Chemistry, University of California, Irvine, 1120 Natural Sciences II, Irvine, California 92697, United States
- Department of Pharmaceutical Sciences, University of California, Irvine, 856 Health Sciences Road, Irvine, California 92697, United States
| |
Collapse
|
10
|
Harding-Larsen D, Funk J, Madsen NG, Gharabli H, Acevedo-Rocha CG, Mazurenko S, Welner DH. Protein representations: Encoding biological information for machine learning in biocatalysis. Biotechnol Adv 2024; 77:108459. [PMID: 39366493 DOI: 10.1016/j.biotechadv.2024.108459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Revised: 09/19/2024] [Accepted: 09/29/2024] [Indexed: 10/06/2024]
Abstract
Enzymes offer a more environmentally friendly and low-impact solution to conventional chemistry, but they often require additional engineering for their application in industrial settings, an endeavour that is challenging and laborious. To address this issue, the power of machine learning can be harnessed to produce predictive models that enable the in silico study and engineering of improved enzymatic properties. Such machine learning models, however, require the conversion of the complex biological information to a numerical input, also called protein representations. These inputs demand special attention to ensure the training of accurate and precise models, and, in this review, we therefore examine the critical step of encoding protein information to numeric representations for use in machine learning. We selected the most important approaches for encoding the three distinct biological protein representations - primary sequence, 3D structure, and dynamics - to explore their requirements for employment and inductive biases. Combined representations of proteins and substrates are also introduced as emergent tools in biocatalysis. We propose the division of fixed representations, a collection of rule-based encoding strategies, and learned representations extracted from the latent spaces of large neural networks. To select the most suitable protein representation, we propose two main factors to consider. The first one is the model setup, which is influenced by the size of the training dataset and the choice of architecture. The second factor is the model objectives such as consideration about the assayed property, the difference between wild-type models and mutant predictors, and requirements for explainability. This review is aimed at serving as a source of information and guidance for properly representing enzymes in future machine learning models for biocatalysis.
Collapse
Affiliation(s)
- David Harding-Larsen
- The Novo Nordisk Center for Biosustainability, Technical University of Denmark, Søltofts Plads, Bygning 220, 2800 Kgs. Lyngby, Denmark
| | - Jonathan Funk
- The Novo Nordisk Center for Biosustainability, Technical University of Denmark, Søltofts Plads, Bygning 220, 2800 Kgs. Lyngby, Denmark
| | - Niklas Gesmar Madsen
- The Novo Nordisk Center for Biosustainability, Technical University of Denmark, Søltofts Plads, Bygning 220, 2800 Kgs. Lyngby, Denmark
| | - Hani Gharabli
- The Novo Nordisk Center for Biosustainability, Technical University of Denmark, Søltofts Plads, Bygning 220, 2800 Kgs. Lyngby, Denmark
| | - Carlos G Acevedo-Rocha
- The Novo Nordisk Center for Biosustainability, Technical University of Denmark, Søltofts Plads, Bygning 220, 2800 Kgs. Lyngby, Denmark
| | - Stanislav Mazurenko
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech Republic; International Clinical Research Center, St. Anne's University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Ditte Hededam Welner
- The Novo Nordisk Center for Biosustainability, Technical University of Denmark, Søltofts Plads, Bygning 220, 2800 Kgs. Lyngby, Denmark.
| |
Collapse
|
11
|
Dey R, Taraphder S. Molecular Modeling of Glycosylated Catalytic Domain of Human Carbonic Anhydrase IX. J Phys Chem B 2024; 128:11054-11068. [PMID: 39487784 DOI: 10.1021/acs.jpcb.4c03514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2024]
Abstract
Glycans exhibit significant structural diversity due to the flexibility of glycosidic bonds linking their constituent monosaccharides and the formation of numerous hydrogen bonds. The present work searches a simulated ensemble of glycan chain conformations attached to the catalytic domain of N-glycosylated human carbonic anhydrase IX (HCA IX-c) to identify conformations pointed away or back-folded toward the protein surface guided by different amino acid residues. A series of classical molecular dynamics (MD) simulation studies for a total of 30 μs followed by accelerated MD simulations for a total of 2 μs have been performed using two different force fields to capture varying degrees of fluctuations of both glycan chain and HCA IX. From the underlying free energy profile and kinetics derived using hidden Markov state model, several stable glycan orientations are identified that extend away from the protein surface and convert among each other with rate constants of the order 107-1010 S-1. Most importantly, we have identified a rare glycan conformation which reaches close to a catalytically important amino acid residue, Glu-106. We further enlist the protein residues that couple such less frequent event of the glycan chain back-folding toward the surface of the protein.
Collapse
Affiliation(s)
- Ritwika Dey
- Department of Chemistry, Indian Institute of Technology, Kharagpur 721302, India
| | - Srabani Taraphder
- Department of Chemistry, Indian Institute of Technology, Kharagpur 721302, India
| |
Collapse
|
12
|
Das M, Venkatramani R. A Mode Evolution Metric to Extract Reaction Coordinates for Biomolecular Conformational Transitions. J Chem Theory Comput 2024; 20:8422-8436. [PMID: 39287954 DOI: 10.1021/acs.jctc.4c00744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/19/2024]
Abstract
The complex, multidimensional energy landscape of biomolecules makes the extraction of suitable, nonintuitive collective variables (CVs) that describe their conformational transitions challenging. At present, dimensionality reduction approaches and machine learning (ML) schemes are employed to obtain CVs from molecular dynamics (MD)/Monte Carlo (MC) trajectories or structural databanks for biomolecules. However, minimum sampling conditions to generate reliable CVs that accurately describe the underlying energy landscape remain unclear. Here, we address this issue by developing a Mode evolution Metric (MeM) to extract CVs that can pinpoint new states and describe local transitions in the vicinity of a reference minimum from nonequilibrated MD/MC trajectories. We present a general mathematical formulation of MeM for both statistical dimensionality reduction and machine learning approaches. Application of MeM to MC trajectories of model potential energy landscapes and MD trajectories of solvated alanine dipeptide reveals that the principal components which locate new states in the vicinity of a reference minimum emerge well before the trajectories locally equilibrate between the associated states. Finally, we demonstrate a possible application of MeM in designing efficient biased sampling schemes to construct accurate energy landscape slices that link transitions between states. MeM can help speed up the search for new minima around a biomolecular conformational state and enable the accurate estimation of thermodynamics for states lying on the energy landscape and the description of associated transitions.
Collapse
Affiliation(s)
- Mitradip Das
- Department of Chemical Sciences, Tata Institue of Fundamental Research, Colaba, Mumbai 400005, India
| | - Ravindra Venkatramani
- Department of Chemical Sciences, Tata Institue of Fundamental Research, Colaba, Mumbai 400005, India
| |
Collapse
|
13
|
Oh M, Rosa M, Xie H, Khelashvili G. Automated collective variable discovery for MFSD2A transporter from molecular dynamics simulations. Biophys J 2024; 123:2934-2955. [PMID: 38932456 PMCID: PMC11393714 DOI: 10.1016/j.bpj.2024.06.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Revised: 06/03/2024] [Accepted: 06/24/2024] [Indexed: 06/28/2024] Open
Abstract
Biomolecules often exhibit complex free energy landscapes in which long-lived metastable states are separated by large energy barriers. Overcoming these barriers to robustly sample transitions between the metastable states with classical molecular dynamics (MD) simulations presents a challenge. To circumvent this issue, collective variable (CV)-based enhanced sampling MD approaches are often employed. Traditional CV selection relies on intuition and prior knowledge of the system. This approach introduces bias, which can lead to incomplete mechanistic insights. Thus, automated CV detection is desired to gain a deeper understanding of the system/process. Analysis of MD data with various machine-learning algorithms, such as principal component analysis (PCA), support vector machine, and linear discriminant analysis (LDA) based approaches have been implemented for automated CV detection. However, their performance has not been systematically evaluated on structurally and mechanistically complex biological systems. Here, we applied these methods to MD simulations of the MFSD2A (Major Facilitator Superfamily Domain 2A) lysolipid transporter in multiple functionally relevant metastable states with the goal of identifying optimal CVs that would structurally discriminate these states. Specific emphasis was on the automated detection and interpretive power of LDA-based CVs. We found that LDA methods, which included a novel gradient descent-based multiclass harmonic variant, termed GDHLDA, we developed here, outperform PCA in class separation, exhibiting remarkable consistency in extracting CVs critical for distinguishing metastable states. Furthermore, the identified CVs included features previously associated with conformational transitions in MFSD2A. Specifically, conformational shifts in transmembrane helix 7 and in residue Y294 on this helix emerged as critical features discriminating the metastable states in MFSD2A. This highlights the effectiveness of LDA-based approaches in automatically extracting from MD trajectories CVs of functional relevance that can be used to drive biased MD simulations to efficiently sample conformational transitions in the molecular system.
Collapse
Affiliation(s)
- Myongin Oh
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, New York
| | - Margarida Rosa
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, New York
| | - Hengyi Xie
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, New York
| | - George Khelashvili
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, New York; Institute for Computational Biomedicine, Weill Cornell Medicine, New York, New York.
| |
Collapse
|
14
|
Wang B, Wang J, Yang W, Zhao L, Wei B, Chen J. Unveiling Allosteric Regulation and Binding Mechanism of BRD9 through Molecular Dynamics Simulations and Markov Modeling. Molecules 2024; 29:3496. [PMID: 39124901 PMCID: PMC11314499 DOI: 10.3390/molecules29153496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2024] [Revised: 07/15/2024] [Accepted: 07/24/2024] [Indexed: 08/12/2024] Open
Abstract
Bromodomain-containing protein 9 (BRD9) is a key player in chromatin remodeling and gene expression regulation, and it is closely associated with the development of various diseases, including cancers. Recent studies have indicated that inhibition of BRD9 may have potential value in the treatment of certain cancers. Molecular dynamics (MD) simulations, Markov modeling and principal component analysis were performed to investigate the binding mechanisms of allosteric inhibitor POJ and orthosteric inhibitor 82I to BRD9 and its allosteric regulation. Our results indicate that binding of these two types of inhibitors induces significant structural changes in the protein, particularly in the formation and dissolution of α-helical regions. Markov flux analysis reveals notable changes occurring in the α-helicity near the ZA loop during the inhibitor binding process. Calculations of binding free energies reveal that the cooperation of orthosteric and allosteric inhibitors affects binding ability of inhibitors to BRD9 and modifies the active sites of orthosteric and allosteric positions. This research is expected to provide new insights into the inhibitory mechanism of 82I and POJ on BRD9 and offers a theoretical foundation for development of cancer treatment strategies targeting BRD9.
Collapse
Affiliation(s)
- Bin Wang
- Center for Medical Artificial Intelligence, Shandong University of Traditional Chinese Medicine, Qingdao 266112, China;
| | - Jian Wang
- School of Science, Shandong Jiaotong University, Jinan 250357, China; (J.W.); (W.Y.); (L.Z.)
| | - Wanchun Yang
- School of Science, Shandong Jiaotong University, Jinan 250357, China; (J.W.); (W.Y.); (L.Z.)
| | - Lu Zhao
- School of Science, Shandong Jiaotong University, Jinan 250357, China; (J.W.); (W.Y.); (L.Z.)
| | - Benzheng Wei
- Center for Medical Artificial Intelligence, Shandong University of Traditional Chinese Medicine, Qingdao 266112, China;
| | - Jianzhong Chen
- School of Science, Shandong Jiaotong University, Jinan 250357, China; (J.W.); (W.Y.); (L.Z.)
| |
Collapse
|
15
|
Bakker M, Svensson O, So̷rensen HV, Skepö M. Exploring the Functional Landscape of the p53 Regulatory Domain: The Stabilizing Role of Post-Translational Modifications. J Chem Theory Comput 2024; 20:5842-5853. [PMID: 38973087 PMCID: PMC11270737 DOI: 10.1021/acs.jctc.4c00570] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2024] [Revised: 06/25/2024] [Accepted: 06/25/2024] [Indexed: 07/09/2024]
Abstract
This study focuses on the intrinsically disordered regulatory domain of p53 and the impact of post-translational modifications. Through fully atomistic explicit water molecular dynamics simulations, we show the wealth of information and detailed understanding that can be obtained by varying the number of phosphorylated amino acids and implementing a restriction in the conformational entropy of the N-termini of that intrinsically disordered region. The take-home message for the reader is to achieve a detailed understanding of the impact of phosphorylation with respect to (1) the conformational dynamics and flexibility, (2) structural effects, (3) protein interactivity, and (4) energy landscapes and conformational ensembles. Although our model system is the regulatory domain p53 of the tumor suppressor protein p53, this study contributes to understanding the general effects of intrinsically disordered phosphorylated proteins and the impact of phosphorylated groups, more specifically, how minor changes in the primary sequence can affect the properties mentioned above.
Collapse
Affiliation(s)
- Michael
J. Bakker
- Faculty
of Pharmacy in Hradec Králové, Charles University, Akademika Heyrovského 1203/8, 500 05 Hradec Králové, Czech Republic
- Division
of Computational Chemistry, Department of Chemistry, Lund University, P.O. Box 124, 221 00 Lund, Sweden
| | - Oskar Svensson
- Division
of Computational Chemistry, Department of Chemistry, Lund University, P.O. Box 124, 221 00 Lund, Sweden
- NanoLund, Lund University, Box 118, 221 00 Lund, Sweden
| | - Henrik V. So̷rensen
- Division
of Computational Chemistry, Department of Chemistry, Lund University, P.O. Box 124, 221 00 Lund, Sweden
- MAX
IV Laboratory, Fotongatan
2, 224 84 Lund, Sweden
| | - Marie Skepö
- Division
of Computational Chemistry, Department of Chemistry, Lund University, P.O. Box 124, 221 00 Lund, Sweden
- NanoLund, Lund University, Box 118, 221 00 Lund, Sweden
| |
Collapse
|
16
|
Muduli S, Karmakar S, Mishra S. Conformational Dynamics in Corynebacterium glutamicum Diaminopimelate Epimerase: Insights from Ligand Parameterization, Atomistic Simulation, and Markov State Modeling. J Chem Inf Model 2024; 64:4250-4262. [PMID: 38701175 DOI: 10.1021/acs.jcim.4c00480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/05/2024]
Abstract
The microbial enzyme diaminopimelate epimerase (DapF), a vital enzyme in the lysine biosynthetic pathway, catalyzes the conversion of L, L-diaminopimelate (L, L-DAP) to D, L-diaminopimelate (D, L-DAP) using a catalytic cysteine dyad with one cysteine in thiol state and another in thiolate. Under oxidizing conditions, the catalytic cysteines of apo DapF form a disulfide bond that alters the structure and function of DapF. Given its potential as a target for antimicrobial resistance treatments, understanding DapF's functional dynamics is imperative. In the present work, we employ microsecond-scale all-atom molecular dynamics simulations of product-bound DapF and apo-DapF under oxidized and reduced conditions. We employ a polarized charge model for the ligand and the active site residues, which was necessary to preserve the electrostatic environment in the active site and retain the ligand in the active site. The product-bound DapF and apo-DapF in oxidized and reduced conditions exhibit a closed, semi-open, and open conformation, respectively, as identified using the internal coordinates of the dimeric enzyme and the principal component analysis. The conformational switch is guided by the dynamic catalytic (DC) loop, loop II, and loop III movements in the active site. The time scale of the close-to-open conformational transition is estimated to be 0.8 μs through Markov state modeling (MSM) and transition path theory (TPT). The present study explains the role of various active site residues and loops in ligand binding and protein dynamics in the DapF enzyme under different redox conditions. Such information will be helpful in future inhibitor design studies targeting the DapF enzyme.
Collapse
Affiliation(s)
- Sunita Muduli
- Department of Chemistry, Indian Institute of Technology Kharagpur, Kharagpur 721302, India
| | - Soumyajit Karmakar
- Department of Chemistry, Indian Institute of Technology Kharagpur, Kharagpur 721302, India
| | - Sabyashachi Mishra
- Department of Chemistry, Indian Institute of Technology Kharagpur, Kharagpur 721302, India
| |
Collapse
|
17
|
Grassmann G, Miotto M, Desantis F, Di Rienzo L, Tartaglia GG, Pastore A, Ruocco G, Monti M, Milanetti E. Computational Approaches to Predict Protein-Protein Interactions in Crowded Cellular Environments. Chem Rev 2024; 124:3932-3977. [PMID: 38535831 PMCID: PMC11009965 DOI: 10.1021/acs.chemrev.3c00550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 02/20/2024] [Accepted: 02/21/2024] [Indexed: 04/11/2024]
Abstract
Investigating protein-protein interactions is crucial for understanding cellular biological processes because proteins often function within molecular complexes rather than in isolation. While experimental and computational methods have provided valuable insights into these interactions, they often overlook a critical factor: the crowded cellular environment. This environment significantly impacts protein behavior, including structural stability, diffusion, and ultimately the nature of binding. In this review, we discuss theoretical and computational approaches that allow the modeling of biological systems to guide and complement experiments and can thus significantly advance the investigation, and possibly the predictions, of protein-protein interactions in the crowded environment of cell cytoplasm. We explore topics such as statistical mechanics for lattice simulations, hydrodynamic interactions, diffusion processes in high-viscosity environments, and several methods based on molecular dynamics simulations. By synergistically leveraging methods from biophysics and computational biology, we review the state of the art of computational methods to study the impact of molecular crowding on protein-protein interactions and discuss its potential revolutionizing effects on the characterization of the human interactome.
Collapse
Affiliation(s)
- Greta Grassmann
- Department
of Biochemical Sciences “Alessandro Rossi Fanelli”, Sapienza University of Rome, Rome 00185, Italy
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
| | - Mattia Miotto
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
| | - Fausta Desantis
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
- The
Open University Affiliated Research Centre at Istituto Italiano di
Tecnologia, Genoa 16163, Italy
| | - Lorenzo Di Rienzo
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
| | - Gian Gaetano Tartaglia
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
- Department
of Neuroscience and Brain Technologies, Istituto Italiano di Tecnologia, Genoa 16163, Italy
- Center
for Human Technologies, Genoa 16152, Italy
| | - Annalisa Pastore
- Experiment
Division, European Synchrotron Radiation
Facility, Grenoble 38043, France
| | - Giancarlo Ruocco
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
- Department
of Physics, Sapienza University, Rome 00185, Italy
| | - Michele Monti
- RNA
System Biology Lab, Department of Neuroscience and Brain Technologies, Istituto Italiano di Tecnologia, Genoa 16163, Italy
| | - Edoardo Milanetti
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
- Department
of Physics, Sapienza University, Rome 00185, Italy
| |
Collapse
|
18
|
Banerjee P, Monje-Galvan V, Voth GA. Cooperative Membrane Binding of HIV-1 Matrix Proteins. J Phys Chem B 2024; 128:2595-2606. [PMID: 38477117 PMCID: PMC10962350 DOI: 10.1021/acs.jpcb.3c06222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 02/24/2024] [Accepted: 02/27/2024] [Indexed: 03/14/2024]
Abstract
The HIV-1 assembly process begins with a newly synthesized Gag polyprotein being targeted to the inner leaflet of the plasma membrane of the infected cells to form immature viral particles. Gag-membrane interactions are mediated through the myristoylated (Myr) N-terminal matrix (MA) domain of Gag, which eventually multimerize on the membrane to form trimers and higher order oligomers. The study of the structure and dynamics of peripheral membrane proteins like MA has been challenging for both experimental and computational studies due to the complex transient dynamics of protein-membrane interactions. Although the roles of anionic phospholipids (PIP2, PS) and the Myr group in the membrane targeting and stable membrane binding of MA are now well-established, the cooperative interactions between the MA monomers and MA-membrane remain elusive in the context of viral assembly and release. Our present study focuses on the membrane binding dynamics of a higher order oligomeric structure of MA protein (a dimer of trimers), which has not been explored before. Employing time-lagged independent component analysis (tICA) to our microsecond-long trajectories, we investigate conformational changes of the matrix protein induced by membrane binding. Interestingly, the Myr switch of an MA monomer correlates with the conformational switch of adjacent monomers in the same trimer. Together, our findings suggest complex protein dynamics during the formation of the immature HIV-1 lattice; while MA trimerization facilitates Myr insertion, MA trimer-trimer interactions in the immature lattice can hinder the same.
Collapse
Affiliation(s)
- Puja Banerjee
- Department of Chemistry,
Chicago Center for Theoretical Chemistry, Institute for Biophysical
Dynamics, and James Franck Institute, The
University of Chicago, Chicago, Illinois 60637, United States
| | | | - Gregory A. Voth
- Department of Chemistry,
Chicago Center for Theoretical Chemistry, Institute for Biophysical
Dynamics, and James Franck Institute, The
University of Chicago, Chicago, Illinois 60637, United States
| |
Collapse
|
19
|
Rydzewski J, Gökdemir T. Learning Markovian dynamics with spectral maps. J Chem Phys 2024; 160:091102. [PMID: 38436438 DOI: 10.1063/5.0189241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 02/05/2024] [Indexed: 03/05/2024] Open
Abstract
The long-time behavior of many complex molecular systems can often be described by Markovian dynamics in a slow subspace spanned by a few reaction coordinates referred to as collective variables (CVs). However, determining CVs poses a fundamental challenge in chemical physics. Depending on intuition or trial and error to construct CVs can lead to non-Markovian dynamics with long memory effects, hindering analysis. To address this problem, we continue to develop a recently introduced deep-learning technique called spectral map [J. Rydzewski, J. Phys. Chem. Lett. 14, 5216-5220 (2023)]. Spectral map learns slow CVs by maximizing a spectral gap of a Markov transition matrix describing anisotropic diffusion. Here, to represent heterogeneous and multiscale free-energy landscapes with spectral map, we implement an adaptive algorithm to estimate transition probabilities. Through a Markov state model analysis, we validate that spectral map learns slow CVs related to the dominant relaxation timescales and discerns between long-lived metastable states.
Collapse
Affiliation(s)
- Jakub Rydzewski
- Institute of Physics, Faculty of Physics, Astronomy and Informatics, Nicolaus Copernicus University, Grudziadzka 5, 87-100 Toruń, Poland
| | - Tuğçe Gökdemir
- Institute of Physics, Faculty of Physics, Astronomy and Informatics, Nicolaus Copernicus University, Grudziadzka 5, 87-100 Toruń, Poland
| |
Collapse
|
20
|
Fu H, Bian H, Shao X, Cai W. Collective Variable-Based Enhanced Sampling: From Human Learning to Machine Learning. J Phys Chem Lett 2024; 15:1774-1783. [PMID: 38329095 DOI: 10.1021/acs.jpclett.3c03542] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
Enhanced-sampling algorithms relying on collective variables (CVs) are extensively employed to study complex (bio)chemical processes that are not amenable to brute-force molecular simulations. The selection of appropriate CVs characterizing the slow movement modes is of paramount importance for reliable and efficient enhanced-sampling simulations. In this Perspective, we first review the application and limitations of CVs obtained from chemical and geometrical intuition. We also introduce path-sampling algorithms, which can identify path-like CVs in a high-dimensional free-energy space. Machine-learning algorithms offer a viable approach to finding suitable CVs by analyzing trajectories from preliminary simulations. We discuss both the performance of machine-learning-derived CVs in enhanced-sampling simulations of experimental models and the challenges involved in applying these CVs to realistic, complex molecular assemblies. Moreover, we provide a prospective view of the potential advancements of machine-learning algorithms for the development of CVs in the field of enhanced-sampling simulations.
Collapse
Affiliation(s)
- Haohao Fu
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Hengwei Bian
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xueguang Shao
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Wensheng Cai
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| |
Collapse
|
21
|
Liu X, Xing J, Fu H, Shao X, Cai W. Analyzing Molecular Dynamics Trajectories Thermodynamically through Artificial Intelligence. J Chem Theory Comput 2024; 20:665-676. [PMID: 38193858 DOI: 10.1021/acs.jctc.3c00975] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2024]
Abstract
Molecular dynamics simulations produce trajectories that correspond to vast amounts of structure when exploring biochemical processes. Extracting valuable information, e.g., important intermediate states and collective variables (CVs) that describe the major movement modes, from molecular trajectories to understand the underlying mechanisms of biological processes presents a significant challenge. To achieve this goal, we introduce a deep learning approach, coined DIKI (deep identification of key intermediates), to determine low-dimensional CVs distinguishing key intermediate conformations without a-priori assumptions. DIKI dynamically plans the distribution of latent space and groups together similar conformations within the same cluster. Moreover, by incorporating two user-defined parameters, namely, coarse focus knob and fine focus knob, to help identify conformations with low free energy and differentiate the subtle distinctions among these conformations, resolution-tunable clustering was achieved. Furthermore, the integration of DIKI with a path-finding algorithm contributes to the identification of crucial intermediates along the lowest free-energy pathway. We postulate that DIKI is a robust and flexible tool that can find widespread applications in the analysis of complex biochemical processes.
Collapse
Affiliation(s)
- Xuyang Liu
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Jingya Xing
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Haohao Fu
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xueguang Shao
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Wensheng Cai
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| |
Collapse
|
22
|
Oh M, da Hora GCA, Swanson JMJ. tICA-Metadynamics for Identifying Slow Dynamics in Membrane Permeation. J Chem Theory Comput 2023; 19:8886-8900. [PMID: 37943658 PMCID: PMC11282584 DOI: 10.1021/acs.jctc.3c00526] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2023]
Abstract
Molecular simulations are commonly used to understand the mechanism of membrane permeation of small molecules, particularly for biomedical and pharmaceutical applications. However, despite significant advances in computing power and algorithms, calculating an accurate permeation free energy profile remains elusive for many drug molecules because it can require identifying the rate-limiting degrees of freedom (i.e., appropriate reaction coordinates). To resolve this issue, researchers have developed machine learning approaches to identify slow system dynamics. In this work, we apply time-lagged independent component analysis (tICA), an unsupervised dimensionality reduction algorithm, to molecular dynamics simulations with well-tempered metadynamics to find the slowest collective degrees of freedom of the permeation process of trimethoprim through a multicomponent membrane. We show that tICA-metadynamics yields translational and orientational collective variables (CVs) that increase convergence efficiency ∼1.5 times. However, crossing the periodic boundary is shown to introduce artifacts in the translational CV that can be corrected by taking absolute values of molecular features. Additionally, we find that the convergence of the tICA CVs is reached with approximately five membrane crossings and that data reweighting is required to avoid deviations in the translational CV.
Collapse
Affiliation(s)
- Myongin Oh
- Department of Chemistry, University of Utah, 315 South 1400 East, Rm 2020, Salt Lake City, Utah 84112, United States
| | - Gabriel C A da Hora
- Department of Chemistry, University of Utah, 315 South 1400 East, Rm 2020, Salt Lake City, Utah 84112, United States
| | - Jessica M J Swanson
- Department of Chemistry, University of Utah, 315 South 1400 East, Rm 2020, Salt Lake City, Utah 84112, United States
| |
Collapse
|
23
|
Rai D, Mondal D, Taraphder S. pH-Dependent Structure and Dynamics of the Catalytic Domains of Human Carbonic Anhydrase II and IX. J Phys Chem B 2023; 127:10279-10294. [PMID: 37983689 DOI: 10.1021/acs.jpcb.3c04721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]
Abstract
Extensive computer simulation studies have been carried out to probe the pH-dependent structure and dynamics of the two most efficient isoenzymes II and IX of human carbonic anhydrase (HCA) that control the pH in the human body. The equilibrium structure and hydration of their catalytic domains are found to be largely unaffected by the variation of pH in the range studied, in close agreement with the known experimental results. In contrast, a significant effect of the change in pH is observed for the first time on the local electrostatic potential of the active site walls and the dynamics of active site water molecules. We also report for the first time the free energy and kinetics of coupled fluctuations of orientation and protonation states of the well-known His-mediated proton shuttle (His-64) in both isozymes at pH 7 and 8. The transitions between different tautomers of in or out conformations of His-64 side chain range between 109 and 106 s-1 depending on pH. Possible implications of these results on conformation-dependent pKa of His-64 side chain and its role in driving the catalysis toward hydration of CO2 or dehydration of HCO3- with varying pH are discussed.
Collapse
Affiliation(s)
- Divya Rai
- Department of Chemistry, Indian Institute of Technology Kharagpur, Kharagpur 721302, India
| | - Dulal Mondal
- Department of Chemistry, Indian Institute of Technology Kharagpur, Kharagpur 721302, India
| | - Srabani Taraphder
- Department of Chemistry, Indian Institute of Technology Kharagpur, Kharagpur 721302, India
| |
Collapse
|
24
|
Shinn M. Phantom oscillations in principal component analysis. Proc Natl Acad Sci U S A 2023; 120:e2311420120. [PMID: 37988465 PMCID: PMC10691246 DOI: 10.1073/pnas.2311420120] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 10/18/2023] [Indexed: 11/23/2023] Open
Abstract
Principal component analysis (PCA) is a dimensionality reduction method that is known for being simple and easy to interpret. Principal components are often interpreted as low-dimensional patterns in high-dimensional space. However, this simple interpretation fails for timeseries, spatial maps, and other continuous data. In these cases, nonoscillatory data may have oscillatory principal components. Here, we show that two common properties of data cause oscillatory principal components: smoothness and shifts in time or space. These two properties implicate almost all neuroscience data. We show how the oscillations produced by PCA, which we call "phantom oscillations," impact data analysis. We also show that traditional cross-validation does not detect phantom oscillations, so we suggest procedures that do. Our findings are supported by a collection of mathematical proofs. Collectively, our work demonstrates that patterns which emerge from high-dimensional data analysis may not faithfully represent the underlying data.
Collapse
Affiliation(s)
- Maxwell Shinn
- University College London (UCL) Queen Square Institute of Neurology, University College London, LondonWC1E 6BT, United Kingdom
| |
Collapse
|
25
|
Conflitti P, Raniolo S, Limongelli V. Perspectives on Ligand/Protein Binding Kinetics Simulations: Force Fields, Machine Learning, Sampling, and User-Friendliness. J Chem Theory Comput 2023; 19:6047-6061. [PMID: 37656199 PMCID: PMC10536999 DOI: 10.1021/acs.jctc.3c00641] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Indexed: 09/02/2023]
Abstract
Computational techniques applied to drug discovery have gained considerable popularity for their ability to filter potentially active drugs from inactive ones, reducing the time scale and costs of preclinical investigations. The main focus of these studies has historically been the search for compounds endowed with high affinity for a specific molecular target to ensure the formation of stable and long-lasting complexes. Recent evidence has also correlated the in vivo drug efficacy with its binding kinetics, thus opening new fascinating scenarios for ligand/protein binding kinetic simulations in drug discovery. The present article examines the state of the art in the field, providing a brief summary of the most popular and advanced ligand/protein binding kinetics techniques and evaluating their current limitations and the potential solutions to reach more accurate kinetic models. Particular emphasis is put on the need for a paradigm change in the present methodologies toward ligand and protein parametrization, the force field problem, characterization of the transition states, the sampling issue, and algorithms' performance, user-friendliness, and data openness.
Collapse
Affiliation(s)
- Paolo Conflitti
- Faculty
of Biomedical Sciences, Euler Institute, Universitá della Svizzera italiana (USI), 6900 Lugano, Switzerland
| | - Stefano Raniolo
- Faculty
of Biomedical Sciences, Euler Institute, Universitá della Svizzera italiana (USI), 6900 Lugano, Switzerland
| | - Vittorio Limongelli
- Faculty
of Biomedical Sciences, Euler Institute, Universitá della Svizzera italiana (USI), 6900 Lugano, Switzerland
- Department
of Pharmacy, University of Naples “Federico
II”, 80131 Naples, Italy
| |
Collapse
|
26
|
Sarkar D, Lee H, Vant JW, Turilli M, Vermaas JV, Jha S, Singharoy A. Adaptive Ensemble Refinement of Protein Structures in High Resolution Electron Microscopy Density Maps with Radical Augmented Molecular Dynamics Flexible Fitting. J Chem Inf Model 2023; 63:5834-5846. [PMID: 37661856 DOI: 10.1021/acs.jcim.3c00350] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]
Abstract
Recent advances in cryo-electron microscopy (cryo-EM) have enabled modeling macromolecular complexes that are essential components of the cellular machinery. The density maps derived from cryo-EM experiments are often integrated with manual, knowledge-driven or artificial intelligence-driven and physics-guided computational methods to build, fit, and refine molecular structures. Going beyond a single stationary-structure determination scheme, it is becoming more common to interpret the experimental data with an ensemble of models that contributes to an average observation. Hence, there is a need to decide on the quality of an ensemble of protein structures on-the-fly while refining them against the density maps. We introduce such an adaptive decision-making scheme during the molecular dynamics flexible fitting (MDFF) of biomolecules. Using RADICAL-Cybertools, the new RADICAL augmented MDFF implementation (R-MDFF) is examined in high-performance computing environments for refinement of two prototypical protein systems, adenylate kinase and carbon monoxide dehydrogenase. For these test cases, use of multiple replicas in flexible fitting with adaptive decision making in R-MDFF improves the overall correlation to the density by 40% relative to the refinements of the brute-force MDFF. The improvements are particularly significant at high, 2-3 Å map resolutions. More importantly, the ensemble model captures key features of biologically relevant molecular dynamics that are inaccessible to a single-model interpretation. Finally, the pipeline is applicable to systems of growing sizes, which is demonstrated using ensemble refinement of capsid proteins from the chimpanzee adenovirus. The overhead for decision making remains low and robust to computing environments. The software is publicly available on GitHub and includes a short user guide to install R-MDFF on different computing environments, from local Linux-based workstations to high-performance computing environments.
Collapse
Affiliation(s)
- Daipayan Sarkar
- MSU-DOE Plant Research Laboratory, East Lansing, Michigan 48824, United States
- School of Molecular Sciences, Arizona State University, Tempe, Arizona 85281, United States
| | - Hyungro Lee
- Pacific Northwest National Laboratory, Richland, Washington 99354, United States
- Electrical & Computer Engineering, Rutgers University, New Brunswick, New Jersey 08854, United States
| | - John W Vant
- School of Molecular Sciences, Arizona State University, Tempe, Arizona 85281, United States
| | - Matteo Turilli
- Electrical & Computer Engineering, Rutgers University, New Brunswick, New Jersey 08854, United States
- Computational Science Initiative, Brookhaven National Laboratory, Upton, New York 11973, United States
| | - Josh V Vermaas
- MSU-DOE Plant Research Laboratory, East Lansing, Michigan 48824, United States
| | - Shantenu Jha
- Electrical & Computer Engineering, Rutgers University, New Brunswick, New Jersey 08854, United States
- Computational Science Initiative, Brookhaven National Laboratory, Upton, New York 11973, United States
| | - Abhishek Singharoy
- School of Molecular Sciences, Arizona State University, Tempe, Arizona 85281, United States
| |
Collapse
|
27
|
Wellawatte GP, Hocky GM, White AD. Neural potentials of proteins extrapolate beyond training data. J Chem Phys 2023; 159:085103. [PMID: 37642255 PMCID: PMC10474891 DOI: 10.1063/5.0147240] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Accepted: 07/31/2023] [Indexed: 08/31/2023] Open
Abstract
We evaluate neural network (NN) coarse-grained (CG) force fields compared to traditional CG molecular mechanics force fields. We conclude that NN force fields are able to extrapolate and sample from unseen regions of the free energy surface when trained with limited data. Our results come from 88 NN force fields trained on different combinations of clustered free energy surfaces from four protein mapped trajectories. We used a statistical measure named total variation similarity to assess the agreement between reference free energy surfaces from mapped atomistic simulations and CG simulations from trained NN force fields. Our conclusions support the hypothesis that NN CG force fields trained with samples from one region of the proteins' free energy surface can, indeed, extrapolate to unseen regions. Additionally, the force matching error was found to only be weakly correlated with a force field's ability to reconstruct the correct free energy surface.
Collapse
Affiliation(s)
- Geemi P. Wellawatte
- Department of Chemistry, University of Rochester, Rochester, New York 14627, USA
| | - Glen M. Hocky
- Department of Chemistry, Simons Center for Computational Physical Chemistry, New York University, New York, New York 10003, USA
| | - Andrew D. White
- Department of Chemical Engineering, University of Rochester, Rochester, New York 14627, USA
| |
Collapse
|
28
|
Kozlowski N, Grubmüller H. Uncertainties in Markov State Models of Small Proteins. J Chem Theory Comput 2023; 19:5516-5524. [PMID: 37540193 PMCID: PMC10448719 DOI: 10.1021/acs.jctc.3c00372] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Indexed: 08/05/2023]
Abstract
Markov state models are widely used to describe and analyze protein dynamics based on molecular dynamics simulations, specifically to extract functionally relevant characteristic time scales and motions. Particularly for larger biomolecules such as proteins, however, insufficient sampling is a notorious concern and often the source of large uncertainties that are difficult to quantify. Furthermore, there are several other sources of uncertainty, such as choice of the number of Markov states and lag time, choice and parameters of dimension reduction preprocessing step, and uncertainty due to the limited number of observed transitions; the latter is often estimated via a Bayesian approach. Here, we quantified and ranked all of these uncertainties for four small globular test proteins. We found that the largest uncertainty is due to insufficient sampling and initially increases with the total trajectory length T up to a critical tipping point, after which it decreases as 1 / T , thus providing guidelines for how much sampling is required for given accuracy. We also found that single long trajectories yielded better sampling accuracy than many shorter trajectories starting from the same structure. In comparison, the remaining sources of the above uncertainties are generally smaller by a factor of about 5, rendering them less of a concern but certainly not negligible. Importantly, the Bayes uncertainty, commonly used as the only uncertainty estimate, captures only a relatively small part of the true uncertainty, which is thus often drastically underestimated.
Collapse
Affiliation(s)
- Nicolai Kozlowski
- Department of Theoretical and Computational
Biophysics, Max-Planck-Institute for Multidisciplinary
Sciences, Göttingen 37077, Germany
| | - Helmut Grubmüller
- Department of Theoretical and Computational
Biophysics, Max-Planck-Institute for Multidisciplinary
Sciences, Göttingen 37077, Germany
| |
Collapse
|
29
|
Oh M, da Hora GCA, Swanson JMJ. tICA-Metadynamics for Identifying Slow Dynamics in Membrane Permeation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.16.553477. [PMID: 37645884 PMCID: PMC10462029 DOI: 10.1101/2023.08.16.553477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]
Abstract
Molecular simulations are commonly used to understand the mechanism of membrane permeation of small molecules, particularly for biomedical and pharmaceutical applications. However, despite significant advances in computing power and algorithms, calculating an accurate permeation free energy profile remains elusive for many drug molecules because it can require identifying the rate-limiting degrees of freedom (i.e., appropriate reaction coordinates). To resolve this issue, researchers have developed machine learning approaches to identify slow system dynamics. In this work, we apply time-lagged independent component analysis (tICA), an unsupervised dimensionality reduction algorithm, to molecular dynamics simulations with well-tempered metadynamics to find the slowest collective degrees of freedom of the permeation process of trimethoprim through a multicomponent membrane. We show that tICA-metadynamics yields translational and orientational collective variables (CVs) that increase convergence efficiency ∼1.5 times. However, crossing the periodic boundary is shown to introduce artefacts in the translational CV that can be corrected by taking absolute values of molecular features. Additionally, we find that the convergence of the tICA CVs is reached with approximately five membrane crossings, and that data reweighting is required to avoid deviations in the translational CV.
Collapse
|
30
|
Hayward S. A Retrospective on the Development of Methods for the Analysis of Protein Conformational Ensembles. Protein J 2023:10.1007/s10930-023-10113-9. [PMID: 37072659 DOI: 10.1007/s10930-023-10113-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/05/2023] [Indexed: 04/20/2023]
Abstract
Analysing protein conformational ensembles whether from molecular dynamics (MD) simulation or other sources for functionally relevant conformational changes can be very challenging. In the nineteen nineties dimensional reduction methods were developed primarily for analysing MD trajectories to determine dominant motions with the aim of understanding their relationship to function. Coarse-graining methods were also developed so the conformational change between two structures could be described in terms of the relative motion of a small number of quasi-rigid regions rather than in terms of a large number of atoms. When these methods are combined, they can characterize the large-scale motions inherent in a conformational ensemble providing insight into possible functional mechanism. The dimensional reduction methods first applied to protein conformational ensembles were referred to as Quasi-Harmonic Analysis, Principal Component Analysis and Essential Dynamics Analysis. A retrospective on the origin of these methods is presented, the relationships between them explained, and more recent developments reviewed.
Collapse
Affiliation(s)
- Steven Hayward
- Laboratory for Computational Biology, School of Computing Sciences, University of East Anglia, Norwich, UK.
| |
Collapse
|
31
|
Rai D, Khatua S, Taraphder S. Structure and Dynamics of the Isozymes II and IX of Human Carbonic Anhydrase. ACS OMEGA 2022; 7:31149-31166. [PMID: 36092600 PMCID: PMC9453958 DOI: 10.1021/acsomega.2c03356] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Accepted: 08/15/2022] [Indexed: 06/15/2023]
Abstract
Human carbonic anhydrases (HCAs) are responsible for the pH control and sensing in our body and constitute key components in the central pH paradigm connected to cancer therapeutics. However, little or no molecular level studies are available on the pH-dependent stability and functional dynamics of the known isozymes of HCA. The main objective of this Article is to report the first bench-marking study on the structure and dynamics of the two most efficient isozymes, HCA II and IX, at neutral pH using classical molecular dynamics (MD) and constant pH MD (CpHMD) simulations combined with umbrella sampling, transition path sampling, and Markov state models. Starting from the known crystal structures of HCA II and the monomeric catalytic domain of HCA IX (labeled as HCA IX-c), we have generated classical MD and CpHMD trajectories (of length 1 μs each). In all cases, the overall stability, RMSD, and secondary structure segments of the two isozymes are found to be quite similar. Functionally important dynamics of these two enzymes have been probed in terms of active site hydration, coordination of the Zn(II) ion to a transient excess water, and the formation of putative proton transfer paths. The most important difference between the two isozymes is observed for the side-chain fluctuations of His-64 that is expected to shuttle an excess proton out of the active site as a part of the rate-determining intramolecular proton transfer reaction. The relative stability of the stable inward and outward conformations of the His-64 side-chain and the underlying free energy surfaces are found to depend strongly on the isozyme. In each case, a lower free energy barrier is detected between predominantly inward conformations from predominantly outward ones when simulated under constant pH conditions. The kinetic rate constants of interconversion between different free energy basins are found to span 107-108 s-1 with faster conformational transitions predicted at constant pH condition. The estimated rate constants and free energies are expected to validate if the fluctuation of the His-64 side-chain in HCA IX may have a significance similar to that known in the multistep catalytic cycle of HCA II.
Collapse
|
32
|
Bhakat S. Collective variable discovery in the age of machine learning: reality, hype and everything in between. RSC Adv 2022; 12:25010-25024. [PMID: 36199882 PMCID: PMC9437778 DOI: 10.1039/d2ra03660f] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Accepted: 08/20/2022] [Indexed: 11/21/2022] Open
Abstract
Understanding the kinetics and thermodynamics profile of biomolecules is necessary to understand their functional roles which has a major impact in mechanism driven drug discovery. Molecular dynamics simulation has been routinely used to understand conformational dynamics and molecular recognition in biomolecules. Statistical analysis of high-dimensional spatiotemporal data generated from molecular dynamics simulation requires identification of a few low-dimensional variables which can describe the essential dynamics of a system without significant loss of information. In physical chemistry, these low-dimensional variables are often called collective variables. Collective variables are used to generate reduced representations of free energy surfaces and calculate transition probabilities between different metastable basins. However the choice of collective variables is not trivial for complex systems. Collective variables range from geometric criteria such as distances and dihedral angles to abstract ones such as weighted linear combinations of multiple geometric variables. The advent of machine learning algorithms led to increasing use of abstract collective variables to represent biomolecular dynamics. In this review, I will highlight several nuances of commonly used collective variables ranging from geometric to abstract ones. Further, I will put forward some cases where machine learning based collective variables were used to describe simple systems which in principle could have been described by geometric ones. Finally, I will put forward my thoughts on artificial general intelligence and how it can be used to discover and predict collective variables from spatiotemporal data generated by molecular dynamics simulations.
Collapse
Affiliation(s)
- Soumendranath Bhakat
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania Pennsylvania 19104-6059 USA +1 30549 32620
| |
Collapse
|
33
|
Principal Component Analysis and Related Methods for Investigating the Dynamics of Biological Macromolecules. J 2022. [DOI: 10.3390/j5020021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Principal component analysis (PCA) is used to reduce the dimensionalities of high-dimensional datasets in a variety of research areas. For example, biological macromolecules, such as proteins, exhibit many degrees of freedom, allowing them to adopt intricate structures and exhibit complex functions by undergoing large conformational changes. Therefore, molecular simulations of and experiments on proteins generate a large number of structure variations in high-dimensional space. PCA and many PCA-related methods have been developed to extract key features from such structural data, and these approaches have been widely applied for over 30 years to elucidate macromolecular dynamics. This review mainly focuses on the methodological aspects of PCA and related methods and their applications for investigating protein dynamics.
Collapse
|
34
|
Djokovic N, Ruzic D, Rahnasto-Rilla M, Srdic-Rajic T, Lahtela-Kakkonen M, Nikolic K. Expanding the Accessible Chemical Space of SIRT2 Inhibitors through Exploration of Binding Pocket Dynamics. J Chem Inf Model 2022; 62:2571-2585. [PMID: 35467856 DOI: 10.1021/acs.jcim.2c00241] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Considerations of binding pocket dynamics are one of the crucial aspects of the rational design of binders. Identification of alternative conformational states or cryptic subpockets could lead to the discovery of completely novel groups of the ligands. However, experimental characterization of pocket dynamics, besides being expensive, may not be able to elucidate all of the conformational states relevant for drug discovery projects. In this study, we propose the protocol for computational simulations of sirtuin 2 (SIRT2) binding pocket dynamics and its integration into the structure-based virtual screening (SBVS) pipeline. Initially, unbiased molecular dynamics simulations of SIRT2:inhibitor complexes were performed using optimized force field parameters of SIRT2 inhibitors. Time-lagged independent component analysis (tICA) was used to design pocket-related collective variables (CVs) for enhanced sampling of SIRT2 pocket dynamics. Metadynamics simulations in the tICA eigenvector space revealed alternative conformational states of the SIRT2 binding pocket and the existence of a cryptic subpocket. Newly identified SIRT2 conformational states outperformed experimentally resolved states in retrospective SBVS validation. After performing prospective SBVS, compounds from the under-represented portions of the SIRT2 inhibitor chemical space were selected for in vitro evaluation. Two compounds, NDJ18 and NDJ85, were identified as potent and selective SIRT2 inhibitors, which validated the in silico protocol and opened up the possibility for generalization and broadening of its application. The anticancer effects of the most potent compound NDJ18 were examined on the triple-negative breast cancer cell line. Results indicated that NDJ18 represents a promising structure suitable for further evaluation.
Collapse
Affiliation(s)
- Nemanja Djokovic
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, University of Belgrade, Vojvode Stepe 450, 11221 Belgrade, Serbia
| | - Dusan Ruzic
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, University of Belgrade, Vojvode Stepe 450, 11221 Belgrade, Serbia
| | - Minna Rahnasto-Rilla
- School of Pharmacy, University of Eastern Finland, P.O. Box 1627, 70210 Kuopio, Finland
| | - Tatjana Srdic-Rajic
- Department of Experimental Oncology, Institute for Oncology and Radiology of Serbia, Pasterova 14, 11000 Belgrade, Serbia
| | | | - Katarina Nikolic
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, University of Belgrade, Vojvode Stepe 450, 11221 Belgrade, Serbia
| |
Collapse
|
35
|
Structural Consequence of Non-Synonymous Single-Nucleotide Variants in the N-Terminal Domain of LIS1. Int J Mol Sci 2022; 23:ijms23063109. [PMID: 35328531 PMCID: PMC8955593 DOI: 10.3390/ijms23063109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 03/10/2022] [Accepted: 03/11/2022] [Indexed: 02/04/2023] Open
Abstract
Disruptive neuronal migration during early brain development causes severe brain malformation. Characterized by mislocalization of cortical neurons, this condition is a result of the loss of function of migration regulating genes. One known neuronal migration disorder is lissencephaly (LIS), which is caused by deletions or mutations of the LIS1 (PAFAH1B1) gene that has been implicated in regulating the microtubule motor protein cytoplasmic dynein. Although this class of diseases has recently received considerable attention, the roles of non-synonymous polymorphisms (nsSNPs) in LIS1 on lissencephaly progression remain elusive. Therefore, the present study employed combined bioinformatics and molecular modeling approach to identify potential damaging nsSNPs in the LIS1 gene and provide atomic insight into their roles in LIS1 loss of function. Using this approach, we identified three high-risk nsSNPs, including rs121434486 (F31S), rs587784254 (W55R), and rs757993270 (W55L) in the LIS1 gene, which are located on the N-terminal domain of LIS1. Molecular dynamics simulation highlighted that all variants decreased helical conformation, increased the intermonomeric distance, and thus disrupted intermonomeric contacts in the LIS1 dimer. Furthermore, the presence of variants also caused a loss of positive electrostatic potential and reduced dimer binding potential. Since self-dimerization is an essential aspect of LIS1 to recruit interacting partners, thus these variants are associated with the loss of LIS1 functions. As a corollary, these findings may further provide critical insights on the roles of LIS1 variants in brain malformation.
Collapse
|