1
|
Li J, Knijff L, Zhang ZY, Andersson L, Zhang C. PiNN: Equivariant Neural Network Suite for Modeling Electrochemical Systems. J Chem Theory Comput 2025; 21:1382-1395. [PMID: 39883580 PMCID: PMC11823406 DOI: 10.1021/acs.jctc.4c01570] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2024] [Revised: 01/07/2025] [Accepted: 01/23/2025] [Indexed: 02/01/2025]
Abstract
Electrochemical energy storage and conversion play increasingly important roles in electrification and sustainable development across the globe. A key challenge therein is to understand, control, and design electrochemical energy materials with atomistic precision. This requires inputs from molecular modeling powered by machine learning (ML) techniques. In this work, we have upgraded our pairwise interaction neural network Python package PiNN via introducing equivariant features to the PiNet2 architecture for fitting potential energy surfaces along with PiNet2-dipole for dipole and charge predictions as well as PiNet2-χ for generating atom-condensed charge response kernels. By benchmarking publicly accessible data sets of small molecules, crystalline materials, and liquid electrolytes, we found that the equivariant PiNet2 shows significant improvements over the original PiNet architecture and provides a state-of-the-art overall performance. Furthermore, leveraging on plug-ins such as PiNNAcLe for an adaptive learn-on-the-fly workflow in generating ML potentials and PiNNwall for modeling heterogeneous electrodes under external bias, we expect PiNN to serve as a versatile and high-performing ML-accelerated platform for molecular modeling of electrochemical systems.
Collapse
Affiliation(s)
- Jichen Li
- Department
of Chemistry-Ångström Laboratory, Uppsala University, Lägerhyddsvägen 1, P.O. Box 538, 75121 Uppsala, Sweden
| | - Lisanne Knijff
- Department
of Chemistry-Ångström Laboratory, Uppsala University, Lägerhyddsvägen 1, P.O. Box 538, 75121 Uppsala, Sweden
| | - Zhan-Yun Zhang
- Department
of Chemistry-Ångström Laboratory, Uppsala University, Lägerhyddsvägen 1, P.O. Box 538, 75121 Uppsala, Sweden
- Wallenberg
Initiative Materials Science for Sustainability, Uppsala University, 75121 Uppsala, Sweden
| | - Linnéa Andersson
- Department
of Chemistry-Ångström Laboratory, Uppsala University, Lägerhyddsvägen 1, P.O. Box 538, 75121 Uppsala, Sweden
| | - Chao Zhang
- Department
of Chemistry-Ångström Laboratory, Uppsala University, Lägerhyddsvägen 1, P.O. Box 538, 75121 Uppsala, Sweden
- Wallenberg
Initiative Materials Science for Sustainability, Uppsala University, 75121 Uppsala, Sweden
| |
Collapse
|
2
|
Shao D, Zhang Z, Liu X, Fu H, Shao X, Cai W. Screening Fast-Mode Motion in Collective Variable Discovery for Biochemical Processes. J Chem Theory Comput 2024; 20:10393-10405. [PMID: 39601677 DOI: 10.1021/acs.jctc.4c01282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
Collective variables (CVs) describing slow degrees of freedom (DOFs) in biomolecular assemblies are crucial for analyzing molecular dynamics trajectories, creating Markov models and performing CV-based enhanced sampling simulations. While time-lagged independent component analysis (tICA) and its nonlinear successor, time-lagged autoencoder (tAE), are widely used, they often struggle to capture protein dynamics due to interference from random fluctuations along fast DOFs. To address this issue, we propose a novel approach integrating discrete wavelet transform (DWT) with dimensionality reduction techniques. DWT effectively separates fast and slow motion in protein simulation trajectories by decoupling high- and low-frequency signals. Based on the trajectory after filtering out high-frequency signals, which corresponds to fast motion, tICA and tAE can accurately extract CVs representing slow DOFs, providing reliable insights into protein dynamics. Our method demonstrates superior performance in identifying CVs that distinguish metastable states compared to standard tICA and tAE, as validated through analyses of conformational changes of alanine dipeptide and tripeptide and folding of CLN025. Moreover, we show that DWT can be used to improve the performance of a variety of CV-finding algorithms by combining it with Deep-tICA, a cutting-edge CV-finding algorithm, to extract CVs for enhanced-sampling calculations. Given its negligible computational cost and remarkable ability to screen fast motion, we propose DWT as a "free lunch" for CV extraction, applicable to a wide range of CV-finding algorithms.
Collapse
Affiliation(s)
- Donghui Shao
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Zhiteng Zhang
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xuyang Liu
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Haohao Fu
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xueguang Shao
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Wensheng Cai
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| |
Collapse
|
3
|
Fiorin G, Marinelli F, Forrest LR, Chen H, Chipot C, Kohlmeyer A, Santuz H, Hénin J. Expanded Functionality and Portability for the Colvars Library. J Phys Chem B 2024; 128:11108-11123. [PMID: 39501453 PMCID: PMC11572706 DOI: 10.1021/acs.jpcb.4c05604] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Revised: 10/08/2024] [Accepted: 10/14/2024] [Indexed: 11/15/2024]
Abstract
Colvars is an open-source C++ library that provides a modular toolkit for collective-variable-based molecular simulations. It allows practitioners to easily create and implement descriptors that best fit a process of interest and to apply a wide range of biasing algorithms in collective variable space. This paper reviews several features and improvements to Colvars that were added since its original introduction. Special attention is given to contributions that significantly expanded the capabilities of this software or its distribution with major MD simulation packages. Collective variables can now be optimized either manually or by machine-learning methods, and the space of descriptors can be explored interactively using the graphical interface included in VMD. Beyond the spatial coordinates of individual molecules, Colvars can now apply biasing forces to mesoscale structures and alchemical degrees of freedom and perform simulations guided by experimental data within ensemble averages or probability distributions. It also features advanced computational schemes to boost the accuracy, robustness, and general applicability of simulation methods, including extended-system and multiple-walker adaptive biasing force, boundary conditions for metadynamics, replica exchange with biasing potentials, and adiabatic bias molecular dynamics. The library is made available directly within the main distributions of the academic software GROMACS, LAMMPS, NAMD, Tinker-HP, and VMD. The robustness of the software and the reliability of the results are ensured through the use of continuous integration with a test suite within the source repository.
Collapse
Affiliation(s)
- Giacomo Fiorin
- National
Institute of Neurological Disorders and Stroke, Bethesda, Maryland 20814, United States
- National
Heart, Lung and Blood Institute, Bethesda, Maryland 20892, United States
| | - Fabrizio Marinelli
- National
Heart, Lung and Blood Institute, Bethesda, Maryland 20892, United States
- Department
of Biophysics and Data Science Institute, Medical College of Wisconsin, Milwaukee, Wisconsin 53226-3548, United States
| | - Lucy R. Forrest
- National
Institute of Neurological Disorders and Stroke, Bethesda, Maryland 20814, United States
| | - Haochuan Chen
- Theoretical
and Computational Biophysics Group, Beckman Institute, and Department
of Physics, University of Illinois at Urbana−Champaign, Urbana, Illinois 61820, United States
| | - Christophe Chipot
- Theoretical
and Computational Biophysics Group, Beckman Institute, and Department
of Physics, University of Illinois at Urbana−Champaign, Urbana, Illinois 61820, United States
- Laboratoire
International Associé CNRS et University of Illinois at Urbana−Champaign,
UMR 7019, Université de Lorraine, 54506 Vandœuvre-lès-Nancy, France
- Department
of Biochemistry and Molecular Biology, The
University of Chicago, 929 E. 57th Street W225, Chicago, Illinois 60637, United States
- Department
of Chemistry, The University of Hawai’i
at Manoa, 2545 McCarthy
Mall, Honolulu, Hawaii 96822, United States
| | - Axel Kohlmeyer
- Institute
for Computational Molecular Science, Temple
University, Philadelphia, Pennsylvania 19122, United States
| | - Hubert Santuz
- Laboratoire
de Biochimie Théorique UPR 9080, Université Paris Cité, CNRS, 75005 Paris, France
| | - Jérôme Hénin
- Laboratoire
de Biochimie Théorique UPR 9080, Université Paris Cité, CNRS, 75005 Paris, France
| |
Collapse
|
4
|
An H, Liu X, Cai W, Shao X. AttenGpKa: A Universal Predictor of Solvation Acidity Using Graph Neural Network and Molecular Topology. J Chem Inf Model 2024; 64:5480-5491. [PMID: 38982757 DOI: 10.1021/acs.jcim.4c00449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/11/2024]
Abstract
Rapid and accurate calculation of acid dissociation constant (pKa) is crucial for designing chemical synthesis routes, optimizing catalysts, and predicting chemical behavior. Despite recent progress in machine learning, predicting solvation acidity, especially in nonaqueous solvents, remains challenging due to limited experimental data. This challenge arises from treating experimental values in different solvents as distinct data domains and modeling them separately. In this work, we treat both the solutes and solvents equally from a perspective of molecular topology and propose a highly universal framework called AttenGpKa for predicting solvation acidity. AttenGpKa is trained using 26,522 experimental pKa values from 60 pure and mixed solvents in the iBonD database. As a result, our model can simultaneously predict the pKa values of a compound in various solvents, including pure water, pure nonaqueous, and mixed solvents. AttenGpKa achieves universality by using graph neural networks and attention mechanisms to learn complex effects within solute and solvent molecules. Furthermore, encodings of both solute and solvent molecules are adaptively fused to simulate the influence of the solvent on acid dissociation. AttenGpKa demonstrates robust generalization in extensive validations. The interpretability studies further indicate that our model has effectively learnt electronic and solvent effects. A free-to-use software is provided to facilitate the use of AttenGpKa for pKa prediction.
Collapse
Affiliation(s)
- Hongle An
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xuyang Liu
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Wensheng Cai
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xueguang Shao
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| |
Collapse
|
5
|
Lelièvre T, Pigeon T, Stoltz G, Zhang W. Analyzing Multimodal Probability Measures with Autoencoders. J Phys Chem B 2024; 128:2607-2631. [PMID: 38466759 DOI: 10.1021/acs.jpcb.3c07075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/13/2024]
Abstract
Finding collective variables to describe some important coarse-grained information on physical systems, in particular metastable states, remains a key issue in molecular dynamics. Recently, machine learning techniques have been intensively used to complement and possibly bypass expert knowledge in order to construct collective variables. Our focus here is on neural network approaches based on autoencoders. We study some relevant mathematical properties of the loss function considered for training autoencoders and provide physical interpretations based on conditional variances and minimum energy paths. We also consider various extensions in order to better describe physical systems, by incorporating more information on transition states at saddle points, and/or allowing for multiple decoders in order to describe several transition paths. Our results are illustrated on toy two-dimensional systems and on alanine dipeptide.
Collapse
Affiliation(s)
- Tony Lelièvre
- CERMICS, École des Ponts ParisTech, 6-8 Avenue Blaise Pascal, 77455 Marne-la-Vallée, France
- MATHERIALS Team-project, Inria Paris, 2 Rue Simone Iff, 75012 Paris, France
| | - Thomas Pigeon
- CERMICS, École des Ponts ParisTech, 6-8 Avenue Blaise Pascal, 77455 Marne-la-Vallée, France
- MATHERIALS Team-project, Inria Paris, 2 Rue Simone Iff, 75012 Paris, France
- IFP Energies Nouvelles, Rond-Point de l'Echangeur de Solaize, BP 3, 69360 Solaize, France
| | - Gabriel Stoltz
- CERMICS, École des Ponts ParisTech, 6-8 Avenue Blaise Pascal, 77455 Marne-la-Vallée, France
- MATHERIALS Team-project, Inria Paris, 2 Rue Simone Iff, 75012 Paris, France
| | - Wei Zhang
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 14, 14195 Berlin, Germany
- Zuse Institute Berlin, Takustraße 7, 14195 Berlin, Germany
| |
Collapse
|
6
|
Gong S, Zheng Z. A slow feature analysis approach for the optimization of collective variables. J Chem Phys 2024; 160:094104. [PMID: 38426510 DOI: 10.1063/5.0191014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Accepted: 02/13/2024] [Indexed: 03/02/2024] Open
Abstract
Molecular dynamics simulations have become increasingly important in understanding the microscopic mechanisms of various molecular systems. However, the high energy barriers in complicated molecules often make it difficult to observe events of interest within a reasonable timescale. To address this issue, researchers have developed a variety of enhanced sampling methods to explore configuration space by adding bias potentials along the slowly changing collective variables (CVs). In this study, we have developed a new tool that combines slow feature analysis and biasing-enhanced sampling methods to identify effective CVs and enhance the sampling efficiency of configuration space. We have demonstrated the effectiveness of this tool through three general examples.
Collapse
Affiliation(s)
- Shuai Gong
- School of Chemistry, Chemical Engineering and Life Science, Wuhan University of Technology, 122 Luoshi Road, Wuhan 430070, People's Republic of China
| | - Zheng Zheng
- School of Chemistry, Chemical Engineering and Life Science, Wuhan University of Technology, 122 Luoshi Road, Wuhan 430070, People's Republic of China
- Divamics Inc., Suzhou 215000, People's Republic of China
| |
Collapse
|
7
|
Fu H, Bian H, Shao X, Cai W. Collective Variable-Based Enhanced Sampling: From Human Learning to Machine Learning. J Phys Chem Lett 2024; 15:1774-1783. [PMID: 38329095 DOI: 10.1021/acs.jpclett.3c03542] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
Enhanced-sampling algorithms relying on collective variables (CVs) are extensively employed to study complex (bio)chemical processes that are not amenable to brute-force molecular simulations. The selection of appropriate CVs characterizing the slow movement modes is of paramount importance for reliable and efficient enhanced-sampling simulations. In this Perspective, we first review the application and limitations of CVs obtained from chemical and geometrical intuition. We also introduce path-sampling algorithms, which can identify path-like CVs in a high-dimensional free-energy space. Machine-learning algorithms offer a viable approach to finding suitable CVs by analyzing trajectories from preliminary simulations. We discuss both the performance of machine-learning-derived CVs in enhanced-sampling simulations of experimental models and the challenges involved in applying these CVs to realistic, complex molecular assemblies. Moreover, we provide a prospective view of the potential advancements of machine-learning algorithms for the development of CVs in the field of enhanced-sampling simulations.
Collapse
Affiliation(s)
- Haohao Fu
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Hengwei Bian
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xueguang Shao
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Wensheng Cai
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Nankai University, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| |
Collapse
|
8
|
Liu X, Xing J, Fu H, Shao X, Cai W. Analyzing Molecular Dynamics Trajectories Thermodynamically through Artificial Intelligence. J Chem Theory Comput 2024; 20:665-676. [PMID: 38193858 DOI: 10.1021/acs.jctc.3c00975] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2024]
Abstract
Molecular dynamics simulations produce trajectories that correspond to vast amounts of structure when exploring biochemical processes. Extracting valuable information, e.g., important intermediate states and collective variables (CVs) that describe the major movement modes, from molecular trajectories to understand the underlying mechanisms of biological processes presents a significant challenge. To achieve this goal, we introduce a deep learning approach, coined DIKI (deep identification of key intermediates), to determine low-dimensional CVs distinguishing key intermediate conformations without a-priori assumptions. DIKI dynamically plans the distribution of latent space and groups together similar conformations within the same cluster. Moreover, by incorporating two user-defined parameters, namely, coarse focus knob and fine focus knob, to help identify conformations with low free energy and differentiate the subtle distinctions among these conformations, resolution-tunable clustering was achieved. Furthermore, the integration of DIKI with a path-finding algorithm contributes to the identification of crucial intermediates along the lowest free-energy pathway. We postulate that DIKI is a robust and flexible tool that can find widespread applications in the analysis of complex biochemical processes.
Collapse
Affiliation(s)
- Xuyang Liu
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Jingya Xing
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Haohao Fu
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xueguang Shao
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Wensheng Cai
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| |
Collapse
|
9
|
Fu H, Liu H, Xing J, Zhao T, Shao X, Cai W. Deep-Learning-Assisted Enhanced Sampling for Exploring Molecular Conformational Changes. J Phys Chem B 2023; 127:9926-9935. [PMID: 37947397 DOI: 10.1021/acs.jpcb.3c05284] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2023]
Abstract
We present a novel strategy to explore conformational changes and identify stable states of molecular objects, eliminating the need for a priori knowledge. The approach applies a deep learning method to extract information about the movement modes of the molecular object from a short, high-dimensional, and parameter-free preliminary enhanced-sampling simulation. The gathered information is described by a small set of deep-learning-based collective variables (dCVs), which steer the production-enhanced-sampling simulation. Considering the challenge of adequately exploring the configurational space using the low-dimensional, suboptimal dCVs, we incorporate a method designed for ergodic sampling, namely, Gaussian-accelerated molecular dynamics (MD), into the framework of CV-based enhanced sampling. MD simulations on both toy models and nontrivial examples demonstrate the remarkable computational efficiency of the strategy in capturing the conformational changes of molecular objects without a priori knowledge. Specifically, we achieved the blind folding of two fast folders, chignolin and villin, within a time scale of hundreds of nanoseconds and successfully reconstructed the free-energy landscapes that characterize their reversible folding. All in all, the presented strategy holds significant promise for investigating conformational changes in macromolecules, and it is anticipated to find extensive applications in the fields of chemistry and biology.
Collapse
Affiliation(s)
- Haohao Fu
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Han Liu
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Jingya Xing
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Tong Zhao
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
| | - Xueguang Shao
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Wensheng Cai
- Research Center for Analytical Sciences, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| |
Collapse
|
10
|
New avenues in artificial-intelligence-assisted drug discovery. Drug Discov Today 2023; 28:103516. [PMID: 36736583 DOI: 10.1016/j.drudis.2023.103516] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2022] [Revised: 12/08/2022] [Accepted: 01/26/2023] [Indexed: 02/05/2023]
Abstract
Over the past decade, the amount of biomedical data available has grown at unprecedented rates. Increased automation technology and larger data volumes have encouraged the use of machine learning (ML) or artificial intelligence (AI) techniques for mining such data and extracting useful patterns. Because the identification of chemical entities with desired biological activity is a crucial task in drug discovery, AI technologies have the potential to accelerate this process and support decision making. In addition, the advent of deep learning (DL) has shown great promise in addressing diverse problems in drug discovery, such as de novo molecular design. Herein, we will appraise the current state-of-the-art in AI-assisted drug discovery, discussing the recent applications covering generative models for chemical structure generation, scoring functions to improve binding affinity and pose prediction, and molecular dynamics to assist in the parametrization, featurization and generalization tasks. Finally, we will discuss current hurdles and the strategies to overcome them, as well as potential future directions.
Collapse
|
11
|
Heydari S, Raniolo S, Livi L, Limongelli V. Transferring chemical and energetic knowledge between molecular systems with machine learning. Commun Chem 2023; 6:13. [PMID: 36697971 PMCID: PMC9839695 DOI: 10.1038/s42004-022-00790-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Accepted: 12/07/2022] [Indexed: 01/15/2023] Open
Abstract
Predicting structural and energetic properties of a molecular system is one of the fundamental tasks in molecular simulations, and it has applications in chemistry, biology, and medicine. In the past decade, the advent of machine learning algorithms had an impact on molecular simulations for various tasks, including property prediction of atomistic systems. In this paper, we propose a novel methodology for transferring knowledge obtained from simple molecular systems to a more complex one, endowed with a significantly larger number of atoms and degrees of freedom. In particular, we focus on the classification of high and low free-energy conformations. Our approach relies on utilizing (i) a novel hypergraph representation of molecules, encoding all relevant information for characterizing multi-atom interactions for a given conformation, and (ii) novel message passing and pooling layers for processing and making free-energy predictions on such hypergraph-structured data. Despite the complexity of the problem, our results show a remarkable Area Under the Curve of 0.92 for transfer learning from tri-alanine to the deca-alanine system. Moreover, we show that the same transfer learning approach can also be used in an unsupervised way to group chemically related secondary structures of deca-alanine in clusters having similar free-energy values. Our study represents a proof of concept that reliable transfer learning models for molecular systems can be designed, paving the way to unexplored routes in prediction of structural and energetic properties of biologically relevant systems.
Collapse
Affiliation(s)
- Sajjad Heydari
- grid.21613.370000 0004 1936 9609Department of Computer Science, University of Manitoba, Winnipeg, MB R3T 2N2 Canada
| | - Stefano Raniolo
- grid.29078.340000 0001 2203 2861Faculty of Biomedical Sciences, Euler Institute, Università della Svizzera italiana (USI), via G. Buffi 13, CH-6900 Lugano, Switzerland
| | - Lorenzo Livi
- grid.21613.370000 0004 1936 9609Department of Computer Science, University of Manitoba, Winnipeg, MB R3T 2N2 Canada ,grid.8391.30000 0004 1936 8024Department of Computer Science, University of Exeter, Exeter, EX4 4QF UK
| | - Vittorio Limongelli
- grid.29078.340000 0001 2203 2861Faculty of Biomedical Sciences, Euler Institute, Università della Svizzera italiana (USI), via G. Buffi 13, CH-6900 Lugano, Switzerland ,grid.4691.a0000 0001 0790 385XDepartment of Pharmacy, University of Naples “Federico II”, via D. Montesano 49, I-80131 Naples, Italy
| |
Collapse
|
12
|
Chen H, Chipot C. Chasing collective variables using temporal data-driven strategies. QRB DISCOVERY 2023; 4:e2. [PMID: 37564298 PMCID: PMC10411323 DOI: 10.1017/qrd.2022.23] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Revised: 12/21/2022] [Accepted: 12/29/2022] [Indexed: 01/09/2023] Open
Abstract
The convergence of free-energy calculations based on importance sampling depends heavily on the choice of collective variables (CVs), which in principle, should include the slow degrees of freedom of the biological processes to be investigated. Autoencoders (AEs), as emerging data-driven dimension reduction tools, have been utilised for discovering CVs. AEs, however, are often treated as black boxes, and what AEs actually encode during training, and whether the latent variables from encoders are suitable as CVs for further free-energy calculations remains unknown. In this contribution, we review AEs and their time-series-based variants, including time-lagged AEs (TAEs) and modified TAEs, as well as the closely related model variational approach for Markov processes networks (VAMPnets). We then show through numerical examples that AEs learn the high-variance modes instead of the slow modes. In stark contrast, time series-based models are able to capture the slow modes. Moreover, both modified TAEs with extensions from slow feature analysis and the state-free reversible VAMPnets (SRVs) can yield orthogonal multidimensional CVs. As an illustration, we employ SRVs to discover the CVs of the isomerizations of N-acetyl-N'-methylalanylamide and trialanine by iterative learning with trajectories from biased simulations. Last, through numerical experiments with anisotropic diffusion, we investigate the potential relationship of time-series-based models and committor probabilities.
Collapse
Affiliation(s)
- Haochuan Chen
- Laboratoire International Associé Centre National de la Recherche Scientifique et University of Illinois at Urbana-Champaign, Unité Mixte de Recherche n°7019, Université de Lorraine, 54506 Vandœuvre-lès-Nancy, France
| | - Christophe Chipot
- Laboratoire International Associé Centre National de la Recherche Scientifique et University of Illinois at Urbana-Champaign, Unité Mixte de Recherche n°7019, Université de Lorraine, 54506 Vandœuvre-lès-Nancy, France
- Theoretical and Computational Biophysics Group, Beckman Institute, and Department of Physics, University of Illinois at Urbana-Champaign, Urbana, IL61801, USA
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, IL60637, USA
| |
Collapse
|
13
|
Ketkaew R, Luber S. DeepCV: A Deep Learning Framework for Blind Search of Collective Variables in Expanded Configurational Space. J Chem Inf Model 2022; 62:6352-6364. [PMID: 36445176 DOI: 10.1021/acs.jcim.2c00883] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
We present Deep learning for Collective Variables (DeepCV), a computer code that provides an efficient and customizable implementation of the deep autoencoder neural network (DAENN) algorithm that has been developed in our group for computing collective variables (CVs) and can be used with enhanced sampling methods to reconstruct free energy surfaces of chemical reactions. DeepCV can be used to conveniently calculate molecular features, train models, generate CVs, validate rare events from sampling, and analyze a trajectory for chemical reactions of interest. We use DeepCV in an example study of the conformational transition of cyclohexene, where metadynamics simulations are performed using DAENN-generated CVs. The results show that the adopted CVs give free energies in line with those obtained by previously developed CVs and experimental results. DeepCV is open-source software written in Python/C++ object-oriented languages, based on the TensorFlow framework and distributed free of charge for noncommercial purposes, which can be incorporated into general molecular dynamics software. DeepCV also comes with several additional tools, i.e., an application program interface (API), documentation, and tutorials.
Collapse
Affiliation(s)
- Rangsiman Ketkaew
- Department of Chemistry, University of Zurich, Winterthurerstrasse 190, CH-8057 Zürich, Switzerland
| | - Sandra Luber
- Department of Chemistry, University of Zurich, Winterthurerstrasse 190, CH-8057 Zürich, Switzerland
| |
Collapse
|
14
|
Bhakat S. Collective variable discovery in the age of machine learning: reality, hype and everything in between. RSC Adv 2022; 12:25010-25024. [PMID: 36199882 PMCID: PMC9437778 DOI: 10.1039/d2ra03660f] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Accepted: 08/20/2022] [Indexed: 11/21/2022] Open
Abstract
Understanding the kinetics and thermodynamics profile of biomolecules is necessary to understand their functional roles which has a major impact in mechanism driven drug discovery. Molecular dynamics simulation has been routinely used to understand conformational dynamics and molecular recognition in biomolecules. Statistical analysis of high-dimensional spatiotemporal data generated from molecular dynamics simulation requires identification of a few low-dimensional variables which can describe the essential dynamics of a system without significant loss of information. In physical chemistry, these low-dimensional variables are often called collective variables. Collective variables are used to generate reduced representations of free energy surfaces and calculate transition probabilities between different metastable basins. However the choice of collective variables is not trivial for complex systems. Collective variables range from geometric criteria such as distances and dihedral angles to abstract ones such as weighted linear combinations of multiple geometric variables. The advent of machine learning algorithms led to increasing use of abstract collective variables to represent biomolecular dynamics. In this review, I will highlight several nuances of commonly used collective variables ranging from geometric to abstract ones. Further, I will put forward some cases where machine learning based collective variables were used to describe simple systems which in principle could have been described by geometric ones. Finally, I will put forward my thoughts on artificial general intelligence and how it can be used to discover and predict collective variables from spatiotemporal data generated by molecular dynamics simulations.
Collapse
Affiliation(s)
- Soumendranath Bhakat
- Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania Pennsylvania 19104-6059 USA +1 30549 32620
| |
Collapse
|
15
|
Duan C, Liu X, Cai W, Shao X. Spectral Encoder to Extract the Features of Near-Infrared Spectra for Multivariate Calibration. J Chem Inf Model 2022; 62:3695-3703. [PMID: 35916486 DOI: 10.1021/acs.jcim.2c00786] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
An autoencoder architecture was adopted for near-infrared (NIR) spectral analysis by extracting the common features in the spectra. Three autoencoder-based networks with different purposes were constructed. First, a spectral encoder was established by training the network with a set of spectra as the input. The features of the spectra can be encoded by the nodes in the bottleneck layer, which in turn can be used to build a sparse and robust model. Second, taking the spectra of one instrument as the input and that of another instrument as the reference output, the common features in both spectra can be obtained in the bottleneck layer. Therefore, in the prediction step, the spectral features of the second can be predicted by taking the reverse of the decoder as the encoder. Furthermore, transfer learning was used to build the model for the spectra of more instruments by fine-tuning the trained network. NIR datasets of plant, wheat, and pharmaceutical tablets measured on multiple instruments were used to test the method. The multi-linear regression (MLR) model with the encoded features was found to have a similar or slightly better performance in prediction compared with the partial least-squares (PLS) model.
Collapse
Affiliation(s)
- Chaoshu Duan
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Nankai University, Tianjin 300071, China.,Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xuyang Liu
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Nankai University, Tianjin 300071, China.,Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Wensheng Cai
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Nankai University, Tianjin 300071, China.,Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xueguang Shao
- Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Nankai University, Tianjin 300071, China.,Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| |
Collapse
|
16
|
Challenges and frontiers of computational modelling of biomolecular recognition. QRB DISCOVERY 2022. [DOI: 10.1017/qrd.2022.11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
Abstract
Biomolecular recognition including binding of small molecules, peptides and proteins to their target receptors plays a key role in cellular function and has been targeted for therapeutic drug design. However, the high flexibility of biomolecules and slow binding and dissociation processes have presented challenges for computational modelling. Here, we review the challenges and computational approaches developed to characterise biomolecular binding, including molecular docking, molecular dynamics simulations (especially enhanced sampling) and machine learning. Further improvements are still needed in order to accurately and efficiently characterise binding structures, mechanisms, thermodynamics and kinetics of biomolecules in the future.
Collapse
|