1
|
Nadeem H, Shukla D. Ensemble Adaptive Sampling Scheme: Identifying an Optimal Sampling Strategy via Policy Ranking. J Chem Theory Comput 2025; 21:4626-4639. [PMID: 40261689 DOI: 10.1021/acs.jctc.4c01488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/24/2025]
Abstract
Efficient sampling in biomolecular simulations is critical for accurately capturing the complex dynamic behaviors of biological systems. Adaptive sampling techniques aim to improve efficiency by focusing computational resources on the most relevant regions of the phase space. In this work, we present a framework for identifying the optimal sampling policy through metric-driven ranking. Our approach systematically evaluates the policy ensemble and ranks the policies based on their ability to explore the conformational space effectively. Through a series of biomolecular simulation case studies, we demonstrate that the choice of a different adaptive sampling policy at each round significantly outperforms single policy sampling, leading to faster convergence and improved sampling performance. This approach takes an ensemble of adaptive sampling policies and identifies the optimal policy for the next round based on current data. Beyond presenting this ensemble view of adaptive sampling, we also propose two sampling algorithms that approximate this ranking framework on the fly. The modularity of this framework allows incorporation of any adaptive sampling policy, making it versatile and suitable as a comprehensive adaptive sampling scheme.
Collapse
Affiliation(s)
- Hassan Nadeem
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Diwakar Shukla
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
- Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
- Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| |
Collapse
|
2
|
Matsumura Y, Tabata K, Komatsuzaki T. Comparative Analysis of Reinforcement Learning Algorithms for Finding Reaction Pathways: Insights from a Large Benchmark Data Set. J Chem Theory Comput 2025; 21:3523-3535. [PMID: 40105681 DOI: 10.1021/acs.jctc.4c01780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/20/2025]
Abstract
The identification of kinetically feasible reaction pathways that connect a reactant to its product, including numerous intermediates and transition states, is crucial for predicting chemical reactions and elucidating reaction mechanisms. However, as molecular systems become increasingly complex or larger, the number of local minimum structures and transition states grows, which makes this task challenging, even with advanced computational approaches. We introduced a reinforcement learning algorithm to efficiently identify a kinetically feasible reaction pathway between a given local minimum structure for the reactant and a given one for the product, starting from the reactant. The performance of the algorithm was validated using a benchmark data set of large-scale chemical reaction path networks. Several search policies were proposed, using metrics based on energetic or structural similarity to the product's goal structure, for each local minimum structure candidate found during the search. The performances of baseline greedy, random, and uniform search policies varied substantially depending on the system. In contrast, exploration-exploitation balanced policies such as Thompson sampling, probability of improvement, and expected improvement consistently demonstrated stable and high performance. Furthermore, we characterized the search mechanisms that depend on different policies in detail. This study also addressed potential avenues for further research, such as hierarchical reinforcement learning and multiobjective optimization, which could deepen the problem setting explored in this study.
Collapse
Affiliation(s)
- Yoshihiro Matsumura
- Institute for Chemical Reaction Design and Discovery (ICReDD), Hokkaido University, Sapporo 001-0020, Japan
| | - Koji Tabata
- Institute for Chemical Reaction Design and Discovery (ICReDD), Hokkaido University, Sapporo 001-0020, Japan
- Research Institute for Electronic Science, Hokkaido University, Sapporo 001-0020, Japan
- Department of Mathematics, Hokkaido University, Sapporo 060-0810, Japan
| | - Tamiki Komatsuzaki
- Institute for Chemical Reaction Design and Discovery (ICReDD), Hokkaido University, Sapporo 001-0020, Japan
- Research Institute for Electronic Science, Hokkaido University, Sapporo 001-0020, Japan
- Graduate School of Chemical Sciences and Engineering, Hokkaido University, Sapporo 060-8628, Japan
- Institute for Open and Transdisciplinary Research Initiatives, Osaka University, Yamadaoka Suita, 565-0871 Osaka, Japan
- The Institute of Scientific and Industrial Research, Osaka University, Mihogaoka Ibaraki 8-1, 567-0047 Osaka, Japan
| |
Collapse
|
3
|
Shen W, Wan K, Li D, Gao H, Shi X. Adaptive CVgen: Leveraging reinforcement learning for advanced sampling in protein folding and chemical reactions. Proc Natl Acad Sci U S A 2024; 121:e2414205121. [PMID: 39475640 PMCID: PMC11551409 DOI: 10.1073/pnas.2414205121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2024] [Accepted: 09/24/2024] [Indexed: 11/13/2024] Open
Abstract
Enhanced sampling techniques have traditionally encountered two significant challenges: identifying suitable reaction coordinates and addressing the exploration-exploitation dilemma, particularly the difficulty of escaping local energy minima. Here, we introduce Adaptive CVgen, a universal adaptive sampling framework designed to tackle these issues. Our approach utilizes a set of collective variables (CVs) to comprehensively cover the system's potential evolutionary phase space, generating diverse reaction coordinates to address the first challenge. Moreover, we integrate reinforcement learning strategies to dynamically adjust the generated reaction coordinates, thereby effectively balancing the exploration-exploitation dilemma. We apply this framework to sample the conformational space of six proteins transitioning from completely disordered states to folded states, as well as to model the chemical synthesis process of C60, achieving conformations that perfectly match the standard C60 structure. The results demonstrate Adaptive CVgen's effectiveness in exploring new conformations and escaping local minima, achieving both sampling efficiency and exploration accuracy. This framework holds potential for extending to various related challenges, including protein folding dynamics, drug targeting, and complex chemical reactions, thereby opening promising avenues for application in these fields.
Collapse
Affiliation(s)
- Wenhui Shen
- Laboratory of Theoretical and Computational Nanoscience, National Center for Nanoscience and Technology, Chinese Academy of Sciences, Beijing100190, China
- University of Chinese Academy of Sciences, Beijing100049, China
| | - Kaiwei Wan
- Laboratory of Theoretical and Computational Nanoscience, National Center for Nanoscience and Technology, Chinese Academy of Sciences, Beijing100190, China
- University of Chinese Academy of Sciences, Beijing100049, China
| | - Dechang Li
- Key Laboratory of Soft Machines and Smart Devices of Zhejiang Province, Institute of Biomechanics and Applications, Department of Engineering Mechanics, Zhejiang University, Hangzhou310027, China
| | - Huajian Gao
- Mechano-X Institute, Applied Mechanics Laboratory, Department of Engineering Mechanics, Tsinghua University, Beijing100084, China
| | - Xinghua Shi
- Laboratory of Theoretical and Computational Nanoscience, National Center for Nanoscience and Technology, Chinese Academy of Sciences, Beijing100190, China
- University of Chinese Academy of Sciences, Beijing100049, China
| |
Collapse
|
4
|
Yang DT, Goldberg AM, Chong LT. Rare-Event Sampling using a Reinforcement Learning-Based Weighted Ensemble Method. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.09.617475. [PMID: 39416089 PMCID: PMC11482931 DOI: 10.1101/2024.10.09.617475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/19/2024]
Abstract
Despite the power of path sampling strategies in enabling simulations of rare events, such strategies have not reached their full potential. A common challenge that remains is the identification of a progress coordinate that captures the slow relevant motions of a rare event. Here we have developed a weighted ensemble (WE) path sampling strategy that exploits reinforcement learning to automatically identify an effective progress coordinate among a set of potential coordinates during a simulation. We apply our WE strategy with reinforcement learning to three benchmark systems: (i) an egg carton-shaped toy potential, (ii) an S-shaped toy potential, and (iii) a dimer of the HIV-1 capsid protein (C-terminal domain). To enable rapid testing of the latter system at the atomic level, we employed discrete-state synthetic molecular dynamics trajectories using a generative, fine-grained Markov state model that was based on extensive conventional simulations. Our results demonstrate that using concepts from reinforcement learning with a weighted ensemble of trajectories automatically identifies relevant progress co-ordinates among multiple candidates at a given time during a simulation. Due to the rigorous weighting of trajectories, the simulations maintain rigorous kinetics.
Collapse
Affiliation(s)
- Darian T. Yang
- Molecular Biophysics and Structural Biology Graduate Program, University of Pittsburgh and Carnegie Mellon University, Pittsburgh, Pennsylvania 15260
- Department of Structural Biology, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania 15260
- Department of Chemistry, University of Pittsburgh, Pittsburgh, Pennsylvania 15260
| | - Alex M. Goldberg
- Department of Chemistry, University of Pittsburgh, Pittsburgh, Pennsylvania 15260
| | - Lillian T. Chong
- Department of Chemistry, University of Pittsburgh, Pittsburgh, Pennsylvania 15260
| |
Collapse
|
5
|
Zhang M, Wu H, Wang Y. Enhanced Sampling of Biomolecular Slow Conformational Transitions Using Adaptive Sampling and Machine Learning. J Chem Theory Comput 2024; 20:8569-8582. [PMID: 39301626 DOI: 10.1021/acs.jctc.4c00764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/22/2024]
Abstract
Biomolecular simulations often suffer from the "time scale problem", hindering the study of rare events occurring over extended time scales. Enhanced sampling techniques aim to alleviate this issue by accelerating conformational transitions, yet they typically necessitate well-defined collective variables (CVs), posing a significant challenge. Machine learning offers promising solutions but typically requires rich training data encompassing the entire free energy surface (FES). In this work, we introduce an automated iterative pipeline designed to mitigate these limitations. Our protocol first utilizes a CV-free count-based adaptive sampling method to generate a data set rich in rare events. From this data set, slow modes are identified using Koopman-reweighted time-lagged independent component analysis (KTICA), which are subsequently leveraged by on-the-fly probability enhanced sampling (OPES) to efficiently explore the FES. The effectiveness of our pipeline is demonstrated and further compared with the common Markov State Model (MSM) approach on two model systems with increasing complexity: alanine dipeptide (Ala2) and deca-alanine (Ala10), underscoring its applicability across diverse biomolecular simulations.
Collapse
Affiliation(s)
- Mingyuan Zhang
- College of Life Sciences, Zhejiang University, Hangzhou 310027, China
| | - Hao Wu
- School of Mathematical Sciences, Institute of Natural Sciences, and MOE-LSC, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yong Wang
- College of Life Sciences, Zhejiang University, Hangzhou 310027, China
| |
Collapse
|
6
|
Gapsys V, Kopec W, Matthes D, de Groot BL. Biomolecular simulations at the exascale: From drug design to organelles and beyond. Curr Opin Struct Biol 2024; 88:102887. [PMID: 39029280 DOI: 10.1016/j.sbi.2024.102887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2023] [Revised: 06/07/2024] [Accepted: 06/24/2024] [Indexed: 07/21/2024]
Abstract
The rapid advancement in computational power available for research offers to bring not only quantitative improvements, but also qualitative changes in the field of biomolecular simulation. Here, we review the state of biomolecular dynamics simulations at the threshold to exascale resources becoming available. Both developments in parallel and distributed computing will be discussed, providing a perspective on the state of the art of both. A main focus will be on obtaining binding and conformational free energies, with an outlook to macromolecular complexes and (sub)cellular assemblies.
Collapse
Affiliation(s)
- Vytautas Gapsys
- Computational Chemistry, Janssen Research & Development, Turnhoutseweg 30, Beerse 2340, Belgium. https://twitter.com/VytasGapsys
| | - Wojciech Kopec
- Department of Chemistry, Queen Mary University of London, 327 Mile End Road, London E1 4NS, UK; Computational Biomolecular Dynamics Group, Max Planck Institute for Multidisciplinary Sciences, Am Fassberg 11, 37077 Göttingen, Germany. https://twitter.com/wojciechkopec3
| | - Dirk Matthes
- Computational Biomolecular Dynamics Group, Max Planck Institute for Multidisciplinary Sciences, Am Fassberg 11, 37077 Göttingen, Germany
| | - Bert L de Groot
- Computational Biomolecular Dynamics Group, Max Planck Institute for Multidisciplinary Sciences, Am Fassberg 11, 37077 Göttingen, Germany.
| |
Collapse
|
7
|
Guneri D, Alexandrou E, El Omari K, Dvořáková Z, Chikhale RV, Pike DTS, Waudby CA, Morris CJ, Haider S, Parkinson GN, Waller ZAE. Structural insights into i-motif DNA structures in sequences from the insulin-linked polymorphic region. Nat Commun 2024; 15:7119. [PMID: 39164244 PMCID: PMC11336075 DOI: 10.1038/s41467-024-50553-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Accepted: 07/12/2024] [Indexed: 08/22/2024] Open
Abstract
The insulin-linked polymorphic region is a variable number of tandem repeats region of DNA in the promoter of the insulin gene that regulates transcription of insulin. This region is known to form the alternative DNA structures, i-motifs and G-quadruplexes. Individuals have different sequence variants of tandem repeats and although previous work investigated the effects of some variants on G-quadruplex formation, there is not a clear picture of the relationship between the sequence diversity, the DNA structures formed, and the functional effects on insulin gene expression. Here we show that different sequence variants of the insulin linked polymorphic region form different DNA structures in vitro. Additionally, reporter genes in cellulo indicate that insulin expression may change depending on which DNA structures form. We report the crystal structure and dynamics of an intramolecular i-motif, which reveal sequences within the loop regions forming additional stabilising interactions that are critical to formation of stable i-motif structures. The outcomes of this work reveal the detail in formation of stable i-motif DNA structures, with potential for rational based drug design for compounds to target i-motif DNA.
Collapse
Affiliation(s)
- Dilek Guneri
- School of Pharmacy, University College London, 29-39 Brunswick Square, London, WC1N 1AX, UK
| | - Effrosyni Alexandrou
- School of Pharmacy, University College London, 29-39 Brunswick Square, London, WC1N 1AX, UK
| | - Kamel El Omari
- Diamond Light Source, Harwell Science and Innovation Campus, Chilton, Didcot, OX11 0DE, UK
| | - Zuzana Dvořáková
- Institute of Biophysics of the Czech Academy of Sciences, Královopolská 135, 612 00, Brno, Czech Republic
| | - Rupesh V Chikhale
- School of Pharmacy, University College London, 29-39 Brunswick Square, London, WC1N 1AX, UK
| | - Daniel T S Pike
- School of Pharmacy, University College London, 29-39 Brunswick Square, London, WC1N 1AX, UK
| | - Christopher A Waudby
- School of Pharmacy, University College London, 29-39 Brunswick Square, London, WC1N 1AX, UK
| | - Christopher J Morris
- School of Pharmacy, University College London, 29-39 Brunswick Square, London, WC1N 1AX, UK.
| | - Shozeb Haider
- School of Pharmacy, University College London, 29-39 Brunswick Square, London, WC1N 1AX, UK.
- UCL Centre for Advanced Research Computing, University College London, Gower Street, London, WC1E 6BT, UK.
| | - Gary N Parkinson
- School of Pharmacy, University College London, 29-39 Brunswick Square, London, WC1N 1AX, UK.
| | - Zoë A E Waller
- School of Pharmacy, University College London, 29-39 Brunswick Square, London, WC1N 1AX, UK.
| |
Collapse
|
8
|
Abedin MM, Tabata K, Matsumura Y, Komatsuzaki T. Multi-armed bandit algorithm for sequential experiments of molecular properties with dynamic feature selection. J Chem Phys 2024; 161:014115. [PMID: 38958158 DOI: 10.1063/5.0206042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Accepted: 06/16/2024] [Indexed: 07/04/2024] Open
Abstract
Sequential optimization is one of the promising approaches in identifying the optimal candidate(s) (molecules, reactants, drugs, etc.) with desired properties (reaction yield, selectivity, efficacy, etc.) from a large set of potential candidates, while minimizing the number of experiments required. However, the high dimensionality of the feature space (e.g., molecular descriptors) makes it often difficult to utilize the relevant features during the process of updating the set of candidates to be examined. In this article, we developed a new sequential optimization algorithm for molecular problems based on reinforcement learning, multi-armed linear bandit framework, and online, dynamic feature selections in which relevant molecular descriptors are updated along with the experiments. We also designed a stopping condition aimed to guarantee the reliability of the chosen candidate from the dataset pool. The developed algorithm was examined by comparing with Bayesian optimization (BO), using two synthetic datasets and two real datasets in which one dataset includes hydration free energy of molecules and another one includes a free energy difference between enantiomer products in chemical reaction. We found that the dynamic feature selection in representing the desired properties along the experiments provides a better performance (e.g., time required to find the best candidate and stop the experiment) as the overall trend and that our multi-armed linear bandit approach with a dynamic feature selection scheme outperforms the standard BO with fixed feature variables. The comparison of our algorithm to BO with dynamic feature selection is also addressed.
Collapse
Affiliation(s)
- Md Menhazul Abedin
- Graduate School of Chemical Sciences and Engineering, Hokkaido University, Sapporo 060-8628, Japan
- Khulna University, Khulna 9208, Bangladesh
| | - Koji Tabata
- Research Institute for Electronic Science, Hokkaido University, Sapporo 001-0020, Japan
- Department of Mathematics, Hokkaido University, Sapporo 060-0810, Japan
- Institute for Chemical Reaction Design and Discovery (ICReDD), Hokkaido University, Sapporo 001-0020, Japan
| | - Yoshihiro Matsumura
- Institute for Chemical Reaction Design and Discovery (ICReDD), Hokkaido University, Sapporo 001-0020, Japan
| | - Tamiki Komatsuzaki
- Graduate School of Chemical Sciences and Engineering, Hokkaido University, Sapporo 060-8628, Japan
- Research Institute for Electronic Science, Hokkaido University, Sapporo 001-0020, Japan
- Institute for Chemical Reaction Design and Discovery (ICReDD), Hokkaido University, Sapporo 001-0020, Japan
- Institute for Open and Transdisciplinary Research Initiatives, Osaka University, Yamadaoka, Suita 565-0871, Osaka, Japan
- The Institute of Scientific and Industrial Research, Osaka University, 8-1 Mihogaoka, Ibaraki 567-0047, Osaka, Japan
| |
Collapse
|
9
|
Rubina, Moin ST, Haider S. Identification of a Cryptic Pocket in Methionine Aminopeptidase-II Using Adaptive Bandit Molecular Dynamics Simulations and Markov State Models. ACS OMEGA 2024; 9:28534-28545. [PMID: 38973915 PMCID: PMC11223136 DOI: 10.1021/acsomega.4c02516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Revised: 06/03/2024] [Accepted: 06/10/2024] [Indexed: 07/09/2024]
Abstract
Methionine aminopeptidase-II (MetAP-II) is a metalloprotease, primarily responsible for the cotranslational removal of the N-terminal initiator methionine from the nascent polypeptide chain during protein synthesis. MetAP-II has been implicated in angiogenesis and endothelial cell proliferation and is therefore considered a validated target for cancer therapeutics. However, there is no effective drug available against MetAP-II. In this study, we employ Adaptive Bandit molecular dynamics simulations to investigate the structural dynamics of the apo and ligand-bound MetAP-II. Our results focus on the dynamic behavior of the disordered loop that is not resolved in most of the crystal structures. Further analysis of the conformational flexibility of the disordered loop reveals a hidden cryptic pocket that is predicted to be potentially druggable. The network analysis indicates that the disordered loop region has a direct signaling route to the active site. These findings highlight a new way to target MetAP-II by designing inhibitors for the allosteric site within this disordered loop region.
Collapse
Affiliation(s)
- Rubina
- Third
World Center for Science and Technology, H.E.J. Research Institute
of Chemistry, International Center for Chemical and Biological Sciences, University of Karachi, Karachi 75270, Pakistan
| | - Syed Tarique Moin
- Third
World Center for Science and Technology, H.E.J. Research Institute
of Chemistry, International Center for Chemical and Biological Sciences, University of Karachi, Karachi 75270, Pakistan
| | - Shozeb Haider
- UCL
School of Pharmacy, University College London, London WC1N 1AX, U.K.
- UCL
Centre for Advanced Research Computing, University College London, London WC1H 9RN, U.K.
| |
Collapse
|
10
|
Zhu Y, Gu J, Zhao Z, Chan AWE, Mojica MF, Hujer AM, Bonomo RA, Haider S. Deciphering the Coevolutionary Dynamics of L2 β-Lactamases via Deep Learning. J Chem Inf Model 2024; 64:3706-3717. [PMID: 38687957 PMCID: PMC11094718 DOI: 10.1021/acs.jcim.4c00189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Revised: 03/10/2024] [Accepted: 04/09/2024] [Indexed: 05/02/2024]
Abstract
L2 β-lactamases, serine-based class A β-lactamases expressed by Stenotrophomonas maltophilia, play a pivotal role in antimicrobial resistance (AMR). However, limited studies have been conducted on these important enzymes. To understand the coevolutionary dynamics of L2 β-lactamase, innovative computational methodologies, including adaptive sampling molecular dynamics simulations, and deep learning methods (convolutional variational autoencoders and BindSiteS-CNN) explored conformational changes and correlations within the L2 β-lactamase family together with other representative class A enzymes including SME-1 and KPC-2. This work also investigated the potential role of hydrophobic nodes and binding site residues in facilitating the functional mechanisms. The convergence of analytical approaches utilized in this effort yielded comprehensive insights into the dynamic behavior of the β-lactamases, specifically from an evolutionary standpoint. In addition, this analysis presents a promising approach for understanding how the class A β-lactamases evolve in response to environmental pressure and establishes a theoretical foundation for forthcoming endeavors in drug development aimed at combating AMR.
Collapse
Affiliation(s)
- Yu Zhu
- Pharmaceutical
and Biological Chemistry, UCL School of
Pharmacy, London WC1N 1AX, U.K.
| | - Jing Gu
- Pharmaceutical
and Biological Chemistry, UCL School of
Pharmacy, London WC1N 1AX, U.K.
| | - Zhuoran Zhao
- Pharmaceutical
and Biological Chemistry, UCL School of
Pharmacy, London WC1N 1AX, U.K.
| | - A. W. Edith Chan
- Division
of Medicine, UCL School of Pharmacy, London WC1E 6BT, U.K.
| | - Maria F. Mojica
- Department
of Molecular Biology and Microbiology, Case
Western Reserve University School of Medicine, Cleveland, Ohio 44106-5029, United
States
- Research
Service, Department of Veterans Affairs Medical Center, Louis Stokes Cleveland, Cleveland, Ohio 44106-1702, United States
- CWRU-Cleveland
VAMC Center for Antimicrobial Resistance and Epidemiology (Case VA
CARES), Cleveland, Ohio 44106-5029, United States
| | - Andrea M. Hujer
- Research
Service, Department of Veterans Affairs Medical Center, Louis Stokes Cleveland, Cleveland, Ohio 44106-1702, United States
- Department
of Medicine, Case Western Reserve University
School of Medicine, Cleveland, Ohio 44106-5029, United States
| | - Robert A. Bonomo
- Research
Service, Department of Veterans Affairs Medical Center, Louis Stokes Cleveland, Cleveland, Ohio 44106-1702, United States
- CWRU-Cleveland
VAMC Center for Antimicrobial Resistance and Epidemiology (Case VA
CARES), Cleveland, Ohio 44106-5029, United States
- Clinician
Scientist Investigator, Department of Veterans Affairs Medical Center, Louis Stokes Cleveland, Cleveland, Ohio 44106-1702, United States
- Departments
of Pharmacology, Biochemistry, and Proteomics and Bioinformatics, Case Western Reserve University School of Medicine, Cleveland, Ohio 44106-5029, United
States
- Departments
of Molecular Biology and Microbiology, Medicine, Case Western Reserve University School of Medicine, Cleveland, Ohio 44106-5029, United
States
| | - Shozeb Haider
- Pharmaceutical
and Biological Chemistry, UCL School of
Pharmacy, London WC1N 1AX, U.K.
- UCL
Centre for Advanced Research in Computing, University College London, London WC1H 9RL, U.K.
| |
Collapse
|
11
|
Kleiman DE, Nadeem H, Shukla D. Adaptive Sampling Methods for Molecular Dynamics in the Era of Machine Learning. J Phys Chem B 2023; 127:10669-10681. [PMID: 38081185 DOI: 10.1021/acs.jpcb.3c04843] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2023]
Abstract
Molecular dynamics (MD) simulations are fundamental computational tools for the study of proteins and their free energy landscapes. However, sampling protein conformational changes through MD simulations is challenging due to the relatively long time scales of these processes. Many enhanced sampling approaches have emerged to tackle this problem, including biased sampling and path-sampling methods. In this Perspective, we focus on adaptive sampling algorithms. These techniques differ from other approaches because the thermodynamic ensemble is preserved and the sampling is enhanced solely by restarting MD trajectories at particularly chosen seeds rather than introducing biasing forces. We begin our treatment with an overview of theoretically transparent methods, where we discuss principles and guidelines for adaptive sampling. Then, we present a brief summary of select methods that have been applied to realistic systems in the past. Finally, we discuss recent advances in adaptive sampling methodology powered by deep learning techniques, as well as their shortcomings.
Collapse
Affiliation(s)
- Diego E Kleiman
- Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Hassan Nadeem
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Diwakar Shukla
- Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
- Department of Plant Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| |
Collapse
|
12
|
Majewski M, Pérez A, Thölke P, Doerr S, Charron NE, Giorgino T, Husic BE, Clementi C, Noé F, De Fabritiis G. Machine learning coarse-grained potentials of protein thermodynamics. Nat Commun 2023; 14:5739. [PMID: 37714883 PMCID: PMC10504246 DOI: 10.1038/s41467-023-41343-1] [Citation(s) in RCA: 39] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 08/29/2023] [Indexed: 09/17/2023] Open
Abstract
A generalized understanding of protein dynamics is an unsolved scientific problem, the solution of which is critical to the interpretation of the structure-function relationships that govern essential biological processes. Here, we approach this problem by constructing coarse-grained molecular potentials based on artificial neural networks and grounded in statistical mechanics. For training, we build a unique dataset of unbiased all-atom molecular dynamics simulations of approximately 9 ms for twelve different proteins with multiple secondary structure arrangements. The coarse-grained models are capable of accelerating the dynamics by more than three orders of magnitude while preserving the thermodynamics of the systems. Coarse-grained simulations identify relevant structural states in the ensemble with comparable energetics to the all-atom systems. Furthermore, we show that a single coarse-grained potential can integrate all twelve proteins and can capture experimental structural features of mutated proteins. These results indicate that machine learning coarse-grained potentials could provide a feasible approach to simulate and understand protein dynamics.
Collapse
Affiliation(s)
- Maciej Majewski
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), Carrer Dr. Aiguader 88, 08003, Barcelona, Spain
- Acellera Labs, Doctor Trueta 183, 08005, Barcelona, Spain
| | - Adrià Pérez
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), Carrer Dr. Aiguader 88, 08003, Barcelona, Spain
- Acellera Labs, Doctor Trueta 183, 08005, Barcelona, Spain
| | - Philipp Thölke
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), Carrer Dr. Aiguader 88, 08003, Barcelona, Spain
| | - Stefan Doerr
- Acellera Labs, Doctor Trueta 183, 08005, Barcelona, Spain
| | - Nicholas E Charron
- Department of Physics, Rice University, Houston, TX, 77005, USA
- Center for Theoretical Biological Physics, Rice University, Houston, TX, 77005, USA
- Department of Physics, FU Berlin, Arnimallee 12, 14195, Berlin, Germany
| | - Toni Giorgino
- Biophysics Institute, National Research Council (CNR-IBF), 20133, Milan, Italy
| | - Brooke E Husic
- Department of Mathematics and Computer Science, FU Berlin, Arnimallee 12, 14195, Berlin, Germany
- Lewis Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, 08540, USA
- Princeton Center for Theoretical Science, Princeton University, Princeton, NJ, 08540, USA
- Center for the Physics of Biological Function, Princeton University, Princeton, NJ, 08540, USA
| | - Cecilia Clementi
- Department of Physics, Rice University, Houston, TX, 77005, USA.
- Center for Theoretical Biological Physics, Rice University, Houston, TX, 77005, USA.
- Department of Physics, FU Berlin, Arnimallee 12, 14195, Berlin, Germany.
- Department of Chemistry, Rice University, Houston, TX, 77005, USA.
| | - Frank Noé
- Department of Physics, FU Berlin, Arnimallee 12, 14195, Berlin, Germany.
- Department of Mathematics and Computer Science, FU Berlin, Arnimallee 12, 14195, Berlin, Germany.
- Department of Chemistry, Rice University, Houston, TX, 77005, USA.
- Microsoft Research AI4Science, Karl-Liebknecht Str. 32, 10178, Berlin, Germany.
| | - Gianni De Fabritiis
- Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), Carrer Dr. Aiguader 88, 08003, Barcelona, Spain.
- Acellera Labs, Doctor Trueta 183, 08005, Barcelona, Spain.
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluis Companys 23, 08010, Barcelona, Spain.
| |
Collapse
|
13
|
Herrera-Nieto P, Pérez A, De Fabritiis G. Binding-and-Folding Recognition of an Intrinsically Disordered Protein Using Online Learning Molecular Dynamics. J Chem Theory Comput 2023; 19:3817-3824. [PMID: 37341654 PMCID: PMC10863933 DOI: 10.1021/acs.jctc.3c00008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Indexed: 06/22/2023]
Abstract
Intrinsically disordered proteins participate in many biological processes by folding upon binding to other proteins. However, coupled folding and binding processes are not well understood from an atomistic point of view. One of the main questions is whether folding occurs prior to or after binding. Here we use a novel, unbiased, high-throughput adaptive sampling approach to reconstruct the binding and folding between the disordered transactivation domain of c-Myb and the KIX domain of the CREB-binding protein. The reconstructed long-term dynamical process highlights the binding of a short stretch of amino acids on c-Myb as a folded α-helix. Leucine residues, especially Leu298-Leu302, establish initial native contacts that prime the binding and folding of the rest of the peptide, with a mixture of conformational selection on the N-terminal region with an induced fit of the C-terminal.
Collapse
Affiliation(s)
- Pablo Herrera-Nieto
- Computational
Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park
(PRBB), C Dr. Aiguader 88, 08003, Barcelona, Spain
| | - Adrià Pérez
- Computational
Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park
(PRBB), C Dr. Aiguader 88, 08003, Barcelona, Spain
- Acellera
Labs, C Dr Trueta 183, 08005, Barcelona, Spain
| | - Gianni De Fabritiis
- Computational
Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park
(PRBB), C Dr. Aiguader 88, 08003, Barcelona, Spain
- Acellera
Ltd, Devonshire House
582, Stanmore Middlesex, HA7 1JS, United Kingdom
- Institució
Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluis Companys 23, 08010 Barcelona, Spain
| |
Collapse
|
14
|
Weigle AT, Feng J, Shukla D. Thirty years of molecular dynamics simulations on posttranslational modifications of proteins. Phys Chem Chem Phys 2022; 24:26371-26397. [PMID: 36285789 PMCID: PMC9704509 DOI: 10.1039/d2cp02883b] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/06/2023]
Abstract
Posttranslational modifications (PTMs) are an integral component to how cells respond to perturbation. While experimental advances have enabled improved PTM identification capabilities, the same throughput for characterizing how structural changes caused by PTMs equate to altered physiological function has not been maintained. In this Perspective, we cover the history of computational modeling and molecular dynamics simulations which have characterized the structural implications of PTMs. We distinguish results from different molecular dynamics studies based upon the timescales simulated and analysis approaches used for PTM characterization. Lastly, we offer insights into how opportunities for modern research efforts on in silico PTM characterization may proceed given current state-of-the-art computing capabilities and methodological advancements.
Collapse
Affiliation(s)
- Austin T Weigle
- Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Jiangyan Feng
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Diwakar Shukla
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
- Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
- Department of Plant Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA.
| |
Collapse
|
15
|
Kleiman DE, Shukla D. Multiagent Reinforcement Learning-Based Adaptive Sampling for Conformational Dynamics of Proteins. J Chem Theory Comput 2022; 18:5422-5434. [PMID: 36044642 DOI: 10.1021/acs.jctc.2c00683] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Machine learning is increasingly applied to improve the efficiency and accuracy of molecular dynamics (MD) simulations. Although the growth of distributed computer clusters has allowed researchers to obtain higher amounts of data, unbiased MD simulations have difficulty sampling rare states, even under massively parallel adaptive sampling schemes. To address this issue, several algorithms inspired by reinforcement learning (RL) have arisen to promote exploration of the slow collective variables (CVs) of complex systems. Nonetheless, most of these algorithms are not well-suited to leverage the information gained by simultaneously sampling a system from different initial states (e.g., a protein in different conformations associated with distinct functional states). To fill this gap, we propose two algorithms inspired by multiagent RL that extend the functionality of closely related techniques (REAP and TSLC) to situations where the sampling can be accelerated by learning from different regions of the energy landscape through coordinated agents. Essentially, the algorithms work by remembering which agent discovered each conformation and sharing this information with others at the action-space discretization step. A stakes function is introduced to modulate how different agents sense rewards from discovered states of the system. The consequences are three-fold: (i) agents learn to prioritize CVs using only relevant data, (ii) redundant exploration is reduced, and (iii) agents that obtain higher stakes are assigned more actions. We compare our algorithm with other adaptive sampling techniques (least counts, REAP, TSLC, and AdaptiveBandit) to show and rationalize the gain in performance.
Collapse
Affiliation(s)
- Diego E Kleiman
- Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Diwakar Shukla
- Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,Department of Plant Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| |
Collapse
|
16
|
Krivov SV. Additive eigenvectors as optimal reaction coordinates, conditioned trajectories, and time-reversible description of stochastic processes. J Chem Phys 2022; 157:014108. [DOI: 10.1063/5.0088061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
A fundamental way to analyze complex multidimensional stochastic dynamics is to describe it as diffusion on a free energy landscape—free energy as a function of reaction coordinates (RCs). For such a description to be quantitatively accurate, the RC should be chosen in an optimal way. The committor function is a primary example of an optimal RC for the description of equilibrium reaction dynamics between two states. Here, additive eigenvectors (addevs) are considered as optimal RCs to address the limitations of the committor. An addev master equation for a Markov chain is derived. A stationary solution of the equation describes a sub-ensemble of trajectories conditioned on having the same optimal RC for the forward and time-reversed dynamics in the sub-ensemble. A collection of such sub-ensembles of trajectories, called stochastic eigenmodes, can be used to describe/approximate the stochastic dynamics. A non-stationary solution describes the evolution of the probability distribution. However, in contrast to the standard master equation, it provides a time-reversible description of stochastic dynamics. It can be integrated forward and backward in time. The developed framework is illustrated on two model systems—unidirectional random walk and diffusion.
Collapse
Affiliation(s)
- Sergei V. Krivov
- University of Leeds, Astbury Center for Structural Molecular Biology, Faculty of Biological Sciences, University of Leeds, Leeds LS2 9JT, United Kingdom
| |
Collapse
|
17
|
Integration of machine learning with computational structural biology of plants. Biochem J 2022; 479:921-928. [PMID: 35484946 DOI: 10.1042/bcj20200942] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Revised: 04/01/2022] [Accepted: 04/06/2022] [Indexed: 11/17/2022]
Abstract
Computational structural biology of proteins has developed rapidly in recent decades with the development of new computational tools and the advancement of computing hardware. However, while these techniques have widely been used to make advancements in human medicine, these methods have seen less utilization in the plant sciences. In the last several years, machine learning methods have gained popularity in computational structural biology. These methods have enabled the development of new tools which are able to address the major challenges that have hampered the wide adoption of the computational structural biology of plants. This perspective examines the remaining challenges in computational structural biology and how the development of machine learning techniques enables more in-depth computational structural biology of plants.
Collapse
|
18
|
Tereshchenko A, Pashkov D, Guda A, Guda S, Rusalev Y, Soldatov A. Adsorption Sites on Pd Nanoparticles Unraveled by Machine-Learning Potential with Adaptive Sampling. MOLECULES (BASEL, SWITZERLAND) 2022; 27:molecules27020357. [PMID: 35056671 PMCID: PMC8780420 DOI: 10.3390/molecules27020357] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 12/30/2021] [Accepted: 01/04/2022] [Indexed: 01/08/2023]
Abstract
Catalytic properties of noble-metal nanoparticles (NPs) are largely determined by their surface morphology. The latter is probed by surface-sensitive spectroscopic techniques in different spectra regions. A fast and precise computational approach enabling the prediction of surface-adsorbate interaction would help the reliable description and interpretation of experimental data. In this work, we applied Machine Learning (ML) algorithms for the task of adsorption-energy approximation for CO on Pd nanoclusters. Due to a high dependency of binding energy from the nature of the adsorbing site and its local coordination, we tested several structural descriptors for the ML algorithm, including mean Pd-C distances, coordination numbers (CN) and generalized coordination numbers (GCN), radial distribution functions (RDF), and angular distribution functions (ADF). To avoid overtraining and to probe the most relevant positions above the metal surface, we utilized the adaptive sampling methodology for guiding the ab initio Density Functional Theory (DFT) calculations. The support vector machines (SVM) and Extra Trees algorithms provided the best approximation quality and mean absolute error in energy prediction up to 0.12 eV. Based on the developed potential, we constructed an energy-surface 3D map for the whole Pd55 nanocluster and extended it to new geometries, Pd79, and Pd85, not implemented in the training sample. The methodology can be easily extended to adsorption energies onto mono- and bimetallic NPs at an affordable computational cost and accuracy.
Collapse
Affiliation(s)
- Andrei Tereshchenko
- The Smart Materials Research Institute, Southern Federal University, 344090 Rostov-on-Don, Russia; (D.P.); (A.G.); (S.G.); (Y.R.); (A.S.)
- Correspondence:
| | - Danil Pashkov
- The Smart Materials Research Institute, Southern Federal University, 344090 Rostov-on-Don, Russia; (D.P.); (A.G.); (S.G.); (Y.R.); (A.S.)
- Vorovich Institute of Mathematics, Mechanics, and Computer Sciences, Southern Federal University, 344058 Rostov-on-Don, Russia
| | - Alexander Guda
- The Smart Materials Research Institute, Southern Federal University, 344090 Rostov-on-Don, Russia; (D.P.); (A.G.); (S.G.); (Y.R.); (A.S.)
| | - Sergey Guda
- The Smart Materials Research Institute, Southern Federal University, 344090 Rostov-on-Don, Russia; (D.P.); (A.G.); (S.G.); (Y.R.); (A.S.)
- Vorovich Institute of Mathematics, Mechanics, and Computer Sciences, Southern Federal University, 344058 Rostov-on-Don, Russia
| | - Yury Rusalev
- The Smart Materials Research Institute, Southern Federal University, 344090 Rostov-on-Don, Russia; (D.P.); (A.G.); (S.G.); (Y.R.); (A.S.)
| | - Alexander Soldatov
- The Smart Materials Research Institute, Southern Federal University, 344090 Rostov-on-Don, Russia; (D.P.); (A.G.); (S.G.); (Y.R.); (A.S.)
| |
Collapse
|
19
|
Abstract
We extend the nonparametric framework of reaction coordinate optimization to nonequilibrium ensembles of (short) trajectories. For example, we show how, starting from such an ensemble, one can obtain an equilibrium free-energy profile along the committor, which can be used to determine important properties of the dynamics exactly. A new adaptive sampling approach, the transition-state ensemble enrichment, is suggested, which samples the configuration space by "growing" committor segments toward each other starting from the boundary states. This framework is suggested as a general tool, alternative to the Markov state models, for a rigorous and accurate analysis of simulations of large biomolecular systems, as it has the following attractive properties. It is immune to the curse of dimensionality, does not require system-specific information, can approximate arbitrary reaction coordinates with high accuracy, and has sensitive and rigorous criteria to test optimality and convergence. The approaches are illustrated on a 50-dimensional model system and a realistic protein folding trajectory.
Collapse
Affiliation(s)
- Sergei V Krivov
- Astbury Center for Structural Molecular Biology, Faculty of Biological Sciences, University of Leeds, Leeds LS2 9JT, U.K
| |
Collapse
|
20
|
Casalino L, Dommer AC, Gaieb Z, Barros EP, Sztain T, Ahn SH, Trifan A, Brace A, Bogetti AT, Clyde A, Ma H, Lee H, Turilli M, Khalid S, Chong LT, Simmerling C, Hardy DJ, Maia JD, Phillips JC, Kurth T, Stern AC, Huang L, McCalpin JD, Tatineni M, Gibbs T, Stone JE, Jha S, Ramanathan A, Amaro RE. AI-driven multiscale simulations illuminate mechanisms of SARS-CoV-2 spike dynamics. THE INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS 2021; 35:432-451. [PMID: 38603008 PMCID: PMC8064023 DOI: 10.1177/10943420211006452] [Citation(s) in RCA: 69] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
We develop a generalizable AI-driven workflow that leverages heterogeneous HPC resources to explore the time-dependent dynamics of molecular systems. We use this workflow to investigate the mechanisms of infectivity of the SARS-CoV-2 spike protein, the main viral infection machinery. Our workflow enables more efficient investigation of spike dynamics in a variety of complex environments, including within a complete SARS-CoV-2 viral envelope simulation, which contains 305 million atoms and shows strong scaling on ORNL Summit using NAMD. We present several novel scientific discoveries, including the elucidation of the spike's full glycan shield, the role of spike glycans in modulating the infectivity of the virus, and the characterization of the flexible interactions between the spike and the human ACE2 receptor. We also demonstrate how AI can accelerate conformational sampling across different systems and pave the way for the future application of such methods to additional studies in SARS-CoV-2 and other molecular systems.
Collapse
Affiliation(s)
- Lorenzo Casalino
- University of California San Diego, La Jolla, CA, USA
- Authors with symbol indicate equal contribution
| | - Abigail C Dommer
- University of California San Diego, La Jolla, CA, USA
- Authors with symbol indicate equal contribution
| | - Zied Gaieb
- University of California San Diego, La Jolla, CA, USA
- Authors with symbol indicate equal contribution
| | | | - Terra Sztain
- University of California San Diego, La Jolla, CA, USA
| | - Surl-Hee Ahn
- University of California San Diego, La Jolla, CA, USA
| | - Anda Trifan
- Argonne National Lab, Lemont, IL, USA
- University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | | | | | - Austin Clyde
- Argonne National Lab, Lemont, IL, USA
- University of Chicago, Chicago, IL, USA
| | - Heng Ma
- Argonne National Lab, Lemont, IL, USA
| | | | | | | | | | | | - David J Hardy
- University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Julio Dc Maia
- University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | | | | | | | - Lei Huang
- Texas Advanced Computing Center, Austin, TX, USA
| | | | | | - Tom Gibbs
- NVIDIA Corporation, Santa Clara, CA, USA
| | - John E Stone
- University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Shantenu Jha
- Rutgers University, Piscataway, NJ, USA
- Brookhaven National Lab, Upton, NY, USA
| | | | | |
Collapse
|
21
|
Ramanathan A, Ma H, Parvatikar A, Chennubhotla SC. Artificial intelligence techniques for integrative structural biology of intrinsically disordered proteins. Curr Opin Struct Biol 2021; 66:216-224. [PMID: 33421906 DOI: 10.1016/j.sbi.2020.12.001] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 12/01/2020] [Accepted: 12/03/2020] [Indexed: 12/16/2022]
Abstract
We outline recent developments in artificial intelligence (AI) and machine learning (ML) techniques for integrative structural biology of intrinsically disordered proteins (IDP) ensembles. IDPs challenge the traditional protein structure-function paradigm by adapting their conformations in response to specific binding partners leading them to mediate diverse, and often complex cellular functions such as biological signaling, self-organization and compartmentalization. Obtaining mechanistic insights into their function can therefore be challenging for traditional structural determination techniques. Often, scientists have to rely on piecemeal evidence drawn from diverse experimental techniques to characterize their functional mechanisms. Multiscale simulations can help bridge critical knowledge gaps about IDP structure-function relationships-however, these techniques also face challenges in resolving emergent phenomena within IDP conformational ensembles. We posit that scalable statistical inference techniques can effectively integrate information gleaned from multiple experimental techniques as well as from simulations, thus providing access to atomistic details of these emergent phenomena.
Collapse
Affiliation(s)
- Arvind Ramanathan
- Data Science & Learning Division, Argonne National Laboratory, Lemont, IL 60439, United States; Consortium for Advanced Science and Engineering (CASE), University of Chicago, Hyde Park, IL, United States.
| | - Heng Ma
- Data Science & Learning Division, Argonne National Laboratory, Lemont, IL 60439, United States
| | - Akash Parvatikar
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA 15260, United States
| | - S Chakra Chennubhotla
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA 15260, United States
| |
Collapse
|
22
|
Casalino L, Dommer A, Gaieb Z, Barros EP, Sztain T, Ahn SH, Trifan A, Brace A, Bogetti A, Ma H, Lee H, Turilli M, Khalid S, Chong L, Simmerling C, Hardy DJ, Maia JDC, Phillips JC, Kurth T, Stern A, Huang L, McCalpin J, Tatineni M, Gibbs T, Stone JE, Jha S, Ramanathan A, Amaro RE. AI-Driven Multiscale Simulations Illuminate Mechanisms of SARS-CoV-2 Spike Dynamics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2020:2020.11.19.390187. [PMID: 33236007 PMCID: PMC7685317 DOI: 10.1101/2020.11.19.390187] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/17/2023]
Abstract
We develop a generalizable AI-driven workflow that leverages heterogeneous HPC resources to explore the time-dependent dynamics of molecular systems. We use this workflow to investigate the mechanisms of infectivity of the SARS-CoV-2 spike protein, the main viral infection machinery. Our workflow enables more efficient investigation of spike dynamics in a variety of complex environments, including within a complete SARS-CoV-2 viral envelope simulation, which contains 305 million atoms and shows strong scaling on ORNL Summit using NAMD. We present several novel scientific discoveries, including the elucidation of the spike's full glycan shield, the role of spike glycans in modulating the infectivity of the virus, and the characterization of the flexible interactions between the spike and the human ACE2 receptor. We also demonstrate how AI can accelerate conformational sampling across different systems and pave the way for the future application of such methods to additional studies in SARS-CoV-2 and other molecular systems.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Anda Trifan
- Argonne National Lab
- University of Illinois at Urbana-Champaign
| | | | | | | | - Hyungro Lee
- Rutgers University & Brookhaven National Lab
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
23
|
Herrera-Nieto P, Pérez A, De Fabritiis G. Small Molecule Modulation of Intrinsically Disordered Proteins Using Molecular Dynamics Simulations. J Chem Inf Model 2020; 60:5003-5010. [DOI: 10.1021/acs.jcim.0c00381] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Pablo Herrera-Nieto
- Computational Science Laboratory, Universitat Pompeu Fabra, 08003 Barcelona, Spain
| | - Adrià Pérez
- Computational Science Laboratory, Universitat Pompeu Fabra, 08003 Barcelona, Spain
| | - Gianni De Fabritiis
- Computational Science Laboratory, Universitat Pompeu Fabra, 08003 Barcelona, Spain
- Acellera Ltd., 08005 Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats, 08010 Barcelona, Spain
| |
Collapse
|