1
|
Lazar T, Connor A, DeLisle CF, Burger V, Tompa P. Targeting protein disorder: the next hurdle in drug discovery. Nat Rev Drug Discov 2025:10.1038/s41573-025-01220-6. [PMID: 40490488 DOI: 10.1038/s41573-025-01220-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/08/2025] [Indexed: 06/11/2025]
Abstract
Intrinsically disordered proteins have key signalling and regulatory roles in cells and are frequently dysregulated in diseases such as cancer, neurodegeneration, inflammation and autoimmune disorders. Preventing the pathological functions mediated by structural disorder is crucial to successfully target proteins that drive transcription, biomolecular condensation and protein aggregation. However, owing to their heterogeneous, highly dynamic structural states, with ensembles of rapidly interconverting conformations, disordered proteins have been considered largely 'undruggable' by traditional approaches. Here, we review key developments of the field and suggest that the synergy of advanced experimental and computational approaches needs to be pursued to conquer this barrier in drug discovery.
Collapse
Affiliation(s)
- Tamas Lazar
- VIB-VUB Center for Structural Biology, Vlaams Instituut voor Biotechnologie (VIB), Brussels, Belgium
- Structural Biology Brussels, Vrije Universiteit Brussel (VUB), Brussels, Belgium
| | | | | | - Virginia Burger
- New Equilibrium Biosciences, Boston, MA, USA.
- Blackbird Laboratories, Baltimore, MD, USA.
| | - Peter Tompa
- VIB-VUB Center for Structural Biology, Vlaams Instituut voor Biotechnologie (VIB), Brussels, Belgium.
- Structural Biology Brussels, Vrije Universiteit Brussel (VUB), Brussels, Belgium.
- New Equilibrium Biosciences, Boston, MA, USA.
- Institute of Molecular Life Sciences, HUN-REN Research Centre for Natural Sciences (RCNS), Budapest, Hungary.
- HUN-REN Office for Supported Research Groups (TKI), Cell Cycle Laboratory, National Institute of Oncology, Budapest, Hungary.
| |
Collapse
|
2
|
Sun Q, Wang H, Xie J, Wang L, Mu J, Li J, Ren Y, Lai L. Computer-Aided Drug Discovery for Undruggable Targets. Chem Rev 2025. [PMID: 40423592 DOI: 10.1021/acs.chemrev.4c00969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/28/2025]
Abstract
Undruggable targets are those of therapeutical significance but challenging for conventional drug design approaches. Such targets often exhibit unique features, including highly dynamic structures, a lack of well-defined ligand-binding pockets, the presence of highly conserved active sites, and functional modulation by protein-protein interactions. Recent advances in computational simulations and artificial intelligence have revolutionized the drug design landscape, giving rise to innovative strategies for overcoming these obstacles. In this review, we highlight the latest progress in computational approaches for drug design against undruggable targets, present several successful case studies, and discuss remaining challenges and future directions. Special emphasis is placed on four primary target categories: intrinsically disordered proteins, protein allosteric regulation, protein-protein interactions, and protein degradation, along with discussion of emerging target types. We also examine how AI-driven methodologies have transformed the field, from applications in protein-ligand complex structure prediction and virtual screening to de novo ligand generation for undruggable targets. Integration of computational methods with experimental techniques is expected to bring further breakthroughs to overcome the hurdles of undruggable targets. As the field continues to evolve, these advancements hold great promise to expand the druggable space, offering new therapeutic opportunities for previously untreatable diseases.
Collapse
Affiliation(s)
- Qi Sun
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
- Peking University Chengdu Academy for Advanced Interdisciplinary Biotechnologies, Chengdu, Sichuan 610213, China
| | - Hanping Wang
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Juan Xie
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Liying Wang
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Junxi Mu
- Peking-Tsinghua Center for Life Science, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Junren Li
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Yuhao Ren
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Luhua Lai
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
- Peking-Tsinghua Center for Life Science, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
- Peking University Chengdu Academy for Advanced Interdisciplinary Biotechnologies, Chengdu, Sichuan 610213, China
- Research Unit of Drug Design Method, Chinese Academy of Medical Sciences, Peking University, Beijing 100871, China
| |
Collapse
|
3
|
Karatzas P, Brotzakis ZF, Sarimveis H. Small Molecules Targeting the Structural Dynamics of AR-V7 Partially Disordered Proteins Using Deep Ensemble Docking. J Chem Theory Comput 2025; 21:4898-4909. [PMID: 40231860 PMCID: PMC12080126 DOI: 10.1021/acs.jctc.5c00171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2025] [Revised: 04/05/2025] [Accepted: 04/07/2025] [Indexed: 04/16/2025]
Abstract
The extensive conformational dynamics of partially disordered proteins hinders the efficiency of traditional in-silico structure-based drug discovery approaches due to the challenge of screening large chemical spaces of compounds, albeit with an excessive number of transient binding sites, quickly making this problem intractable. In this study, using the monomer of the AR-V7 transcription factor splicing variant related to prostate cancer as a test case, we present a deep ensemble docking pipeline that accelerates the screening of small molecule binders targeting partially disordered proteins at functional regions. By swiftly identifying the conformational ensemble of AR-V7 and reducing the dimension of binding sites by a factor of 90, we identify functionally relevant binding sites along the AR-V7 structural ensemble at phase separation-prone regions that have been experimentally shown to contribute to enhanced transcription activity and the onset of tumor growth. Following this, we combine physics-based molecular docking and multiobjective classification machine learning models to speed up the screening for binders in a larger chemical space able to target these functional multiple binding sites of AR-V7. This step increases the multibinding site hit rate of small molecules by a factor of 17 compared to naive molecular docking. Finally, assessing in atomistic molecular dynamics the effect of a selected binder on AR-V7 dynamics, we find that in the presence of the ChEMBL22003 compound, AR-V7 exhibits less conformational entropy, smaller solvent exposure of phase separation-prone regions, and higher solvent exposure of other protein regions, promoting this compound as a potential AR-V7 phase separation modulator.
Collapse
Affiliation(s)
- Pantelis Karatzas
- School
of Chemical Engineering, National Technical
University of Athens, 9 Heroon Polytechniou Street, Athens 15780, Greece
| | - Z. Faidon Brotzakis
- Institute
of Bioinnovation (IBI), Biomedical Science Research Center Alexander
Fleming, 34 Fleming Street, Vari 16672, Greece
- Centre
for Misfolding Diseases, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K.
| | - Haralambos Sarimveis
- School
of Chemical Engineering, National Technical
University of Athens, 9 Heroon Polytechniou Street, Athens 15780, Greece
| |
Collapse
|
4
|
Wankowicz SA, Fraser JS. Advances in uncovering the mechanisms of macromolecular conformational entropy. Nat Chem Biol 2025; 21:623-634. [PMID: 40275100 PMCID: PMC12103944 DOI: 10.1038/s41589-025-01879-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Accepted: 03/10/2025] [Indexed: 04/26/2025]
Abstract
During protein folding, proteins transition from a disordered polymer into a globular structure, markedly decreasing their conformational degrees of freedom, leading to a substantial reduction in entropy. Nonetheless, folded proteins retain substantial entropy as they fluctuate between the conformations that make up their native state. This residual entropy contributes to crucial functions like binding and catalysis, supported by growing evidence primarily from NMR and simulation studies. Here, we propose three major ways that macromolecules use conformational entropy to perform their functions; first, prepaying entropic cost through ordering of the ground state; second, spatially redistributing entropy, in which a decrease in entropy in one area is reciprocated by an increase in entropy elsewhere; third, populating catalytically competent ensembles, in which conformational entropy within the enzymatic scaffold aids in lowering transition state barriers. We also provide our perspective on how solving the current challenge of structurally defining the ensembles encoding conformational entropy will lead to new possibilities for controlling binding, catalysis and allostery.
Collapse
Affiliation(s)
- Stephanie A Wankowicz
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA.
| | - James S Fraser
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA.
| |
Collapse
|
5
|
Vargas-Rosales PA, Caflisch A. The physics-AI dialogue in drug design. RSC Med Chem 2025; 16:1499-1515. [PMID: 39906313 PMCID: PMC11788922 DOI: 10.1039/d4md00869c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2024] [Accepted: 01/16/2025] [Indexed: 02/06/2025] Open
Abstract
A long path has led from the determination of the first protein structure in 1960 to the recent breakthroughs in protein science. Protein structure prediction and design methodologies based on machine learning (ML) have been recognized with the 2024 Nobel prize in Chemistry, but they would not have been possible without previous work and the input of many domain scientists. Challenges remain in the application of ML tools for the prediction of structural ensembles and their usage within the software pipelines for structure determination by crystallography or cryogenic electron microscopy. In the drug discovery workflow, ML techniques are being used in diverse areas such as scoring of docked poses, or the generation of molecular descriptors. As the ML techniques become more widespread, novel applications emerge which can profit from the large amounts of data available. Nevertheless, it is essential to balance the potential advantages against the environmental costs of ML deployment to decide if and when it is best to apply it. For hit to lead optimization ML tools can efficiently interpolate between compounds in large chemical series but free energy calculations by molecular dynamics simulations seem to be superior for designing novel derivatives. Importantly, the potential complementarity and/or synergism of physics-based methods (e.g., force field-based simulation models) and data-hungry ML techniques is growing strongly. Current ML methods have evolved from decades of research. It is now necessary for biologists, physicists, and computer scientists to fully understand advantages and limitations of ML techniques to ensure that the complementarity of physics-based methods and ML tools can be fully exploited for drug design.
Collapse
Affiliation(s)
| | - Amedeo Caflisch
- Department of Biochemistry, University of Zurich Winterthurerstrasse 190 8057 Zürich Switzerland
| |
Collapse
|
6
|
Sil S, Datta I, Basu S. Use of AI-methods over MD simulations in the sampling of conformational ensembles in IDPs. Front Mol Biosci 2025; 12:1542267. [PMID: 40264953 PMCID: PMC12011600 DOI: 10.3389/fmolb.2025.1542267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2024] [Accepted: 03/17/2025] [Indexed: 04/24/2025] Open
Abstract
Intrinsically Disordered Proteins (IDPs) challenge traditional structure-function paradigms by existing as dynamic ensembles rather than stable tertiary structures. Capturing these ensembles is critical to understanding their biological roles, yet Molecular Dynamics (MD) simulations, though accurate and widely used, are computationally expensive and struggle to sample rare, transient states. Artificial intelligence (AI) offers a transformative alternative, with deep learning (DL) enabling efficient and scalable conformational sampling. They leverage large-scale datasets to learn complex, non-linear, sequence-to-structure relationships, allowing for the modeling of conformational ensembles in IDPs without the constraints of traditional physics-based approaches. Such DL approaches have been shown to outperform MD in generating diverse ensembles with comparable accuracy. Most models rely primarily on simulated data for training and experimental data serves a critical role in validation, aligning the generated conformational ensembles with observable physical and biochemical properties. However, challenges remain, including dependence on data quality, limited interpretability, and scalability for larger proteins. Hybrid approaches combining AI and MD can bridge the gaps by integrating statistical learning with thermodynamic feasibility. Future directions include incorporating physics-based constraints and learning experimental observables into DL frameworks to refine predictions and enhance applicability. AI-driven methods hold significant promise in IDP research, offering novel insights into protein dynamics and therapeutic targeting while overcoming the limitations of traditional MD simulations.
Collapse
Affiliation(s)
- Souradeep Sil
- Department of Genetics, Osmania University, Hyderabad, India
| | - Ishita Datta
- Department of Genetics and Plant Breeding, Banaras Hindu University, Varanasi, India
| | - Sankar Basu
- Department of Microbiology, Asutosh College (Affiliated with University of Calcutta), Kolkata, India
| |
Collapse
|
7
|
Kalakoti Y, Sanjeev A, Wallner B. Prediction of structural variation. Curr Opin Struct Biol 2025; 91:103003. [PMID: 39983409 DOI: 10.1016/j.sbi.2025.103003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2024] [Revised: 01/15/2025] [Accepted: 01/26/2025] [Indexed: 02/23/2025]
Abstract
Proteins are dynamic molecules that transition between conformational states to perform their functions, and characterizing the protein ensemble is important for understanding biology and therapeutic applications. While recent breakthroughs in machine learning have enabled the prediction of high-quality static models of individual proteins, generating reliable estimates of their conformational ensembles remains a challenge. Several recent methods have tried to utilize the evolutionary and structural features captured by effective sequence-to-structure models to enhance conformational diversity in generated models. Most of these approaches involve adapting existing inference pipelines, such as AlphaFold 2, combined with sampling techniques to induce the generation of diverse conformational states. Here, we describe the general problem of predicting structural variations in protein systems, explain the methods designed to address this challenge, explore why they are effective, discuss their limitations, and suggest potential future directions.
Collapse
Affiliation(s)
- Yogesh Kalakoti
- Linköping University, Division of Bioinformatics, Department of Physics, Chemistry and Biolog, Linköping, 58183, Sweden
| | - Airy Sanjeev
- Linköping University, Division of Bioinformatics, Department of Physics, Chemistry and Biolog, Linköping, 58183, Sweden
| | - Björn Wallner
- Linköping University, Division of Bioinformatics, Department of Physics, Chemistry and Biolog, Linköping, 58183, Sweden.
| |
Collapse
|
8
|
Aranganathan A, Gu X, Wang D, Vani BP, Tiwary P. Modeling Boltzmann-weighted structural ensembles of proteins using artificial intelligence-based methods. Curr Opin Struct Biol 2025; 91:103000. [PMID: 39923288 PMCID: PMC12011212 DOI: 10.1016/j.sbi.2025.103000] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2024] [Revised: 01/09/2025] [Accepted: 01/20/2025] [Indexed: 02/11/2025]
Abstract
This review highlights recent advances in AI-driven methods for generating Boltzmann-weighted structural ensembles, which are crucial for understanding biomolecular dynamics and drug discovery. With the rise of deep learning models such as AlphaFold2, there has been a shift toward more accurate and efficient sampling of structural ensembles. The review discusses the integration of AI with traditional molecular dynamics techniques as well as experiments, the challenges of conformational sampling, and future directions for AI-driven research in structural biology, particularly in drug discovery and protein dynamics.
Collapse
Affiliation(s)
- Akashnathan Aranganathan
- Biophysics Program, University of Maryland, College Park, 20742, MD, USA; Institute of Physical Science and Technology, University of Maryland, College Park, 20742, MD, USA
| | - Xinyu Gu
- Institute of Physical Science and Technology, University of Maryland, College Park, 20742, MD, USA; University of Maryland Institute for Health Computing, Bethesda, 20852, MD, USA.
| | - Dedi Wang
- Genentech, 1 DNA Way, South San Francisco, 94080, CA, USA
| | - Bodhi P Vani
- Genentech, 1 DNA Way, South San Francisco, 94080, CA, USA
| | - Pratyush Tiwary
- Institute of Physical Science and Technology, University of Maryland, College Park, 20742, MD, USA; University of Maryland Institute for Health Computing, Bethesda, 20852, MD, USA; Department of Chemistry and Biochemistry, University of Maryland, College Park, 20742, MD, USA.
| |
Collapse
|
9
|
Schafer JW, Porter LL. AlphaFold2's training set powers its predictions of some fold-switched conformations. Protein Sci 2025; 34:e70105. [PMID: 40130805 PMCID: PMC11934219 DOI: 10.1002/pro.70105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2024] [Revised: 02/04/2025] [Accepted: 03/07/2025] [Indexed: 03/26/2025]
Abstract
AlphaFold2 (AF2), a deep-learning-based model that predicts protein structures from their amino acid sequences, has recently been used to predict multiple protein conformations. In some cases, AF2 has successfully predicted both dominant and alternative conformations of fold-switching proteins, which remodel their secondary and/or tertiary structures in response to cellular stimuli. Whether AF2 has learned enough protein folding principles to reliably predict alternative conformations outside of its training set is unclear. Previous work suggests that AF2 predicted these alternative conformations by memorizing them during training. Here, we use CFold-an implementation of the AF2 network trained on a more limited subset of experimentally determined protein structures-to directly test how well the AF2 architecture predicts alternative conformations of fold switchers outside of its training set. We tested CFold on eight fold switchers from six protein families. These proteins-whose secondary structures switch between α-helix and β-sheet and/or whose hydrogen bonding networks are reconfigured dramatically-had not been tested previously, and only one of their alternative conformations was in CFold's training set. Successful CFold predictions would indicate that the AF2 architecture can predict disparate alternative conformations of fold-switched conformations outside of its training set, while unsuccessful predictions would suggest that AF2 predictions of these alternative conformations likely arise from association with structures learned during training. Despite sampling 1300-4300 structures/protein with various sequence sampling techniques, CFold predicted only one alternative structure outside of its training set accurately and with high confidence while also generating experimentally inconsistent structures with higher confidence. Though these results indicate that AF2's current success in predicting alternative conformations of fold switchers stems largely from its training data, results from a sequence pruning technique suggest developments that could lead to a more reliable generative model in the future.
Collapse
Affiliation(s)
- Joseph W. Schafer
- National Library of Medicine, National Center for Biotechnology Information, National Institutes of HealthBethesdaMarylandUSA
| | - Lauren L. Porter
- National Library of Medicine, National Center for Biotechnology Information, National Institutes of HealthBethesdaMarylandUSA
- National Heart, Lung, and Blood Institute, Biochemistry and Biophysics Center, National Institutes of HealthBethesdaMarylandUSA
| |
Collapse
|
10
|
Janson G, Jussupow A, Feig M. Deep generative modeling of temperature-dependent structural ensembles of proteins. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.03.09.642148. [PMID: 40161645 PMCID: PMC11952339 DOI: 10.1101/2025.03.09.642148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/02/2025]
Abstract
Deep learning has revolutionized protein structure prediction, but capturing conformational ensembles and structural variability remains an open challenge. While molecular dynamics (MD) is the foundation method for simulating biomolecular dynamics, it is computationally expensive. Recently, deep learning models trained on MD have made progress in generating structural ensembles at reduced cost. However, they remain limited in modeling atomistic details and, crucially, incorporating the effect of environmental factors. Here, we present aSAM (atomistic structural autoencoder model), a latent diffusion model trained on MD to generate heavy atom protein ensembles. Unlike most methods, aSAM models atoms in a latent space, greatly facilitating accurate sampling of side chain and backbone torsion angle distributions. Additionally, we extended aSAM into the first reported transferable generator conditioned on temperature, named aSAMt. Trained on the large and open mdCATH dataset, aSAMt captures temperature-dependent ensemble properties and demonstrates generalization beyond training temperatures. By comparing aSAMt ensembles to long MD simulations of fast folding proteins, we find that high-temperature training enhances the ability of deep generators to explore energy landscapes. Finally, we also show that our MD-based aSAMt can already capture experimentally observed thermal behavior of proteins. Our work is a step towards generalizable ensemble generation to complement physics-based approaches.
Collapse
Affiliation(s)
- Giacomo Janson
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA
| | - Alexander Jussupow
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA
| | - Michael Feig
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
11
|
Brotzakis ZF, Zhang S, Murtada MH, Vendruscolo M. AlphaFold prediction of structural ensembles of disordered proteins. Nat Commun 2025; 16:1632. [PMID: 39952928 PMCID: PMC11829000 DOI: 10.1038/s41467-025-56572-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Accepted: 01/23/2025] [Indexed: 02/17/2025] Open
Abstract
Deep learning methods of predicting protein structures have reached an accuracy comparable to that of high-resolution experimental methods. It is thus possible to generate accurate models of the native states of hundreds of millions of proteins. An open question, however, concerns whether these advances can be translated to disordered proteins, which should be represented as structural ensembles because of their heterogeneous and dynamical nature. To address this problem, we introduce the AlphaFold-Metainference method to use AlphaFold-derived distances as structural restraints in molecular dynamics simulations to construct structural ensembles of ordered and disordered proteins. The results obtained using AlphaFold-Metainference illustrate the possibility of making predictions of the conformational properties of disordered proteins using deep learning methods trained on the large structural databases available for folded proteins.
Collapse
Affiliation(s)
- Z Faidon Brotzakis
- Centre for Misfolding Diseases, Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, UK
- Institute for Bioinnovation, Biomedical Sciences Research Center "Alexander Fleming", 16672, Vari, Greece
| | - Shengyu Zhang
- Centre for Misfolding Diseases, Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Mhd Hussein Murtada
- Centre for Misfolding Diseases, Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, UK
| | - Michele Vendruscolo
- Centre for Misfolding Diseases, Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, UK.
| |
Collapse
|
12
|
Bemelmans MP, Cournia Z, Damm-Ganamet KL, Gervasio FL, Pande V. Computational advances in discovering cryptic pockets for drug discovery. Curr Opin Struct Biol 2025; 90:102975. [PMID: 39778412 DOI: 10.1016/j.sbi.2024.102975] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2024] [Revised: 11/27/2024] [Accepted: 12/06/2024] [Indexed: 01/11/2025]
Abstract
A number of promising therapeutic target proteins have been considered "undruggable" due to the lack of well-defined ligandable pockets. Substantial research in protein dynamics has elucidated the existence of "cryptic" pockets that only exist transiently and become favorable for binding in the presence of a ligand. These pockets provide an avenue to target challenging proteins, inspiring the development of multiple computational methods. This review highlights established cryptic pocket modeling approaches like mixed solvent molecular dynamics and recent applications of enhanced sampling and AI-based methods in therapeutically relevant proteins.
Collapse
Affiliation(s)
- Martijn P Bemelmans
- Computer-Aided Drug Design, In Silico Discovery, Therapeutics Discovery, Johnson & Johnson Innovative Medicine, Turnhoutseweg 30, 2340 Beerse, Belgium; School of Pharmaceutical Sciences, University of Geneva, Rue Michel Servet 1, Geneva, 1206, Switzerland
| | - Zoe Cournia
- Biomedical Research Foundation, Academy of Athens, 4 Soranou Ephesiou, Athens 11527, Greece
| | - Kelly L Damm-Ganamet
- Computer-Aided Drug Design, In Silico Discovery, Therapeutics Discovery, Johnson & Johnson Innovative Medicine, 3210 Merryfield Row, San Diego, CA 92121, United States
| | - Francesco L Gervasio
- School of Pharmaceutical Sciences, University of Geneva, Rue Michel Servet 1, Geneva, 1206, Switzerland.
| | - Vineet Pande
- Computer-Aided Drug Design, In Silico Discovery, Therapeutics Discovery, Johnson & Johnson Innovative Medicine, Turnhoutseweg 30, 2340 Beerse, Belgium.
| |
Collapse
|
13
|
Montserrat-Canals M, Cordara G, Krengel U. Allostery. Q Rev Biophys 2025; 58:e5. [PMID: 39849666 DOI: 10.1017/s0033583524000209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2025]
Abstract
Allostery describes the ability of biological macromolecules to transmit signals spatially through the molecule from an allosteric site – a site that is distinct from orthosteric binding sites of primary, endogenous ligands – to the functional or active site. This review starts with a historical overview and a description of the classical example of allostery – hemoglobin – and other well-known examples (aspartate transcarbamoylase, Lac repressor, kinases, G-protein-coupled receptors, adenosine triphosphate synthase, and chaperonin). We then discuss fringe examples of allostery, including intrinsically disordered proteins and inter-enzyme allostery, and the influence of dynamics, entropy, and conformational ensembles and landscapes on allosteric mechanisms, to capture the essence of the field. Thereafter, we give an overview over central methods for investigating molecular mechanisms, covering experimental techniques as well as simulations and artificial intelligence (AI)-based methods. We conclude with a review of allostery-based drug discovery, with its challenges and opportunities: with the recent advent of AI-based methods, allosteric compounds are set to revolutionize drug discovery and medical treatments.
Collapse
Affiliation(s)
- Mateu Montserrat-Canals
- Department of Chemistry, University of Oslo, Oslo, Norway
- Hylleraas Centre for Quantum Molecular Sciences, University of Oslo, Oslo, Norway
| | - Gabriele Cordara
- Department of Chemistry, University of Oslo, Oslo, Norway
- Hylleraas Centre for Quantum Molecular Sciences, University of Oslo, Oslo, Norway
| | - Ute Krengel
- Department of Chemistry, University of Oslo, Oslo, Norway
- Hylleraas Centre for Quantum Molecular Sciences, University of Oslo, Oslo, Norway
| |
Collapse
|
14
|
van Aalst EJ, Wylie BJ. An in silico framework to visualize how cancer-associated mutations influence structural plasticity of the chemokine receptor CCR3. Protein Sci 2025; 34:e70013. [PMID: 39723881 DOI: 10.1002/pro.70013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2024] [Revised: 11/06/2024] [Accepted: 12/12/2024] [Indexed: 12/28/2024]
Abstract
G protein Coupled Receptors (GPCRs) are the largest family of cell surface receptors in humans. Somatic mutations in GPCRs are implicated in cancer progression and metastasis, but mechanisms are poorly understood. Emerging evidence implicates perturbation of intra-receptor activation pathway motifs whereby extracellular signals are transmitted intracellularly. Recently, sufficiently sensitive methodology was described to calculate structural strain as a function of missense mutations in AlphaFold-predicted model structures, which was extensively validated on experimental and predicted structural datasets. When paired with Molecular Dynamics (MD) simulations, these tools provide a facile approach to screen mutations in silico. We applied this framework to calculate the structural and dynamic effects of cancer-associated mutations in the chemokine receptor CCR3, a Class A GPCR involved in cancer and autoimmune disorders. Residue-residue contact scoring refined effective strain results, highlighting significant remodeling of inter- and intra-motif contacts along the highly conserved GPCR activation pathway network. We then integrated AlphaFold-derived predicted Local Distance Difference Test scores with per-residue Root Mean Square Fluctuations and activation pathway Contact Analysis (CONAN) from coarse grain MD simulations to identify statistically significant changes in receptor dynamics upon mutation. Finally, analysis of negative control mutants suggests false positive results in AlphaFold pipelines should be considered but can be mitigated with stricter control of statistical analysis. Our results indicate selected mutants influence structural plasticity of CCR3 related to ligand interaction, activation, and G protein coupling, using a framework that could be applicable to a wide range of biochemically relevant protein targets following further validation.
Collapse
Affiliation(s)
- Evan J van Aalst
- Department of Chemistry and Biochemistry, Texas Tech University, Lubbock, Texas, USA
| | - Benjamin J Wylie
- Department of Chemistry and Biochemistry, Texas Tech University, Lubbock, Texas, USA
| |
Collapse
|
15
|
O'Donnell TJ, Kanduri C, Isacchini G, Limenitakis JP, Brachman RA, Alvarez RA, Haff IH, Sandve GK, Greiff V. Reading the repertoire: Progress in adaptive immune receptor analysis using machine learning. Cell Syst 2024; 15:1168-1189. [PMID: 39701034 DOI: 10.1016/j.cels.2024.11.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2024] [Revised: 08/16/2024] [Accepted: 11/14/2024] [Indexed: 12/21/2024]
Abstract
The adaptive immune system holds invaluable information on past and present immune responses in the form of B and T cell receptor sequences, but we are limited in our ability to decode this information. Machine learning approaches are under active investigation for a range of tasks relevant to understanding and manipulating the adaptive immune receptor repertoire, including matching receptors to the antigens they bind, generating antibodies or T cell receptors for use as therapeutics, and diagnosing disease based on patient repertoires. Progress on these tasks has the potential to substantially improve the development of vaccines, therapeutics, and diagnostics, as well as advance our understanding of fundamental immunological principles. We outline key challenges for the field, highlighting the need for software benchmarking, targeted large-scale data generation, and coordinated research efforts.
Collapse
Affiliation(s)
| | - Chakravarthi Kanduri
- Department of Informatics, University of Oslo, Oslo, Norway; UiO:RealArt Convergence Environment, University of Oslo, Oslo, Norway
| | | | | | - Rebecca A Brachman
- Imprint Labs, LLC, New York, NY, USA; Cornell Tech, Cornell University, New York, NY, USA
| | | | - Ingrid H Haff
- Department of Mathematics, University of Oslo, 0371 Oslo, Norway
| | - Geir K Sandve
- Department of Informatics, University of Oslo, Oslo, Norway; UiO:RealArt Convergence Environment, University of Oslo, Oslo, Norway
| | - Victor Greiff
- Imprint Labs, LLC, New York, NY, USA; Department of Immunology, University of Oslo and Oslo University Hospital, Oslo, Norway.
| |
Collapse
|
16
|
Wang D, Tiwary P. Augmenting Human Expertise in Weighted Ensemble Simulations through Deep Learning-Based Information Bottleneck. J Chem Theory Comput 2024; 20:10371-10383. [PMID: 39589127 DOI: 10.1021/acs.jctc.4c00919] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2024]
Abstract
The weighted ensemble (WE) method stands out as a widely used segment-based sampling technique renowned for its rigorous treatment of kinetics. The WE framework typically involves initially mapping the configuration space onto a low-dimensional collective variable (CV) space and then partitioning it into bins. The efficacy of WE simulations heavily depends on the selection of CVs and binning schemes. The recently proposed state predictive information bottleneck (SPIB) method has emerged as a promising tool for automatically constructing CVs from data and guiding enhanced sampling through an iterative manner. In this work, we advance this data-driven pipeline by incorporating prior expert knowledge. Our hybrid approach combines SPIB-learned CVs to enhance sampling in explored regions with expert-based CVs to guide exploration in regions of interest, synergizing the strengths of both methods. Through benchmarking on alanine dipeptide and chignolin systems, we demonstrate that our hybrid approach effectively guides WE simulations to sample states of interest and reduces run-to-run variances. Moreover, our integration of the SPIB model also enhances the analysis and interpretation of WE simulation data by effectively identifying metastable states and pathways and offering direct visualization of dynamics.
Collapse
Affiliation(s)
- Dedi Wang
- Biophysics Program and Institute for Physical Science and Technology, University of Maryland, College Park, Maryland 20742, United States
| | - Pratyush Tiwary
- Department of Chemistry and Biochemistry and Institute for Physical Science and Technology, University of Maryland, College Park, Maryland 20742, United States
- University of Maryland Institute for Health Computing, Bethesda, Maryland 20852, United States
| |
Collapse
|
17
|
Harding-Larsen D, Funk J, Madsen NG, Gharabli H, Acevedo-Rocha CG, Mazurenko S, Welner DH. Protein representations: Encoding biological information for machine learning in biocatalysis. Biotechnol Adv 2024; 77:108459. [PMID: 39366493 DOI: 10.1016/j.biotechadv.2024.108459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Revised: 09/19/2024] [Accepted: 09/29/2024] [Indexed: 10/06/2024]
Abstract
Enzymes offer a more environmentally friendly and low-impact solution to conventional chemistry, but they often require additional engineering for their application in industrial settings, an endeavour that is challenging and laborious. To address this issue, the power of machine learning can be harnessed to produce predictive models that enable the in silico study and engineering of improved enzymatic properties. Such machine learning models, however, require the conversion of the complex biological information to a numerical input, also called protein representations. These inputs demand special attention to ensure the training of accurate and precise models, and, in this review, we therefore examine the critical step of encoding protein information to numeric representations for use in machine learning. We selected the most important approaches for encoding the three distinct biological protein representations - primary sequence, 3D structure, and dynamics - to explore their requirements for employment and inductive biases. Combined representations of proteins and substrates are also introduced as emergent tools in biocatalysis. We propose the division of fixed representations, a collection of rule-based encoding strategies, and learned representations extracted from the latent spaces of large neural networks. To select the most suitable protein representation, we propose two main factors to consider. The first one is the model setup, which is influenced by the size of the training dataset and the choice of architecture. The second factor is the model objectives such as consideration about the assayed property, the difference between wild-type models and mutant predictors, and requirements for explainability. This review is aimed at serving as a source of information and guidance for properly representing enzymes in future machine learning models for biocatalysis.
Collapse
Affiliation(s)
- David Harding-Larsen
- The Novo Nordisk Center for Biosustainability, Technical University of Denmark, Søltofts Plads, Bygning 220, 2800 Kgs. Lyngby, Denmark
| | - Jonathan Funk
- The Novo Nordisk Center for Biosustainability, Technical University of Denmark, Søltofts Plads, Bygning 220, 2800 Kgs. Lyngby, Denmark
| | - Niklas Gesmar Madsen
- The Novo Nordisk Center for Biosustainability, Technical University of Denmark, Søltofts Plads, Bygning 220, 2800 Kgs. Lyngby, Denmark
| | - Hani Gharabli
- The Novo Nordisk Center for Biosustainability, Technical University of Denmark, Søltofts Plads, Bygning 220, 2800 Kgs. Lyngby, Denmark
| | - Carlos G Acevedo-Rocha
- The Novo Nordisk Center for Biosustainability, Technical University of Denmark, Søltofts Plads, Bygning 220, 2800 Kgs. Lyngby, Denmark
| | - Stanislav Mazurenko
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech Republic; International Clinical Research Center, St. Anne's University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Ditte Hededam Welner
- The Novo Nordisk Center for Biosustainability, Technical University of Denmark, Søltofts Plads, Bygning 220, 2800 Kgs. Lyngby, Denmark.
| |
Collapse
|
18
|
Wang D, Tiwary P. Augmenting Human Expertise in Weighted Ensemble Simulations through Deep Learning based Information Bottleneck. ARXIV 2024:arXiv:2406.14839v2. [PMID: 38947925 PMCID: PMC11213147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
The weighted ensemble (WE) method stands out as a widely used segment-based sampling technique renowned for its rigorous treatment of kinetics. The WE framework typically involves initially mapping the configuration space onto a low-dimensional collective variable (CV) space and then partitioning it into bins. The efficacy of WE simulations heavily depends on the selection of CVs and binning schemes. The recently proposed State Predictive Information Bottleneck (SPIB) method has emerged as a promising tool for automatically constructing CVs from data and guiding enhanced sampling through an iterative manner. In this work, we advance this data-driven pipeline by incorporating prior expert knowledge. Our hybrid approach combines SPIB-learned CVs to enhance sampling in explored regions with expert-based CVs to guide exploration in regions of interest, synergizing the strengths of both methods. Through benchmarking on alanine dipeptide and chignoin systems, we demonstrate that our hybrid approach effectively guides WE simulations to sample states of interest, and reduces run-to-run variances. Moreover, our integration of the SPIB model also enhances the analysis and interpretation of WE simulation data by effectively identifying metastable states and pathways, and offering direct visualization of dynamics.
Collapse
Affiliation(s)
- Dedi Wang
- Biophysics Program and Institute for Physical Science and Technology, University of Maryland, College Park 20742, USA
| | - Pratyush Tiwary
- Department of Chemistry and Biochemistry and Institute for Physical Science and Technology, University of Maryland, College Park 20742, USA
- University of Maryland Institute for Health Computing, Bethesda 20852, USA
| |
Collapse
|
19
|
Riccabona JR, Spoendlin FC, Fischer ALM, Loeffler JR, Quoika PK, Jenkins TP, Ferguson JA, Smorodina E, Laustsen AH, Greiff V, Forli S, Ward AB, Deane CM, Fernández-Quintero ML. Assessing AF2's ability to predict structural ensembles of proteins. Structure 2024; 32:2147-2159.e2. [PMID: 39332396 DOI: 10.1016/j.str.2024.09.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Revised: 08/07/2024] [Accepted: 09/02/2024] [Indexed: 09/29/2024]
Abstract
Recent breakthroughs in protein structure prediction have enhanced the precision and speed at which protein configurations can be determined. Additionally, molecular dynamics (MD) simulations serve as a crucial tool for capturing the conformational space of proteins, providing valuable insights into their structural fluctuations. However, the scope of MD simulations is often limited by the accessible timescales and the computational resources available, posing challenges to comprehensively exploring protein behaviors. Recently emerging approaches have focused on expanding the capability of AlphaFold2 (AF2) to predict conformational substates of protein. Here, we benchmark the performance of various workflows that have adapted AF2 for ensemble prediction and compare the obtained structures with ensembles obtained from MD simulations and NMR. We provide an overview of the levels of performance and accessible timescales that can currently be achieved with machine learning (ML) based ensemble generation. Significant minima of the free energy surfaces remain undetected.
Collapse
Affiliation(s)
- Jakob R Riccabona
- Center for Molecular Biosciences Innsbruck, Department of General, Inorganic and Theoretical Chemistry, University of Innsbruck, Innsbruck, Austria
| | - Fabian C Spoendlin
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford OX1 3LB, UK
| | - Anna-Lena M Fischer
- Center for Molecular Biosciences Innsbruck, Department of General, Inorganic and Theoretical Chemistry, University of Innsbruck, Innsbruck, Austria
| | - Johannes R Loeffler
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Patrick K Quoika
- Center for Functional Protein Assemblies, Technical University of Munich, Ernst-Otto-Fischer-Str. 8, 85748 Garching, Germany
| | - Timothy P Jenkins
- Department of Biotechnology and Biomedicine, Technical University of Denmark, DK-2800 Kongens Lyngby, Denmark
| | - James A Ferguson
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Eva Smorodina
- Department of Immunology, University of Oslo, Oslo, Norway
| | - Andreas H Laustsen
- Department of Biotechnology and Biomedicine, Technical University of Denmark, DK-2800 Kongens Lyngby, Denmark
| | - Victor Greiff
- Department of Immunology, University of Oslo, Oslo, Norway
| | - Stefano Forli
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Andrew B Ward
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA.
| | - Charlotte M Deane
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford OX1 3LB, UK.
| | - Monica L Fernández-Quintero
- Center for Molecular Biosciences Innsbruck, Department of General, Inorganic and Theoretical Chemistry, University of Innsbruck, Innsbruck, Austria; Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA; Department of Biotechnology and Biomedicine, Technical University of Denmark, DK-2800 Kongens Lyngby, Denmark.
| |
Collapse
|
20
|
Georgouli K, Stephany RR, Tempkin JOB, Santiago C, Aydin F, Heimann MA, Pottier L, Zhang X, Carpenter TS, Hsu T, Nissley DV, Streitz FH, Lightstone FC, Ingolfsson HI, Bremer PT. Generating Protein Structures for Pathway Discovery Using Deep Learning. J Chem Theory Comput 2024; 20:8795-8806. [PMID: 39388723 PMCID: PMC11500303 DOI: 10.1021/acs.jctc.4c00816] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2024] [Revised: 09/27/2024] [Accepted: 09/30/2024] [Indexed: 10/12/2024]
Abstract
Resolving the intricate details of biological phenomena at the molecular level is fundamentally limited by both length- and time scales that can be probed experimentally. Molecular dynamics (MD) simulations at various scales are powerful tools frequently employed to offer valuable biological insights beyond experimental resolution. However, while it is relatively simple to observe long-lived, stable configurations of, for example, proteins, at the required spatial resolution, simulating the more interesting rare transitions between such states often takes orders of magnitude longer than what is feasible even on the largest supercomputers available today. One common aspect of this challenge is pathway discovery, where the start and end states of a scientific phenomenon are known or can be approximated, but the mechanistic details in between are unknown. Here, we propose a representation-learning-based solution that uses interpolation and extrapolation in an abstract representation space to synthesize potential transition states, which are automatically validated using MD simulations. The new simulations of the synthesized transition states are subsequently incorporated into the representation learning, leading to an iterative framework for targeted path sampling. Our approach is demonstrated by recovering the transition of a RAS-RAF protein domain (CRD) from membrane-free to interacting with the membrane using coarse-grain MD simulations.
Collapse
Affiliation(s)
- Konstantia Georgouli
- Physical
and Life Sciences Directorate, Lawrence
Livermore National Laboratory, Livermore 94550, California, United States
| | - Robert R. Stephany
- Center
for Applied Mathematics, Cornell University, Ithaca 14853, New York, United States
| | - Jeremy O. B. Tempkin
- Physical
and Life Sciences Directorate, Lawrence
Livermore National Laboratory, Livermore 94550, California, United States
| | - Claudio Santiago
- Center
for Applied Scientific Computing, Lawrence
Livermore National Laboratory, Livermore 94550, California, United States
| | - Fikret Aydin
- Physical
and Life Sciences Directorate, Lawrence
Livermore National Laboratory, Livermore 94550, California, United States
| | - Mark A. Heimann
- Center
for Applied Scientific Computing, Lawrence
Livermore National Laboratory, Livermore 94550, California, United States
| | - Loïc Pottier
- Center
for Applied Scientific Computing, Lawrence
Livermore National Laboratory, Livermore 94550, California, United States
| | - Xiaohua Zhang
- Physical
and Life Sciences Directorate, Lawrence
Livermore National Laboratory, Livermore 94550, California, United States
| | - Timothy S. Carpenter
- Physical
and Life Sciences Directorate, Lawrence
Livermore National Laboratory, Livermore 94550, California, United States
| | - Tim Hsu
- Center
for Applied Scientific Computing, Lawrence
Livermore National Laboratory, Livermore 94550, California, United States
| | - Dwight V. Nissley
- RAS
Initiative, The Cancer Research Technology Program, Frederick National Laboratory, Frederick 21701, Maryland, United States
| | - Frederick H. Streitz
- Computing
Directorate, Lawrence Livermore National
Laboratory, Livermore 94550, California, United States
| | - Felice C. Lightstone
- Physical
and Life Sciences Directorate, Lawrence
Livermore National Laboratory, Livermore 94550, California, United States
| | - Helgi I. Ingolfsson
- Physical
and Life Sciences Directorate, Lawrence
Livermore National Laboratory, Livermore 94550, California, United States
| | - Peer-Timo Bremer
- Center
for Applied Scientific Computing, Lawrence
Livermore National Laboratory, Livermore 94550, California, United States
| |
Collapse
|
21
|
Schafer JW, Porter LL. AlphaFold2's training set powers its predictions of fold-switched conformations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.11.617857. [PMID: 39803493 PMCID: PMC11722258 DOI: 10.1101/2024.10.11.617857] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/23/2025]
Abstract
AlphaFold2 (AF2), a deep-learning based model that predicts protein structures from their amino acid sequences, has recently been used to predict multiple protein conformations. In some cases, AF2 has successfully predicted both dominant and alternative conformations of fold-switching proteins, which remodel their secondary and tertiary structures in response to cellular stimuli. Whether AF2 has learned enough protein folding principles to reliably predict alternative conformations outside of its training set is unclear. Here, we address this question by assessing whether CFold-an implementation of the AF2 network trained on a more limited subset of experimentally determined protein structures- predicts alternative conformations of eight fold switchers from six protein families. Previous work suggests that AF2 predicted these alternative conformations by memorizing them during training. Unlike AF2, CFold's training set contains only one of these alternative conformations. Despite sampling 1300-4400 structures/protein with various sequence sampling techniques, CFold predicted only one alternative structure outside of its training set accurately and with high confidence while also generating experimentally inconsistent structures with higher confidence. Though these results indicate that AF2's current success in predicting alternative conformations of fold switchers stems largely from its training data, results from a sequence pruning technique suggest developments that could lead to a more reliable generative model in the future.
Collapse
Affiliation(s)
- Joseph W. Schafer
- National Library of Medicine, National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Lauren L. Porter
- National Library of Medicine, National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD, 20894, USA
- National Heart, Lung, and Blood Institute, Biochemistry and Biophysics Center, National Institutes of Health, Bethesda, MD, 20892, USA
| |
Collapse
|
22
|
Benavides TL, Montelione GT. Integrative Modeling of Protein-Polypeptide Complexes by Bayesian Model Selection using AlphaFold and NMR Chemical Shift Perturbation Data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.19.613999. [PMID: 39345459 PMCID: PMC11430059 DOI: 10.1101/2024.09.19.613999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/01/2024]
Abstract
Protein-polypeptide interactions, including those involving intrinsically-disordered peptides and intrinsically-disordered regions of protein binding partners, are crucial for many biological functions. However, experimental structure determination of protein-peptide complexes can be challenging. Computational methods, while promising, generally require experimental data for validation and refinement. Here we present CSP_Rank, an integrated modeling approach to determine the structures of protein-peptide complexes. This method combines AlphaFold2 (AF2) enhanced sampling methods with a Bayesian conformational selection process based on experimental Nuclear Magnetic Resonance (NMR) Chemical Shift Perturbation (CSP) data and AF2 confidence metrics. Using a curated dataset of 108 protein-peptide complexes from the Biological Magnetic Resonance Data Bank (BMRB), we observe that while AF2 typically yields models with excellent consistency with experimental CSP data, applying enhanced sampling followed by data-guided conformational selection routinely results in ensembles of structures with improved agreement with NMR observables. For two systems, we cross-validate the CSP-selected models using independently acquired nuclear Overhauser effect (NOE) NMR data and demonstrate how CSP and NMR can be combined using our Bayesian framework for model selection. CSP_Rank is a novel method for integrative modeling of protein-peptide complexes and has broad implications for studies of protein-peptide interactions and aiding in understanding their biological functions.
Collapse
Affiliation(s)
- Tiburon L. Benavides
- Department of Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180 USA
| | - Gaetano T. Montelione
- Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180 USA
| |
Collapse
|
23
|
Gu X, Aranganathan A, Tiwary P. Empowering AlphaFold2 for protein conformation selective drug discovery with AlphaFold2-RAVE. eLife 2024; 13:RP99702. [PMID: 39240197 PMCID: PMC11379456 DOI: 10.7554/elife.99702] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/07/2024] Open
Abstract
Small-molecule drug design hinges on obtaining co-crystallized ligand-protein structures. Despite AlphaFold2's strides in protein native structure prediction, its focus on apo structures overlooks ligands and associated holo structures. Moreover, designing selective drugs often benefits from the targeting of diverse metastable conformations. Therefore, direct application of AlphaFold2 models in virtual screening and drug discovery remains tentative. Here, we demonstrate an AlphaFold2-based framework combined with all-atom enhanced sampling molecular dynamics and Induced Fit docking, named AF2RAVE-Glide, to conduct computational model-based small-molecule binding of metastable protein kinase conformations, initiated from protein sequences. We demonstrate the AF2RAVE-Glide workflow on three different mammalian protein kinases and their type I and II inhibitors, with special emphasis on binding of known type II kinase inhibitors which target the metastable classical DFG-out state. These states are not easy to sample from AlphaFold2. Here, we demonstrate how with AF2RAVE these metastable conformations can be sampled for different kinases with high enough accuracy to enable subsequent docking of known type II kinase inhibitors with more than 50% success rates across docking calculations. We believe the protocol should be deployable for other kinases and more proteins generally.
Collapse
Affiliation(s)
- Xinyu Gu
- Institute for Physical Science and Technology, University of MarylandCollege ParkUnited States
- University of Maryland Institute for Health ComputingBethesdaUnited States
| | - Akashnathan Aranganathan
- Institute for Physical Science and Technology, University of MarylandCollege ParkUnited States
- Biophysics Program, University of MarylandCollege ParkUnited States
| | - Pratyush Tiwary
- Institute for Physical Science and Technology, University of MarylandCollege ParkUnited States
- University of Maryland Institute for Health ComputingBethesdaUnited States
- Department of Chemistry and Biochemistry, University of MarylandCollege ParkUnited States
| |
Collapse
|
24
|
Liu ZH, Tsanai M, Zhang O, Forman-Kay J, Head-Gordon T. Computational Methods to Investigate Intrinsically Disordered Proteins and their Complexes. ARXIV 2024:arXiv:2409.02240v1. [PMID: 39279844 PMCID: PMC11398552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 09/18/2024]
Abstract
In 1999 Wright and Dyson highlighted the fact that large sections of the proteome of all organisms are comprised of protein sequences that lack globular folded structures under physiological conditions. Since then the biophysics community has made significant strides in unraveling the intricate structural and dynamic characteristics of intrinsically disordered proteins (IDPs) and intrinsically disordered regions (IDRs). Unlike crystallographic beamlines and their role in streamlining acquisition of structures for folded proteins, an integrated experimental and computational approach aimed at IDPs/IDRs has emerged. In this Perspective we aim to provide a robust overview of current computational tools for IDPs and IDRs, and most recently their complexes and phase separated states, including statistical models, physics-based approaches, and machine learning methods that permit structural ensemble generation and validation against many solution experimental data types.
Collapse
Affiliation(s)
- Zi Hao Liu
- Molecular Medicine Program, Hospital for Sick Children, Toronto, Ontario M5G 0A4, Canada
- Department of Biochemistry, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| | - Maria Tsanai
- Kenneth S. Pitzer Center for Theoretical Chemistry and Department of Chemistry, University of California, Berkeley, Berkeley, California 94720, USA
| | - Oufan Zhang
- Kenneth S. Pitzer Center for Theoretical Chemistry and Department of Chemistry, University of California, Berkeley, Berkeley, California 94720, USA
| | - Julie Forman-Kay
- Molecular Medicine Program, Hospital for Sick Children, Toronto, Ontario M5G 0A4, Canada
- Department of Biochemistry, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| | - Teresa Head-Gordon
- Kenneth S. Pitzer Center for Theoretical Chemistry and Department of Chemistry, University of California, Berkeley, Berkeley, California 94720, USA
- Departments of Bioengineering and Chemical and Biomolecular Engineering, University of California, Berkeley, Berkeley, California 94720, USA
| |
Collapse
|
25
|
Vats S, Bobrovs R, Söderhjelm P, Bhakat S. AlphaFold-SFA: Accelerated sampling of cryptic pocket opening, protein-ligand binding and allostery by AlphaFold, slow feature analysis and metadynamics. PLoS One 2024; 19:e0307226. [PMID: 39190764 DOI: 10.1371/journal.pone.0307226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Accepted: 07/02/2024] [Indexed: 08/29/2024] Open
Abstract
Sampling rare events in proteins is crucial for comprehending complex phenomena like cryptic pocket opening, where transient structural changes expose new binding sites. Understanding these rare events also sheds light on protein-ligand binding and allosteric communications, where distant site interactions influence protein function. Traditional unbiased molecular dynamics simulations often fail to sample such rare events, as the free energy barrier between metastable states is large relative to the thermal energy. This renders these events inaccessible on the timescales typically simulated by unbiased molecular dynamics, limiting our understanding of these critical processes. In this paper, we proposed a novel unsupervised learning approach termed as slow feature analysis (SFA) which aims to extract slowly varying features from high-dimensional temporal data. SFA trained on small unbiased molecular dynamics simulations launched from AlphaFold generated conformational ensembles manages to capture rare events governing cryptic pocket opening, protein-ligand binding, and allosteric communications in a kinase. Metadynamics simulations using SFA as collective variables manage to sample 'deep' cryptic pocket opening within a few hundreds of nanoseconds which was beyond the reach of microsecond long unbiased molecular dynamics simulations. SFA augmented metadynamics also managed to capture conformational plasticity of protein upon ligand binding/unbinding and provided novel insights into allosteric communication in receptor-interacting protein kinase 2 (RIPK2) which dictates protein-protein interaction. Taken together, our results show how SFA acts as a dimensionality reduction tool which bridges the gap between AlphaFold, molecular dynamics simulation and metadynamics in context of capturing rare events in biomolecules, extending the scope of structure-based drug discovery in the era of AlphaFold.
Collapse
Affiliation(s)
- Shray Vats
- Department of Computer Science, University of Texas at Austin, Austin, TX, United States of America
| | | | - Pär Söderhjelm
- Division of Biophysical Chemistry, Chemical Center, Lund University, Lund, Sweden
| | | |
Collapse
|
26
|
Zhou J, Huang M. Navigating the landscape of enzyme design: from molecular simulations to machine learning. Chem Soc Rev 2024; 53:8202-8239. [PMID: 38990263 DOI: 10.1039/d4cs00196f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/12/2024]
Abstract
Global environmental issues and sustainable development call for new technologies for fine chemical synthesis and waste valorization. Biocatalysis has attracted great attention as the alternative to the traditional organic synthesis. However, it is challenging to navigate the vast sequence space to identify those proteins with admirable biocatalytic functions. The recent development of deep-learning based structure prediction methods such as AlphaFold2 reinforced by different computational simulations or multiscale calculations has largely expanded the 3D structure databases and enabled structure-based design. While structure-based approaches shed light on site-specific enzyme engineering, they are not suitable for large-scale screening of potential biocatalysts. Effective utilization of big data using machine learning techniques opens up a new era for accelerated predictions. Here, we review the approaches and applications of structure-based and machine-learning guided enzyme design. We also provide our view on the challenges and perspectives on effectively employing enzyme design approaches integrating traditional molecular simulations and machine learning, and the importance of database construction and algorithm development in attaining predictive ML models to explore the sequence fitness landscape for the design of admirable biocatalysts.
Collapse
Affiliation(s)
- Jiahui Zhou
- School of Chemistry and Chemical Engineering, Queen's University, David Keir Building, Stranmillis Road, Belfast BT9 5AG, Northern Ireland, UK.
| | - Meilan Huang
- School of Chemistry and Chemical Engineering, Queen's University, David Keir Building, Stranmillis Road, Belfast BT9 5AG, Northern Ireland, UK.
| |
Collapse
|
27
|
Bowman GR. AlphaFold and Protein Folding: Not Dead Yet! The Frontier Is Conformational Ensembles. Annu Rev Biomed Data Sci 2024; 7:51-57. [PMID: 38603560 PMCID: PMC11892350 DOI: 10.1146/annurev-biodatasci-102423-011435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/13/2024]
Abstract
Like the black knight in the classic Monty Python movie, grand scientific challenges such as protein folding are hard to finish off. Notably, AlphaFold is revolutionizing structural biology by bringing highly accurate structure prediction to the masses and opening up innumerable new avenues of research. Despite this enormous success, calling structure prediction, much less protein folding and related problems, "solved" is dangerous, as doing so could stymie further progress. Imagine what the world would be like if we had declared flight solved after the first commercial airlines opened and stopped investing in further research and development. Likewise, there are still important limitations to structure prediction that we would benefit from addressing. Moreover, we are limited in our understanding of the enormous diversity of different structures a single protein can adopt (called a conformational ensemble) and the dynamics by which a protein explores this space. What is clear is that conformational ensembles are critical to protein function, and understanding this aspect of protein dynamics will advance our ability to design new proteins and drugs.
Collapse
Affiliation(s)
- Gregory R Bowman
- Departments of Biochemistry and Biophysics and Bioengineering, University of Pennsylvania, Philadelphia, Pennsylvania, USA;
| |
Collapse
|
28
|
Frasnetti E, Magni A, Castelli M, Serapian SA, Moroni E, Colombo G. Structures, dynamics, complexes, and functions: From classic computation to artificial intelligence. Curr Opin Struct Biol 2024; 87:102835. [PMID: 38744148 DOI: 10.1016/j.sbi.2024.102835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 04/14/2024] [Accepted: 04/22/2024] [Indexed: 05/16/2024]
Abstract
Computational approaches can provide highly detailed insight into the molecular recognition processes that underlie drug binding, the assembly of protein complexes, and the regulation of biological functional processes. Classical simulation methods can bridge a wide range of length- and time-scales typically involved in such processes. Lately, automated learning and artificial intelligence methods have shown the potential to expand the reach of physics-based approaches, ushering in the possibility to model and even design complex protein architectures. The synergy between atomistic simulations and AI methods is an emerging frontier with a huge potential for advances in structural biology. Herein, we explore various examples and frameworks for these approaches, providing select instances and applications that illustrate their impact on fundamental biomolecular problems.
Collapse
Affiliation(s)
- Elena Frasnetti
- Department of Chemistry, University of Pavia, via Taramelli 12, 27100 Pavia, Italy
| | - Andrea Magni
- Department of Chemistry, University of Pavia, via Taramelli 12, 27100 Pavia, Italy
| | - Matteo Castelli
- Department of Chemistry, University of Pavia, via Taramelli 12, 27100 Pavia, Italy
| | - Stefano A Serapian
- Department of Chemistry, University of Pavia, via Taramelli 12, 27100 Pavia, Italy
| | | | - Giorgio Colombo
- Department of Chemistry, University of Pavia, via Taramelli 12, 27100 Pavia, Italy.
| |
Collapse
|
29
|
Biriukov D, Vácha R. Pathways to a Shiny Future: Building the Foundation for Computational Physical Chemistry and Biophysics in 2050. ACS PHYSICAL CHEMISTRY AU 2024; 4:302-313. [PMID: 39069976 PMCID: PMC11274290 DOI: 10.1021/acsphyschemau.4c00003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2024] [Revised: 03/15/2024] [Accepted: 03/18/2024] [Indexed: 07/30/2024]
Abstract
In the last quarter-century, the field of molecular dynamics (MD) has undergone a remarkable transformation, propelled by substantial enhancements in software, hardware, and underlying methodologies. In this Perspective, we contemplate the future trajectory of MD simulations and their possible look at the year 2050. We spotlight the pivotal role of artificial intelligence (AI) in shaping the future of MD and the broader field of computational physical chemistry. We outline critical strategies and initiatives that are essential for the seamless integration of such technologies. Our discussion delves into topics like multiscale modeling, adept management of ever-increasing data deluge, the establishment of centralized simulation databases, and the autonomous refinement, cross-validation, and self-expansion of these repositories. The successful implementation of these advancements requires scientific transparency, a cautiously optimistic approach to interpreting AI-driven simulations and their analysis, and a mindset that prioritizes knowledge-motivated research alongside AI-enhanced big data exploration. While history reminds us that the trajectory of technological progress can be unpredictable, this Perspective offers guidance on preparedness and proactive measures, aiming to steer future advancements in the most beneficial and successful direction.
Collapse
Affiliation(s)
- Denys Biriukov
- CEITEC
− Central European Institute of Technology, Masaryk University, Kamenice 753/5, 625 00 Brno, Czech Republic
- National
Centre for Biomolecular Research, Faculty of Science, Masaryk University, Kamenice 753/5, 625 00 Brno, Czech Republic
| | - Robert Vácha
- CEITEC
− Central European Institute of Technology, Masaryk University, Kamenice 753/5, 625 00 Brno, Czech Republic
- National
Centre for Biomolecular Research, Faculty of Science, Masaryk University, Kamenice 753/5, 625 00 Brno, Czech Republic
- Department
of Condensed Matter Physics, Faculty of Science, Masaryk University, Kotlářská 267/2, 611 37 Brno, Czech
Republic
| |
Collapse
|
30
|
Lee S, Wang D, Seeliger MA, Tiwary P. Calculating Protein-Ligand Residence Times through State Predictive Information Bottleneck Based Enhanced Sampling. J Chem Theory Comput 2024; 20:6341-6349. [PMID: 38991145 PMCID: PMC11990086 DOI: 10.1021/acs.jctc.4c00503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/13/2024]
Abstract
Understanding drug residence times in target proteins is key to improving drug efficacy and understanding target recognition in biochemistry. While drug residence time is just as important as binding affinity, atomic-level understanding of drug residence times through molecular dynamics (MD) simulations has been difficult primarily due to the extremely long time scales. Recent advances in rare event sampling have allowed us to reach these time scales, yet predicting protein-ligand residence times remains a significant challenge. Here we present a semi-automated protocol to calculate the ligand residence times across 12 orders of magnitude of time scales. In our proposed framework, we integrate a deep learning-based method, the state predictive information bottleneck (SPIB), to learn an approximate reaction coordinate (RC) and use it to guide the enhanced sampling method metadynamics. We demonstrate the performance of our algorithm by applying it to six different protein-ligand complexes with available benchmark residence times, including the dissociation of the widely studied anticancer drug Imatinib (Gleevec) from both wild-type Abl kinase and drug-resistant mutants. We show how our protocol can recover quantitatively accurate residence times, potentially opening avenues for deeper insights into drug development possibilities and ligand recognition mechanisms.
Collapse
Affiliation(s)
- Suemin Lee
- Biophysics Program and Institute for Physical Science and Technology, University of Maryland, College Park 20742, USA
| | - Dedi Wang
- Biophysics Program and Institute for Physical Science and Technology, University of Maryland, College Park 20742, USA
| | - Markus A. Seeliger
- Department of Pharmacological Sciences, Stony Brook University, Stony Brook, NY 11794-8651, USA
| | - Pratyush Tiwary
- Biophysics Program and Institute for Physical Science and Technology, University of Maryland, College Park 20742, USA
- Department of Chemistry and Biochemistry and Institute for Physical Science and Technology, University of Maryland, College Park 20742, USA
- University of Maryland Institute for Health Computing, Bethesda, Maryland 20852, USA
| |
Collapse
|
31
|
Gu X, Aranganathan A, Tiwary P. Empowering AlphaFold2 for protein conformation selective drug discovery with AlphaFold2-RAVE. ARXIV 2024:arXiv:2404.07102v3. [PMID: 38659642 PMCID: PMC11042445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
Small molecule drug design hinges on obtaining co-crystallized ligand-protein structures. Despite AlphaFold2's strides in protein native structure prediction, its focus on apo structures overlooks ligands and associated holo structures. Moreover, designing selective drugs often benefits from the targeting of diverse metastable conformations. Therefore, direct application of AlphaFold2 models in virtual screening and drug discovery remains tentative. Here, we demonstrate an AlphaFold2 based framework combined with all-atom enhanced sampling molecular dynamics and induced fit docking, named AF2RAVE-Glide, to conduct computational model based small molecule binding of metastable protein kinase conformations, initiated from protein sequences. We demonstrate the AF2RAVE-Glide workflow on three different protein kinases and their type I and II inhibitors, with special emphasis on binding of known type II kinase inhibitors which target the metastable classical DFG-out state. These states are not easy to sample from AlphaFold2. Here we demonstrate how with AF2RAVE these metastable conformations can be sampled for different kinases with high enough accuracy to enable subsequent docking of known type II kinase inhibitors with more than 50% success rates across docking calculations. We believe the protocol should be deployable for other kinases and more proteins generally.
Collapse
Affiliation(s)
- Xinyu Gu
- Institute for Physical Science and Technology, University of Maryland, College Park, Maryland 20742, USA
- University of Maryland Institute for Health Computing, Bethesda, United States
| | - Akashnathan Aranganathan
- Institute for Physical Science and Technology, University of Maryland, College Park, Maryland 20742, USA
- Biophysics Program, University of Maryland, College Park 20742, USA
| | - Pratyush Tiwary
- Institute for Physical Science and Technology, University of Maryland, College Park, Maryland 20742, USA
- Department of Chemistry and Biochemistry, University of Maryland, College Park 20742, USA
- University of Maryland Institute for Health Computing, Bethesda, United States
| |
Collapse
|
32
|
Wang D, Qiu Y, Beyerle ER, Huang X, Tiwary P. Information Bottleneck Approach for Markov Model Construction. J Chem Theory Comput 2024; 20:5352-5367. [PMID: 38859575 PMCID: PMC11199095 DOI: 10.1021/acs.jctc.4c00449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/12/2024]
Abstract
Markov state models (MSMs) have proven valuable in studying the dynamics of protein conformational changes via statistical analysis of molecular dynamics simulations. In MSMs, the complex configuration space is coarse-grained into conformational states, with dynamics modeled by a series of Markovian transitions among these states at discrete lag times. Constructing the Markovian model at a specific lag time necessitates defining states that circumvent significant internal energy barriers, enabling internal dynamics relaxation within the lag time. This process effectively coarse-grains time and space, integrating out rapid motions within metastable states. Thus, MSMs possess a multiresolution nature, where the granularity of states can be adjusted according to the time-resolution, offering flexibility in capturing system dynamics. This work introduces a continuous embedding approach for molecular conformations using the state predictive information bottleneck (SPIB), a framework that unifies dimensionality reduction and state space partitioning via a continuous, machine learned basis set. Without explicit optimization of the VAMP-based scores, SPIB demonstrates state-of-the-art performance in identifying slow dynamical processes and constructing predictive multiresolution Markovian models. Through applications to well-validated mini-proteins, SPIB showcases unique advantages compared to competing methods. It autonomously and self-consistently adjusts the number of metastable states based on a specified minimal time resolution, eliminating the need for manual tuning. While maintaining efficacy in dynamical properties, SPIB excels in accurately distinguishing metastable states and capturing numerous well-populated macrostates. This contrasts with existing VAMP-based methods, which often emphasize slow dynamics at the expense of incorporating numerous sparsely populated states. Furthermore, SPIB's ability to learn a low-dimensional continuous embedding of the underlying MSMs enhances the interpretation of dynamic pathways. With these benefits, we propose SPIB as an easy-to-implement methodology for end-to-end MSM construction.
Collapse
Affiliation(s)
- Dedi Wang
- Biophysics Program and Institute for Physical Science and Technology, University of Maryland, College Park, MD 20742, United States
| | - Yunrui Qiu
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, WI 53706, United States
- Data Science Institute, University of Wisconsin-Madison, Madison, WI, 53706, United States
| | - Eric R. Beyerle
- Institute for Physical Science and Technology, University of Maryland, College Park, MD 20742, United States
| | - Xuhui Huang
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, WI 53706, United States
- Data Science Institute, University of Wisconsin-Madison, Madison, WI, 53706, United States
| | - Pratyush Tiwary
- Department of Chemistry and Biochemistry and Institute for Physical Science and Technology, University of Maryland, College Park, MD 20742, United States
- University of Maryland Institute for Health Computing, Bethesda, MD 20852, United States
| |
Collapse
|
33
|
Wang D, Qiu Y, Beyerle ER, Huang X, Tiwary P. An Information Bottleneck Approach for Markov Model Construction. ARXIV 2024:arXiv:2404.02856v2. [PMID: 38947932 PMCID: PMC11213129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
Markov state models (MSMs) have proven valuable in studying dynamics of protein conformational changes via statistical analysis of molecular dynamics (MD) simulations. In MSMs, the complex configuration space is coarse-grained into conformational states, with dynamics modeled by a series of Markovian transitions among these states at discrete lag times. Constructing the Markovian model at a specific lag time necessitates defining states that circumvent significant internal energy barriers, enabling internal dynamics relaxation within the lag time. This process effectively coarse-grains time and space, integrating out rapid motions within metastable states. Thus, MSMs possess a multi-resolution nature, where the granularity of states can be adjusted according to the time-resolution, offering flexibility in capturing system dynamics. This work introduces a continuous embedding approach for molecular conformations using the state predictive information bottleneck (SPIB), a framework that unifies dimensionality reduction and state space partitioning via a continuous, machine learned basis set. Without explicit optimization of the VAMP-based scores, SPIB demonstrates state-of-the-art performance in identifying slow dynamical processes and constructing predictive multi-resolution Markovian models. Through applications to well-validated mini-proteins, SPIB showcases unique advantages compared to competing methods. It autonomously and self-consistently adjusts the number of metastable states based on specified minimal time resolution, eliminating the need for manual tuning. While maintaining efficacy in dynamical properties, SPIB excels in accurately distinguishing metastable states and capturing numerous well-populated macrostates. This contrasts with existing VAMP-based methods, which often emphasize slow dynamics at the expense of incorporating numerous sparsely populated states. Furthermore, SPIB's ability to learn a low-dimensional continuous embedding of the underlying MSMs enhances the interpretation of dynamic pathways. With these benefits, we propose SPIB as an easy-to-implement methodology for end-to-end MSMs construction.
Collapse
Affiliation(s)
- Dedi Wang
- Biophysics Program and Institute for Physical Science and Technology, University of Maryland, College Park, MD 20742, United States
| | - Yunrui Qiu
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, WI 53706, United States
- Data Science Institute, University of Wisconsin-Madison, Madison, WI, 53706, United States
| | - Eric R. Beyerle
- Institute for Physical Science and Technology, University of Maryland, College Park, MD 20742, United States
| | - Xuhui Huang
- Department of Chemistry, Theoretical Chemistry Institute, University of Wisconsin-Madison, Madison, WI 53706, United States
- Data Science Institute, University of Wisconsin-Madison, Madison, WI, 53706, United States
| | - Pratyush Tiwary
- Department of Chemistry and Biochemistry and Institute for Physical Science and Technology, University of Maryland, College Park, MD 20742, United States
- University of Maryland Institute for Health Computing, Bethesda, MD 20852, United States
| |
Collapse
|
34
|
Abstract
Molecular dynamics (MD) enables the study of physical systems with excellent spatiotemporal resolution but suffers from severe timescale limitations. To address this, enhanced sampling methods have been developed to improve the exploration of configurational space. However, implementing these methods is challenging and requires domain expertise. In recent years, integration of machine learning (ML) techniques into different domains has shown promise, prompting their adoption in enhanced sampling as well. Although ML is often employed in various fields primarily due to its data-driven nature, its integration with enhanced sampling is more natural with many common underlying synergies. This review explores the merging of ML and enhanced MD by presenting different shared viewpoints. It offers a comprehensive overview of this rapidly evolving field, which can be difficult to stay updated on. We highlight successful strategies such as dimensionality reduction, reinforcement learning, and flow-based methods. Finally, we discuss open problems at the exciting ML-enhanced MD interface.
Collapse
Affiliation(s)
- Shams Mehdi
- Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, USA;
- Biophysics Program, University of Maryland, College Park, Maryland, USA
| | - Zachary Smith
- Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, USA;
- Biophysics Program, University of Maryland, College Park, Maryland, USA
| | - Lukas Herron
- Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, USA;
- Biophysics Program, University of Maryland, College Park, Maryland, USA
| | - Ziyue Zou
- Department of Chemistry and Biochemistry, University of Maryland, College Park, Maryland, USA
| | - Pratyush Tiwary
- Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, USA;
- Department of Chemistry and Biochemistry, University of Maryland, College Park, Maryland, USA
| |
Collapse
|
35
|
Lee S, Wang D, Seeliger MA, Tiwary P. Calculating Protein-Ligand Residence Times Through State Predictive Information Bottleneck based Enhanced Sampling. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.16.589710. [PMID: 38659748 PMCID: PMC11042289 DOI: 10.1101/2024.04.16.589710] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
Understanding drug residence times in target proteins is key to improving drug efficacy and understanding target recognition in biochemistry. While drug residence time is just as important as binding affinity, atomic-level understanding of drug residence times through molecular dynamics (MD) simulations has been difficult primarily due to the extremely long timescales. Recent advances in rare event sampling have allowed us to reach these timescales, yet predicting protein-ligand residence times remains a significant challenge. Here we present a semi-automated protocol to calculate the ligand residence times across 12 orders of magnitudes of timescales. In our proposed framework, we integrate a deep learning-based method, the state predictive information bottleneck (SPIB), to learn an approximate reaction coordinate (RC) and use it to guide the enhanced sampling method metadynamics. We demonstrate the performance of our algorithm by applying it to six different protein-ligand complexes with available benchmark residence times, including the dissociation of the widely studied anti-cancer drug Imatinib (Gleevec) from both wild-type Abl kinase and drug-resistant mutants. We show how our protocol can recover quantitatively accurate residence times, potentially opening avenues for deeper insights into drug development possibilities and ligand recognition mechanisms.
Collapse
Affiliation(s)
- Suemin Lee
- Biophysics Program and Institute for Physical Science and Technology, University of Maryland, College Park 20742, USA
| | - Dedi Wang
- Biophysics Program and Institute for Physical Science and Technology, University of Maryland, College Park 20742, USA
| | - Markus A. Seeliger
- Department of Pharmacological Sciences, Stony Brook University, Stony Brook, NY 11794-8651, USA
| | - Pratyush Tiwary
- Biophysics Program and Institute for Physical Science and Technology, University of Maryland, College Park 20742, USA
- Department of Chemistry and Biochemistry and Institute for Physical Science and Technology, University of Maryland, College Park 20742, USA
- University of Maryland Institute for Health Computing, Rockville, United States
| |
Collapse
|
36
|
Smith Z, Strobel M, Vani BP, Tiwary P. Graph Attention Site Prediction (GrASP): Identifying Druggable Binding Sites Using Graph Neural Networks with Attention. J Chem Inf Model 2024; 64:2637-2644. [PMID: 38453912 PMCID: PMC11182664 DOI: 10.1021/acs.jcim.3c01698] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/09/2024]
Abstract
Identifying and discovering druggable protein binding sites is an important early step in computer-aided drug discovery, but it remains a difficult task where most campaigns rely on a priori knowledge of binding sites from experiments. Here, we present a binding site prediction method called Graph Attention Site Prediction (GrASP) and re-evaluate assumptions in nearly every step in the site prediction workflow from data set preparation to model evaluation. GrASP is able to achieve state-of-the-art performance at recovering binding sites in PDB structures while maintaining a high degree of precision which will minimize wasted computation in downstream tasks such as docking and free energy perturbation.
Collapse
Affiliation(s)
- Zachary Smith
- Institute for Physical Science and Technology, University of Maryland, College Park 20742, USA
- Biophysics Program, University of Maryland, College Park 20742, USA
| | - Michael Strobel
- Department of Computer Science, University of Maryland, College Park 20742, USA
| | - Bodhi P. Vani
- Institute for Physical Science and Technology, University of Maryland, College Park 20742, USA
| | - Pratyush Tiwary
- Institute for Physical Science and Technology, University of Maryland, College Park 20742, USA
- Department of Chemistry and Biochemistry, University of Maryland, College Park 20742, USA
| |
Collapse
|
37
|
Vani BP, Aranganathan A, Tiwary P. Exploring Kinase Asp-Phe-Gly (DFG) Loop Conformational Stability with AlphaFold2-RAVE. J Chem Inf Model 2024; 64:2789-2797. [PMID: 37981824 PMCID: PMC11001530 DOI: 10.1021/acs.jcim.3c01436] [Citation(s) in RCA: 22] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2023]
Abstract
Kinases compose one of the largest fractions of the human proteome, and their misfunction is implicated in many diseases, in particular, cancers. The ubiquitousness and structural similarities of kinases make specific and effective drug design difficult. In particular, conformational variability due to the evolutionarily conserved Asp-Phe-Gly (DFG) motif adopting in and out conformations and the relative stabilities thereof are key in structure-based drug design for ATP competitive drugs. These relative conformational stabilities are extremely sensitive to small changes in sequence and provide an important problem for sampling method development. Since the invention of AlphaFold2, the world of structure-based drug design has noticeably changed. In spite of it being limited to crystal-like structure prediction, several methods have also leveraged its underlying architecture to improve dynamics and enhanced sampling of conformational ensembles, including AlphaFold2-RAVE. Here, we extend AlphaFold2-RAVE and apply it to a set of kinases: the wild type DDR1 sequence and three mutants with single point mutations that are known to behave drastically differently. We show that AlphaFold2-RAVE is able to efficiently recover the changes in relative stability using transferable learned order parameters and potentials, thereby supplementing AlphaFold2 as a tool for exploration of Boltzmann-weighted protein conformations (Meller, A.; Bhakat, S.; Solieva, S.; Bowman, G. R. Accelerating Cryptic Pocket Discovery Using AlphaFold. J. Chem. Theory Comput. 2023, 19, 4355-4363).
Collapse
Affiliation(s)
- Bodhi P. Vani
- Institute for Physical Science and Technology, University of Maryland, College Park, Maryland 20742, USA
| | - Akashnathan Aranganathan
- Biophysics Program and Institute for Physical Science and Technology, University of Maryland, College Park 20742, USA
| | - Pratyush Tiwary
- Department of Chemistry and Biochemistry and Institute for Physical Science and Technology, University of Maryland, College Park 20742, USA
| |
Collapse
|
38
|
Müllender L, Rizzi A, Parrinello M, Carloni P, Mandelli D. Effective data-driven collective variables for free energy calculations from metadynamics of paths. PNAS NEXUS 2024; 3:pgae159. [PMID: 38665160 PMCID: PMC11044970 DOI: 10.1093/pnasnexus/pgae159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Accepted: 04/04/2024] [Indexed: 04/28/2024]
Abstract
A variety of enhanced sampling (ES) methods predict multidimensional free energy landscapes associated with biological and other molecular processes as a function of a few selected collective variables (CVs). The accuracy of these methods is crucially dependent on the ability of the chosen CVs to capture the relevant slow degrees of freedom of the system. For complex processes, finding such CVs is the real challenge. Machine learning (ML) CVs offer, in principle, a solution to handle this problem. However, these methods rely on the availability of high-quality datasets-ideally incorporating information about physical pathways and transition states-which are difficult to access, therefore greatly limiting their domain of application. Here, we demonstrate how these datasets can be generated by means of ES simulations in trajectory space via the metadynamics of paths algorithm. The approach is expected to provide a general and efficient way to generate efficient ML-based CVs for the fast prediction of free energy landscapes in ES simulations. We demonstrate our approach with two numerical examples, a 2D model potential and the isomerization of alanine dipeptide, using deep targeted discriminant analysis as our ML-based CV of choice.
Collapse
Affiliation(s)
- Lukas Müllender
- Department of Applied Physics, Science for Life Laboratory, KTH Royal Institute of Technology, SE-171 21 Solna, Sweden
- Computational Biomedicine, Institute of Advanced Simulations IAS-5/Institute for Neuroscience and Medicine INM-9, Forschungszentrum Jülich GmbH, 52428 Jülich, Germany
- Department of Physics, RWTH Aachen University, 52062 Aachen, Germany
| | - Andrea Rizzi
- Computational Biomedicine, Institute of Advanced Simulations IAS-5/Institute for Neuroscience and Medicine INM-9, Forschungszentrum Jülich GmbH, 52428 Jülich, Germany
- Atomistic Simulations, Italian Institute of Technology, 16163 Genova, Italy
| | - Michele Parrinello
- Atomistic Simulations, Italian Institute of Technology, 16163 Genova, Italy
| | - Paolo Carloni
- Computational Biomedicine, Institute of Advanced Simulations IAS-5/Institute for Neuroscience and Medicine INM-9, Forschungszentrum Jülich GmbH, 52428 Jülich, Germany
- Department of Physics, RWTH Aachen University, 52062 Aachen, Germany
- Universitätsklinikum, RWTH Aachen University, 52062 Aachen, Germany
| | - Davide Mandelli
- Computational Biomedicine, Institute of Advanced Simulations IAS-5/Institute for Neuroscience and Medicine INM-9, Forschungszentrum Jülich GmbH, 52428 Jülich, Germany
| |
Collapse
|
39
|
Monteiro da Silva G, Cui JY, Dalgarno DC, Lisi GP, Rubenstein BM. High-throughput prediction of protein conformational distributions with subsampled AlphaFold2. Nat Commun 2024; 15:2464. [PMID: 38538622 PMCID: PMC10973385 DOI: 10.1038/s41467-024-46715-9] [Citation(s) in RCA: 56] [Impact Index Per Article: 56.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Accepted: 02/28/2024] [Indexed: 04/12/2024] Open
Abstract
This paper presents an innovative approach for predicting the relative populations of protein conformations using AlphaFold 2, an AI-powered method that has revolutionized biology by enabling the accurate prediction of protein structures. While AlphaFold 2 has shown exceptional accuracy and speed, it is designed to predict proteins' ground state conformations and is limited in its ability to predict conformational landscapes. Here, we demonstrate how AlphaFold 2 can directly predict the relative populations of different protein conformations by subsampling multiple sequence alignments. We tested our method against nuclear magnetic resonance experiments on two proteins with drastically different amounts of available sequence data, Abl1 kinase and the granulocyte-macrophage colony-stimulating factor, and predicted changes in their relative state populations with more than 80% accuracy. Our subsampling approach worked best when used to qualitatively predict the effects of mutations or evolution on the conformational landscape and well-populated states of proteins. It thus offers a fast and cost-effective way to predict the relative populations of protein conformations at even single-point mutation resolution, making it a useful tool for pharmacology, analysis of experimental results, and predicting evolution.
Collapse
Affiliation(s)
| | - Jennifer Y Cui
- Brown University Department of Molecular and Cell Biology and Biochemistry, Providence, RI, USA
| | | | - George P Lisi
- Brown University Department of Molecular and Cell Biology and Biochemistry, Providence, RI, USA
- Brown University Department of Chemistry, Providence, RI, USA
| | - Brenda M Rubenstein
- Brown University Department of Molecular and Cell Biology and Biochemistry, Providence, RI, USA.
- Brown University Department of Chemistry, Providence, RI, USA.
| |
Collapse
|
40
|
Lotthammer JM, Ginell GM, Griffith D, Emenecker RJ, Holehouse AS. Direct prediction of intrinsically disordered protein conformational properties from sequence. Nat Methods 2024; 21:465-476. [PMID: 38297184 PMCID: PMC10927563 DOI: 10.1038/s41592-023-02159-5] [Citation(s) in RCA: 66] [Impact Index Per Article: 66.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2023] [Accepted: 12/20/2023] [Indexed: 02/02/2024]
Abstract
Intrinsically disordered regions (IDRs) are ubiquitous across all domains of life and play a range of functional roles. While folded domains are generally well described by a stable three-dimensional structure, IDRs exist in a collection of interconverting states known as an ensemble. This structural heterogeneity means that IDRs are largely absent from the Protein Data Bank, contributing to a lack of computational approaches to predict ensemble conformational properties from sequence. Here we combine rational sequence design, large-scale molecular simulations and deep learning to develop ALBATROSS, a deep-learning model for predicting ensemble dimensions of IDRs, including the radius of gyration, end-to-end distance, polymer-scaling exponent and ensemble asphericity, directly from sequences at a proteome-wide scale. ALBATROSS is lightweight, easy to use and accessible as both a locally installable software package and a point-and-click-style interface via Google Colab notebooks. We first demonstrate the applicability of our predictors by examining the generalizability of sequence-ensemble relationships in IDRs. Then, we leverage the high-throughput nature of ALBATROSS to characterize the sequence-specific biophysical behavior of IDRs within and between proteomes.
Collapse
Affiliation(s)
- Jeffrey M Lotthammer
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, MO, USA
- Center for Biomolecular Condensates, Washington University in St. Louis, St. Louis, MO, USA
| | - Garrett M Ginell
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, MO, USA
- Center for Biomolecular Condensates, Washington University in St. Louis, St. Louis, MO, USA
| | - Daniel Griffith
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, MO, USA
- Center for Biomolecular Condensates, Washington University in St. Louis, St. Louis, MO, USA
| | - Ryan J Emenecker
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, MO, USA
- Center for Biomolecular Condensates, Washington University in St. Louis, St. Louis, MO, USA
| | - Alex S Holehouse
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, MO, USA.
- Center for Biomolecular Condensates, Washington University in St. Louis, St. Louis, MO, USA.
| |
Collapse
|
41
|
Meller A, Kelly D, Smith LG, Bowman GR. Toward physics-based precision medicine: Exploiting protein dynamics to design new therapeutics and interpret variants. Protein Sci 2024; 33:e4902. [PMID: 38358129 PMCID: PMC10868452 DOI: 10.1002/pro.4902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 12/01/2023] [Accepted: 01/04/2024] [Indexed: 02/16/2024]
Abstract
The goal of precision medicine is to utilize our knowledge of the molecular causes of disease to better diagnose and treat patients. However, there is a substantial mismatch between the small number of food and drug administration (FDA)-approved drugs and annotated coding variants compared to the needs of precision medicine. This review introduces the concept of physics-based precision medicine, a scalable framework that promises to improve our understanding of sequence-function relationships and accelerate drug discovery. We show that accounting for the ensemble of structures a protein adopts in solution with computer simulations overcomes many of the limitations imposed by assuming a single protein structure. We highlight studies of protein dynamics and recent methods for the analysis of structural ensembles. These studies demonstrate that differences in conformational distributions predict functional differences within protein families and between variants. Thanks to new computational tools that are providing unprecedented access to protein structural ensembles, this insight may enable accurate predictions of variant pathogenicity for entire libraries of variants. We further show that explicitly accounting for protein ensembles, with methods like alchemical free energy calculations or docking to Markov state models, can uncover novel lead compounds. To conclude, we demonstrate that cryptic pockets, or cavities absent in experimental structures, provide an avenue to target proteins that are currently considered undruggable. Taken together, our review provides a roadmap for the field of protein science to accelerate precision medicine.
Collapse
Affiliation(s)
- Artur Meller
- Department of Biochemistry and Molecular BiophysicsWashington University in St. LouisSt. LouisMissouriUSA
- Medical Scientist Training ProgramWashington University in St. LouisSt. LouisMissouriUSA
- Departments of Biochemistry & Biophysics and BioengineeringUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Devin Kelly
- Departments of Biochemistry & Biophysics and BioengineeringUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Louis G. Smith
- Departments of Biochemistry & Biophysics and BioengineeringUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Gregory R. Bowman
- Departments of Biochemistry & Biophysics and BioengineeringUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| |
Collapse
|
42
|
Brown BP, Stein RA, Meiler J, Mchaourab HS. Approximating Projections of Conformational Boltzmann Distributions with AlphaFold2 Predictions: Opportunities and Limitations. J Chem Theory Comput 2024; 20:1434-1447. [PMID: 38215214 PMCID: PMC10867840 DOI: 10.1021/acs.jctc.3c01081] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 12/13/2023] [Accepted: 12/13/2023] [Indexed: 01/14/2024]
Abstract
Protein thermodynamics is intimately tied to biological function and can enable processes such as signal transduction, enzyme catalysis, and molecular recognition. The relative free energies of conformations that contribute to these functional equilibria evolved for the physiology of the organism. Despite the importance of these equilibria for understanding biological function and developing treatments for disease, computational and experimental methods capable of quantifying the energetic determinants of these equilibria are limited to systems of modest size. Recently, it has been demonstrated that the artificial intelligence system AlphaFold2 can be manipulated to produce structurally valid protein conformational ensembles. Here, we extend these studies and explore the extent to which AlphaFold2 contact distance distributions can approximate projections of the conformational Boltzmann distributions. For this purpose, we examine the joint probability distributions of inter-residue contact distances along functionally relevant collective variables of several protein systems. Our studies suggest that AlphaFold2 normalized contact distance distributions can correlate with conformation probabilities obtained with other methods but that they suffer from peak broadening. We also find that the AlphaFold2 contact distance distributions can be sensitive to point mutations. Overall, we anticipate that our findings will be valuable as the community seeks to model the thermodynamics of conformational changes in large biomolecular systems.
Collapse
Affiliation(s)
- Benjamin P. Brown
- Department
of Chemistry, Vanderbilt University, Nashville, Tennessee 37232, United States
- Center
for Structural Biology, Vanderbilt University, Nashville, Tennessee 37232, United States
- Center
for Applied AI in Protein Dynamics, Vanderbilt
University, Nashville, Tennessee 37232, United States
| | - Richard A. Stein
- Center
for Applied AI in Protein Dynamics, Vanderbilt
University, Nashville, Tennessee 37232, United States
- Department
of Molecular Physiology and Biophysics, Vanderbilt University School of Medicine, Nashville, Tennessee 37232, United States
| | - Jens Meiler
- Department
of Chemistry, Vanderbilt University, Nashville, Tennessee 37232, United States
- Center
for Structural Biology, Vanderbilt University, Nashville, Tennessee 37232, United States
- Center
for Applied AI in Protein Dynamics, Vanderbilt
University, Nashville, Tennessee 37232, United States
- Institute
for Drug Discovery, Leipzig University Medical
School, Leipzig, SAC 04103, Germany
| | - Hassane S. Mchaourab
- Center
for Structural Biology, Vanderbilt University, Nashville, Tennessee 37232, United States
- Center
for Applied AI in Protein Dynamics, Vanderbilt
University, Nashville, Tennessee 37232, United States
- Department
of Molecular Physiology and Biophysics, Vanderbilt University School of Medicine, Nashville, Tennessee 37232, United States
| |
Collapse
|
43
|
Miller EB, Hwang H, Shelley M, Placzek A, Rodrigues JPGLM, Suto RK, Wang L, Akinsanya K, Abel R. Enabling structure-based drug discovery utilizing predicted models. Cell 2024; 187:521-525. [PMID: 38306979 DOI: 10.1016/j.cell.2023.12.034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 12/28/2023] [Accepted: 12/29/2023] [Indexed: 02/04/2024]
Abstract
High-quality predicted structures enable structure-based approaches to an expanding number of drug discovery programs. We propose that by utilizing free energy perturbation (FEP), predicted structures can be confidently employed to achieve drug design goals. We use structure-based modeling of hERG inhibition to illustrate this value of FEP.
Collapse
Affiliation(s)
- Edward B Miller
- Schrödinger New York, 1540 Broadway, 24th Floor, New York, NY 10036, USA.
| | - Howook Hwang
- Schrödinger New York, 1540 Broadway, 24th Floor, New York, NY 10036, USA
| | - Mee Shelley
- Schrödinger Portland, 101 SW Main Street, Suite 1300, Portland, OR 97204, USA
| | - Andrew Placzek
- Schrödinger Portland, 101 SW Main Street, Suite 1300, Portland, OR 97204, USA
| | | | - Robert K Suto
- Schrödinger Framingham, 200 Staples Drive, Suite 210, Framingham, MA 01702, USA
| | - Lingle Wang
- Schrödinger New York, 1540 Broadway, 24th Floor, New York, NY 10036, USA
| | - Karen Akinsanya
- Schrödinger New York, 1540 Broadway, 24th Floor, New York, NY 10036, USA
| | - Robert Abel
- Schrödinger New York, 1540 Broadway, 24th Floor, New York, NY 10036, USA
| |
Collapse
|
44
|
Hoff SE, Zinke M, Izadi-Pruneyre N, Bonomi M. Bonds and bytes: The odyssey of structural biology. Curr Opin Struct Biol 2024; 84:102746. [PMID: 38101027 DOI: 10.1016/j.sbi.2023.102746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 11/20/2023] [Accepted: 11/24/2023] [Indexed: 12/17/2023]
Abstract
Characterizing structural and dynamic properties of proteins and large macromolecular assemblies is crucial to understand the molecular mechanisms underlying biological functions. In the field of structural biology, no single method comprehensively reveals the behavior of biological systems across various spatiotemporal scales. Instead, we have a versatile toolkit of techniques, each contributing a piece to the overall puzzle. Integrative structural biology combines different techniques to create accurate and precise multi-scale models that expand our understanding of complex biological systems. This review outlines recent advancements in computational and experimental methods in structural biology, with special focus on recent Artificial Intelligence techniques, emphasizes integrative approaches that combine different types of data for precise spatiotemporal modeling, and provides an outlook into future directions of this field.
Collapse
Affiliation(s)
- S E Hoff
- Institut Pasteur, Université Paris Cité, CNRS UMR 3528, Structural Bioinformatics Unit, Paris, France
| | - M Zinke
- Institut Pasteur, Université Paris Cité, CNRS UMR 3528, Bacterial Transmembrane Systems Unit, Paris, France. https://twitter.com/ZinkeMaximilian
| | - N Izadi-Pruneyre
- Institut Pasteur, Université Paris Cité, CNRS UMR 3528, Bacterial Transmembrane Systems Unit, Paris, France.
| | - M Bonomi
- Institut Pasteur, Université Paris Cité, CNRS UMR 3528, Structural Bioinformatics Unit, Paris, France.
| |
Collapse
|
45
|
Kobayashi H. Potential for artificial intelligence in medicine and its application to male infertility. Reprod Med Biol 2024; 23:e12590. [PMID: 38948339 PMCID: PMC11211808 DOI: 10.1002/rmb2.12590] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Revised: 05/15/2024] [Accepted: 05/27/2024] [Indexed: 07/02/2024] Open
Abstract
Background The third AI boom, which began in 2010, has been characterized by the rapid evolution and diversification of AI and marked by the development of key technologies such as machine learning and deep learning. AI is revolutionizing the medical field, enhancing diagnostic accuracy, surgical outcomes, and drug production. Methods This review includes explanations of digital transformation (DX), the history of AI, the difference between machine learning and deep learning, recent AI topics, medical AI, and AI research in male infertility. Main Findings Results In research on male infertility, I established an AI-based prediction model for Johnsen scores and an AI predictive model for sperm retrieval in non-obstructive azoospermia, both by no-code AI. Conclusions AI is making constant progress. It would be ideal for physicians to acquire a knowledge of AI and even create AI models. No-code AI tools have revolutionized model creation, allowing individuals to independently handle data preparation and model development. Previously a team effort, this shift empowers users to craft customized AI models solo, offering greater flexibility and control in the model creation process.
Collapse
|
46
|
Kleiman DE, Nadeem H, Shukla D. Adaptive Sampling Methods for Molecular Dynamics in the Era of Machine Learning. J Phys Chem B 2023; 127:10669-10681. [PMID: 38081185 DOI: 10.1021/acs.jpcb.3c04843] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2023]
Abstract
Molecular dynamics (MD) simulations are fundamental computational tools for the study of proteins and their free energy landscapes. However, sampling protein conformational changes through MD simulations is challenging due to the relatively long time scales of these processes. Many enhanced sampling approaches have emerged to tackle this problem, including biased sampling and path-sampling methods. In this Perspective, we focus on adaptive sampling algorithms. These techniques differ from other approaches because the thermodynamic ensemble is preserved and the sampling is enhanced solely by restarting MD trajectories at particularly chosen seeds rather than introducing biasing forces. We begin our treatment with an overview of theoretically transparent methods, where we discuss principles and guidelines for adaptive sampling. Then, we present a brief summary of select methods that have been applied to realistic systems in the past. Finally, we discuss recent advances in adaptive sampling methodology powered by deep learning techniques, as well as their shortcomings.
Collapse
Affiliation(s)
- Diego E Kleiman
- Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Hassan Nadeem
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Diwakar Shukla
- Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
- Department of Plant Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| |
Collapse
|
47
|
Day EC, Chittari SS, Bogen MP, Knight AS. Navigating the Expansive Landscapes of Soft Materials: A User Guide for High-Throughput Workflows. ACS POLYMERS AU 2023; 3:406-427. [PMID: 38107416 PMCID: PMC10722570 DOI: 10.1021/acspolymersau.3c00025] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 11/02/2023] [Accepted: 11/07/2023] [Indexed: 12/19/2023]
Abstract
Synthetic polymers are highly customizable with tailored structures and functionality, yet this versatility generates challenges in the design of advanced materials due to the size and complexity of the design space. Thus, exploration and optimization of polymer properties using combinatorial libraries has become increasingly common, which requires careful selection of synthetic strategies, characterization techniques, and rapid processing workflows to obtain fundamental principles from these large data sets. Herein, we provide guidelines for strategic design of macromolecule libraries and workflows to efficiently navigate these high-dimensional design spaces. We describe synthetic methods for multiple library sizes and structures as well as characterization methods to rapidly generate data sets, including tools that can be adapted from biological workflows. We further highlight relevant insights from statistics and machine learning to aid in data featurization, representation, and analysis. This Perspective acts as a "user guide" for researchers interested in leveraging high-throughput screening toward the design of multifunctional polymers and predictive modeling of structure-property relationships in soft materials.
Collapse
Affiliation(s)
| | | | - Matthew P. Bogen
- Department of Chemistry, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Abigail S. Knight
- Department of Chemistry, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| |
Collapse
|
48
|
Ramelot TA, Tejero R, Montelione GT. Representing structures of the multiple conformational states of proteins. Curr Opin Struct Biol 2023; 83:102703. [PMID: 37776602 PMCID: PMC10841472 DOI: 10.1016/j.sbi.2023.102703] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 08/18/2023] [Accepted: 08/23/2023] [Indexed: 10/02/2023]
Abstract
Biomolecules exhibit dynamic behavior that single-state models of their structures cannot fully capture. We review some recent advances for investigating multiple conformations of biomolecules, including experimental methods, molecular dynamics simulations, and machine learning. We also address the challenges associated with representing single- and multiple-state models in data archives, with a particular focus on NMR structures. Establishing standardized representations and annotations will facilitate effective communication and understanding of these complex models to the broader scientific community.
Collapse
Affiliation(s)
- Theresa A Ramelot
- Dept of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY, 12180, USA.
| | - Roberto Tejero
- Dept of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY, 12180, USA
| | - Gaetano T Montelione
- Dept of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY, 12180, USA.
| |
Collapse
|
49
|
Ahmed M, Maldonado AM, Durrant JD. From Byte to Bench to Bedside: Molecular Dynamics Simulations and Drug Discovery. ARXIV 2023:arXiv:2311.16946v1. [PMID: 38076508 PMCID: PMC10705576] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
Molecular dynamics (MD) simulations and computer-aided drug design (CADD) have advanced substantially over the past two decades, thanks to continuous computer hardware and software improvements. Given these advancements, MD simulations are poised to become even more powerful tools for investigating the dynamic interactions between potential small-molecule drugs and their target proteins, with significant implications for pharmacological research.
Collapse
Affiliation(s)
- Mayar Ahmed
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Alex M. Maldonado
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Jacob D. Durrant
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| |
Collapse
|
50
|
Sala D, Engelberger F, Mchaourab HS, Meiler J. Modeling conformational states of proteins with AlphaFold. Curr Opin Struct Biol 2023; 81:102645. [PMID: 37392556 DOI: 10.1016/j.sbi.2023.102645] [Citation(s) in RCA: 76] [Impact Index Per Article: 38.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 05/16/2023] [Accepted: 06/01/2023] [Indexed: 07/03/2023]
Abstract
Many proteins exert their function by switching among different structures. Knowing the conformational ensembles affiliated with these states is critical to elucidate key mechanistic aspects that govern protein function. While experimental determination efforts are still bottlenecked by cost, time, and technical challenges, the machine-learning technology AlphaFold showed near experimental accuracy in predicting the three-dimensional structure of monomeric proteins. However, an AlphaFold ensemble of models usually represents a single conformational state with minimal structural heterogeneity. Consequently, several pipelines have been proposed to either expand the structural breadth of an ensemble or bias the prediction toward a desired conformational state. Here, we analyze how those pipelines work, what they can and cannot predict, and future directions.
Collapse
Affiliation(s)
- D Sala
- Institute of Drug Discovery, Faculty of Medicine, University of Leipzig, 04103 Leipzig, Germany. https://twitter.com/sala_davide
| | - F Engelberger
- Institute of Drug Discovery, Faculty of Medicine, University of Leipzig, 04103 Leipzig, Germany. https://twitter.com/fengel97
| | - H S Mchaourab
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN, USA. https://twitter.com/Mchaourablab
| | - J Meiler
- Institute of Drug Discovery, Faculty of Medicine, University of Leipzig, 04103 Leipzig, Germany; Center for Structural Biology, Vanderbilt University, Nashville, TN 37240, USA; Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI), Dresden/Leipzig, Germany.
| |
Collapse
|