1
|
Wróblewski K, Zalewski M, Kuriata A, Kmiecik S. CABS-flex 3.0: an online tool for simulating protein structural flexibility and peptide modeling. Nucleic Acids Res 2025:gkaf412. [PMID: 40366023 DOI: 10.1093/nar/gkaf412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2025] [Revised: 04/22/2025] [Accepted: 05/02/2025] [Indexed: 05/15/2025] Open
Abstract
Simulating protein structure flexibility using classical methods is computationally demanding, especially for large proteins. To address this challenge, we have been developing the CABS-flex method, which enables fast simulations of protein structural flexibility by combining a coarse-grained simulation approach with all-atom detail. Previously available as the CABS-flex 2.0 web server, the method has now undergone a major upgrade with the release of CABS-flex 3.0. Key improvements include the introduction of intuitive flexibility modes that simplify the control of distance restraints and allow users to reflect known or expected dynamic regions; improved all-atom reconstruction for higher-quality model generation; a new feature for de novo peptide structure prediction, supporting both linear and cyclic peptides along with their conformational flexibility; and new tools for result analysis and visualization, facilitating deeper insights into structural flexibility. Additionally, AlphaFold pLDDT-derived restraints can be used as optional input for guiding simulations. The method accepts input as either a PDB/mmCIF structure or a sequence (for peptide modeling). Advanced options allow users to incorporate experimental or computational restraints. The CABS-flex 3.0 web server is available at https://lcbio.pl/cabsflex3. This website is free and open to all users, with no login requirement.
Collapse
Affiliation(s)
- Karol Wróblewski
- University of Warsaw, Biological and Chemical Research Centre, Faculty of Chemistry, 02-089 Warsaw, Poland
| | - Mateusz Zalewski
- University of Warsaw, Biological and Chemical Research Centre, Faculty of Chemistry, 02-089 Warsaw, Poland
| | - Aleksander Kuriata
- University of Warsaw, Biological and Chemical Research Centre, Faculty of Chemistry, 02-089 Warsaw, Poland
| | - Sebastian Kmiecik
- University of Warsaw, Biological and Chemical Research Centre, Faculty of Chemistry, 02-089 Warsaw, Poland
| |
Collapse
|
2
|
Wozniak S, Janson G, Feig M. Accurate Predictions of Molecular Properties of Proteins via Graph Neural Networks and Transfer Learning. J Chem Theory Comput 2025; 21:4830-4845. [PMID: 40270304 PMCID: PMC12080100 DOI: 10.1021/acs.jctc.4c01682] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2024] [Revised: 03/14/2025] [Accepted: 04/15/2025] [Indexed: 04/25/2025]
Abstract
Machine learning has emerged as a promising approach for predicting molecular properties of proteins, as it addresses limitations of experimental and traditional computational methods. Here, we introduce GSnet, a graph neural network (GNN) trained to predict physicochemical and geometric properties including solvation-free energies, diffusion constants, and hydrodynamic radii, based on three-dimensional protein structures. By leveraging transfer learning, pretrained GSnet embeddings were adapted to predict solvent-accessible surface area (SASA) and residue-specific pKa values, achieving high accuracy and generalizability. Notably, GSnet outperformed existing protein embeddings for SASA prediction and a locally charge-aware variant, aLCnet, approached the accuracy of simulation-based and empirical methods for pKa prediction. Our GNN framework demonstrated robustness across diverse data sets, including intrinsically disordered peptides, and scalability for high-throughput applications. These results highlight the potential of GNN-based embeddings and transfer learning to advance protein structure analysis, providing a foundation for integrating predictive models into proteome-wide studies and structural biology pipelines.
Collapse
Affiliation(s)
- Spencer Wozniak
- Department of Biochemistry
and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
| | - Giacomo Janson
- Department of Biochemistry
and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
| | - Michael Feig
- Department of Biochemistry
and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
3
|
Pfaendner C, Korn V, Gogoi P, Unger B, Pluhackova K. ART-SM: Boosting Fragment-Based Backmapping by Machine Learning. J Chem Theory Comput 2025; 21:4151-4166. [PMID: 40184371 DOI: 10.1021/acs.jctc.5c00189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/06/2025]
Abstract
In sequential multiscale molecular dynamics simulations, which advantageously combine the increased sampling and dynamics at coarse-grained resolution with the higher accuracy of atomistic simulations, the resolution is altered over time. While coarse-graining is straightforward once the mapping between atomistic and coarse-grained resolution is defined, reintroducing the atomistic details is still a nontrivial process called backmapping. Here, we present ART-SM, a fragment-based backmapping framework that learns from atomistic simulation data to seamlessly switch from coarse-grained to atomistic resolution. ART-SM requires minimal user input and goes beyond state-of-the-art fragment-based approaches by selecting from multiple conformations per fragment via machine learning to simultaneously reflect the coarse-grained structure and the Boltzmann distribution. Additionally, we introduce a novel refinement step to connect individual fragments by optimizing specific bonds, angles, and dihedral angles in the backmapping process. We demonstrate that our algorithm accurately restores the atomistic bond length, angle, and dihedral angle distributions for various small and linear molecules from Martini coarse-grained beads and that the resulting high-resolution structures are representative of the input coarse-grained conformations. Moreover, the reconstruction of the TIP3P water model is fast and robust, and we demonstrate that ART-SM can be applied to larger linear molecules as well. To illustrate the efficiency of the local and autoregressive approach of ART-SM, we simulated a large realistic system containing the surfactants TAPB and SDS in solution using the Martini3 force field. The self-assembled micelles of various shapes were backmapped with ART-SM after training on only short atomistic simulations of a single water-solvated SDS or TAPB molecule. Together, these results indicate the potential for the method to be extended to more complex molecules such as lipids, proteins, macromolecules, and materials in the future.
Collapse
Affiliation(s)
- Christian Pfaendner
- Stuttgart Center for Simulation Science, Cluster of Excellence EXC 2075, University of Stuttgart, Universitätsstr. 32, 70569 Stuttgart, Germany
- Artificial Intelligence Software Academy, University of Stuttgart, 70569 Stuttgart, Germany
| | - Viktoria Korn
- Stuttgart Center for Simulation Science, Cluster of Excellence EXC 2075, University of Stuttgart, Universitätsstr. 32, 70569 Stuttgart, Germany
| | - Pritom Gogoi
- Stuttgart Center for Simulation Science, Cluster of Excellence EXC 2075, University of Stuttgart, Universitätsstr. 32, 70569 Stuttgart, Germany
| | - Benjamin Unger
- Stuttgart Center for Simulation Science, Cluster of Excellence EXC 2075, University of Stuttgart, Universitätsstr. 32, 70569 Stuttgart, Germany
| | - Kristyna Pluhackova
- Stuttgart Center for Simulation Science, Cluster of Excellence EXC 2075, University of Stuttgart, Universitätsstr. 32, 70569 Stuttgart, Germany
| |
Collapse
|
4
|
Janson G, Jussupow A, Feig M. Deep generative modeling of temperature-dependent structural ensembles of proteins. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.03.09.642148. [PMID: 40161645 PMCID: PMC11952339 DOI: 10.1101/2025.03.09.642148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/02/2025]
Abstract
Deep learning has revolutionized protein structure prediction, but capturing conformational ensembles and structural variability remains an open challenge. While molecular dynamics (MD) is the foundation method for simulating biomolecular dynamics, it is computationally expensive. Recently, deep learning models trained on MD have made progress in generating structural ensembles at reduced cost. However, they remain limited in modeling atomistic details and, crucially, incorporating the effect of environmental factors. Here, we present aSAM (atomistic structural autoencoder model), a latent diffusion model trained on MD to generate heavy atom protein ensembles. Unlike most methods, aSAM models atoms in a latent space, greatly facilitating accurate sampling of side chain and backbone torsion angle distributions. Additionally, we extended aSAM into the first reported transferable generator conditioned on temperature, named aSAMt. Trained on the large and open mdCATH dataset, aSAMt captures temperature-dependent ensemble properties and demonstrates generalization beyond training temperatures. By comparing aSAMt ensembles to long MD simulations of fast folding proteins, we find that high-temperature training enhances the ability of deep generators to explore energy landscapes. Finally, we also show that our MD-based aSAMt can already capture experimentally observed thermal behavior of proteins. Our work is a step towards generalizable ensemble generation to complement physics-based approaches.
Collapse
Affiliation(s)
- Giacomo Janson
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA
| | - Alexander Jussupow
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA
| | - Michael Feig
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
5
|
von Bülow S, Tesei G, Lindorff-Larsen K. Machine learning methods to study sequence-ensemble-function relationships in disordered proteins. Curr Opin Struct Biol 2025; 92:103028. [PMID: 40081192 DOI: 10.1016/j.sbi.2025.103028] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2024] [Revised: 02/13/2025] [Accepted: 02/14/2025] [Indexed: 03/15/2025]
Abstract
Recent years have seen tremendous developments in the use of machine learning models to link amino-acid sequence, structure, and function of folded proteins. These methods are, however, rarely applicable to the wide range of proteins and sequences that comprise intrinsically disordered regions. We here review developments in the study of sequence-ensemble-function relationships of disordered proteins that exploit or are used to train machine learning models. These include methods for generating conformational ensembles and designing new sequences, and for linking sequences to biophysical properties and biological functions. We highlight how these developments are built on a tight integration between experiment, theory and simulations, and account for evolutionary constraints, which operate on sequences of disordered regions differently than on those of folded domains.
Collapse
Affiliation(s)
- Sören von Bülow
- Structural Biology and NMR Laboratory & the Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, 2200, Copenhagen, Denmark
| | - Giulio Tesei
- Structural Biology and NMR Laboratory & the Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, 2200, Copenhagen, Denmark
| | - Kresten Lindorff-Larsen
- Structural Biology and NMR Laboratory & the Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, 2200, Copenhagen, Denmark.
| |
Collapse
|
6
|
Jussupow A, Bartley D, Lapidus LJ, Feig M. COCOMO2: A Coarse-Grained Model for Interacting Folded and Disordered Proteins. J Chem Theory Comput 2025; 21:2095-2107. [PMID: 39908323 PMCID: PMC11866933 DOI: 10.1021/acs.jctc.4c01460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2024] [Revised: 01/24/2025] [Accepted: 01/31/2025] [Indexed: 02/07/2025]
Abstract
Biomolecular interactions are essential in many biological processes, including complex formation and phase separation processes. Coarse-grained computational models are especially valuable for studying such processes via simulation. Here, we present COCOMO2, an updated residue-based coarse-grained model that extends its applicability from intrinsically disordered peptides to folded proteins. This is accomplished with the introduction of a surface exposure scaling factor, which adjusts interaction strengths based on solvent accessibility, to enable the more realistic modeling of interactions involving folded domains without additional computational costs. COCOMO2 was parametrized directly with solubility and phase separation data to improve its performance on predicting concentration-dependent phase separation for a broader range of biomolecular systems compared to the original version. COCOMO2 enables new applications including the study of condensates that involve IDPs together with folded domains and the study of complex assembly processes. COCOMO2 also provides an expanded foundation for the development of multiscale approaches for modeling biomolecular interactions that span from residue-level to atomistic resolution.
Collapse
Affiliation(s)
- Alexander Jussupow
- Department
of Biochemistry and Molecular Biology, Michigan
State University, East Lansing, Michigan 48824, United States
| | - Divya Bartley
- Department
of Biochemistry and Molecular Biology, Michigan
State University, East Lansing, Michigan 48824, United States
| | - Lisa J. Lapidus
- Department
of Physics and Astronomy, Michigan State
University, East Lansing, Michigan 48824, United States
| | - Michael Feig
- Department
of Biochemistry and Molecular Biology, Michigan
State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
7
|
Co N, Czaplewski C, Lubecka EA, Liwo A. Implementation of Time-Averaged Restraints with UNRES Coarse-Grained Model of Polypeptide Chains. J Chem Theory Comput 2025; 21:1476-1493. [PMID: 39851064 PMCID: PMC11823420 DOI: 10.1021/acs.jctc.4c01504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2024] [Revised: 12/27/2024] [Accepted: 01/14/2025] [Indexed: 01/25/2025]
Abstract
Time-averaged restraints from nuclear magnetic resonance (NMR) measurements have been implemented in the UNRES coarse-grained model of polypeptide chains in order to develop a tool for data-assisted modeling of the conformational ensembles of multistate proteins, intrinsically disordered proteins (IDPs) and proteins with intrinsically disordered regions (IDRs), many of which are essential in cell biology. A numerically stable variant of molecular dynamics with time-averaged restraints has been introduced, in which the total energy is conserved in sections of a trajectory in microcanonical runs, the bath temperature is maintained in canonical runs, and the time-average-restraint-force components are scaled up with the length of the memory window so that the restraints affect the simulated structures. The new approach restores the conformational ensembles used to generate ensemble-averaged distances, as demonstrated with synthetic restraints. The approach results in a better fitting of the ensemble-averaged interproton distances to those determined experimentally for multistate proteins and proteins with intrinsically disordered regions, which puts it at an advantage over all-atom approaches with regard to the determination of the conformational ensembles of proteins with diffuse structures, owing to a faster and more robust conformational search.
Collapse
Affiliation(s)
- Nguyen
Truong Co
- Faculty
of Chemistry, University of Gdańsk,
Fahrenheit Union of Universities, ul. Wita Stwosza 63, 80-308 Gdańsk, Poland
| | - Cezary Czaplewski
- Faculty
of Chemistry, University of Gdańsk,
Fahrenheit Union of Universities, ul. Wita Stwosza 63, 80-308 Gdańsk, Poland
| | - Emilia A. Lubecka
- Faculty
of Electronics, Telecommunications and Informatics, Gdańsk University of Technology, Fahrenheit Union of Universities
in Gdańsk, ul.
G. Narutowicza 11/12, 80-233 Gdańsk, Poland
| | - Adam Liwo
- Faculty
of Chemistry, University of Gdańsk,
Fahrenheit Union of Universities, ul. Wita Stwosza 63, 80-308 Gdańsk, Poland
| |
Collapse
|
8
|
Jones MS, Khanna S, Ferguson AL. FlowBack: A Generalized Flow-Matching Approach for Biomolecular Backmapping. J Chem Inf Model 2025; 65:672-692. [PMID: 39772562 DOI: 10.1021/acs.jcim.4c02046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2025]
Abstract
Coarse-grained models have become ubiquitous in biomolecular modeling tasks aimed at studying slow dynamical processes such as protein folding and DNA hybridization. These models can considerably accelerate sampling but it remains challenging to accurately and efficiently restore all-atom detail to the coarse-grained trajectory, which can be vital for detailed understanding of molecular mechanisms and calculation of observables contingent on all-atom coordinates. In this work, we introduce FlowBack as a deep generative model employing a flow-matching objective to map samples from a coarse-grained prior distribution to an all-atom data distribution. We construct our prior distribution to be agnostic to the coarse-grained map and molecular type. A protein-specific model trained on ∼65k structures from the Protein Data Bank achieves state-of-the-art performance on structural metrics compared to previous generative and rules-based approaches in applications to static PDB structures, all-atom simulations of fast-folding proteins, and coarse-grained trajectories generated by a machine-learned force field. A DNA-protein model trained on ∼1.5k DNA-protein complexes achieves excellent reconstruction and generative capabilities on static DNA-protein complexes from the Protein Data Bank as well as on out-of-distribution coarse-grained dynamical simulations of DNA-protein complexation. FlowBack offers an accurate, efficient, and easy-to-use tool to recover all-atom structures from coarse-grained molecular simulations with higher robustness and fewer steric clashes than previous approaches. We make FlowBack freely available to the community as an open source Python package.
Collapse
Affiliation(s)
- Michael S Jones
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Smayan Khanna
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Andrew L Ferguson
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
- Department of Chemistry, University of Chicago, Chicago, Illinois 60637, United States
| |
Collapse
|
9
|
Leśniewski M, Iłowska E, Sawicka J, Li Z, Tang C, Liwo A. Coarse-Grained Simulation Study of the Association of Selected Dipeptides. J Phys Chem B 2024; 128:12403-12415. [PMID: 39631776 DOI: 10.1021/acs.jpcb.4c06305] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/07/2024]
Abstract
The association of 55 dipeptides extracted from aggregation-prone regions of selected proteins was studied by means of multiplexed replica-exchange molecular dynamics simulations with the coarse-grained UNRES model of polypeptide chains. Each simulation was carried out with 320 dipeptide molecules in a periodic box at 0.24 mol/dm3 concentration, in the 260-370 K temperature range. The temperature profiles of the degree of association, distributions of dipeptide cluster size, and structures of clusters were examined. It has been found that the dipeptides composed of strongly nonpolar (aromatic or aliphatic) residues associate nearly completely at all temperatures to form tight clusters, while those composed of charged or polar residues exhibited no or residual association. The dipeptides composed of nonpolar and small polar residues and those composed of less hydrophobic residues formed single clusters, gradually dissolving with increasing temperature, while those composed of phenylalanine or tryptophan and polar or charged residues formed multiple irregular clusters with room to accommodate water inside, suggesting the formation of liquid droplets or gels. The logarithms of the average degree of association and the free energy of aggregation per monomer were found to correlate with the dipeptide hydrophobicity.
Collapse
Affiliation(s)
- Mateusz Leśniewski
- Faculty of Chemistry, University of Gdańsk, Fahrenheit Union of Universities in Gdańsk, Wita Stwosza 63, 80-308 Gdańsk, Poland
| | - Emilia Iłowska
- Faculty of Chemistry, University of Gdańsk, Fahrenheit Union of Universities in Gdańsk, Wita Stwosza 63, 80-308 Gdańsk, Poland
| | - Justyna Sawicka
- Laboratory of Molecular and Cellular Nephrology, Department of Molecular Biotechnology, Faculty of Chemistry, Mossakowski Medical Research Institute, Polish Academy of Sciences, ul. Adolfa Pawińskiego 5, 02-106 Warsaw, Poland
| | - Zihan Li
- College of Chemistry and Molecular Engineering & PKU-Tsinghua Center for Life Sciences & Beijing National Laboratory for Molecular Sciences, Peking University, Beijing 100871, China
| | - Chun Tang
- College of Chemistry and Molecular Engineering & PKU-Tsinghua Center for Life Sciences & Beijing National Laboratory for Molecular Sciences, Peking University, Beijing 100871, China
| | - Adam Liwo
- Faculty of Chemistry, University of Gdańsk, Fahrenheit Union of Universities in Gdańsk, Wita Stwosza 63, 80-308 Gdańsk, Poland
| |
Collapse
|
10
|
Wozniak S, Janson G, Feig M. Accurate Predictions of Molecular Properties of Proteins via Graph Neural Networks and Transfer Learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.12.10.627714. [PMID: 39713395 PMCID: PMC11661272 DOI: 10.1101/2024.12.10.627714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/24/2024]
Abstract
Machine learning has emerged as a promising approach for predicting molecular properties of proteins, as it addresses limitations of experimental and traditional computational methods. Here, we introduce GSnet, a graph neural network (GNN) trained to predict physicochemical and geometric properties including solvation free energies, diffusion constants, and hydrodynamic radii, based on three-dimensional protein structures. By leveraging transfer learning, pre-trained GSnet embeddings were adapted to predict solvent-accessible surface area (SASA) and residue-specific pKa values, achieving high accuracy and generalizability. Notably, GSnet outperformed existing protein embeddings for SASA prediction, and a locally charge-aware variant, aLCnet, approached the accuracy of simulation-based and empirical methods for pKa prediction. Our GNN framework demonstrated robustness across diverse datasets, including intrinsically disordered peptides, and scalability for high-throughput applications. These results highlight the potential of GNN-based embeddings and transfer learning to advance protein structure analysis, providing a foundation for integrating predictive models into proteome-wide studies and structural biology pipelines.
Collapse
Affiliation(s)
- Spencer Wozniak
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA
| | - Giacomo Janson
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA
| | - Michael Feig
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
11
|
Kryś JD, Głowacki M, Śmieja P, Gront D. deepBBQ: A Deep Learning Approach to the Protein Backbone Reconstruction. Biomolecules 2024; 14:1448. [PMID: 39595623 PMCID: PMC11592026 DOI: 10.3390/biom14111448] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2024] [Revised: 10/10/2024] [Accepted: 11/01/2024] [Indexed: 11/28/2024] Open
Abstract
Coarse-grained models have provided researchers with greatly improved computational efficiency in modeling structures and dynamics of biomacromolecules, but, to be practically useful, they need fast and accurate conversion methods back to the all-atom representation. Reconstruction of atomic details may also be required in the case of some experimental methods, like electron microscopy, which may provide Cα-only structures. In this contribution, we present a new method for recovery of all backbone atom positions from just the Cα coordinates. Our approach, called deepBBQ, uses a deep convolutional neural network to predict a single internal coordinate per peptide plate, based on Cα trace geometric features, and then proceeds to recalculate the cartesian coordinates based on the assumption that the peptide plate atoms lie in the same plane. Extensive comparison with similar programs shows that our solution is accurate and cost-efficient. The deepBBQ program is available as part of the open-source bioinformatics toolkit Bioshell and is free for download and the documentation is available online.
Collapse
Affiliation(s)
| | | | | | - Dominik Gront
- Faculty of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland; (J.D.K.); (M.G.); (P.Ś.)
| |
Collapse
|
12
|
González-Delgado J, Bernadó P, Neuvial P, Cortés J. Weighted families of contact maps to characterize conformational ensembles of (highly-)flexible proteins. Bioinformatics 2024; 40:btae627. [PMID: 39432675 PMCID: PMC11530230 DOI: 10.1093/bioinformatics/btae627] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2024] [Revised: 09/17/2024] [Accepted: 10/16/2024] [Indexed: 10/23/2024] Open
Abstract
MOTIVATION Characterizing the structure of flexible proteins, particularly within the realm of intrinsic disorder, presents a formidable challenge due to their high conformational variability. Currently, their structural representation relies on (possibly large) conformational ensembles derived from a combination of experimental and computational methods. The detailed structural analysis of these ensembles is a difficult task, for which existing tools have limited effectiveness. RESULTS This study proposes an innovative extension of the concept of contact maps to the ensemble framework, incorporating the intrinsic probabilistic nature of disordered proteins. Within this framework, a conformational ensemble is characterized through a weighted family of contact maps. To achieve this, conformations are first described using a refined definition of contact that appropriately accounts for the geometry of the inter-residue interactions and the sequence context. Representative structural features of the ensemble naturally emerge from the subsequent clustering of the resulting contact-based descriptors. Importantly, transiently populated structural features are readily identified within large ensembles. The performance of the method is illustrated by several use cases and compared with other existing approaches, highlighting its superiority in capturing relevant structural features of highly flexible proteins. AVAILABILITY AND IMPLEMENTATION An open-source implementation of the method is provided together with an easy-to-use Jupyter notebook, available at https://gitlab.laas.fr/moma/WARIO.
Collapse
Affiliation(s)
- Javier González-Delgado
- LAAS-CNRS, Université de Toulouse, CNRS, 31400 Toulouse, France
- Institut de Mathématiques de Toulouse, Université de Toulouse, CNRS, 31400 Toulouse, France
| | - Pau Bernadó
- Centre de Biologie Structurale, Université de Montpellier, INSERM, CNRS, 34090 Montpellier, France
| | - Pierre Neuvial
- Institut de Mathématiques de Toulouse, Université de Toulouse, CNRS, 31400 Toulouse, France
| | - Juan Cortés
- LAAS-CNRS, Université de Toulouse, CNRS, 31400 Toulouse, France
| |
Collapse
|
13
|
Jussupow A, Bartley D, Lapidus LJ, Feig M. COCOMO2: A coarse-grained model for interacting folded and disordered proteins. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.29.620916. [PMID: 39554101 PMCID: PMC11565878 DOI: 10.1101/2024.10.29.620916] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/19/2024]
Abstract
Biomolecular interactions are essential in many biological processes, including complex formation and phase separation processes. Coarse-grained computational models are especially valuable for studying such processes via simulation. Here, we present COCOMO2, an updated residue-based coarse-grained model that extends its applicability from intrinsically disordered peptides to folded proteins. This is accomplished with the introduction of a surface exposure scaling factor, which adjusts interaction strengths based on solvent accessibility, to enable the more realistic modeling of interactions involving folded domains without additional computational costs. COCOMO2 was parameterized directly with solubility and phase separation data to improve its performance on predicting concentration-dependent phase separation for a broader range of biomolecular systems compared to the original version. COCOMO2 enables new applications including the study of condensates that involve IDPs together with folded domains and the study of complex assembly processes. COCOMO2 also provides an expanded foundation for the development of multi-scale approaches for modeling biomolecular interactions that span from residue-level to atomistic resolution.
Collapse
Affiliation(s)
- Alexander Jussupow
- Department of Biochemistry and Molecular Biology, East Lansing, MI 48824, USA
| | - Divya Bartley
- Department of Biochemistry and Molecular Biology, East Lansing, MI 48824, USA
| | - Lisa J. Lapidus
- Department of Physics and Astronomy Michigan State University, East Lansing, MI 48824, USA
| | - Michael Feig
- Department of Biochemistry and Molecular Biology, East Lansing, MI 48824, USA
| |
Collapse
|
14
|
Cao F, von Bülow S, Tesei G, Lindorff‐Larsen K. A coarse-grained model for disordered and multi-domain proteins. Protein Sci 2024; 33:e5172. [PMID: 39412378 PMCID: PMC11481261 DOI: 10.1002/pro.5172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 07/12/2024] [Accepted: 08/23/2024] [Indexed: 10/20/2024]
Abstract
Many proteins contain more than one folded domain, and such modular multi-domain proteins help expand the functional repertoire of proteins. Because of their larger size and often substantial dynamics, it may be difficult to characterize the conformational ensembles of multi-domain proteins by simulations. Here, we present a coarse-grained model for multi-domain proteins that is both fast and provides an accurate description of the global conformational properties in solution. We show that the accuracy of a one-bead-per-residue coarse-grained model depends on how the interaction sites in the folded domains are represented. Specifically, we find excessive domain-domain interactions if the interaction sites are located at the position of the Cα atoms. We also show that if the interaction sites are located at the center of mass of the residue, we obtain good agreement between simulations and experiments across a wide range of proteins. We then optimize our previously described CALVADOS model using this center-of-mass representation, and validate the resulting model using independent data. Finally, we use our revised model to simulate phase separation of both disordered and multi-domain proteins, and to examine how the stability of folded domains may differ between the dilute and dense phases. Our results provide a starting point for understanding interactions between folded and disordered regions in proteins, and how these regions affect the propensity of proteins to self-associate and undergo phase separation.
Collapse
Affiliation(s)
- Fan Cao
- Structural Biology and NMR Laboratory & the Linderstrøm‐Lang Centre for Protein Science, Department of BiologyUniversity of CopenhagenCopenhagenDenmark
| | - Sören von Bülow
- Structural Biology and NMR Laboratory & the Linderstrøm‐Lang Centre for Protein Science, Department of BiologyUniversity of CopenhagenCopenhagenDenmark
| | - Giulio Tesei
- Structural Biology and NMR Laboratory & the Linderstrøm‐Lang Centre for Protein Science, Department of BiologyUniversity of CopenhagenCopenhagenDenmark
| | - Kresten Lindorff‐Larsen
- Structural Biology and NMR Laboratory & the Linderstrøm‐Lang Centre for Protein Science, Department of BiologyUniversity of CopenhagenCopenhagenDenmark
| |
Collapse
|
15
|
Hwang W, Austin SL, Blondel A, Boittier ED, Boresch S, Buck M, Buckner J, Caflisch A, Chang HT, Cheng X, Choi YK, Chu JW, Crowley MF, Cui Q, Damjanovic A, Deng Y, Devereux M, Ding X, Feig MF, Gao J, Glowacki DR, Gonzales JE, Hamaneh MB, Harder ED, Hayes RL, Huang J, Huang Y, Hudson PS, Im W, Islam SM, Jiang W, Jones MR, Käser S, Kearns FL, Kern NR, Klauda JB, Lazaridis T, Lee J, Lemkul JA, Liu X, Luo Y, MacKerell AD, Major DT, Meuwly M, Nam K, Nilsson L, Ovchinnikov V, Paci E, Park S, Pastor RW, Pittman AR, Post CB, Prasad S, Pu J, Qi Y, Rathinavelan T, Roe DR, Roux B, Rowley CN, Shen J, Simmonett AC, Sodt AJ, Töpfer K, Upadhyay M, van der Vaart A, Vazquez-Salazar LI, Venable RM, Warrensford LC, Woodcock HL, Wu Y, Brooks CL, Brooks BR, Karplus M. CHARMM at 45: Enhancements in Accessibility, Functionality, and Speed. J Phys Chem B 2024; 128:9976-10042. [PMID: 39303207 PMCID: PMC11492285 DOI: 10.1021/acs.jpcb.4c04100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Revised: 08/15/2024] [Accepted: 08/22/2024] [Indexed: 09/22/2024]
Abstract
Since its inception nearly a half century ago, CHARMM has been playing a central role in computational biochemistry and biophysics. Commensurate with the developments in experimental research and advances in computer hardware, the range of methods and applicability of CHARMM have also grown. This review summarizes major developments that occurred after 2009 when the last review of CHARMM was published. They include the following: new faster simulation engines, accessible user interfaces for convenient workflows, and a vast array of simulation and analysis methods that encompass quantum mechanical, atomistic, and coarse-grained levels, as well as extensive coverage of force fields. In addition to providing the current snapshot of the CHARMM development, this review may serve as a starting point for exploring relevant theories and computational methods for tackling contemporary and emerging problems in biomolecular systems. CHARMM is freely available for academic and nonprofit research at https://academiccharmm.org/program.
Collapse
Affiliation(s)
- Wonmuk Hwang
- Department
of Biomedical Engineering, Texas A&M
University, College
Station, Texas 77843, United States
- Department
of Materials Science and Engineering, Texas
A&M University, College Station, Texas 77843, United States
- Department
of Physics and Astronomy, Texas A&M
University, College Station, Texas 77843, United States
- Center for
AI and Natural Sciences, Korea Institute
for Advanced Study, Seoul 02455, Republic
of Korea
| | - Steven L. Austin
- Department
of Chemistry, University of South Florida, Tampa, Florida 33620, United States
| | - Arnaud Blondel
- Institut
Pasteur, Université Paris Cité, CNRS UMR3825, Structural
Bioinformatics Unit, 28 rue du Dr. Roux F-75015 Paris, France
| | - Eric D. Boittier
- Department
of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland
| | - Stefan Boresch
- Faculty of
Chemistry, Department of Computational Biological Chemistry, University of Vienna, Wahringerstrasse 17, 1090 Vienna, Austria
| | - Matthias Buck
- Department
of Physiology and Biophysics, Case Western
Reserve University, School of Medicine, Cleveland, Ohio 44106, United States
| | - Joshua Buckner
- Department
of Chemistry, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Amedeo Caflisch
- Department
of Biochemistry, University of Zürich, CH-8057 Zürich, Switzerland
| | - Hao-Ting Chang
- Institute
of Bioinformatics and Systems Biology, National
Yang Ming Chiao Tung University, Hsinchu 30010, Taiwan, ROC
| | - Xi Cheng
- Shanghai
Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| | - Yeol Kyo Choi
- Department
of Biological Sciences, Lehigh University, Bethlehem, Pennsylvania 18015, United States
| | - Jhih-Wei Chu
- Institute
of Bioinformatics and Systems Biology, Department of Biological Science
and Technology, Institute of Molecular Medicine and Bioengineering,
and Center for Intelligent Drug Systems and Smart Bio-devices (IDSB), National Yang Ming Chiao Tung
University, Hsinchu 30010, Taiwan,
ROC
| | - Michael F. Crowley
- Renewable
Resources and Enabling Sciences Center, National Renewable Energy Laboratory, Golden, Colorado 80401, United States
| | - Qiang Cui
- Department
of Chemistry, Boston University, 590 Commonwealth Avenue, Boston, Massachusetts 02215, United States
- Department
of Physics, Boston University, 590 Commonwealth Avenue, Boston, Massachusetts 02215, United States
- Department
of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, Massachusetts 02215, United States
| | - Ana Damjanovic
- Department
of Biophysics, Johns Hopkins University, Baltimore, Maryland 21218, United States
- Department
of Physics and Astronomy, Johns Hopkins
University, Baltimore, Maryland 21218, United States
- Laboratory
of Computational Biology, National Heart
Lung and Blood Institute, National Institutes of Health, Bethesda, Maryland 20892, United States
| | - Yuqing Deng
- Shanghai
R&D Center, DP Technology, Ltd., Shanghai 201210, China
| | - Mike Devereux
- Department
of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland
| | - Xinqiang Ding
- Department
of Chemistry, Tufts University, Medford, Massachusetts 02155, United States
| | - Michael F. Feig
- Department
of Biochemistry and Molecular Biology, Michigan
State University, East Lansing, Michigan 48824, United States
| | - Jiali Gao
- School
of Chemical Biology & Biotechnology, Peking University Shenzhen Graduate School, Shenzhen, Guangdong 518055, China
- Institute
of Systems and Physical Biology, Shenzhen
Bay Laboratory, Shenzhen, Guangdong 518055, China
- Department
of Chemistry and Supercomputing Institute, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - David R. Glowacki
- CiTIUS
Centro Singular de Investigación en Tecnoloxías Intelixentes
da USC, 15705 Santiago de Compostela, Spain
| | - James E. Gonzales
- Department
of Biomedical Engineering, Texas A&M
University, College
Station, Texas 77843, United States
- Laboratory
of Computational Biology, National Heart
Lung and Blood Institute, National Institutes of Health, Bethesda, Maryland 20892, United States
| | - Mehdi Bagerhi Hamaneh
- Department
of Physiology and Biophysics, Case Western
Reserve University, School of Medicine, Cleveland, Ohio 44106, United States
| | | | - Ryan L. Hayes
- Department
of Chemical and Biomolecular Engineering, University of California, Irvine, Irvine, California 92697, United States
- Department
of Pharmaceutical Sciences, University of
California, Irvine, Irvine, California 92697, United States
| | - Jing Huang
- Key Laboratory
of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, Hangzhou, Zhejiang 310024, China
| | - Yandong Huang
- College
of Computer Engineering, Jimei University, Xiamen 361021, China
| | - Phillip S. Hudson
- Department
of Chemistry, University of South Florida, Tampa, Florida 33620, United States
- Medicine
Design, Pfizer Inc., Cambridge, Massachusetts 02139, United States
| | - Wonpil Im
- Department
of Biological Sciences, Lehigh University, Bethlehem, Pennsylvania 18015, United States
| | - Shahidul M. Islam
- Department
of Chemistry, Delaware State University, Dover, Delaware 19901, United States
| | - Wei Jiang
- Computational
Science Division, Argonne National Laboratory, Argonne, Illinois 60439, United States
| | - Michael R. Jones
- Laboratory
of Computational Biology, National Heart
Lung and Blood Institute, National Institutes of Health, Bethesda, Maryland 20892, United States
| | - Silvan Käser
- Department
of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland
| | - Fiona L. Kearns
- Department
of Chemistry, University of South Florida, Tampa, Florida 33620, United States
| | - Nathan R. Kern
- Department
of Biological Sciences, Lehigh University, Bethlehem, Pennsylvania 18015, United States
| | - Jeffery B. Klauda
- Department
of Chemical and Biomolecular Engineering, Institute for Physical Science
and Technology, Biophysics Program, University
of Maryland, College Park, Maryland 20742, United States
| | - Themis Lazaridis
- Department
of Chemistry, City College of New York, New York, New York 10031, United States
| | - Jinhyuk Lee
- Disease
Target Structure Research Center, Korea
Research Institute of Bioscience and Biotechnology, Daejeon 34141, Republic of Korea
- Department
of Bioinformatics, KRIBB School of Bioscience, University of Science and Technology, Daejeon 34141, Republic of Korea
| | - Justin A. Lemkul
- Department
of Biochemistry, Virginia Polytechnic Institute
and State University, Blacksburg, Virginia 24061, United States
| | - Xiaorong Liu
- Department
of Chemistry, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Yun Luo
- Department
of Biotechnology and Pharmaceutical Sciences, College of Pharmacy, Western University of Health Sciences, Pomona, California 91766, United States
| | - Alexander D. MacKerell
- Department
of Pharmaceutical Sciences, University of
Maryland School of Pharmacy, Baltimore, Maryland 21201, United States
| | - Dan T. Major
- Department
of Chemistry and Institute for Nanotechnology & Advanced Materials, Bar-Ilan University, Ramat-Gan 52900, Israel
| | - Markus Meuwly
- Department
of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland
- Department
of Chemistry, Brown University, Providence, Rhode Island 02912, United States
| | - Kwangho Nam
- Department
of Chemistry and Biochemistry, University
of Texas at Arlington, Arlington, Texas 76019, United States
| | - Lennart Nilsson
- Karolinska
Institutet, Department of Biosciences and
Nutrition, SE-14183 Huddinge, Sweden
| | - Victor Ovchinnikov
- Harvard
University, Department of Chemistry
and Chemical Biology, Cambridge, Massachusetts 02138, United States
| | - Emanuele Paci
- Dipartimento
di Fisica e Astronomia, Universitá
di Bologna, Bologna 40127, Italy
| | - Soohyung Park
- Department
of Biological Sciences, Lehigh University, Bethlehem, Pennsylvania 18015, United States
| | - Richard W. Pastor
- Laboratory
of Computational Biology, National Heart
Lung and Blood Institute, National Institutes of Health, Bethesda, Maryland 20892, United States
| | - Amanda R. Pittman
- Department
of Chemistry, University of South Florida, Tampa, Florida 33620, United States
| | - Carol Beth Post
- Borch Department
of Medicinal Chemistry and Molecular Pharmacology, Purdue University, West Lafayette, Indiana 47907, United States
| | - Samarjeet Prasad
- Laboratory
of Computational Biology, National Heart
Lung and Blood Institute, National Institutes of Health, Bethesda, Maryland 20892, United States
| | - Jingzhi Pu
- Department
of Chemistry and Chemical Biology, Indiana
University Indianapolis, Indianapolis, Indiana 46202, United States
| | - Yifei Qi
- School
of Pharmacy, Fudan University, Shanghai 201203, China
| | | | - Daniel R. Roe
- Laboratory
of Computational Biology, National Heart
Lung and Blood Institute, National Institutes of Health, Bethesda, Maryland 20892, United States
| | - Benoit Roux
- Department
of Chemistry, University of Chicago, Chicago, Illinois 60637, United States
| | | | - Jana Shen
- Department
of Pharmaceutical Sciences, University of
Maryland School of Pharmacy, Baltimore, Maryland 21201, United States
| | - Andrew C. Simmonett
- Laboratory
of Computational Biology, National Heart
Lung and Blood Institute, National Institutes of Health, Bethesda, Maryland 20892, United States
| | - Alexander J. Sodt
- Eunice
Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland 20892, United States
| | - Kai Töpfer
- Department
of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland
| | - Meenu Upadhyay
- Department
of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland
| | - Arjan van der Vaart
- Department
of Chemistry, University of South Florida, Tampa, Florida 33620, United States
| | | | - Richard M. Venable
- Laboratory
of Computational Biology, National Heart
Lung and Blood Institute, National Institutes of Health, Bethesda, Maryland 20892, United States
| | - Luke C. Warrensford
- Department
of Chemistry, University of South Florida, Tampa, Florida 33620, United States
| | - H. Lee Woodcock
- Department
of Chemistry, University of South Florida, Tampa, Florida 33620, United States
| | - Yujin Wu
- Department
of Chemistry, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Charles L. Brooks
- Department
of Chemistry, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Bernard R. Brooks
- Laboratory
of Computational Biology, National Heart
Lung and Blood Institute, National Institutes of Health, Bethesda, Maryland 20892, United States
| | - Martin Karplus
- Harvard
University, Department of Chemistry
and Chemical Biology, Cambridge, Massachusetts 02138, United States
- Laboratoire
de Chimie Biophysique, ISIS, Université
de Strasbourg, 67000 Strasbourg, France
| |
Collapse
|
16
|
Nithin C, Fornari RP, Pilla SP, Wroblewski K, Zalewski M, Madaj R, Kolinski A, Macnar JM, Kmiecik S. Exploring protein functions from structural flexibility using CABS-flex modeling. Protein Sci 2024; 33:e5090. [PMID: 39194135 PMCID: PMC11350595 DOI: 10.1002/pro.5090] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 05/06/2024] [Accepted: 06/10/2024] [Indexed: 08/29/2024]
Abstract
Understanding protein function often necessitates characterizing the flexibility of protein structures. However, simulating protein flexibility poses significant challenges due to the complex dynamics of protein systems, requiring extensive computational resources and accurate modeling techniques. In response to these challenges, the CABS-flex method has been developed as an efficient modeling tool that combines coarse-grained simulations with all-atom detail. Available both as a web server and a standalone package, CABS-flex is dedicated to a wide range of users. The web server version offers an accessible interface for straightforward tasks, while the standalone command-line program is designed for advanced users, providing additional features, analytical tools, and support for handling large systems. This paper examines the application of CABS-flex across various structure-function studies, facilitating investigations into the interplay among protein structure, dynamics, and function in diverse research fields. We present an overview of the current status of the CABS-flex methodology, highlighting its recent advancements, practical applications, and forthcoming challenges.
Collapse
Affiliation(s)
- Chandran Nithin
- Biological and Chemical Research Centre, Faculty of ChemistryUniversity of WarsawWarsawPoland
| | - Rocco Peter Fornari
- Biological and Chemical Research Centre, Faculty of ChemistryUniversity of WarsawWarsawPoland
| | - Smita P. Pilla
- Biological and Chemical Research Centre, Faculty of ChemistryUniversity of WarsawWarsawPoland
| | - Karol Wroblewski
- Biological and Chemical Research Centre, Faculty of ChemistryUniversity of WarsawWarsawPoland
| | - Mateusz Zalewski
- Biological and Chemical Research Centre, Faculty of ChemistryUniversity of WarsawWarsawPoland
| | - Rafał Madaj
- Institute of Evolutionary Biology, Biological and Chemical Research Centre, Faculty of BiologyUniversity of WarsawWarsawPoland
| | - Andrzej Kolinski
- Biological and Chemical Research Centre, Faculty of ChemistryUniversity of WarsawWarsawPoland
| | - Joanna M. Macnar
- Biological and Chemical Research Centre, Faculty of ChemistryUniversity of WarsawWarsawPoland
- Present address:
Ryvu TherapeuticsCracowPoland
| | - Sebastian Kmiecik
- Biological and Chemical Research Centre, Faculty of ChemistryUniversity of WarsawWarsawPoland
| |
Collapse
|
17
|
Janson G, Feig M. Transferable deep generative modeling of intrinsically disordered protein conformations. PLoS Comput Biol 2024; 20:e1012144. [PMID: 38781245 PMCID: PMC11152266 DOI: 10.1371/journal.pcbi.1012144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 06/05/2024] [Accepted: 05/07/2024] [Indexed: 05/25/2024] Open
Abstract
Intrinsically disordered proteins have dynamic structures through which they play key biological roles. The elucidation of their conformational ensembles is a challenging problem requiring an integrated use of computational and experimental methods. Molecular simulations are a valuable computational strategy for constructing structural ensembles of disordered proteins but are highly resource-intensive. Recently, machine learning approaches based on deep generative models that learn from simulation data have emerged as an efficient alternative for generating structural ensembles. However, such methods currently suffer from limited transferability when modeling sequences and conformations absent in the training data. Here, we develop a novel generative model that achieves high levels of transferability for intrinsically disordered protein ensembles. The approach, named idpSAM, is a latent diffusion model based on transformer neural networks. It combines an autoencoder to learn a representation of protein geometry and a diffusion model to sample novel conformations in the encoded space. IdpSAM was trained on a large dataset of simulations of disordered protein regions performed with the ABSINTH implicit solvent model. Thanks to the expressiveness of its neural networks and its training stability, idpSAM faithfully captures 3D structural ensembles of test sequences with no similarity in the training set. Our study also demonstrates the potential for generating full conformational ensembles from datasets with limited sampling and underscores the importance of training set size for generalization. We believe that idpSAM represents a significant progress in transferable protein ensemble modeling through machine learning.
Collapse
Affiliation(s)
- Giacomo Janson
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan, United States of America
| | - Michael Feig
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan, United States of America
| |
Collapse
|
18
|
Christofi E, Bačová P, Harmandaris VA. Physics-Informed Deep Learning Approach for Reintroducing Atomic Detail in Coarse-Grained Configurations of Multiple Poly(lactic acid) Stereoisomers. J Chem Inf Model 2024; 64:1853-1867. [PMID: 38427962 PMCID: PMC10966642 DOI: 10.1021/acs.jcim.3c01870] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 02/15/2024] [Accepted: 02/16/2024] [Indexed: 03/03/2024]
Abstract
Multiscale modeling of complex molecular systems, such as macromolecules, encompasses methods that combine information from fine and coarse representations of molecules to capture material properties over a wide range of spatiotemporal scales. Being able to exchange information between different levels of resolution is essential for the effective transfer of this information. The inverse problem of reintroducing atomistic degrees of freedom in coarse-grained (CG) molecular configurations is particularly challenging as, from a mathematical point of view, it is an ill-posed problem; the forward mapping from the atomistic to the CG description is typically defined via a deterministic operator ("one-to-one" problem), whereas the reversed mapping from the CG to the atomistic model refers to creating one representative configuration out of many possible ones ("one-to-many" problem). Most of the backmapping methods proposed so far balance accuracy, efficiency, and general applicability. This is particularly important for macromolecular systems with different types of isomers, i.e., molecules that have the same molecular formula and sequence of bonded atoms (constitution) but differ in the three-dimensional configurations of their atoms in space. Here, we introduce a versatile deep learning approach for backmapping multicomponent CG macromolecules with chiral centers, trained to learn structural correlations between polymer configurations at the atomistic level and their corresponding CG descriptions. This method is intended to be simple and flexible while presenting a generic solution for resolution transformation. In addition, the method is aimed to respect the structural features of the molecule, such as local packing, capturing therefore the physical properties of the material. As an illustrative example, we apply the model on linear poly(lactic acid) (PLA) in melt, which is one of the most popular biodegradable polymers. The framework is tested on a number of model systems starting from homopolymer stereoisomers of PLA to copolymers with randomly placed chiral centers. The results demonstrate the efficiency and efficacy of the new approach.
Collapse
Affiliation(s)
- Eleftherios Christofi
- Computation-based
Science and Technology Research Center, The Cyprus Institute, Nicosia 2121, Cyprus
| | - Petra Bačová
- Departamento
de Ciencia de los Materiales e Ingeniería Metalúrgica
y Química Inorgánica, Facultad de Ciencias, IMEYMAT, Campus Universitario Río San Pedro s/n.,
Puerto Real, Cádiz 11510, Spain
| | - Vagelis A. Harmandaris
- Computation-based
Science and Technology Research Center, The Cyprus Institute, Nicosia 2121, Cyprus
- Department
of Mathematics and Applied Mathematics, University of Crete, Heraklion GR-71110, Greece
- Institute
of Applied and Computational Mathematics, Foundation for Research and Technology - Hellas, Heraklion GR-71110, Crete, Greece
| |
Collapse
|
19
|
Janson G, Feig M. Transferable deep generative modeling of intrinsically disordered protein conformations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.08.579522. [PMID: 38370653 PMCID: PMC10871340 DOI: 10.1101/2024.02.08.579522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
Intrinsically disordered proteins have dynamic structures through which they play key biological roles. The elucidation of their conformational ensembles is a challenging problem requiring an integrated use of computational and experimental methods. Molecular simulations are a valuable computational strategy for constructing structural ensembles of disordered proteins but are highly resource-intensive. Recently, machine learning approaches based on deep generative models that learn from simulation data have emerged as an efficient alternative for generating structural ensembles. However, such methods currently suffer from limited transferability when modeling sequences and conformations absent in the training data. Here, we develop a novel generative model that achieves high levels of transferability for intrinsically disordered protein ensembles. The approach, named idpSAM, is a latent diffusion model based on transformer neural networks. It combines an autoencoder to learn a representation of protein geometry and a diffusion model to sample novel conformations in the encoded space. IdpSAM was trained on a large dataset of simulations of disordered protein regions performed with the ABSINTH implicit solvent model. Thanks to the expressiveness of its neural networks and its training stability, idpSAM faithfully captures 3D structural ensembles of test sequences with no similarity in the training set. Our study also demonstrates the potential for generating full conformational ensembles from datasets with limited sampling and underscores the importance of training set size for generalization. We believe that idpSAM represents a significant progress in transferable protein ensemble modeling through machine learning.
Collapse
Affiliation(s)
- Giacomo Janson
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan, USA
| | - Michael Feig
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan, USA
| |
Collapse
|
20
|
Pang YT, Yang L, Gumbart JC. From simple to complex: Reconstructing all-atom structures from coarse-grained models using cg2all. Structure 2024; 32:5-7. [PMID: 38181727 PMCID: PMC11283325 DOI: 10.1016/j.str.2023.12.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Revised: 12/08/2023] [Accepted: 12/08/2023] [Indexed: 01/07/2024]
Abstract
In this issue of Structure, Heo and Feig present cg2all, a novel deep-learning model capable of efficiently predicting all-atom protein structures from coarse-grained (CG) representations. The model maintains high accuracy, even when the CG model is simplified to a single bead per residue, and has a number of promising applications.
Collapse
Affiliation(s)
- Yui Tik Pang
- School of Physics, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Lixinhao Yang
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - James C Gumbart
- School of Physics, Georgia Institute of Technology, Atlanta, GA 30332, USA; School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA 30332, USA.
| |
Collapse
|