1
|
Ruzmetov T, Hung TI, Jonnalagedda SP, Chen SH, Fasihianifard P, Guo Z, Bhanu B, Chang CEA. Sampling Conformational Ensembles of Highly Dynamic Proteins via Generative Deep Learning. J Chem Inf Model 2025; 65:2487-2502. [PMID: 39984300 DOI: 10.1021/acs.jcim.4c01838] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2025]
Abstract
Proteins are inherently dynamic, and their conformational ensembles play a crucial role in biological function. Large-scale motions may govern the protein structure-function relationship, and numerous transient but stable conformations of intrinsically disordered proteins (IDPs) can play a crucial role in biological function. Investigating conformational ensembles to understand regulations and disease-related aggregations of IDPs is challenging, both experimentally and computationally. In this paper, we first introduce a deep learning-based model, termed Internal Coordinate Net (ICoN), which learns the physical principles of conformational changes from molecular dynamics simulation data. Second, we selected data points through interpolation in the learned latent space to rapidly identify novel synthetic conformations with sophisticated and large-scale side chains and backbone arrangements. Third, with the highly dynamic amyloid-β1-42 (Aβ42) monomer, our deep learning model provided a comprehensive sampling of Aβ42's conformational landscape. Analysis of these synthetic conformations revealed conformational clusters that could be used to rationalize experimental findings. Additionally, the method can identify novel conformations with important interactions in atomistic details that are not included in the training data. New synthetic conformations showed distinct side chain rearrangements that are probed by our electron paramagnetic resonance and amino acid substitution studies. This approach is highly transferable and can be used for any available data for training. The work also demonstrated the ability of deep learning to utilize natural atomistic motions in protein conformation sampling.
Collapse
Affiliation(s)
- Talant Ruzmetov
- Department of Chemistry, University of California, Riverside, California 92521, United States
| | - Ta I Hung
- Department of Chemistry, University of California, Riverside, California 92521, United States
- Department of Bioengineering, University of California, Riverside, California 92521, United States
| | - Saisri Padmaja Jonnalagedda
- Department of Electrical and Computer Engineering, University of California, Riverside, California 92521, United States
| | - Si-Han Chen
- Department of Chemistry, University of California, Riverside, California 92521, United States
| | - Parisa Fasihianifard
- Department of Chemistry, University of California, Riverside, California 92521, United States
| | - Zhefeng Guo
- Department of Neurology, Brain Research Institute, University of California, Los Angeles, California 90095, United States
| | - Bir Bhanu
- Department of Bioengineering, University of California, Riverside, California 92521, United States
- Department of Electrical and Computer Engineering, University of California, Riverside, California 92521, United States
| | - Chia-En A Chang
- Department of Chemistry, University of California, Riverside, California 92521, United States
- Department of Bioengineering, University of California, Riverside, California 92521, United States
| |
Collapse
|
2
|
Taneja I, Lasker K. Machine-learning-based methods to generate conformational ensembles of disordered proteins. Biophys J 2024; 123:101-113. [PMID: 38053335 PMCID: PMC10808026 DOI: 10.1016/j.bpj.2023.12.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 10/24/2023] [Accepted: 12/01/2023] [Indexed: 12/07/2023] Open
Abstract
Intrinsically disordered proteins are characterized by a conformational ensemble. While computational approaches such as molecular dynamics simulations have been used to generate such ensembles, their computational costs can be prohibitive. An alternative approach is to learn from data and train machine-learning models to generate conformational ensembles of disordered proteins. This has been a relatively unexplored approach, and in this work we demonstrate a proof-of-principle approach to do so. Specifically, we devised a two-stage computational pipeline: in the first stage, we employed supervised machine-learning models to predict ensemble-derived two-dimensional (2D) properties of a sequence, given the conformational ensemble of a closely related sequence. In the second stage, we used denoising diffusion models to generate three-dimensional (3D) coarse-grained conformational ensembles, given the two-dimensional predictions outputted by the first stage. We trained our models on a data set of coarse-grained molecular dynamics simulations of thousands of rationally designed synthetic sequences. The accuracy of our 2D and 3D predictions was validated across multiple metrics, and our work demonstrates the applicability of machine-learning techniques to predicting higher-dimensional properties of disordered proteins.
Collapse
Affiliation(s)
- Ishan Taneja
- Department of Integrative Structural and Computational Biology, Scripps Research, La Jolla, California
| | - Keren Lasker
- Department of Integrative Structural and Computational Biology, Scripps Research, La Jolla, California.
| |
Collapse
|
3
|
Hagg A, Kirschner KN. Open-Source Machine Learning in Computational Chemistry. J Chem Inf Model 2023; 63:4505-4532. [PMID: 37466636 PMCID: PMC10430767 DOI: 10.1021/acs.jcim.3c00643] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Indexed: 07/20/2023]
Abstract
The field of computational chemistry has seen a significant increase in the integration of machine learning concepts and algorithms. In this Perspective, we surveyed 179 open-source software projects, with corresponding peer-reviewed papers published within the last 5 years, to better understand the topics within the field being investigated by machine learning approaches. For each project, we provide a short description, the link to the code, the accompanying license type, and whether the training data and resulting models are made publicly available. Based on those deposited in GitHub repositories, the most popular employed Python libraries are identified. We hope that this survey will serve as a resource to learn about machine learning or specific architectures thereof by identifying accessible codes with accompanying papers on a topic basis. To this end, we also include computational chemistry open-source software for generating training data and fundamental Python libraries for machine learning. Based on our observations and considering the three pillars of collaborative machine learning work, open data, open source (code), and open models, we provide some suggestions to the community.
Collapse
Affiliation(s)
- Alexander Hagg
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Electrical Engineering, Mechanical Engineering and Technical Journalism, University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| | - Karl N. Kirschner
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Computer Science, University of Applied
Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| |
Collapse
|