1
|
Randolph NZ, Kuhlman B. Invariant point message passing for protein side chain packing. Proteins 2024. [PMID: 38790143 DOI: 10.1002/prot.26705] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 04/19/2024] [Accepted: 05/13/2024] [Indexed: 05/26/2024]
Abstract
Protein side chain packing (PSCP) is a fundamental problem in the field of protein engineering, as high-confidence and low-energy conformations of amino acid side chains are crucial for understanding (and designing) protein folding, protein-protein interactions, and protein-ligand interactions. Traditional PSCP methods (such as the Rosetta Packer) often rely on a library of discrete side chain conformations, or rotamers, and a forcefield to guide the structure to low-energy conformations. Recently, deep learning (DL) based methods (such as DLPacker, AttnPacker, and DiffPack) have demonstrated state-of-the-art predictions and speed in the PSCP task. Building off the success of geometric graph neural networks for protein modeling, we present the Protein Invariant Point Packer (PIPPack) which effectively processes local structural and sequence information to produce realistic, idealized side chain coordinates usingχ $$ \chi $$ -angle distribution predictions and geometry-aware invariant point message passing (IPMP). On a test set of ∼1400 high-quality protein chains, PIPPack is highly competitive with other state-of-the-art PSCP methods in rotamer recovery and per-residue RMSD but is significantly faster.
Collapse
Affiliation(s)
- Nicholas Z Randolph
- Department of Bioinformatics and Computational Biology, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
- Department of Biochemistry and Biophysics, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
| | - Brian Kuhlman
- Department of Bioinformatics and Computational Biology, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
- Department of Biochemistry and Biophysics, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
| |
Collapse
|
2
|
Xu G, Luo Z, Yan Y, Wang Q, Ma J. OPUS-Rota5: A highly accurate protein side-chain modeling method with 3D-Unet and RotaFormer. Structure 2024:S0969-2126(24)00126-6. [PMID: 38657613 DOI: 10.1016/j.str.2024.03.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Revised: 02/06/2024] [Accepted: 03/28/2024] [Indexed: 04/26/2024]
Abstract
Accurate protein side-chain modeling is crucial for protein folding and design. This is particularly true for molecular docking as ligands primarily interact with side chains. In this study, we introduce a two-stage side-chain modeling approach called OPUS-Rota5. It leverages a modified 3D-Unet to capture the local environmental features, including ligand information of each residue, and then employs the RotaFormer module to aggregate various types of features. Evaluation on three test sets, including recently released targets from CAMEO and CASP15, shows that OPUS-Rota5 significantly outperforms some other leading side-chain modeling methods. We also employ OPUS-Rota5 to refine the side chains of 25 G protein-coupled receptor targets predicted by AlphaFold2 and achieve a significantly improved success rate in a subsequent "back" docking of their natural ligands. Therefore, OPUS-Rota5 is a useful and effective tool for molecular docking, particularly for targets with relatively accurate predicted backbones but not side chains such as high-homology targets.
Collapse
Affiliation(s)
- Gang Xu
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China; Zhangjiang Fudan International Innovation Center, Fudan University, Shanghai 201210, China; Shanghai AI Laboratory, Shanghai 200030, China
| | - Zhenwei Luo
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China; Zhangjiang Fudan International Innovation Center, Fudan University, Shanghai 201210, China; Shanghai AI Laboratory, Shanghai 200030, China
| | - Yaming Yan
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China; Zhangjiang Fudan International Innovation Center, Fudan University, Shanghai 201210, China
| | - Qinghua Wang
- Center for Biomolecular Innovation, Harcam Biomedicines, Shanghai 200131, China
| | - Jianpeng Ma
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China; Zhangjiang Fudan International Innovation Center, Fudan University, Shanghai 201210, China; Shanghai AI Laboratory, Shanghai 200030, China.
| |
Collapse
|
3
|
Tandiana R, Barletta GP, Soler MA, Fortuna S, Rocchia W. Computational Mutagenesis of Antibody Fragments: Disentangling Side Chains from ΔΔ G Predictions. J Chem Theory Comput 2024; 20:2630-2642. [PMID: 38445482 DOI: 10.1021/acs.jctc.3c01225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/07/2024]
Abstract
The development of highly potent antibodies and antibody fragments as binding agents holds significant implications in fields such as biosensing and biotherapeutics. Their binding strength is intricately linked to the arrangement and composition of residues at the binding interface. Computational techniques offer a robust means to predict the three-dimensional structure of these complexes and to assess the affinity changes resulting from mutations. Given the interdependence of structure and affinity prediction, our objective here is to disentangle their roles. We aim to evaluate independently six side-chain reconstruction methods and ten binding affinity estimation techniques. This evaluation was pivotal in predicting affinity alterations due to single mutations, a key step in computational affinity maturation protocols. Our analysis focuses on a data set comprising 27 distinct antibody/hen egg white lysozyme complexes, each with crystal structures and experimentally determined binding affinities. Using six different side-chain reconstruction methods, we transformed each structure into its corresponding mutant via in silico single-point mutations. Subsequently, these structures undergo minimization and molecular dynamics simulation. We therefore estimate ΔΔG values based on the original crystal structure, its energy-minimized form, and the ensuing molecular dynamics trajectories. Our research underscores the critical importance of selecting reliable side-chain reconstruction methods and conducting thorough molecular dynamics simulations to accurately predict the impact of mutations. In summary, our study demonstrates that the integration of conformational sampling and scoring is a potent approach to precisely characterizing mutation processes in single-point mutagenesis protocols and crucial for computational antibody design.
Collapse
Affiliation(s)
- Rika Tandiana
- Computational MOdelling of NanosCalE and BioPhysical SysTems─CONCEPT Lab Istituto Italiano di Tecnologia (IIT), Via Melen-83, B Block, 16152 Genoa, Italy
| | - German P Barletta
- Computational MOdelling of NanosCalE and BioPhysical SysTems─CONCEPT Lab Istituto Italiano di Tecnologia (IIT), Via Melen-83, B Block, 16152 Genoa, Italy
- The Abdus Salam International Centre for Theoretical Physics─ICTP, Strada Costiera 11, 34151 Trieste, Italy
| | - Miguel Angel Soler
- Dipartimento di Scienze Matematiche, Informatiche e Fisiche, Universita' di Udine, Via delle Scienze 206, 33100 Udine, Italy
| | - Sara Fortuna
- Computational MOdelling of NanosCalE and BioPhysical SysTems─CONCEPT Lab Istituto Italiano di Tecnologia (IIT), Via Melen-83, B Block, 16152 Genoa, Italy
| | - Walter Rocchia
- Computational MOdelling of NanosCalE and BioPhysical SysTems─CONCEPT Lab Istituto Italiano di Tecnologia (IIT), Via Melen-83, B Block, 16152 Genoa, Italy
| |
Collapse
|
4
|
Kim DN, McNaughton AD, Kumar N. Leveraging Artificial Intelligence to Expedite Antibody Design and Enhance Antibody-Antigen Interactions. Bioengineering (Basel) 2024; 11:185. [PMID: 38391671 PMCID: PMC10886287 DOI: 10.3390/bioengineering11020185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Revised: 01/30/2024] [Accepted: 02/06/2024] [Indexed: 02/24/2024] Open
Abstract
This perspective sheds light on the transformative impact of recent computational advancements in the field of protein therapeutics, with a particular focus on the design and development of antibodies. Cutting-edge computational methods have revolutionized our understanding of protein-protein interactions (PPIs), enhancing the efficacy of protein therapeutics in preclinical and clinical settings. Central to these advancements is the application of machine learning and deep learning, which offers unprecedented insights into the intricate mechanisms of PPIs and facilitates precise control over protein functions. Despite these advancements, the complex structural nuances of antibodies pose ongoing challenges in their design and optimization. Our review provides a comprehensive exploration of the latest deep learning approaches, including language models and diffusion techniques, and their role in surmounting these challenges. We also present a critical analysis of these methods, offering insights to drive further progress in this rapidly evolving field. The paper includes practical recommendations for the application of these computational techniques, supplemented with independent benchmark studies. These studies focus on key performance metrics such as accuracy and the ease of program execution, providing a valuable resource for researchers engaged in antibody design and development. Through this detailed perspective, we aim to contribute to the advancement of antibody design, equipping researchers with the tools and knowledge to navigate the complexities of this field.
Collapse
Affiliation(s)
- Doo Nam Kim
- Pacific Northwest National Laboratory, 902 Battelle Blvd., Richland, WA 99352, USA
| | - Andrew D McNaughton
- Pacific Northwest National Laboratory, 902 Battelle Blvd., Richland, WA 99352, USA
| | - Neeraj Kumar
- Pacific Northwest National Laboratory, 902 Battelle Blvd., Richland, WA 99352, USA
| |
Collapse
|
5
|
Pun MN, Ivanov A, Bellamy Q, Montague Z, LaMont C, Bradley P, Otwinowski J, Nourmohammad A. Learning the shape of protein microenvironments with a holographic convolutional neural network. Proc Natl Acad Sci U S A 2024; 121:e2300838121. [PMID: 38300863 PMCID: PMC10861886 DOI: 10.1073/pnas.2300838121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Accepted: 11/29/2023] [Indexed: 02/03/2024] Open
Abstract
Proteins play a central role in biology from immune recognition to brain activity. While major advances in machine learning have improved our ability to predict protein structure from sequence, determining protein function from its sequence or structure remains a major challenge. Here, we introduce holographic convolutional neural network (H-CNN) for proteins, which is a physically motivated machine learning approach to model amino acid preferences in protein structures. H-CNN reflects physical interactions in a protein structure and recapitulates the functional information stored in evolutionary data. H-CNN accurately predicts the impact of mutations on protein stability and binding of protein complexes. Our interpretable computational model for protein structure-function maps could guide design of novel proteins with desired function.
Collapse
Affiliation(s)
- Michael N. Pun
- Department of Physics, University of Washington, Seattle, WA98195
- The Department for Statistical Physics of Evolving Systems, Max Planck Institute for Dynamics and Self-Organization, Göttingen37077, Germany
| | - Andrew Ivanov
- Department of Physics, University of Washington, Seattle, WA98195
| | - Quinn Bellamy
- Department of Physics, University of Washington, Seattle, WA98195
| | - Zachary Montague
- Department of Physics, University of Washington, Seattle, WA98195
- The Department for Statistical Physics of Evolving Systems, Max Planck Institute for Dynamics and Self-Organization, Göttingen37077, Germany
| | - Colin LaMont
- The Department for Statistical Physics of Evolving Systems, Max Planck Institute for Dynamics and Self-Organization, Göttingen37077, Germany
| | - Philip Bradley
- Fred Hutchinson Cancer Center, Seattle, WA98102
- Department of Biochemistry, University of Washington, Seattle, WA98195
- Institute for Protein Design, University of Washington, Seattle, WA98195
| | - Jakub Otwinowski
- The Department for Statistical Physics of Evolving Systems, Max Planck Institute for Dynamics and Self-Organization, Göttingen37077, Germany
- Dyno Therapeutics, Watertown, MA02472
| | - Armita Nourmohammad
- Department of Physics, University of Washington, Seattle, WA98195
- The Department for Statistical Physics of Evolving Systems, Max Planck Institute for Dynamics and Self-Organization, Göttingen37077, Germany
- Fred Hutchinson Cancer Center, Seattle, WA98102
- Department of Applied Mathematics, University of Washington, Seattle, WA98105
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA98195
| |
Collapse
|
6
|
Heo L, Feig M. One bead per residue can describe all-atom protein structures. Structure 2024; 32:97-111.e6. [PMID: 38000367 PMCID: PMC10872525 DOI: 10.1016/j.str.2023.10.013] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Revised: 09/16/2023] [Accepted: 10/30/2023] [Indexed: 11/26/2023]
Abstract
Atomistic resolution is the standard for high-resolution biomolecular structures, but experimental structural data are often at lower resolution. Coarse-grained models are also used extensively in computational studies to reach biologically relevant spatial and temporal scales. This study explores the use of advanced machine learning networks for reconstructing atomistic models from reduced representations. The main finding is that a single bead per amino acid residue allows construction of accurate and stereochemically realistic all-atom structures with minimal loss of information. This suggests that lower resolution representations of proteins may be sufficient for many applications when combined with a machine learning framework that encodes knowledge from known structures. Practical applications include the rapid addition of atomistic detail to low-resolution structures from experiment or computational coarse-grained models. The application of rapid, deterministic all-atom reconstruction within multi-scale frameworks is further demonstrated with a rapid protocol for the generation of accurate models from cryo-EM densities close to experimental structures.
Collapse
Affiliation(s)
- Lim Heo
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA
| | - Michael Feig
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA.
| |
Collapse
|
7
|
Randolph NZ, Kuhlman B. Invariant point message passing for protein side chain packing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.03.551328. [PMID: 38187664 PMCID: PMC10769188 DOI: 10.1101/2023.08.03.551328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Protein side chain packing (PSCP) is a fundamental problem in the field of protein engineering, as high-confidence and low-energy conformations of amino acid side chains are crucial for understanding (and designing) protein folding, protein-protein interactions, and protein-ligand interactions. Traditional PSCP methods (such as the Rosetta Packer) often rely on a library of discrete side chain conformations, or rotamers, and a forcefield to guide the structure to low-energy conformations. Recently, deep learning (DL) based methods (such as DLPacker, AttnPacker, and DiffPack) have demonstrated state-of-the-art predictions and speed in the PSCP task. Building off the success of geometric graph neural networks for protein modeling, we present the Protein Invariant Point Packer (PIPPack) which effectively processes local structural and sequence information to produce realistic, idealized side chain coordinates using χ-angle distribution predictions and geometry-aware invariant point message passing (IPMP). On a test set of ~1,400 high-quality protein chains, PIPPack is highly competitive with other state-of-the-art PSCP methods in rotamer recovery and per-residue RMSD but is significantly faster.
Collapse
Affiliation(s)
- Nicholas Z Randolph
- Department of Bioinformatics and Computational Biology, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
- Department of Biochemistry and Biophysics, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
| | - Brian Kuhlman
- Department of Bioinformatics and Computational Biology, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
- Department of Biochemistry and Biophysics, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
| |
Collapse
|
8
|
Cole CC, Yu LT, Misiura M, Williams J, Bui TH, Hartgerink JD. Stabilization of Synthetic Collagen Triple Helices: Charge Pairs and Covalent Capture. Biomacromolecules 2023; 24:5083-5090. [PMID: 37871141 DOI: 10.1021/acs.biomac.3c00680] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
Collagen mimetic peptides are composed of triple helices. Triple helical formation frequently utilizes charge pair interactions to direct protein assembly. The design of synthetic triple helices is challenging due to the large number of competing species and the overall fragile nature of collagen mimetics. A successfully designed triple helix incorporates both positive and negative criteria to achieve maximum specificity of the supramolecular assembly. Intrahelical charge pair interactions, particularly those involved in lysine-aspartate and lysine-glutamate pairs, have been especially successful both in driving helix specificity and for subsequent stabilization by covalent capture. Despite this progress, the important sequential and geometric relationships of charged residues in a triple helical context have not been fully explored for either supramolecular assembly or covalent capture stabilization. In this study, we compare the eight canonical axial and lateral charge pairs of lysine and arginine with glutamate and aspartate to their noncanonical, reversed charge pairs. These findings are put into the context of collagen triple helical design and synthesis.
Collapse
Affiliation(s)
- Carson C Cole
- Department of Chemistry, Rice University, Houston, Texas 77005, United States
| | - Le Tracy Yu
- Department of Chemistry, Rice University, Houston, Texas 77005, United States
| | - Mikita Misiura
- Department of Chemistry, Rice University, Houston, Texas 77005, United States
| | - Joseph Williams
- Department of Chemistry, Rice University, Houston, Texas 77005, United States
| | - Thi H Bui
- Department of Chemistry, Rice University, Houston, Texas 77005, United States
| | - Jeffrey D Hartgerink
- Department of Chemistry, Rice University, Houston, Texas 77005, United States
- Department of Bioengineering, Rice University, 6100 Main Street, Houston, Texas 77005, United States
| |
Collapse
|
9
|
Shao Q, Jiang Y, Yang ZJ. EnzyHTP Computational Directed Evolution with Adaptive Resource Allocation. J Chem Inf Model 2023; 63:5650-5659. [PMID: 37611241 DOI: 10.1021/acs.jcim.3c00618] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/25/2023]
Abstract
Directed evolution facilitates enzyme engineering via iterative rounds of mutagenesis. Despite the wide applications of high-throughput screening, building "smart libraries" to effectively identify beneficial variants remains a major challenge in the community. Here, we developed a new computational directed evolution protocol based on EnzyHTP, a software that we have previously reported to automate enzyme modeling. To enhance the throughput efficiency, we implemented an adaptive resource allocation strategy that dynamically allocates different types of computing resources (e.g., GPU/CPU) based on the specific need of an enzyme modeling subtask in the workflow. We implemented the strategy as a Python library and tested the library using fluoroacetate dehalogenase as a model enzyme. The results show that compared to fixed resource allocation where both CPU and GPU are on-call for use during the entire workflow, applying adaptive resource allocation can save 87% CPU hours and 14% GPU hours. Furthermore, we constructed a computational directed evolution protocol under the framework of adaptive resource allocation. The workflow was tested against two rounds of mutational screening in the directed evolution experiments of Kemp eliminase (KE07) with a total of 184 mutants. Using folding stability and electrostatic stabilization energy as computational readout, we identified all four experimentally observed target variants. Enabled by the workflow, the entire computation task (i.e., 18.4 μs MD and 18,400 QM single-point calculations) completes in 3 days of wall-clock time using ∼30 GPUs and ∼1000 CPUs.
Collapse
Affiliation(s)
- Qianzhen Shao
- Department of Chemistry, Vanderbilt University, Nashville, Tennessee 37235, United States
| | - Yaoyukun Jiang
- Department of Chemistry, Vanderbilt University, Nashville, Tennessee 37235, United States
| | - Zhongyue J Yang
- Department of Chemistry, Vanderbilt University, Nashville, Tennessee 37235, United States
- Center for Structural Biology, Vanderbilt University, Nashville, Tennessee 37235, United States
- Vanderbilt Institute of Chemical Biology, Vanderbilt University, Nashville, Tennessee 37235, United States
- Data Science Institute, Vanderbilt University, Nashville, Tennessee 37235, United States
- Department of Chemical and Biomolecular Engineering, Vanderbilt University, Nashville, Tennessee 37235, United States
| |
Collapse
|
10
|
Hagg A, Kirschner KN. Open-Source Machine Learning in Computational Chemistry. J Chem Inf Model 2023; 63:4505-4532. [PMID: 37466636 PMCID: PMC10430767 DOI: 10.1021/acs.jcim.3c00643] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Indexed: 07/20/2023]
Abstract
The field of computational chemistry has seen a significant increase in the integration of machine learning concepts and algorithms. In this Perspective, we surveyed 179 open-source software projects, with corresponding peer-reviewed papers published within the last 5 years, to better understand the topics within the field being investigated by machine learning approaches. For each project, we provide a short description, the link to the code, the accompanying license type, and whether the training data and resulting models are made publicly available. Based on those deposited in GitHub repositories, the most popular employed Python libraries are identified. We hope that this survey will serve as a resource to learn about machine learning or specific architectures thereof by identifying accessible codes with accompanying papers on a topic basis. To this end, we also include computational chemistry open-source software for generating training data and fundamental Python libraries for machine learning. Based on our observations and considering the three pillars of collaborative machine learning work, open data, open source (code), and open models, we provide some suggestions to the community.
Collapse
Affiliation(s)
- Alexander Hagg
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Electrical Engineering, Mechanical Engineering and Technical Journalism, University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| | - Karl N. Kirschner
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Computer Science, University of Applied
Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| |
Collapse
|
11
|
Yan J, Li S, Zhang Y, Hao A, Zhao Q. ZetaDesign: an end-to-end deep learning method for protein sequence design and side-chain packing. Brief Bioinform 2023; 24:bbad257. [PMID: 37429578 DOI: 10.1093/bib/bbad257] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 06/05/2023] [Accepted: 06/21/2023] [Indexed: 07/12/2023] Open
Abstract
Computational protein design has been demonstrated to be the most powerful tool in the last few years among protein designing and repacking tasks. In practice, these two tasks are strongly related but often treated separately. Besides, state-of-the-art deep-learning-based methods cannot provide interpretability from an energy perspective, affecting the accuracy of the design. Here we propose a new systematic approach, including both a posterior probability and a joint probability parts, to solve the two essential questions once for all. This approach takes the physicochemical property of amino acids into consideration and uses the joint probability model to ensure the convergence between structure and amino acid type. Our results demonstrated that this method could generate feasible, high-confidence sequences with low-energy side conformations. The designed sequences can fold into target structures with high confidence and maintain relatively stable biochemical properties. The side chain conformation has a significantly lower energy landscape without delegating to a rotamer library or performing the expensive conformational searches. Overall, we propose an end-to-end method that combines the advantages of both deep learning and energy-based methods. The design results of this model demonstrate high efficiency, and precision, as well as a low energy state and good interpretability.
Collapse
Affiliation(s)
- Junyu Yan
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China
| | - Shuai Li
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China
| | - Ying Zhang
- The Key Laboratory of Cell Proliferation and Regulation Biology, Ministry of Education, College of Life Sciences, Beijing Normal University, Beijing, China
| | - Aimin Hao
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China
| | - Qinping Zhao
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China
| |
Collapse
|
12
|
Grybauskas A, Gražulis S. Building protein structure-specific rotamer libraries. Bioinformatics 2023; 39:btad429. [PMID: 37439702 PMCID: PMC10359632 DOI: 10.1093/bioinformatics/btad429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2022] [Revised: 06/19/2023] [Indexed: 07/14/2023] Open
Abstract
MOTIVATION Identifying the probable positions of the protein side-chains is one of the protein modelling steps that can improve the prediction of protein-ligand and protein-protein interactions. Most of the strategies predicting the side-chain conformations use predetermined dihedral angle lists, also called rotamer libraries, that are usually generated from a subset of high-quality protein structures. Although these methods are fast to apply, they tend to average out geometries instead of taking into account the surrounding atoms and molecules and ignore structures not included in the selected subset. Such simplifications can result in inaccuracies when predicting possible side-chain atom positions. RESULTS We propose an approach that takes into account both of these circumstances by scanning through sterically accessible side-chain conformations and generating dihedral angle libraries specific to the target proteins. The method avoids the drawbacks of lacking conformations due to unusual or rare protein structures and successfully suggests potential rotamers with average RMSD closer to the experimentally determined side-chain atom positions than other widely used rotamer libraries. AVAILABILITY AND IMPLEMENTATION The technique is implemented in open-source software package rotag and available at GitHub: https://www.github.com/agrybauskas/rotag, under GNU Lesser General Public License.
Collapse
Affiliation(s)
- Algirdas Grybauskas
- Sector of Crystallography and Cheminformatics, Institute of Biotechnology, Life Sciences Center, Vilnius University, 7 Saulėtekio Ave, Vilnius, LT- 10257, Lithuania
| | - Saulius Gražulis
- Sector of Crystallography and Cheminformatics, Institute of Biotechnology, Life Sciences Center, Vilnius University, 7 Saulėtekio Ave, Vilnius, LT- 10257, Lithuania
| |
Collapse
|
13
|
Xu G, Wang Q, Ma J. OPUS-Mut: Studying the Effect of Protein Mutation through Side-Chain Modeling. J Chem Theory Comput 2023; 19:1629-1640. [PMID: 36813264 PMCID: PMC10018731 DOI: 10.1021/acs.jctc.2c00847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/24/2023]
Abstract
Predicting the effect of protein mutation is crucial in many applications such as protein design, protein evolution, and genetic disease analysis. Structurally, mutation is basically the replacement of the side chain of a particular residue. Therefore, accurate side-chain modeling is useful in studying the effect of mutation. Here, we propose a computational method, namely, OPUS-Mut, which significantly outperforms other backbone-dependent side-chain modeling methods including our previous method OPUS-Rota4. We evaluate OPUS-Mut by four case studies on Myoglobin, p53, HIV-1 protease, and T4 lysozyme. The results show that the predicted structures of side chains of different mutants are consistent well with their experimentally determined results. In addition, when the residues with significant structural shifts upon the mutation are considered, it is found that the extent of the predicted structural shift of these affected residues can be correlated reasonably well with the functional changes of the mutant measured by experiments. OPUS-Mut can also help one to identify the harmful and benign mutations and thus may guide the construction of a protein with relatively low sequence homology but with a similar structure.
Collapse
Affiliation(s)
- Gang Xu
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China.,Zhangjiang Fudan International Innovation Center, Fudan University, Shanghai 201210, China.,Shanghai AI Laboratory, Shanghai 200030, China
| | - Qinghua Wang
- Center for Biomolecular Innovation, Harcam Biomedicines, Shanghai 200131, China
| | - Jianpeng Ma
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China.,Zhangjiang Fudan International Innovation Center, Fudan University, Shanghai 201210, China.,Shanghai AI Laboratory, Shanghai 200030, China
| |
Collapse
|
14
|
Liu J, Zhang C, Lai L. GeoPacker: A novel deep learning framework for protein side-chain modeling. Protein Sci 2022; 31:e4484. [PMID: 36309961 PMCID: PMC9667900 DOI: 10.1002/pro.4484] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 10/23/2022] [Accepted: 10/26/2022] [Indexed: 12/13/2022]
Abstract
Atomic interactions play essential roles in protein folding, structure stabilization, and function performance. Recent advances in deep learning-based methods have achieved impressive success not only in protein structure prediction, but also in protein sequence design. However, highly efficient and accurate protein side-chain prediction methods that can give detailed atomic interactions are still lacking. In the present study, we developed a deep learning based method, GeoPacker, that uses geometric deep learning coupled ResNet for protein side-chain modeling. GeoPacker explicitly represents atomic interactions with rotational and translational invariance for information extraction of relative locations. GeoPacker outperformed the state-of-the-art energy function-based methods in side-chain structure prediction accuracy and runs about 10 and 700 times faster than the deep learning-based method DLPacker and OPUS-rota4 with comparable prediction accuracy, respectively. The performance of GeoPacker does not depend on the secondary structures that the residues belong to. GeoPacker gives highly accurate predictions for buried residues in the protein core as well as protein-protein interface, making it a useful tool for protein structure modeling, protein, and interaction design.
Collapse
Affiliation(s)
- Jiale Liu
- Center for Life Sciences, Academy for Advanced Interdisciplinary StudiesPeking UniversityBeijingChina
| | - Changsheng Zhang
- BNLMS, College of Chemistry and Molecular EngineeringPeking UniversityBeijingChina
| | - Luhua Lai
- Center for Life Sciences, Academy for Advanced Interdisciplinary StudiesPeking UniversityBeijingChina
- BNLMS, College of Chemistry and Molecular EngineeringPeking UniversityBeijingChina
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary StudiesPeking UniversityBeijingChina
| |
Collapse
|
15
|
Dicks L, Wales DJ. Exploiting Sequence-Dependent Rotamer Information in Global Optimization of Proteins. J Phys Chem B 2022; 126:8381-8390. [PMID: 36257022 PMCID: PMC9623586 DOI: 10.1021/acs.jpcb.2c04647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Rotamers, namely amino acid side chain conformations common to many different peptides, can be compiled into libraries. These rotamer libraries are used in protein modeling, where the limited conformational space occupied by amino acid side chains is exploited. Here, we construct a sequence-dependent rotamer library from simulations of all possible tripeptides, which provides rotameric states dependent on adjacent amino acids. We observe significant sensitivity of rotamer populations to sequence and find that the library is successful in locating side chain conformations present in crystal structures. The library is designed for applications with basin-hopping global optimization, where we use it to propose moves in conformational space. The addition of rotamer moves significantly increases the efficiency of protein structure prediction within this framework, and we determine parameters to optimize efficiency.
Collapse
Affiliation(s)
- L. Dicks
- Yusuf
Hamied Department of Chemistry, University
of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom,IBM
Research, The Hartree Centre STFC Laboratory,
Sci-Tech Daresbury, Warrington WA4 4AD, United Kingdom
| | - D. J. Wales
- Yusuf
Hamied Department of Chemistry, University
of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom,
| |
Collapse
|
16
|
Cole CC, Misiura M, Hulgan SAH, Peterson CM, Williams JW, Kolomeisky AB, Hartgerink JD. Cation-π Interactions and Their Role in Assembling Collagen Triple Helices. Biomacromolecules 2022; 23:4645-4654. [PMID: 36239387 DOI: 10.1021/acs.biomac.2c00856] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Cation-π interactions play a significant role in the stabilization of globular proteins. However, their role in collagen triple helices is less well understood and they have rarely been used in de novo designed collagen mimetic systems. In this study, we analyze the stabilizing and destabilizing effects in pairwise amino acid interactions between cationic and aromatic residues in both axial and lateral sequential relationships. Thermal unfolding experiments demonstrated that only axial pairs are stabilizing, while the lateral pairs are uniformly destabilizing. Molecular dynamics simulations show that pairs with an axial relationship can achieve a near-ideal interaction distance, but pairs in a lateral relationship do not. Arginine-π systems were found to be more stabilizing than lysine-π and histidine-π. Arginine-π interactions were then studied in more chemically diverse ABC-type heterotrimeric helices, where arginine-tyrosine pairs were found to form the best helix. This work helps elucidate the role of cation-π interactions in triple helices and illustrates their utility in designing collagen mimetic peptides.
Collapse
Affiliation(s)
- Carson C Cole
- Department of Chemistry, Rice University, 6100 Main Street, Houston, Texas 77005, United States
| | - Mikita Misiura
- Department of Chemistry, Rice University, 6100 Main Street, Houston, Texas 77005, United States
| | - Sarah A H Hulgan
- Department of Chemistry, Rice University, 6100 Main Street, Houston, Texas 77005, United States
| | - Caroline M Peterson
- Department of Chemistry, Rice University, 6100 Main Street, Houston, Texas 77005, United States
| | - Joseph W Williams
- Department of Chemistry, Rice University, 6100 Main Street, Houston, Texas 77005, United States
| | - Anatoly B Kolomeisky
- Department of Chemistry, Rice University, 6100 Main Street, Houston, Texas 77005, United States
| | - Jeffrey D Hartgerink
- Department of Chemistry, Rice University, 6100 Main Street, Houston, Texas 77005, United States.,Department of Bioengineering, Rice University, 6100 Main Street, Houston, Texas 77005, United States
| |
Collapse
|
17
|
Protein Function Analysis through Machine Learning. Biomolecules 2022; 12:biom12091246. [PMID: 36139085 PMCID: PMC9496392 DOI: 10.3390/biom12091246] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2022] [Revised: 08/22/2022] [Accepted: 08/31/2022] [Indexed: 11/16/2022] Open
Abstract
Machine learning (ML) has been an important arsenal in computational biology used to elucidate protein function for decades. With the recent burgeoning of novel ML methods and applications, new ML approaches have been incorporated into many areas of computational biology dealing with protein function. We examine how ML has been integrated into a wide range of computational models to improve prediction accuracy and gain a better understanding of protein function. The applications discussed are protein structure prediction, protein engineering using sequence modifications to achieve stability and druggability characteristics, molecular docking in terms of protein–ligand binding, including allosteric effects, protein–protein interactions and protein-centric drug discovery. To quantify the mechanisms underlying protein function, a holistic approach that takes structure, flexibility, stability, and dynamics into account is required, as these aspects become inseparable through their interdependence. Another key component of protein function is conformational dynamics, which often manifest as protein kinetics. Computational methods that use ML to generate representative conformational ensembles and quantify differences in conformational ensembles important for function are included in this review. Future opportunities are highlighted for each of these topics.
Collapse
|
18
|
Xu G, Wang Y, Wang Q, Ma J. Studying protein-protein interaction through side-chain modeling method OPUS-Mut. Brief Bioinform 2022; 23:6663639. [PMID: 35959990 DOI: 10.1093/bib/bbac330] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2022] [Revised: 07/17/2022] [Accepted: 07/20/2022] [Indexed: 12/12/2022] Open
Abstract
Protein side chains are vitally important to many biological processes such as protein-protein interaction. In this study, we evaluate the performance of our previous released side-chain modeling method OPUS-Mut, together with some other methods, on three oligomer datasets, CASP14 (11), CAMEO-Homo (65) and CAMEO-Hetero (21). The results show that OPUS-Mut outperforms other methods measured by all residues or by the interfacial residues. We also demonstrate our method on evaluating protein-protein docking pose on a dataset Oligomer-Dock (75) created using the top 10 predictions from ZDOCK 3.0.2. Our scoring function correctly identifies the native pose as the top-1 in 45 out of 75 targets. Different from traditional scoring functions, our method is based on the overall side-chain packing favorableness in accordance with the local packing environment. It emphasizes the significance of side chains and provides a new and effective scoring term for studying protein-protein interaction.
Collapse
Affiliation(s)
- Gang Xu
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China.,Zhangjiang Fudan International Innovation Center, Fudan University, Shanghai 201210, China.,Shanghai AI Laboratory, Shanghai 200030, China
| | - Yilin Wang
- Georgetown Preparatory School, North Bethesda, MD 20852, USA
| | - Qinghua Wang
- Center for Biomolecular Innovation, Harcam Biomedicines, Shanghai, China
| | - Jianpeng Ma
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China.,Zhangjiang Fudan International Innovation Center, Fudan University, Shanghai 201210, China.,Shanghai AI Laboratory, Shanghai 200030, China
| |
Collapse
|