1
|
Zheng W, Wuyun Q, Li Y, Liu Q, Zhou X, Peng C, Zhu Y, Freddolino L, Zhang Y. Deep-learning-based single-domain and multidomain protein structure prediction with D-I-TASSER. Nat Biotechnol 2025:10.1038/s41587-025-02654-4. [PMID: 40410405 DOI: 10.1038/s41587-025-02654-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2024] [Accepted: 03/26/2025] [Indexed: 05/25/2025]
Abstract
The dominant success of deep learning techniques on protein structure prediction has challenged the necessity and usefulness of traditional force field-based folding simulations. We proposed a hybrid approach, deep-learning-based iterative threading assembly refinement (D-I-TASSER), which constructs atomic-level protein structural models by integrating multisource deep learning potentials with iterative threading fragment assembly simulations. D-I-TASSER introduces a domain splitting and assembly protocol for the automated modeling of large multidomain protein structures. Benchmark tests and the most recent critical assessment of protein structure prediction, 15 experiments demonstrate that D-I-TASSER outperforms AlphaFold2 and AlphaFold3 on both single-domain and multidomain proteins. Large-scale folding experiments further show that D-I-TASSER could fold 81% of protein domains and 73% of full-chain sequences in the human proteome with results highly complementary to recently released models by AlphaFold2. These results highlight a new avenue to integrate deep learning with classical physics-based folding simulations for high-accuracy protein structure and function predictions that are usable in genome-wide applications.
Collapse
Affiliation(s)
- Wei Zheng
- NITFID, School of Statistics and Data Science, AAIS, LPMC and KLMDASR, Nankai University, Tianjin, China
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Qiqige Wuyun
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI, USA
| | - Yang Li
- Cancer Science Institute of Singapore, National University of Singapore, Singapore, Singapore
| | - Quancheng Liu
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Xiaogen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Chunxiang Peng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Yiheng Zhu
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Lydia Freddolino
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, USA.
| | - Yang Zhang
- Cancer Science Institute of Singapore, National University of Singapore, Singapore, Singapore.
- Department of Computer Science, School of Computing, National University of Singapore, Singapore, Singapore.
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
| |
Collapse
|
2
|
Zhang J, Xiong Y. PackPPI: An integrated framework for protein-protein complex side-chain packing and ΔΔG prediction based on diffusion model. Protein Sci 2025; 34:e70110. [PMID: 40260988 PMCID: PMC12012842 DOI: 10.1002/pro.70110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2024] [Revised: 03/07/2025] [Accepted: 03/17/2025] [Indexed: 04/24/2025]
Abstract
Deep learning methods have played an increasingly pivotal role in advancing side-chain packing and mutation effect prediction (ΔΔG) for protein complexes. Although these two tasks are inherently closely related, they are typically treated separately in practice. Furthermore, the lack of effective post-processing in most approaches results in sub-optimal refinement of generated conformations, limiting the plausibility of the predicted conformations. In this study, we introduce an integrated framework, PackPPI, which employs a diffusion model and a proximal optimization algorithm to improve side-chain prediction for protein complexes while using learned representations to predict ΔΔG. The results demonstrate that PackPPI achieved the lowest atom RMSD (0.9822) on the CASP15 dataset. The proximal optimization algorithm effectively reduces spatial clashes between side-chain atoms while maintaining a low-energy landscape. Furthermore, PackPPI achieves state-of-the-art performance in predicting binding affinity changes induced by multi-point mutations on the SKEMPI v2.0 dataset. These findings underscore the potential of PackPPI as a robust and versatile computational tool for protein design and engineering. The implementation of PackPPI is available at https://github.com/Jackz915/PackPPI.
Collapse
Affiliation(s)
- Jingkai Zhang
- State Key Laboratory of BiocontrolSchool of Life Sciences, Sun Yat‐sen UniversityGuangzhouChina
| | - Yuanyan Xiong
- State Key Laboratory of BiocontrolSchool of Life Sciences, Sun Yat‐sen UniversityGuangzhouChina
| |
Collapse
|
3
|
Zhang O, Liu ZH, Forman-Kay JD, Head-Gordon T. Deep Learning of Proteins with Local and Global Regions of Disorder. ARXIV 2025:arXiv:2502.11326v2. [PMID: 40034137 PMCID: PMC11875298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Although machine learning has transformed protein structure prediction of folded protein ground states with remarkable accuracy, intrinsically disordered proteins and regions (IDPs/IDRs) are defined by diverse and dynamical structural ensembles that are predicted with low confidence by algorithms such as AlphaFold. We present a new machine learning method, IDPForge (Intrinsically Disordered Protein, FOlded and disordered Region GEnerator), that exploits a transformer protein language diffusion model to create all-atom IDP ensembles and IDR disordered ensembles that maintains the folded domains. IDPForge does not require sequence-specific training, back transformations from coarse-grained representations, nor ensemble reweighting, as in general the created IDP/IDR conformational ensembles show good agreement with solution experimental data, and options for biasing with experimental restraints are provided if desired. We envision that IDPForge with these diverse capabilities will facilitate integrative and structural studies for proteins that contain intrinsic disorder.
Collapse
|
4
|
Norton T, Bhattacharya D. Sifting through the noise: A survey of diffusion probabilistic models and their applications to biomolecules. J Mol Biol 2025; 437:168818. [PMID: 39389290 PMCID: PMC11885034 DOI: 10.1016/j.jmb.2024.168818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Revised: 09/20/2024] [Accepted: 10/03/2024] [Indexed: 10/12/2024]
Abstract
Diffusion probabilistic models have made their way into a number of high-profile applications since their inception. In particular, there has been a wave of research into using diffusion models in the prediction and design of biomolecular structures and sequences. Their growing ubiquity makes it imperative for researchers in these fields to understand them. This paper serves as a general overview for the theory behind these models and the current state of research. We first introduce diffusion models and discuss common motifs used when applying them to biomolecules. We then present the significant outcomes achieved through the application of these models in generative and predictive tasks. This survey aims to provide readers with a comprehensive understanding of the increasingly critical role of diffusion models.
Collapse
Affiliation(s)
- Trevor Norton
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States
| | | |
Collapse
|
5
|
Vangaru S, Bhattacharya D. To pack or not to pack: revisiting protein side-chain packing in the post-AlphaFold era. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.22.639681. [PMID: 40060396 PMCID: PMC11888329 DOI: 10.1101/2025.02.22.639681] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 03/20/2025]
Abstract
Motivation Protein side-chain packing (PSCP), the problem of predicting side-chain conformation given a fixed backbone structure, has important implications in modeling of structures and interactions. However, despite the groundbreaking progress in protein structure prediction pioneered by AlphaFold, the existing PSCP methods still rely on experimental inputs, and do not leverage AlphaFold-predicted backbone coordinates to enable PSCP at scale. Results Here, we perform a large-scale benchmarking of the predictive performance of various PSCP methods on public datasets from multiple rounds of the Critical Assessment of Structure Prediction (CASP) challenges using a diverse set of evaluation metrics. Empirical results demonstrate that the PSCP methods perform well in packing the side-chains with experimental inputs, but they fail to generalize in repacking AlphaFold-generated structures. We additionally explore the effectiveness of leveraging the self-assessment confidence scores from AlphaFold by implementing a backbone confidence-aware integrative approach. While such a protocol often leads to performance improvement by attaining modest yet statistically significant accuracy gains over the AlphaFold baseline, it does not yield consistent and pronounced improvements. Our study highlights the recent advances and remaining challenges in PSCP in the post-AlphaFold era. Availability The code and raw data are freely available at https://github.com/Bhattacharya-Lab/PackBench.
Collapse
Affiliation(s)
- Sriniketh Vangaru
- Department of Computer Science, Virginia Tech, Blacksburg, 24061, Virginia, USA
| | | |
Collapse
|
6
|
Hehlert P, Effertz T, Gu RX, Nadrowski B, Geurten BRH, Beutner D, de Groot BL, Göpfert MC. NOMPC ion channel hinge forms a gating spring that initiates mechanosensation. Nat Neurosci 2025; 28:259-267. [PMID: 39762662 DOI: 10.1038/s41593-024-01849-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Accepted: 11/12/2024] [Indexed: 02/08/2025]
Abstract
The sensation of mechanical stimuli is initiated by elastic gating springs that pull open mechanosensory transduction channels. Searches for gating springs have focused on force-conveying protein tethers such as the amino-terminal ankyrin tether of the Drosophila mechanosensory transduction channel NOMPC. Here, by combining protein domain duplications with mechanical measurements, electrophysiology, molecular dynamics simulations and modeling, we identify the NOMPC gating-spring as the short linker between the ankyrin tether and the channel gate. This linker acts as a Hookean hinge that is ten times more elastic than the tether, with the linker hinge dictating channel gating and the intrinsic stiffness of the gating spring. Our study shows how mechanosensation is initiated molecularly; disentangles gating springs and tethers, and respective paradigms of channel gating; and puts forward gating springs as core ion channel constituents that enable efficient gating by diverse stimuli and in a wide variety of channels.
Collapse
Affiliation(s)
- Philip Hehlert
- Department of Cellular Neurobiology, University of Göttingen, Göttingen, Germany
| | - Thomas Effertz
- Department of Otorhinolaryngology, Head and Neck Surgery and InnerEarLab, University Medical Center Göttingen, Göttingen, Germany
| | - Ruo-Xu Gu
- Computational Biomolecular Dynamics Group, Max Planck Institute for Multidisciplinary Sciences, Göttingen, Germany
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Björn Nadrowski
- Department of Cellular Neurobiology, University of Göttingen, Göttingen, Germany
| | - Bart R H Geurten
- Department of Cellular Neurobiology, University of Göttingen, Göttingen, Germany
- Department of Zoology, University of Otago, Dunedin, New Zealand
| | - Dirk Beutner
- Department of Otorhinolaryngology, Head and Neck Surgery and InnerEarLab, University Medical Center Göttingen, Göttingen, Germany
| | - Bert L de Groot
- Computational Biomolecular Dynamics Group, Max Planck Institute for Multidisciplinary Sciences, Göttingen, Germany
- Multiscale Bioimaging Cluster of Excellence (MBExC), University of Göttingen, Göttingen, Germany
| | - Martin C Göpfert
- Department of Cellular Neurobiology, University of Göttingen, Göttingen, Germany.
- Multiscale Bioimaging Cluster of Excellence (MBExC), University of Göttingen, Göttingen, Germany.
| |
Collapse
|
7
|
Vinterbladh I, Soussi RH, Forsman J, Bouhallab S, Lund M. Strong electrostatic attraction drives milk heteroprotein complex coacervation. Int J Biol Macromol 2025; 286:137790. [PMID: 39603294 DOI: 10.1016/j.ijbiomac.2024.137790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2024] [Revised: 11/06/2024] [Accepted: 11/15/2024] [Indexed: 11/29/2024]
Abstract
Coacervates of oppositely charged milk proteins are used in functional food development, mainly to encapsulate bioactives. To uncover the driving forces behind coacervates formation, we study the association of lactoferrin and β-lactoglobulin at amino-acid level detail, using molecular simulations. Our findings show that inter-protein electrostatic interactions dominate and are, surprisingly, equally divided between an isotropic part, due to monopole-monopole attraction of the oppositely charged proteins, and an anisotropic part due to uneven surface charge distributions. In good agreement with recent experimental association constants, the calculated protein-protein interaction free energy is strongly dependent on pH and salt concentration. In addition to thermodynamics, we also investigate amino acid contacts in microstates of trimeric and pentameric protein complexes, and identify interaction hot-spots that drive heteroprotein complex coacervation process.
Collapse
Affiliation(s)
- Isabel Vinterbladh
- Division of Computational Chemistry, Lund University, Naturvetarvägen 24, SE-223 62 Lund, Sweden.
| | - Rima Hachfi Soussi
- INRAE, Institut Agro, STLO, 65 Rue de Saint Brieuc, 35042 Rennes, France; Université Paris-Saclay, CNRS, Institut Galien Paris-Saclay, 91400 Orsay, France
| | - Jan Forsman
- Division of Computational Chemistry, Lund University, Naturvetarvägen 24, SE-223 62 Lund, Sweden
| | - Said Bouhallab
- INRAE, Institut Agro, STLO, 65 Rue de Saint Brieuc, 35042 Rennes, France
| | - Mikael Lund
- Division of Computational Chemistry, Lund University, Naturvetarvägen 24, SE-223 62 Lund, Sweden; LINXS - Institute of advanced Neutron and X-ray Science, Lund University, Scheelevägen 19, 223 70 SE-Lund, Sweden.
| |
Collapse
|
8
|
Gut JA, Lemmin T. Dissecting AlphaFold2's capabilities with limited sequence information. BIOINFORMATICS ADVANCES 2024; 5:vbae187. [PMID: 39846081 PMCID: PMC11751578 DOI: 10.1093/bioadv/vbae187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/06/2024] [Revised: 10/24/2024] [Accepted: 11/21/2024] [Indexed: 01/24/2025]
Abstract
Summary Protein structure prediction aims to infer a protein's three-dimensional (3D) structure from its amino acid sequence. Protein structure is pivotal for elucidating protein functions, interactions, and driving biotechnological innovation. The deep learning model AlphaFold2, has revolutionized this field by leveraging phylogenetic information from multiple sequence alignments (MSAs) to achieve remarkable accuracy in protein structure prediction. However, a key question remains: how well does AlphaFold2 understand protein structures? This study investigates AlphaFold2's capabilities when relying primarily on high-quality template structures, without the additional information provided by MSAs. By designing experiments that probe local and global structural understanding, we aimed to dissect its dependence on specific features and its ability to handle missing information. Our findings revealed AlphaFold2's reliance on sterically valid C β for correctly interpreting structural templates. Additionally, we observed its remarkable ability to recover 3D structures from certain perturbations and the negligible impact of the previous structure in recycling. Collectively, these results support the hypothesis that AlphaFold2 has learned an accurate biophysical energy function. However, this function seems most effective for local interactions. Our work advances understanding of how deep learning models predict protein structures and provides guidance for researchers aiming to overcome limitations in these models. Availability and implementation Data and implementation are available at https://github.com/ibmm-unibe-ch/template-analysis.
Collapse
Affiliation(s)
- Jannik Adrian Gut
- Institute of Biochemistry and Molecular Medicine, University of Bern, Bern 3012, Switzerland
- Graduate School for Cellular and Biomedical Sciences (GCB), University of Bern, Bern 3012, Switzerland
| | - Thomas Lemmin
- Institute of Biochemistry and Molecular Medicine, University of Bern, Bern 3012, Switzerland
| |
Collapse
|
9
|
Randolph NZ, Kuhlman B. Invariant point message passing for protein side chain packing. Proteins 2024; 92:1220-1233. [PMID: 38790143 PMCID: PMC11511640 DOI: 10.1002/prot.26705] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 04/19/2024] [Accepted: 05/13/2024] [Indexed: 05/26/2024]
Abstract
Protein side chain packing (PSCP) is a fundamental problem in the field of protein engineering, as high-confidence and low-energy conformations of amino acid side chains are crucial for understanding (and designing) protein folding, protein-protein interactions, and protein-ligand interactions. Traditional PSCP methods (such as the Rosetta Packer) often rely on a library of discrete side chain conformations, or rotamers, and a forcefield to guide the structure to low-energy conformations. Recently, deep learning (DL) based methods (such as DLPacker, AttnPacker, and DiffPack) have demonstrated state-of-the-art predictions and speed in the PSCP task. Building off the success of geometric graph neural networks for protein modeling, we present the Protein Invariant Point Packer (PIPPack) which effectively processes local structural and sequence information to produce realistic, idealized side chain coordinates using χ -angle distribution predictions and geometry-aware invariant point message passing (IPMP). On a test set of ∼1400 high-quality protein chains, PIPPack is highly competitive with other state-of-the-art PSCP methods in rotamer recovery and per-residue RMSD but is significantly faster.
Collapse
Affiliation(s)
- Nicholas Z Randolph
- Department of Bioinformatics and Computational Biology, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
- Department of Biochemistry and Biophysics, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
| | - Brian Kuhlman
- Department of Bioinformatics and Computational Biology, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
- Department of Biochemistry and Biophysics, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
| |
Collapse
|
10
|
Liu J, Guo Z, You H, Zhang C, Lai L. All-Atom Protein Sequence Design Based on Geometric Deep Learning. Angew Chem Int Ed Engl 2024:e202411461. [PMID: 39295564 DOI: 10.1002/anie.202411461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Revised: 09/09/2024] [Accepted: 09/18/2024] [Indexed: 09/21/2024]
Abstract
Designing sequences for specific protein backbones is a key step in creating new functional proteins. Here, we introduce GeoSeqBuilder, a deep learning framework that integrates protein sequence generation with side chain conformation prediction to produce the complete all-atom structures for designed sequences. GeoSeqBuilder uses spatial geometric features from protein backbones and explicitly includes three-body interactions of neighboring residues. GeoSeqBuilder achieves native residue type recovery rate of 51.6 %, comparable to ProteinMPNN and other leading methods, while accurately predicting side chain conformations. We first used GeoSeqBuilder to design sequences for thioredoxin and a hallucinated three-helical bundle protein. All the 15 tested sequences expressed as soluble monomeric proteins with high thermal stability, and the 2 high-resolution crystal structures solved closely match the designed models. The generated protein sequences exhibit low similarity (minimum 23 %) to the original sequences, with significantly altered hydrophobic cores. We further redesigned the hydrophobic core of glutathione peroxidase 4, and 3 of the 5 designs showed improved enzyme activity. Although further testing is needed, the high experimental success rate in our testing demonstrates that GeoSeqBuilder is a powerful tool for designing novel sequences for predefined protein structures with atomic details. GeoSeqBuilder is available at https://github.com/PKUliujl/GeoSeqBuilder.
Collapse
Affiliation(s)
- Jiale Liu
- Center for Life Sciences Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China
| | - Zheng Guo
- Center for Life Sciences Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China
| | - Hantian You
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing, 100871, China
| | - Changsheng Zhang
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing, 100871, China
| | - Luhua Lai
- Center for Life Sciences Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China
- BNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing, 100871, China
- Center for Quantitative Biology Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China
- Chengdu Academy for Advanced Interdisciplinary Biotechnologies, Peking University, Chengdu, 510100, Sichuan, China
| |
Collapse
|
11
|
Bastida A, Zúñiga J, Fogolari F, Soler MA. Statistical accuracy of molecular dynamics-based methods for sampling conformational ensembles of disordered proteins. Phys Chem Chem Phys 2024; 26:23213-23227. [PMID: 39190324 DOI: 10.1039/d4cp02564d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/28/2024]
Abstract
The characterization of the statistical ensemble of conformations of intrinsically disordered regions (IDRs) is a great challenge both from experimental and computational points of view. In this respect, a number of protocols have been developed using molecular dynamics (MD) simulations to sample the huge conformational space of the molecule. In this work, we consider one of the best methods available, replica exchange solute tempering (REST), as a reference to compare the results obtained using this method with the results obtained using other methods, in terms of experimentally measurable quantities. Along with the methods assessed, we propose here a novel protocol called probabilistic MD chain growth (PMD-CG), which combines the flexible-meccano and hierarchical chain growth methods with the statistical data obtained from tripeptide MD trajectories as the starting point. The system chosen for testing is a 20-residue region from the C-terminal domain of the p53 tumor suppressor protein (p53-CTD). Our results show that PMD-CG provides an ensemble of conformations extremely quickly, after suitable computation of the conformational pool for all peptide triplets of the IDR sequence. The measurable quantities computed on the ensemble of conformations agree well with those based on the REST conformational ensemble.
Collapse
Affiliation(s)
- Adolfo Bastida
- Departamento de Química Física, Universidad de Murcia, 30100 Murcia, Spain.
| | - José Zúñiga
- Departamento de Química Física, Universidad de Murcia, 30100 Murcia, Spain.
| | - Federico Fogolari
- Dipartimento di Scienze Matematiche, Informatiche e Fisiche, Università di Udine, 33100 Udine, Italy.
| | - Miguel A Soler
- Dipartimento di Scienze Matematiche, Informatiche e Fisiche, Università di Udine, 33100 Udine, Italy.
| |
Collapse
|
12
|
Xu G, Luo Z, Yan Y, Wang Q, Ma J. OPUS-Rota5: A highly accurate protein side-chain modeling method with 3D-Unet and RotaFormer. Structure 2024; 32:1001-1010.e2. [PMID: 38657613 DOI: 10.1016/j.str.2024.03.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Revised: 02/06/2024] [Accepted: 03/28/2024] [Indexed: 04/26/2024]
Abstract
Accurate protein side-chain modeling is crucial for protein folding and design. This is particularly true for molecular docking as ligands primarily interact with side chains. In this study, we introduce a two-stage side-chain modeling approach called OPUS-Rota5. It leverages a modified 3D-Unet to capture the local environmental features, including ligand information of each residue, and then employs the RotaFormer module to aggregate various types of features. Evaluation on three test sets, including recently released targets from CAMEO and CASP15, shows that OPUS-Rota5 significantly outperforms some other leading side-chain modeling methods. We also employ OPUS-Rota5 to refine the side chains of 25 G protein-coupled receptor targets predicted by AlphaFold2 and achieve a significantly improved success rate in a subsequent "back" docking of their natural ligands. Therefore, OPUS-Rota5 is a useful and effective tool for molecular docking, particularly for targets with relatively accurate predicted backbones but not side chains such as high-homology targets.
Collapse
Affiliation(s)
- Gang Xu
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China; Zhangjiang Fudan International Innovation Center, Fudan University, Shanghai 201210, China; Shanghai AI Laboratory, Shanghai 200030, China
| | - Zhenwei Luo
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China; Zhangjiang Fudan International Innovation Center, Fudan University, Shanghai 201210, China; Shanghai AI Laboratory, Shanghai 200030, China
| | - Yaming Yan
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China; Zhangjiang Fudan International Innovation Center, Fudan University, Shanghai 201210, China
| | - Qinghua Wang
- Center for Biomolecular Innovation, Harcam Biomedicines, Shanghai 200131, China
| | - Jianpeng Ma
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China; Zhangjiang Fudan International Innovation Center, Fudan University, Shanghai 201210, China; Shanghai AI Laboratory, Shanghai 200030, China.
| |
Collapse
|
13
|
Zhang O, Naik SA, Liu ZH, Forman-Kay J, Head-Gordon T. A curated rotamer library for common post-translational modifications of proteins. Bioinformatics 2024; 40:btae444. [PMID: 38995731 PMCID: PMC11254353 DOI: 10.1093/bioinformatics/btae444] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2024] [Revised: 06/06/2024] [Accepted: 07/11/2024] [Indexed: 07/14/2024] Open
Abstract
MOTIVATION Sidechain rotamer libraries of the common amino acids of a protein are useful for folded protein structure determination and for generating ensembles of intrinsically disordered proteins (IDPs). However, much of protein function is modulated beyond the translated sequence through the introduction of post-translational modifications (PTMs). RESULTS In this work, we have provided a curated set of side chain rotamers for the most common PTMs derived from the RCSB PDB database, including phosphorylated, methylated, and acetylated sidechains. Our rotamer libraries improve upon existing methods such as SIDEpro, Rosetta, and AlphaFold3 in predicting the experimental structures for PTMs in folded proteins. In addition, we showcase our PTM libraries in full use by generating ensembles with the Monte Carlo Side Chain Entropy (MCSCE) for folded proteins, and combining MCSCE with the Local Disordered Region Sampling algorithms within IDPConformerGenerator for proteins with intrinsically disordered regions. AVAILABILITY AND IMPLEMENTATION The codes for dihedral angle computations and library creation are available at https://github.com/THGLab/ptm_sc.git.
Collapse
Affiliation(s)
- Oufan Zhang
- Kenneth S. Pitzer Center for Theoretical Chemistry, University of California, Berkeley, CA 94720, United States
| | - Shubhankar A Naik
- Department of Chemistry, University of California, Berkeley, CA 94720, United States
| | - Zi Hao Liu
- Molecular Medicine Program, Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
- Department of Biochemistry, University of Toronto, Toronto, ON M5S 1A8, Canada
| | - Julie Forman-Kay
- Molecular Medicine Program, Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
- Department of Biochemistry, University of Toronto, Toronto, ON M5S 1A8, Canada
| | - Teresa Head-Gordon
- Kenneth S. Pitzer Center for Theoretical Chemistry, University of California, Berkeley, CA 94720, United States
- Department of Chemistry, University of California, Berkeley, CA 94720, United States
- Department of Bioengineering, University of California, Berkeley, CA 94720, United States
- Department of Chemical and Biomolecular Engineering, University of California, Berkeley, CA 94720, United States
| |
Collapse
|
14
|
Gautheron J, Elsayed S, Pistorio V, Lockhart S, Zammouri J, Auclair M, Koulman A, Meadows SR, Lhomme M, Ponnaiah M, Si-Bouazza R, Fabrega S, Belkadi A, Qatar Genome Project, Delaunay JL, Aït-Slimane T, Fève B, Vigouroux C, Abdel Ghaffar TY, O’Rahilly S, Jéru I. ADH1B, the adipocyte-enriched alcohol dehydrogenase, plays an essential, cell-autonomous role in human adipogenesis. Proc Natl Acad Sci U S A 2024; 121:e2319301121. [PMID: 38838011 PMCID: PMC11181076 DOI: 10.1073/pnas.2319301121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Accepted: 05/06/2024] [Indexed: 06/07/2024] Open
Abstract
Alcohol dehydrogenase 1B (ADH1B) is a primate-specific enzyme which, uniquely among the ADH class 1 family, is highly expressed both in adipose tissue and liver. Its expression in adipose tissue is reduced in obesity and increased by insulin stimulation. Interference with ADH1B expression has also been reported to impair adipocyte function. To better understand the role of ADH1B in adipocytes, we used CRISPR/Cas9 to delete ADH1B in human adipose stem cells (ASC). Cells lacking ADH1B failed to differentiate into mature adipocytes manifested by minimal triglyceride accumulation and a marked reduction in expression of established adipocyte markers. As ADH1B is capable of converting retinol to retinoic acid (RA), we conducted rescue experiments. Incubation of ADH1B-deficient preadipocytes with 9-cis-RA, but not with all-transretinol, significantly rescued their ability to accumulate lipids and express markers of adipocyte differentiation. A homozygous missense variant in ADH1B (p.Arg313Cys) was found in a patient with congenital lipodystrophy of unknown cause. This variant significantly impaired the protein's dimerization, enzymatic activity, and its ability to rescue differentiation in ADH1B-deficient ASC. The allele frequency of this variant in the Middle Eastern population suggests that it is unlikely to be a fully penetrant cause of severe lipodystrophy. In conclusion, ADH1B appears to play an unexpected, crucial and cell-autonomous role in human adipocyte differentiation by serving as a necessary source of endogenous retinoic acid.
Collapse
Affiliation(s)
- Jérémie Gautheron
- Centre de Recherche Saint-Antoine, Sorbonne Université-Inserm, Paris75012, France
- Foundation for Innovation in Cardiometabolism and Nutrition, Paris75013, France
| | - Solaf Elsayed
- Medical Genetics Department, Faculty of Medicine, Ain Shams University, Cairo11566, Egypt
| | - Valeria Pistorio
- Centre de Recherche Saint-Antoine, Sorbonne Université-Inserm, Paris75012, France
- Foundation for Innovation in Cardiometabolism and Nutrition, Paris75013, France
| | - Sam Lockhart
- Wellcome Trust-Medical Research Council Institute of Metabolic Science, University of Cambridge, Cambridge CB2 1TN, United Kingdom
| | - Jamila Zammouri
- Centre de Recherche Saint-Antoine, Sorbonne Université-Inserm, Paris75012, France
- Foundation for Innovation in Cardiometabolism and Nutrition, Paris75013, France
| | - Martine Auclair
- Centre de Recherche Saint-Antoine, Sorbonne Université-Inserm, Paris75012, France
- Foundation for Innovation in Cardiometabolism and Nutrition, Paris75013, France
| | - Albert Koulman
- Wellcome Trust-Medical Research Council Institute of Metabolic Science, University of Cambridge, Cambridge CB2 1TN, United Kingdom
| | - Sarah R. Meadows
- Wellcome Trust-Medical Research Council Institute of Metabolic Science, University of Cambridge, Cambridge CB2 1TN, United Kingdom
| | - Marie Lhomme
- Omics Lipidomics, Foundation for Innovation in Cardiometabolism and Nutrition, Paris75013, France
| | - Maharajah Ponnaiah
- Data sciences unit, Foundation for Innovation in Cardiometabolism and Nutrition, Paris75013, France
| | - Redouane Si-Bouazza
- Viral Vector and Gene Transfer Platform, Structure Federative de Recherche Necker, Université Paris Cité, Paris75015, France
| | - Sylvie Fabrega
- Viral Vector and Gene Transfer Platform, Structure Federative de Recherche Necker, Université Paris Cité, Paris75015, France
| | - Abdelaziz Belkadi
- Bioinformatics Core, Weill Cornell Medicine-Qatar, Education City, Doha24144, Qatar
| | - Qatar Genome Project
- Qatar Genome Program, Foundation Research, Development and Innovation, Qatar Foundation, Doha24144, Qatar
| | - Jean-Louis Delaunay
- Centre de Recherche Saint-Antoine, Sorbonne Université-Inserm, Paris75012, France
- Foundation for Innovation in Cardiometabolism and Nutrition, Paris75013, France
| | - Tounsia Aït-Slimane
- Centre de Recherche Saint-Antoine, Sorbonne Université-Inserm, Paris75012, France
- Foundation for Innovation in Cardiometabolism and Nutrition, Paris75013, France
| | - Bruno Fève
- Centre de Recherche Saint-Antoine, Sorbonne Université-Inserm, Paris75012, France
- Foundation for Innovation in Cardiometabolism and Nutrition, Paris75013, France
- Centre National de Référence des Pathologies Rares de l’Insulino-Sécrétion et de l’Insulino-Sensibilité, Service de Diabétologie et Endocrinologie de la Reproduction, Hôpital Saint-Antoine, Assistance Publique-Hôpitaux de Paris, Paris75012, France
| | - Corinne Vigouroux
- Centre de Recherche Saint-Antoine, Sorbonne Université-Inserm, Paris75012, France
- Foundation for Innovation in Cardiometabolism and Nutrition, Paris75013, France
- Centre National de Référence des Pathologies Rares de l’Insulino-Sécrétion et de l’Insulino-Sensibilité, Service de Diabétologie et Endocrinologie de la Reproduction, Hôpital Saint-Antoine, Assistance Publique-Hôpitaux de Paris, Paris75012, France
| | | | - Stephen O’Rahilly
- Wellcome Trust-Medical Research Council Institute of Metabolic Science, University of Cambridge, Cambridge CB2 1TN, United Kingdom
| | - Isabelle Jéru
- Centre de Recherche Saint-Antoine, Sorbonne Université-Inserm, Paris75012, France
- Foundation for Innovation in Cardiometabolism and Nutrition, Paris75013, France
- Medical Genetics Unit, Biology, Genomics and Hygiene Medical-University Department, Pitié-Salpêtrière Hospital, Sorbonne Université, Assistance Publique-Hôpitaux de Paris, Paris75013, France
| |
Collapse
|
15
|
Robeson L, Casanova‐Morales N, Burgos‐Bravo F, Alfaro‐Valdés HM, Lesch R, Ramírez‐Álvarez C, Valdivia‐Delgado M, Vega M, Matute RA, Schekman R, Wilson CAM. Characterization of the interaction between the Sec61 translocon complex and ppαF using optical tweezers. Protein Sci 2024; 33:e4996. [PMID: 38747383 PMCID: PMC11094780 DOI: 10.1002/pro.4996] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Revised: 04/03/2024] [Accepted: 04/05/2024] [Indexed: 05/19/2024]
Abstract
The Sec61 translocon allows the translocation of secretory preproteins from the cytosol to the endoplasmic reticulum lumen during polypeptide biosynthesis. These proteins possess an N-terminal signal peptide (SP) which docks at the translocon. SP mutations can abolish translocation and cause diseases, suggesting an essential role for this SP/Sec61 interaction. However, a detailed biophysical characterization of this binding is still missing. Here, optical tweezers force spectroscopy was used to characterize the kinetic parameters of the dissociation process between Sec61 and the SP of prepro-alpha-factor. The unbinding parameters including off-rate constant and distance to the transition state were obtained by fitting rupture force data to Dudko-Hummer-Szabo models. Interestingly, the translocation inhibitor mycolactone increases the off-rate and accelerates the SP/Sec61 dissociation, while also weakening the interaction. Whereas the translocation deficient mutant containing a single point mutation in the SP abolished the specificity of the SP/Sec61 binding, resulting in an unstable interaction. In conclusion, we characterize quantitatively the dissociation process between the signal peptide and the translocon, and how the unbinding parameters are modified by a translocation inhibitor.
Collapse
Affiliation(s)
- Luka Robeson
- Departamento de Bioquímica y Biología Molecular, Facultad de Ciencias Químicas y FarmacéuticasUniversidad de ChileSantiagoChile
| | - Nathalie Casanova‐Morales
- Departamento de Bioquímica y Biología Molecular, Facultad de Ciencias Químicas y FarmacéuticasUniversidad de ChileSantiagoChile
- Facultad de Artes LiberalesUniversidad Adolfo IbáñezSantiagoChile
| | - Francesca Burgos‐Bravo
- Departamento de Bioquímica y Biología Molecular, Facultad de Ciencias Químicas y FarmacéuticasUniversidad de ChileSantiagoChile
- California Institute for Quantitative Biosciences, Howard Hughes Medical InstituteUniversity of CaliforniaBerkeleyCaliforniaUSA
| | - Hilda M. Alfaro‐Valdés
- Departamento de Bioquímica y Biología Molecular, Facultad de Ciencias Químicas y FarmacéuticasUniversidad de ChileSantiagoChile
| | - Robert Lesch
- Department of Molecular and Cellular Biology, Howard Hughes Medical InstituteUniversity of CaliforniaBerkeleyCaliforniaUSA
| | - Carolina Ramírez‐Álvarez
- Departamento de Bioquímica y Biología Molecular, Facultad de Ciencias Químicas y FarmacéuticasUniversidad de ChileSantiagoChile
| | - Mauricio Valdivia‐Delgado
- Departamento de Bioquímica y Biología Molecular, Facultad de Ciencias Químicas y FarmacéuticasUniversidad de ChileSantiagoChile
| | - Marcela Vega
- Departamento de Bioquímica y Biología Molecular, Facultad de Ciencias Químicas y FarmacéuticasUniversidad de ChileSantiagoChile
| | - Ricardo A. Matute
- Centro Integrativo de Biología y Química Aplicada (CIBQA)Universidad Bernardo O'HigginsSantiagoChile
- Division of Chemistry and Chemical EngineeringCalifornia Institute of TechnologyPasadenaCaliforniaUSA
| | - Randy Schekman
- Department of Molecular and Cellular Biology, Howard Hughes Medical InstituteUniversity of CaliforniaBerkeleyCaliforniaUSA
| | - Christian A. M. Wilson
- Departamento de Bioquímica y Biología Molecular, Facultad de Ciencias Químicas y FarmacéuticasUniversidad de ChileSantiagoChile
| |
Collapse
|
16
|
Rochat J, Blavier A, Ruet S, Vasseur S, Puma A, Desnous B, Chan V, Delmont E, Attarian S, Juntas Morales R, Quadrio I, Vidoni L, Bonello-Palot N, Cheillan D. Functional and Molecular Characterization of New SPTLC1 Missense Variants in Patients with Hereditary Sensory and Autonomic Neuropathy Type 1 (HSAN1). Genes (Basel) 2024; 15:692. [PMID: 38927628 PMCID: PMC11203308 DOI: 10.3390/genes15060692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Revised: 05/16/2024] [Accepted: 05/23/2024] [Indexed: 06/28/2024] Open
Abstract
Hereditary sensory and autonomic neuropathy type 1 is an autosomal dominant neuropathy caused by the SPTLC1 or SPTLC2 variants. These variants modify the preferred substrate of serine palmitoyl transferase, responsible for the first step of de novo sphingolipids synthesis, leading to accumulation of cytotoxic deoxysphingolipids. Diagnosis of HSAN1 is based on clinical symptoms, mainly progressive loss of distal sensory keep, and genetic analysis. Aim: Identifying new SPTLC1 or SPTLC2 "gain-of-function" variants raises the question as to their pathogenicity. This work focused on characterizing six new SPTLC1 variants using in silico prediction tools, new meta-scores, 3D modeling, and functional testing to establish their pathogenicity. Methods: Variants from six patients with HSAN1 were studied. In silico, CADD and REVEL scores and the 3D modeling software MITZLI were used to characterize the pathogenic effect of the variants. Functional tests based on plasma sphingolipids quantification (total deoxysphinganine, ceramides, and dihydroceramides) were performed by tandem mass spectrometry. Results: In silico predictors did not provide very contrasting results when functional tests discriminated the different variants according to their impact on deoxysphinganine level or canonical sphingolipids synthesis. Two SPTLC1 variants were newly described as pathogenic: SPTLC1 NM_006415.4:c.998A>G and NM_006415.4:c.1015G>A. Discussion: The combination of the different tools provides arguments to establish the pathogenicity of these new variants. When available, functional testing remains the best option to establish the in vivo impact of a variant. Moreover, the comprehension of metabolic dysregulation offers opportunities to develop new therapeutic strategies for these genetic disorders.
Collapse
Affiliation(s)
- Julie Rochat
- Unité Pathologies Métaboliques, Érythrocytaires et Dépistage Périnatal, Service de Biochimie et Biologie Moléculaire, Centre de Biologie et de Pathologie Est, Hospices Civils de Lyon, 69500 Bron, France; (J.R.); (S.R.); (S.V.)
| | | | - Séverine Ruet
- Unité Pathologies Métaboliques, Érythrocytaires et Dépistage Périnatal, Service de Biochimie et Biologie Moléculaire, Centre de Biologie et de Pathologie Est, Hospices Civils de Lyon, 69500 Bron, France; (J.R.); (S.R.); (S.V.)
| | - Sophie Vasseur
- Unité Pathologies Métaboliques, Érythrocytaires et Dépistage Périnatal, Service de Biochimie et Biologie Moléculaire, Centre de Biologie et de Pathologie Est, Hospices Civils de Lyon, 69500 Bron, France; (J.R.); (S.R.); (S.V.)
| | - Angela Puma
- Service Système Nerveux Périphérique et Muscle, Université Côte d’Azur, Centre Hospitalier Universitaire Nice, 06000 Nice, France;
| | - Béatrice Desnous
- Centre de Référence des Maladies Neuromusculaires de l’Enfant, Hôpital Timone Enfants, Assistance Publique Hôpitaux de Marseille 13915 Marseille, France;
| | - Victor Chan
- Service de Neurologie et Unité Neuro-Vasculaire, Centre Hospitalier de Valence, 26953 Valence, France;
| | - Emilien Delmont
- Centre de Référence des Maladies Neuromusculaires et SLA, Hôpital de la Timone, Assistance Publique Hôpitaux de Marseille, 13915 Marseille, France; (E.D.); (S.A.)
| | - Shahram Attarian
- Centre de Référence des Maladies Neuromusculaires et SLA, Hôpital de la Timone, Assistance Publique Hôpitaux de Marseille, 13915 Marseille, France; (E.D.); (S.A.)
| | - Raul Juntas Morales
- Centre de Reference des Maladies Neuromusculaires Atlantique Occitanie Caraïbe, Département de Neurologie, Centre Hospitalier Universitaire Montpellier, 34295 Montpellier, France;
| | - Isabelle Quadrio
- Unité Neurogénétique Moléculaire, Service de Biochimie et Biologie Moléculaire, Centre de Biologie et de Pathologie Est, Hospices Civils de Lyon, 69500 Bron, France; (I.Q.); (L.V.)
| | - Léo Vidoni
- Unité Neurogénétique Moléculaire, Service de Biochimie et Biologie Moléculaire, Centre de Biologie et de Pathologie Est, Hospices Civils de Lyon, 69500 Bron, France; (I.Q.); (L.V.)
| | - Nathalie Bonello-Palot
- Département de Génétique Médicale, Hôpital Timone Enfants, Assistance Publique Hôpitaux de Marseille, 13915 Marseille, France;
| | - David Cheillan
- Unité Pathologies Métaboliques, Érythrocytaires et Dépistage Périnatal, Service de Biochimie et Biologie Moléculaire, Centre de Biologie et de Pathologie Est, Hospices Civils de Lyon, 69500 Bron, France; (J.R.); (S.R.); (S.V.)
- Laboratoire Carmen INSERM INRAE, Centre Hospitalier Lyon Sud, 69310 Pierre Bénite, France
| |
Collapse
|
17
|
Chen X, Liu J, Park N, Cheng J. A Survey of Deep Learning Methods for Estimating the Accuracy of Protein Quaternary Structure Models. Biomolecules 2024; 14:574. [PMID: 38785981 PMCID: PMC11117562 DOI: 10.3390/biom14050574] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Revised: 04/07/2024] [Accepted: 05/09/2024] [Indexed: 05/25/2024] Open
Abstract
The quality prediction of quaternary structure models of a protein complex, in the absence of its true structure, is known as the Estimation of Model Accuracy (EMA). EMA is useful for ranking predicted protein complex structures and using them appropriately in biomedical research, such as protein-protein interaction studies, protein design, and drug discovery. With the advent of more accurate protein complex (multimer) prediction tools, such as AlphaFold2-Multimer and ESMFold, the estimation of the accuracy of protein complex structures has attracted increasing attention. Many deep learning methods have been developed to tackle this problem; however, there is a noticeable absence of a comprehensive overview of these methods to facilitate future development. Addressing this gap, we present a review of deep learning EMA methods for protein complex structures developed in the past several years, analyzing their methodologies, data and feature construction. We also provide a prospective summary of some potential new developments for further improving the accuracy of the EMA methods.
Collapse
Affiliation(s)
- Xiao Chen
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Jian Liu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
- NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, USA
| | - Nolan Park
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
- NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
18
|
Zhang O, Naik SA, Liu ZH, Forman-Kay J, Head-Gordon T. A Curated Rotamer Library for Common Post-Translational Modifications of Proteins. ARXIV 2024:arXiv:2405.03120v1. [PMID: 38764597 PMCID: PMC11100909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/21/2024]
Abstract
Sidechain rotamer libraries of the common amino acids of a protein are useful for folded protein structure determination and for generating ensembles of intrinsically disordered proteins (IDPs). However much of protein function is modulated beyond the translated sequence through thFiguree introduction of post-translational modifications (PTMs). In this work we have provided a curated set of side chain rotamers for the most common PTMs derived from the RCSB PDB database, including phosphorylated, methylated, and acetylated sidechains. Our rotamer libraries improve upon existing methods such as SIDEpro and Rosetta in predicting the experimental structures for PTMs in folded proteins. In addition, we showcase our PTM libraries in full use by generating ensembles with the Monte Carlo Side Chain Entropy (MCSCE) for folded proteins, and combining MCSCE with the Local Disordered Region Sampling algorithms within IDPConformerGenerator for proteins with intrinsically disordered regions.
Collapse
Affiliation(s)
- Oufan Zhang
- Kenneth S. Pitzer Center for Theoretical Chemistry, University of California, Berkeley, Berkeley, California 94720, USA
| | - Shubhankar A. Naik
- Department of Chemistry, University of California, Berkeley, Berkeley, California 94720, USA
| | - Zi Hao Liu
- Molecular Medicine Program, Hospital for Sick Children, Toronto, Ontario M5G 0A4, Canada
- Department of Biochemistry, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| | - Julie Forman-Kay
- Molecular Medicine Program, Hospital for Sick Children, Toronto, Ontario M5G 0A4, Canada
- Department of Biochemistry, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| | - Teresa Head-Gordon
- Kenneth S. Pitzer Center for Theoretical Chemistry, University of California, Berkeley, Berkeley, California 94720, USA
- Department of Chemistry, University of California, Berkeley, Berkeley, California 94720, USA
- Department of Bioengineering, University of California, Berkeley, Berkeley, California 94720, USA
- Department of Chemical and Biomolecular Engineering, University of California, Berkeley, Berkeley, California 94720, USA
| |
Collapse
|
19
|
Wu KE, Yang KK, van den Berg R, Alamdari S, Zou JY, Lu AX, Amini AP. Protein structure generation via folding diffusion. Nat Commun 2024; 15:1059. [PMID: 38316764 PMCID: PMC10844308 DOI: 10.1038/s41467-024-45051-2] [Citation(s) in RCA: 31] [Impact Index Per Article: 31.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 01/12/2024] [Indexed: 02/07/2024] Open
Abstract
The ability to computationally generate novel yet physically foldable protein structures could lead to new biological discoveries and new treatments targeting yet incurable diseases. Despite recent advances in protein structure prediction, directly generating diverse, novel protein structures from neural networks remains difficult. In this work, we present a diffusion-based generative model that generates protein backbone structures via a procedure inspired by the natural folding process. We describe a protein backbone structure as a sequence of angles capturing the relative orientation of the constituent backbone atoms, and generate structures by denoising from a random, unfolded state towards a stable folded structure. Not only does this mirror how proteins natively twist into energetically favorable conformations, the inherent shift and rotational invariance of this representation crucially alleviates the need for more complex equivariant networks. We train a denoising diffusion probabilistic model with a simple transformer backbone and demonstrate that our resulting model unconditionally generates highly realistic protein structures with complexity and structural patterns akin to those of naturally-occurring proteins. As a useful resource, we release an open-source codebase and trained models for protein structure diffusion.
Collapse
Affiliation(s)
- Kevin E Wu
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
| | | | | | | | - James Y Zou
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
| | - Alex X Lu
- Microsoft Research, Cambridge, MA, USA
| | | |
Collapse
|
20
|
Vasilyeva TA, Sukhanova NV, Khalanskaya OV, Marakhonov AV, Prokhorov NS, Kadyshev VV, Skryabin NA, Kutsev SI, Zinchenko RA. An Unusual Presentation of Novel Missense Variant in PAX6 Gene: NM_000280.4:c.341A>G, p.(Asn114Ser). Curr Issues Mol Biol 2023; 46:96-105. [PMID: 38248310 PMCID: PMC10814852 DOI: 10.3390/cimb46010008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 12/18/2023] [Accepted: 12/20/2023] [Indexed: 01/23/2024] Open
Abstract
This study investigates a unique and complex eye phenotype characterized by minimal iris defects, foveal hypoplasia, optic nerve coloboma, and severe posterior segment damage. Through genetic analysis and bioinformatic tools, a specific nonsynonymous substitution, p.(Asn114Ser), within the PAX6 gene's paired domain is identified. Although this substitution is not in direct contact with DNA, its predicted stabilizing effect on the protein structure challenges the traditional understanding of PAX6 mutations, suggesting a gain-of-function mechanism. Contrary to classical loss-of-function effects, this gain-of-function hypothesis aligns with research demonstrating PAX6's dosage sensitivity. Gain-of-function mutations, though less common, can lead to diverse phenotypes distinct from aniridia. Our findings emphasize PAX6's multifaceted influence on ocular phenotypes and the importance of genetic variations. We contribute a new perspective on PAX6 mutations by suggesting a potential gain-of-function mechanism and showcasing the complexities of ocular development. This study sheds light on the intricate interplay of the genetic alterations and regulatory mechanisms underlying complex eye phenotypes. Further research, validation, and collaboration are crucial to unravel the nuanced interactions shaping ocular health and development.
Collapse
Affiliation(s)
- Tatyana A. Vasilyeva
- Research Centre for Medical Genetics, 115522 Moscow, Russia; (T.A.V.); (N.V.S.); (O.V.K.); (V.V.K.); (S.I.K.); (R.A.Z.)
| | - Natella V. Sukhanova
- Research Centre for Medical Genetics, 115522 Moscow, Russia; (T.A.V.); (N.V.S.); (O.V.K.); (V.V.K.); (S.I.K.); (R.A.Z.)
| | - Olga V. Khalanskaya
- Research Centre for Medical Genetics, 115522 Moscow, Russia; (T.A.V.); (N.V.S.); (O.V.K.); (V.V.K.); (S.I.K.); (R.A.Z.)
| | - Andrey V. Marakhonov
- Research Centre for Medical Genetics, 115522 Moscow, Russia; (T.A.V.); (N.V.S.); (O.V.K.); (V.V.K.); (S.I.K.); (R.A.Z.)
| | - Nikolai S. Prokhorov
- Department of Molecular and Cellular Biochemistry, Indiana University, Bloomington, IN 47405, USA;
| | - Vitaly V. Kadyshev
- Research Centre for Medical Genetics, 115522 Moscow, Russia; (T.A.V.); (N.V.S.); (O.V.K.); (V.V.K.); (S.I.K.); (R.A.Z.)
| | - Nikolay A. Skryabin
- Research Institute of Medical Genetics, Tomsk National Research Medical Center of the Russian Academy of Sciences, 634050 Tomsk, Russia;
| | - Sergey I. Kutsev
- Research Centre for Medical Genetics, 115522 Moscow, Russia; (T.A.V.); (N.V.S.); (O.V.K.); (V.V.K.); (S.I.K.); (R.A.Z.)
| | - Rena A. Zinchenko
- Research Centre for Medical Genetics, 115522 Moscow, Russia; (T.A.V.); (N.V.S.); (O.V.K.); (V.V.K.); (S.I.K.); (R.A.Z.)
| |
Collapse
|
21
|
Olechnovič K, Venclovas Č. VoroIF-GNN: Voronoi tessellation-derived protein-protein interface assessment using a graph neural network. Proteins 2023; 91:1879-1888. [PMID: 37482904 DOI: 10.1002/prot.26554] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Revised: 06/19/2023] [Accepted: 07/01/2023] [Indexed: 07/25/2023]
Abstract
We present VoroIF-GNN (Voronoi InterFace Graph Neural Network), a novel method for assessing inter-subunit interfaces in a structural model of a protein-protein complex, relying solely on the input structure without any additional information. Given a multimeric protein structural model, we derive interface contacts from the Voronoi tessellation of atomic balls, construct a graph of those contacts, and predict the accuracy of every contact using an attention-based GNN. The contact-level predictions are then summarized to produce whole interface-level scores. VoroIF-GNN was blindly tested for its ability to estimate the accuracy of protein complexes during CASP15 and showed strong performance in selecting the best multimeric model out of many. The method implementation is freely available at https://kliment-olechnovic.github.io/voronota/expansion_js/.
Collapse
Affiliation(s)
- Kliment Olechnovič
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Česlovas Venclovas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| |
Collapse
|
22
|
Zheng W, Wuyun Q, Freddolino PL, Zhang Y. Integrating deep learning, threading alignments, and a multi-MSA strategy for high-quality protein monomer and complex structure prediction in CASP15. Proteins 2023; 91:1684-1703. [PMID: 37650367 PMCID: PMC10840719 DOI: 10.1002/prot.26585] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 08/04/2023] [Accepted: 08/14/2023] [Indexed: 09/01/2023]
Abstract
We report the results of the "UM-TBM" and "Zheng" groups in CASP15 for protein monomer and complex structure prediction. These prediction sets were obtained using the D-I-TASSER and DMFold-Multimer algorithms, respectively. For monomer structure prediction, D-I-TASSER introduced four new features during CASP15: (i) a multiple sequence alignment (MSA) generation protocol that combines multi-source MSA searching and a structural modeling-based MSA ranker; (ii) attention-network based spatial restraints; (iii) a multi-domain module containing domain partition and arrangement for domain-level templates and spatial restraints; (iv) an optimized I-TASSER-based folding simulation system for full-length model creation guided by a combination of deep learning restraints, threading alignments, and knowledge-based potentials. For 47 free modeling targets in CASP15, the final models predicted by D-I-TASSER showed average TM-score 19% higher than the standard AlphaFold2 program. We thus showed that traditional Monte Carlo-based folding simulations, when appropriately coupled with deep learning algorithms, can generate models with improved accuracy over end-to-end deep learning methods alone. For protein complex structure prediction, DMFold-Multimer generated models by integrating a new MSA generation algorithm (DeepMSA2) with the end-to-end modeling module from AlphaFold2-Multimer. For the 38 complex targets, DMFold-Multimer generated models with an average TM-score of 0.83 and Interface Contact Score of 0.60, both significantly higher than those of competing complex prediction tools. Our analyses on complexes highlighted the critical role played by MSA generating, ranking, and pairing in protein complex structure prediction. We also discuss future room for improvement in the areas of viral protein modeling and complex model ranking.
Collapse
Affiliation(s)
- Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Qiqige Wuyun
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Peter L Freddolino
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan 48109, USA
- Department of Computer Science, School of Computing, National University of Singapore, 117417 Singapore
- Cancer Science Institute of Singapore, National University of Singapore, 117599, Singapore
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, 117596, Singapore
| |
Collapse
|
23
|
Visani GM, Galvin W, Pun MN, Nourmohammad A. H-Packer: Holographic Rotationally Equivariant Convolutional Neural Network for Protein Side-Chain Packing. ARXIV 2023:arXiv:2311.09312v2. [PMID: 38013891 PMCID: PMC10680869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Accurately modeling protein 3D structure is essential for the design of functional proteins. An important sub-task of structure modeling is protein side-chain packing: predicting the conformation of side-chains (rotamers) given the protein's backbone structure and amino-acid sequence. Conventional approaches for this task rely on expensive sampling procedures over hand-crafted energy functions and rotamer libraries. Recently, several deep learning methods have been developed to tackle the problem in a data-driven way, albeit with vastly different formulations (from image-to-image translation to directly predicting atomic coordinates). Here, we frame the problem as a joint regression over the side-chains' true degrees of freedom: the dihedral χ angles. We carefully study possible objective functions for this task, while accounting for the underlying symmetries of the task. We propose Holographic Packer (H-Packer), a novel two-stage algorithm for side-chain packing built on top of two light-weight rotationally equivariant neural networks. We evaluate our method on CASP13 and CASP14 targets. H-Packer is computationally efficient and shows favorable performance against conventional physics-based algorithms and is competitive against alternative deep learning solutions.
Collapse
Affiliation(s)
- Gian Marco Visani
- Paul G. Allen School of Computer Science and Engineering, University of Washington
| | - William Galvin
- Paul G. Allen School of Computer Science and Engineering, University of Washington
| | | | - Armita Nourmohammad
- Paul G. Allen School of Computer Science and Engineering, University of Washington
- Department of Physics, University of Washington
- Department of Applied Mathematics, University of Washington
- Fred Hutch Cancer Research Center, Seattle, WA
| |
Collapse
|
24
|
Hoffman J, Tan H, Sandoval-Cooper C, de Villiers K, Reed SM. GTExome: Modeling commonly expressed missense mutations in the human genome. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.14.567143. [PMID: 38014287 PMCID: PMC10680684 DOI: 10.1101/2023.11.14.567143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
A web application, GTExome, is described that quickly identifies, classifies, and models missense mutations in commonly expressed human proteins. GTExome can be used to categorize genomic mutation data with tissue specific expression data from the Genotype-Tissue Expression (GTEx) project. Commonly expressed missense mutations in proteins from a wide range of tissue types can be selected and assessed for modeling suitability. Information about the consequences of each mutation is provided to the user including if disulfide bonds, hydrogen bonds, or salt bridges are broken, buried prolines introduced, buried charges are created or lost, charge is swapped, a buried glycine is replaced, or if the residue that would be removed is a proline in the cis configuration. Also, if the mutation site is in a binding pocket the number of pockets and their volumes are reported. The user can assess this information and then select from available experimental or computationally predicted structures of native proteins to create, visualize, and download a model of the mutated protein using Fast and Accurate Side-chain Protein Repacking (FASPR). For AlphaFold modeled proteins, confidence scores for native proteins are provided. Using this tool, we explored a set of 9,666 common missense mutations from a variety of tissues from GTEx and show that most mutations can be modeled using this tool to facilitate studies of protein-protein and protein-drug interactions. The open-source tool is freely available at https://pharmacogenomics.clas.ucdenver.edu/gtexome/.
Collapse
Affiliation(s)
| | | | | | | | - Scott M. Reed
- Department of Chemistry, Department of Chemistry, University of Colorado Denver, 1151 Arapahoe St., Denver, CO 80204 USA
| |
Collapse
|
25
|
Yan J, Li S, Zhang Y, Hao A, Zhao Q. ZetaDesign: an end-to-end deep learning method for protein sequence design and side-chain packing. Brief Bioinform 2023; 24:bbad257. [PMID: 37429578 DOI: 10.1093/bib/bbad257] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 06/05/2023] [Accepted: 06/21/2023] [Indexed: 07/12/2023] Open
Abstract
Computational protein design has been demonstrated to be the most powerful tool in the last few years among protein designing and repacking tasks. In practice, these two tasks are strongly related but often treated separately. Besides, state-of-the-art deep-learning-based methods cannot provide interpretability from an energy perspective, affecting the accuracy of the design. Here we propose a new systematic approach, including both a posterior probability and a joint probability parts, to solve the two essential questions once for all. This approach takes the physicochemical property of amino acids into consideration and uses the joint probability model to ensure the convergence between structure and amino acid type. Our results demonstrated that this method could generate feasible, high-confidence sequences with low-energy side conformations. The designed sequences can fold into target structures with high confidence and maintain relatively stable biochemical properties. The side chain conformation has a significantly lower energy landscape without delegating to a rotamer library or performing the expensive conformational searches. Overall, we propose an end-to-end method that combines the advantages of both deep learning and energy-based methods. The design results of this model demonstrate high efficiency, and precision, as well as a low energy state and good interpretability.
Collapse
Affiliation(s)
- Junyu Yan
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China
| | - Shuai Li
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China
| | - Ying Zhang
- The Key Laboratory of Cell Proliferation and Regulation Biology, Ministry of Education, College of Life Sciences, Beijing Normal University, Beijing, China
| | - Aimin Hao
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China
| | - Qinping Zhao
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China
| |
Collapse
|
26
|
Hameduh T, Mokry M, Miller AD, Heger Z, Haddad Y. Solvent Accessibility Promotes Rotamer Errors during Protein Modeling with Major Side-Chain Prediction Programs. J Chem Inf Model 2023. [PMID: 37410883 PMCID: PMC10369486 DOI: 10.1021/acs.jcim.3c00134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/08/2023]
Abstract
Side-chain rotamer prediction is one of the most critical late stages in protein 3D structure building. Highly advanced and specialized algorithms (e.g., FASPR, RASP, SCWRL4, and SCWRL4v) optimize this process by use of rotamer libraries, combinatorial searches, and scoring functions. We seek to identify the sources of key rotamer errors as a basis for correcting and improving the accuracy of protein modeling going forward. In order to evaluate the aforementioned programs, we process 2496 high-quality single-chained all-atom filtered 30% homology protein 3D structures and use discretized rotamer analysis to compare original with calculated structures. Among 513,024 filtered residue records, increased amino acid residue-dependent rotamer errors─associated in particular with polar and charged amino acid residues (ARG, LYS, and GLN)─clearly correlate with increased amino acid residue solvent accessibility and an increased residue tendency toward the adoption of non-canonical off rotamers which modeling programs struggle to predict accurately. Understanding the impact of solvent accessibility now appears key to improved side-chain prediction accuracies.
Collapse
Affiliation(s)
- Tareq Hameduh
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemědělská 1665/1, CZ-613 00 Brno, Czech Republic
| | - Michal Mokry
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemědělská 1665/1, CZ-613 00 Brno, Czech Republic
| | - Andrew D Miller
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemědělská 1665/1, CZ-613 00 Brno, Czech Republic
- Veterinary Research Institute, Hudcova 296/70, CZ-621 00 Brno, Czech Republic
- KP Therapeutics (Europe) s.r.o., Purkyňova 649/127, CZ-612 00 Brno, Czech Republic
| | - Zbynek Heger
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemědělská 1665/1, CZ-613 00 Brno, Czech Republic
| | - Yazan Haddad
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemědělská 1665/1, CZ-613 00 Brno, Czech Republic
| |
Collapse
|
27
|
Zhang C. BeEM: fast and faithful conversion of mmCIF format structure files to PDB format. BMC Bioinformatics 2023; 24:260. [PMID: 37340457 DOI: 10.1186/s12859-023-05388-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 06/16/2023] [Indexed: 06/22/2023] Open
Abstract
BACKGROUND Although mmCIF is the current official format for deposition of protein and nucleic acid structures to the protein data bank (PDB) database, the legacy PDB format is still the primary supported format for many structural bioinformatics tools. Therefore, reliable software to convert mmCIF structure files to PDB files is needed. Unfortunately, existing conversion programs fail to correctly convert many mmCIF files, especially those with many atoms and/or long chain identifies. RESULTS This study proposed BeEM, which converts any mmCIF format structure files to PDB format. BeEM conversion faithfully retains all atomic and chain information, including chain IDs with more than 2 characters, which are not supported by any existing mmCIF to PDB converters. The conversion speed of BeEM is at least ten times faster than existing converters such as MAXIT and Phenix. Part of the reason for the speed improvement is the avoidance of conversion between numerical values and text strings. CONCLUSION BeEM is a fast and accurate tool for mmCIF-to-PDB format conversion, which is a common procedure in structural biology. The source code is available under the BSD licence at https://github.com/kad-ecoli/BeEM/ .
Collapse
Affiliation(s)
- Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
28
|
McPartlon M, Xu J. An end-to-end deep learning method for protein side-chain packing and inverse folding. Proc Natl Acad Sci U S A 2023; 120:e2216438120. [PMID: 37253017 PMCID: PMC10266014 DOI: 10.1073/pnas.2216438120] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Accepted: 04/24/2023] [Indexed: 06/01/2023] Open
Abstract
Protein side-chain packing (PSCP), the task of determining amino acid side-chain conformations given only backbone atom positions, has important applications to protein structure prediction, refinement, and design. Many methods have been proposed to tackle this problem, but their speed or accuracy is still unsatisfactory. To address this, we present AttnPacker, a deep learning (DL) method for directly predicting protein side-chain coordinates. Unlike existing methods, AttnPacker directly incorporates backbone 3D geometry to simultaneously compute all side-chain coordinates without delegating to a discrete rotamer library or performing expensive conformational search and sampling steps. This enables a significant increase in computational efficiency, decreasing inference time by over 100× compared to the DL-based method DLPacker and physics-based RosettaPacker. Tested on the CASP13 and CASP14 native and nonnative protein backbones, AttnPacker computes physically realistic side-chain conformations, reducing steric clashes and improving both rmsd and dihedral accuracy compared to state-of-the-art methods SCWRL4, FASPR, RosettaPacker, and DLPacker. Different from traditional PSCP approaches, AttnPacker can also codesign sequences and side chains, producing designs with subnative Rosetta energy and high in silico consistency.
Collapse
Affiliation(s)
- Matthew McPartlon
- Department of Computer Science, Physical Sciences, The University of Chicago, Chicago, IL60637
| | - Jinbo Xu
- Toyota Technical Institute of Chicago, Chicago, IL60637
- MoleculeMind Inc., Beijing100086, China
| |
Collapse
|
29
|
Huang X, Zhou J, Yang D, Zhang J, Xia X, Chen YE, Xu J. Decoding CRISPR-Cas PAM recognition with UniDesign. Brief Bioinform 2023; 24:bbad133. [PMID: 37078688 PMCID: PMC10199764 DOI: 10.1093/bib/bbad133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Revised: 02/09/2023] [Accepted: 03/16/2023] [Indexed: 04/21/2023] Open
Abstract
The critical first step in Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-associated (CRISPR-Cas) protein-mediated gene editing is recognizing a preferred protospacer adjacent motif (PAM) on target DNAs by the protein's PAM-interacting amino acids (PIAAs). Thus, accurate computational modeling of PAM recognition is useful in assisting CRISPR-Cas engineering to relax or tighten PAM requirements for subsequent applications. Here, we describe a universal computational protein design framework (UniDesign) for designing protein-nucleic acid interactions. As a proof of concept, we applied UniDesign to decode the PAM-PIAA interactions for eight Cas9 and two Cas12a proteins. We show that, given native PIAAs, the UniDesign-predicted PAMs are largely identical to the natural PAMs of all Cas proteins. In turn, given natural PAMs, the computationally redesigned PIAA residues largely recapitulated the native PIAAs (74% and 86% in terms of identity and similarity, respectively). These results demonstrate that UniDesign faithfully captures the mutual preference between natural PAMs and native PIAAs, suggesting it is a useful tool for engineering CRISPR-Cas and other nucleic acid-interacting proteins. UniDesign is open-sourced at https://github.com/tommyhuangthu/UniDesign.
Collapse
Affiliation(s)
- Xiaoqiang Huang
- Center for Advanced Models for Translational Sciences and Therapeutics, Department of Internal Medicine, University of Michigan Medical School, 2800 Plymouth Road, Ann Arbor, MI 48109, USA
| | - Jun Zhou
- Center for Advanced Models for Translational Sciences and Therapeutics, Department of Internal Medicine, University of Michigan Medical School, 2800 Plymouth Road, Ann Arbor, MI 48109, USA
| | - Dongshan Yang
- Center for Advanced Models for Translational Sciences and Therapeutics, Department of Internal Medicine, University of Michigan Medical School, 2800 Plymouth Road, Ann Arbor, MI 48109, USA
| | - Jifeng Zhang
- Center for Advanced Models for Translational Sciences and Therapeutics, Department of Internal Medicine, University of Michigan Medical School, 2800 Plymouth Road, Ann Arbor, MI 48109, USA
| | - Xiaofeng Xia
- Research & Development, ATGC Inc., 100 E Lancaster Avenue, LIMR Building Lab 129, Wynnewood, PA 19096, USA
| | - Yuqing Eugene Chen
- Center for Advanced Models for Translational Sciences and Therapeutics, Department of Internal Medicine, University of Michigan Medical School, 2800 Plymouth Road, Ann Arbor, MI 48109, USA
| | - Jie Xu
- Center for Advanced Models for Translational Sciences and Therapeutics, Department of Internal Medicine, University of Michigan Medical School, 2800 Plymouth Road, Ann Arbor, MI 48109, USA
| |
Collapse
|
30
|
Mufassirin MMM, Newton MAH, Sattar A. Artificial intelligence for template-free protein structure prediction: a comprehensive review. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10350-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
31
|
Liu J, Zhang C, Lai L. GeoPacker: A novel deep learning framework for protein side-chain modeling. Protein Sci 2022; 31:e4484. [PMID: 36309961 PMCID: PMC9667900 DOI: 10.1002/pro.4484] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 10/23/2022] [Accepted: 10/26/2022] [Indexed: 12/13/2022]
Abstract
Atomic interactions play essential roles in protein folding, structure stabilization, and function performance. Recent advances in deep learning-based methods have achieved impressive success not only in protein structure prediction, but also in protein sequence design. However, highly efficient and accurate protein side-chain prediction methods that can give detailed atomic interactions are still lacking. In the present study, we developed a deep learning based method, GeoPacker, that uses geometric deep learning coupled ResNet for protein side-chain modeling. GeoPacker explicitly represents atomic interactions with rotational and translational invariance for information extraction of relative locations. GeoPacker outperformed the state-of-the-art energy function-based methods in side-chain structure prediction accuracy and runs about 10 and 700 times faster than the deep learning-based method DLPacker and OPUS-rota4 with comparable prediction accuracy, respectively. The performance of GeoPacker does not depend on the secondary structures that the residues belong to. GeoPacker gives highly accurate predictions for buried residues in the protein core as well as protein-protein interface, making it a useful tool for protein structure modeling, protein, and interaction design.
Collapse
Affiliation(s)
- Jiale Liu
- Center for Life Sciences, Academy for Advanced Interdisciplinary StudiesPeking UniversityBeijingChina
| | - Changsheng Zhang
- BNLMS, College of Chemistry and Molecular EngineeringPeking UniversityBeijingChina
| | - Luhua Lai
- Center for Life Sciences, Academy for Advanced Interdisciplinary StudiesPeking UniversityBeijingChina
- BNLMS, College of Chemistry and Molecular EngineeringPeking UniversityBeijingChina
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary StudiesPeking UniversityBeijingChina
| |
Collapse
|
32
|
Zheng W, Du Z, Ko SB, Wickramasinghe N, Yang S. Incorporation of D 2O-Induced Fluorine Chemical Shift Perturbations into Ensemble-Structure Characterization of the ERalpha Disordered Region. J Phys Chem B 2022; 126:9176-9186. [PMID: 36331868 PMCID: PMC10066504 DOI: 10.1021/acs.jpcb.2c05456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Structural characterization of intrinsically disordered proteins (IDPs) requires a concerted effort between experiments and computations by accounting for their conformational heterogeneity. Given the diversity of experimental tools providing local and global structural information, constructing an experimental restraint-satisfying structural ensemble remains challenging. Here, we use the disordered N-terminal domain (NTD) of the estrogen receptor alpha (ERalpha) as a model system to combine existing small-angle X-ray scattering (SAXS) and hydroxyl radical protein footprinting (HRPF) data and newly acquired solvent accessibility data via D2O-induced fluorine chemical shifting (DFCS) measurements. A new set of DFCS data for the solvent exposure of a set of 12 amino acid positions were added to complement previously acquired HRPF measurements for the solvent exposure of the other 16 nonoverlapping amino acids, thereby improving the NTD ensemble characterization considerably. We also found that while choosing an initial ensemble of structures generated from a different atomic-level force field or sampling/modeling method can lead to distinct contact maps even when the same sets of experimental measurements were used for ensemble-fitting, comparative analyses from these initial ensembles reveal commonly recurring structural features in their ensemble-averaged contact map. Specifically, nonlocal or long-range transient interactions were found consistently between the N-terminal segments and the central region, sufficient to mediate the conformational ensemble and regulate how the NTD interacts with its coactivator proteins.
Collapse
Affiliation(s)
- Wenwei Zheng
- College of Integrative Sciences and Arts, Arizona State University, Mesa, Arizona 85212, United States
| | - Zhanwen Du
- Center for Proteomics and Department of Nutrition, School of Medicine, Case Western Reserve University, Cleveland, Ohio, 44106, United States
| | - Soo Bin Ko
- Center for Proteomics and Department of Nutrition, School of Medicine, Case Western Reserve University, Cleveland, Ohio, 44106, United States
| | - Nalinda Wickramasinghe
- Chemistry-NMR Facility, Case Western Reserve University, Cleveland, Ohio 44106, United States
| | - Sichun Yang
- Center for Proteomics and Department of Nutrition, School of Medicine, Case Western Reserve University, Cleveland, Ohio, 44106, United States
| |
Collapse
|
33
|
Dicks L, Wales DJ. Exploiting Sequence-Dependent Rotamer Information in Global Optimization of Proteins. J Phys Chem B 2022; 126:8381-8390. [PMID: 36257022 PMCID: PMC9623586 DOI: 10.1021/acs.jpcb.2c04647] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Rotamers, namely amino acid side chain conformations common to many different peptides, can be compiled into libraries. These rotamer libraries are used in protein modeling, where the limited conformational space occupied by amino acid side chains is exploited. Here, we construct a sequence-dependent rotamer library from simulations of all possible tripeptides, which provides rotameric states dependent on adjacent amino acids. We observe significant sensitivity of rotamer populations to sequence and find that the library is successful in locating side chain conformations present in crystal structures. The library is designed for applications with basin-hopping global optimization, where we use it to propose moves in conformational space. The addition of rotamer moves significantly increases the efficiency of protein structure prediction within this framework, and we determine parameters to optimize efficiency.
Collapse
Affiliation(s)
- L. Dicks
- Yusuf
Hamied Department of Chemistry, University
of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom,IBM
Research, The Hartree Centre STFC Laboratory,
Sci-Tech Daresbury, Warrington WA4 4AD, United Kingdom
| | - D. J. Wales
- Yusuf
Hamied Department of Chemistry, University
of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom,
| |
Collapse
|
34
|
Characterization of Treponema denticola Major Surface Protein (Msp) by Deletion Analysis and Advanced Molecular Modeling. J Bacteriol 2022; 204:e0022822. [PMID: 35913147 PMCID: PMC9487533 DOI: 10.1128/jb.00228-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
Treponema denticola, a keystone pathogen in periodontitis, is a model organism for studying Treponema physiology and host-microbe interactions. Its major surface protein Msp forms an oligomeric outer membrane complex that binds fibronectin, has cytotoxic pore-forming activity, and disrupts several intracellular processes in host cells. T. denticola msp is an ortholog of the Treponema pallidum tprA to -K gene family that includes tprK, whose remarkable in vivo hypervariability is proposed to contribute to T. pallidum immune evasion. We recently identified the primary Msp surface-exposed epitope and proposed a model of the Msp protein as a β-barrel protein similar to Gram-negative bacterial porins. Here, we report fine-scale Msp mutagenesis demonstrating that both the N and C termini as well as the centrally located Msp surface epitope are required for native Msp oligomer expression. Removal of as few as three C-terminal amino acids abrogated Msp detection on the T. denticola cell surface, and deletion of four residues resulted in complete loss of detectable Msp. Substitution of a FLAG tag for either residues 6 to 13 of mature Msp or an 8-residue portion of the central Msp surface epitope resulted in expression of full-length Msp but absence of the oligomer, suggesting roles for both domains in oligomer formation. Consistent with previously reported Msp N-glycosylation, proteinase K treatment of intact cells released a 25 kDa polypeptide containing the Msp surface epitope into culture supernatants. Molecular modeling of Msp using novel metagenome-derived multiple sequence alignment (MSA) algorithms supports the hypothesis that Msp is a large-diameter, trimeric outer membrane porin-like protein whose potential transport substrate remains to be identified. IMPORTANCE The Treponema denticola gene encoding its major surface protein (Msp) is an ortholog of the T. pallidum tprA to -K gene family that includes tprK, whose remarkable in vivo hypervariability is proposed to contribute to T. pallidum immune evasion. Using a combined strategy of fine-scale mutagenesis and advanced predictive molecular modeling, we characterized the Msp protein and present a high-confidence model of its structure as an oligomer embedded in the outer membrane. This work adds to knowledge of Msp-like proteins in oral treponemes and may contribute to understanding the evolutionary and potential functional relationships between T. denticola Msp and the orthologous T. pallidum Tpr proteins.
Collapse
|
35
|
Tessmer MH, Canarie ER, Stoll S. Comparative evaluation of spin-label modeling methods for protein structural studies. Biophys J 2022; 121:3508-3519. [PMID: 35957530 PMCID: PMC9515001 DOI: 10.1016/j.bpj.2022.08.002] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 07/01/2022] [Accepted: 08/04/2022] [Indexed: 11/18/2022] Open
Abstract
Site-directed spin-labeling electron paramagnetic resonance spectroscopy is a powerful technique for the investigation of protein structure and dynamics. Accurate spin-label modeling methods are essential to make full quantitative use of site-directed spin-labeling electron paramagnetic resonance data for protein modeling and model validation. Using a set of double electron-electron resonance data from seven different site pairs on maltodextrin/maltose-binding protein under two different conditions using five different spin labels, we compare the ability of two widely used spin-label modeling methods, based on accessible volume sampling and rotamer libraries, to predict experimental distance distributions. We present a spin-label modeling approach inspired by canonical side-chain modeling methods and compare modeling accuracy with the established methods.
Collapse
Affiliation(s)
- Maxx H Tessmer
- Department of Chemistry, University of Washington, Seattle, Washington
| | | | - Stefan Stoll
- Department of Chemistry, University of Washington, Seattle, Washington.
| |
Collapse
|
36
|
Zhang H, Shang R, Kim K, Zheng W, Johnson CJ, Sun L, Niu X, Liu L, Zhou J, Liu L, Zhang Z, Uyeno TA, Pei J, Fissette SD, Green SA, Samudra SP, Wen J, Zhang J, Eggenschwiler JT, Menke DB, Bronner ME, Grishin NV, Li W, Ye K, Zhang Y, Stolfi A, Bi P. Evolution of a chordate-specific mechanism for myoblast fusion. SCIENCE ADVANCES 2022; 8:eadd2696. [PMID: 36054355 PMCID: PMC10848958 DOI: 10.1126/sciadv.add2696] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 07/15/2022] [Indexed: 06/15/2023]
Abstract
Vertebrate myoblast fusion allows for multinucleated muscle fibers to compound the size and strength of mononucleated cells, but the evolution of this important process is unknown. We investigated the evolutionary origins and function of membrane-coalescing agents Myomaker and Myomixer in various groups of chordates. Here, we report that Myomaker likely arose through gene duplication in the last common ancestor of tunicates and vertebrates, while Myomixer appears to have evolved de novo in early vertebrates. Functional tests revealed a complex evolutionary history of myoblast fusion. A prevertebrate phase of muscle multinucleation driven by Myomaker was followed by the later emergence of Myomixer that enables the highly efficient fusion system of vertebrates. Evolutionary comparisons between vertebrate and nonvertebrate Myomaker revealed key structural and mechanistic insights into myoblast fusion. Thus, our findings suggest an evolutionary model of chordate fusogens and illustrate how new genes shape the emergence of novel morphogenetic traits and mechanisms.
Collapse
Affiliation(s)
- Haifeng Zhang
- Center for Molecular Medicine, University of Georgia, Athens, GA, USA
| | - Renjie Shang
- Center for Molecular Medicine, University of Georgia, Athens, GA, USA
- Department of Genetics, University of Georgia, Athens, GA, USA
| | - Kwantae Kim
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA
| | - Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, USA
| | | | - Lei Sun
- The Fifth People’s Hospital of Shanghai, and Shanghai Key Laboratory of Medical Epigenetics, Institutes of Biomedical Sciences, Fudan University, Shanghai, China
| | - Xiang Niu
- Tri-Institutional Program in Computational Biology and Medicine, Weill Cornell Medical College, New York, USA
| | - Liang Liu
- Department of Statistics, University of Georgia, Athens, GA, USA
- Institute of Bioinformatics, University of Georgia, Athens, GA, USA
| | - Jingqi Zhou
- Department of Genetics, University of Georgia, Athens, GA, USA
| | - Lingshu Liu
- Department of Genetics, University of Georgia, Athens, GA, USA
| | - Zheng Zhang
- Center for Molecular Medicine, University of Georgia, Athens, GA, USA
| | | | - Jimin Pei
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Skye D. Fissette
- Department of Fisheries and Wildlife, Michigan State University, East Lansing, MI, USA
| | - Stephen A. Green
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | | | - Junfei Wen
- Center for Molecular Medicine, University of Georgia, Athens, GA, USA
| | - Jianli Zhang
- College of Engineering, University of Georgia, Athens, GA, USA
| | | | | | - Marianne E. Bronner
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Nick V. Grishin
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, TX, USA
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Weiming Li
- Department of Fisheries and Wildlife, Michigan State University, East Lansing, MI, USA
| | - Kaixiong Ye
- Department of Genetics, University of Georgia, Athens, GA, USA
- Institute of Bioinformatics, University of Georgia, Athens, GA, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Alberto Stolfi
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA
| | - Pengpeng Bi
- Center for Molecular Medicine, University of Georgia, Athens, GA, USA
- Department of Genetics, University of Georgia, Athens, GA, USA
| |
Collapse
|
37
|
Yuan B, Ru X, Lin Z. Analysis of the sidechain structures of amino acids and peptides and a deduced method for the efficient search of peptide conformations. COMPUT THEOR CHEM 2022. [DOI: 10.1016/j.comptc.2022.113815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
38
|
Nnyigide OS, Nnyigide TO, Lee SG, Hyun K. Protein Repair and Analysis Server: A Web Server to Repair PDB Structures, Add Missing Heavy Atoms and Hydrogen Atoms, and Assign Secondary Structures by Amide Interactions. J Chem Inf Model 2022; 62:4232-4246. [DOI: 10.1021/acs.jcim.2c00571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
| | | | - Sun-Gu Lee
- School of Chemical Engineering, Pusan National University, Busan 46241, Korea
| | - Kyu Hyun
- School of Chemical Engineering, Pusan National University, Busan 46241, Korea
| |
Collapse
|
39
|
Xu G, Wang Y, Wang Q, Ma J. Studying protein-protein interaction through side-chain modeling method OPUS-Mut. Brief Bioinform 2022; 23:6663639. [PMID: 35959990 DOI: 10.1093/bib/bbac330] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2022] [Revised: 07/17/2022] [Accepted: 07/20/2022] [Indexed: 12/12/2022] Open
Abstract
Protein side chains are vitally important to many biological processes such as protein-protein interaction. In this study, we evaluate the performance of our previous released side-chain modeling method OPUS-Mut, together with some other methods, on three oligomer datasets, CASP14 (11), CAMEO-Homo (65) and CAMEO-Hetero (21). The results show that OPUS-Mut outperforms other methods measured by all residues or by the interfacial residues. We also demonstrate our method on evaluating protein-protein docking pose on a dataset Oligomer-Dock (75) created using the top 10 predictions from ZDOCK 3.0.2. Our scoring function correctly identifies the native pose as the top-1 in 45 out of 75 targets. Different from traditional scoring functions, our method is based on the overall side-chain packing favorableness in accordance with the local packing environment. It emphasizes the significance of side chains and provides a new and effective scoring term for studying protein-protein interaction.
Collapse
Affiliation(s)
- Gang Xu
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China.,Zhangjiang Fudan International Innovation Center, Fudan University, Shanghai 201210, China.,Shanghai AI Laboratory, Shanghai 200030, China
| | - Yilin Wang
- Georgetown Preparatory School, North Bethesda, MD 20852, USA
| | - Qinghua Wang
- Center for Biomolecular Innovation, Harcam Biomedicines, Shanghai, China
| | - Jianpeng Ma
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China.,Zhangjiang Fudan International Innovation Center, Fudan University, Shanghai 201210, China.,Shanghai AI Laboratory, Shanghai 200030, China
| |
Collapse
|
40
|
I-TASSER-MTD: a deep-learning-based platform for multi-domain protein structure and function prediction. Nat Protoc 2022; 17:2326-2353. [PMID: 35931779 DOI: 10.1038/s41596-022-00728-0] [Citation(s) in RCA: 224] [Impact Index Per Article: 74.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 05/24/2022] [Indexed: 01/17/2023]
Abstract
Most proteins in cells are composed of multiple folding units (or domains) to perform complex functions in a cooperative manner. Relative to the rapid progress in single-domain structure prediction, there are few effective tools available for multi-domain protein structure assembly, mainly due to the complexity of modeling multi-domain proteins, which involves higher degrees of freedom in domain-orientation space and various levels of continuous and discontinuous domain assembly and linker refinement. To meet the challenge and the high demand of the community, we developed I-TASSER-MTD to model the structures and functions of multi-domain proteins through a progressive protocol that combines sequence-based domain parsing, single-domain structure folding, inter-domain structure assembly and structure-based function annotation in a fully automated pipeline. Advanced deep-learning models have been incorporated into each of the steps to enhance both the domain modeling and inter-domain assembly accuracy. The protocol allows for the incorporation of experimental cross-linking data and cryo-electron microscopy density maps to guide the multi-domain structure assembly simulations. I-TASSER-MTD is built on I-TASSER but substantially extends its ability and accuracy in modeling large multi-domain protein structures and provides meaningful functional insights for the targets at both the domain- and full-chain levels from the amino acid sequence alone.
Collapse
|
41
|
Zhou X, Peng C, Zheng W, Li Y, Zhang G, Zhang Y. DEMO2: Assemble multi-domain protein structures by coupling analogous template alignments with deep-learning inter-domain restraint prediction. Nucleic Acids Res 2022; 50:W235-W245. [PMID: 35536281 PMCID: PMC9252800 DOI: 10.1093/nar/gkac340] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 04/13/2022] [Accepted: 04/22/2022] [Indexed: 01/19/2023] Open
Abstract
Most proteins in nature contain multiple folding units (or domains). The revolutionary success of AlphaFold2 in single-domain structure prediction showed potential to extend deep-learning techniques for multi-domain structure modeling. This work presents a significantly improved method, DEMO2, which integrates analogous template structural alignments with deep-learning techniques for high-accuracy domain structure assembly. Starting from individual domain models, inter-domain spatial restraints are first predicted with deep residual convolutional networks, where full-length structure models are assembled using L-BFGS simulations under the guidance of a hybrid energy function combining deep-learning restraints and analogous multi-domain template alignments searched from the PDB. The output of DEMO2 contains deep-learning inter-domain restraints, top-ranked multi-domain structure templates, and up to five full-length structure models. DEMO2 was tested on a large-scale benchmark and the blind CASP14 experiment, where DEMO2 was shown to significantly outperform its predecessor and the state-of-the-art protein structure prediction methods. By integrating with new deep-learning techniques, DEMO2 should help fill the rapidly increasing gap between the improved ability of tertiary structure determination and the high demand for the high-quality multi-domain protein structures. The DEMO2 server is available at https://zhanggroup.org/DEMO/.
Collapse
Affiliation(s)
- Xiaogen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Chunxiang Peng
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
42
|
Li Y, Zhang C, Yu DJ, Zhang Y. Deep learning geometrical potential for high-accuracy ab initio protein structure prediction. iScience 2022; 25:104425. [PMID: 35663033 PMCID: PMC9160776 DOI: 10.1016/j.isci.2022.104425] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Revised: 05/02/2022] [Accepted: 05/11/2022] [Indexed: 11/22/2022] Open
Abstract
Ab initio protein structure prediction has been vastly boosted by the modeling of inter-residue contact/distance maps in recent years. We developed a new deep learning model, DeepPotential, which accurately predicts the distribution of a complementary set of geometric descriptors including a novel hydrogen-bonding potential defined by C-alpha atom coordinates. On 154 Free-Modeling/Hard targets from the CASP and CAMEO experiments, DeepPotential demonstrated significant advantage on both geometrical feature prediction and full-length structure construction, with Top-L/5 contact accuracy and TM-score of full-length models 4.1% and 6.7% higher than the best of other deep-learning restraint prediction approaches. Detail analyses showed that the major contributions to the TM-score/contact-map improvements come from the employment of multi-tasking network architecture and metagenome-based MSA collection assisted with confidence-based MSA selection, where hydrogen-bonding and inter-residue orientation predictions help improve hydrogen-bonding network and secondary structure packing. These results demonstrated new progress in the deep-learning restraint-guided ab initio protein structure prediction. Multi-tasking network architecture for multiple inter-residue geometries Novel deep learning model for improved hydrogen-bonding modeling Rapid and high-accuracy Ab initio protein structure prediction
Collapse
Affiliation(s)
- Yang Li
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 21000, China.,Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 21000, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.,Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
43
|
Zheng W, Wuyun Q, Zhou X, Li Y, Freddolino PL, Zhang Y. LOMETS3: integrating deep learning and profile alignment for advanced protein template recognition and function annotation. Nucleic Acids Res 2022; 50:W454-W464. [PMID: 35420129 PMCID: PMC9252734 DOI: 10.1093/nar/gkac248] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2022] [Revised: 03/29/2022] [Accepted: 03/31/2022] [Indexed: 11/25/2022] Open
Abstract
Deep learning techniques have significantly advanced the field of protein structure prediction. LOMETS3 (https://zhanglab.ccmb.med.umich.edu/LOMETS/) is a new generation meta-server approach to template-based protein structure prediction and function annotation, which integrates newly developed deep learning threading methods. For the first time, we have extended LOMETS3 to handle multi-domain proteins and to construct full-length models with gradient-based optimizations. Starting from a FASTA-formatted sequence, LOMETS3 performs four steps of domain boundary prediction, domain-level template identification, full-length template/model assembly and structure-based function prediction. The output of LOMETS3 contains (i) top-ranked templates from LOMETS3 and its component threading programs, (ii) up to 5 full-length structure models constructed by L-BFGS (limited-memory Broyden–Fletcher–Goldfarb–Shanno algorithm) optimization, (iii) the 10 closest Protein Data Bank (PDB) structures to the target, (iv) structure-based functional predictions, (v) domain partition and assembly results, and (vi) the domain-level threading results, including items (i)–(iii) for each identified domain. LOMETS3 was tested in large-scale benchmarks and the blind CASP14 (14th Critical Assessment of Structure Prediction) experiment, where the overall template recognition and function prediction accuracy is significantly beyond its predecessors and other state-of-the-art threading approaches, especially for hard targets without homologous templates in the PDB. Based on the improved developments, LOMETS3 should help significantly advance the capability of broader biomedical community for template-based protein structure and function modelling.
Collapse
Affiliation(s)
- Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Qiqige Wuyun
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Xiaogen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Peter L Freddolino
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.,Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.,Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
44
|
Zhou X, Li Y, Zhang C, Zheng W, Zhang G, Zhang Y. Progressive assembly of multi-domain protein structures from cryo-EM density maps. NATURE COMPUTATIONAL SCIENCE 2022; 2:265-275. [PMID: 35844960 PMCID: PMC9281201 DOI: 10.1038/s43588-022-00232-1] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Accepted: 03/21/2022] [Indexed: 05/20/2023]
Abstract
Progress in cryo-electron microscopy has provided the potential for large-size protein structure determination. However, the success rate for solving multi-domain proteins remains low because of the difficulty in modelling inter-domain orientations. Here we developed domain enhanced modeling using cryo-electron microscopy (DEMO-EM), an automatic method to assemble multi-domain structures from cryo-electron microscopy maps through a progressive structural refinement procedure combining rigid-body domain fitting and flexible assembly simulations with deep-neural-network inter-domain distance profiles. The method was tested on a large-scale benchmark set of proteins containing up to 12 continuous and discontinuous domains with medium- to low-resolution density maps, where DEMO-EM produced models with correct inter-domain orientations (template modeling score (TM-score) >0.5) for 97% of cases and outperformed state-of-the-art methods. DEMO-EM was applied to the severe acute respiratory syndrome coronavirus 2 genome and generated models with average TM-score and root-mean-square deviation of 0.97 and 1.3 Å, respectively, with respect to the deposited structures. These results demonstrate an efficient pipeline that enables automated and reliable large-scale multi-domain protein structure modelling from cryo-electron microscopy maps.
Collapse
Affiliation(s)
- Xiaogen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, China
| | - Yang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
45
|
Malik A, Banerjee A, Pal A, Mitra P. A sequence space search engine for computational protein design to modulate molecular functionality. J Biomol Struct Dyn 2022; 41:2937-2946. [PMID: 35220920 DOI: 10.1080/07391102.2022.2042386] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
De-novo protein design explores the untapped sequence space that is otherwise less discovered during the evolutionary process. This necessitates an efficient sequence space search engine for effective convergence in computational protein design. We propose a greedy simulated annealing-based Monte-Carlo parallel search algorithm for better sequence-structure compatibility probing in protein design. The guidance provided by the evolutionary profile, the greedy approach, and the cooling schedule adopted in the Monte Carlo simulation ensures sufficient exploration and exploitation of the search space leading to faster convergence. On evaluating the proposed algorithm, we find that a dataset of 76 target scaffolds report an average root-mean-square-deviation (RMSD) of 1.07 Å and an average TM-Score of 0.93 with the modeled designed protein sequences. High sequence recapitulation of 48.7% (59.4%) observed in the design sequences for all (hydrophobic) solvent-inaccessible residues again establish the goodness of the proposed algorithm. A high (93.4%) intra-group recapitulation of hydrophobic residues in the solvent-inaccessible region indicates that the proposed protein design algorithm preserves the core residues in the protein and provides alternative residue combinations in the solvent-accessible regions of the target protein. Furthermore, a COFACTOR-based protein functional analysis shows that the design sequences exhibit altered molecular functionality and introduce new molecular functions compared to the target scaffolds.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Ayush Malik
- Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, India
| | - Anupam Banerjee
- School of Medical Science and Technology, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, India
| | - Abantika Pal
- Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, India
| | - Pralay Mitra
- Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, India
| |
Collapse
|
46
|
Xu G, Wang Q, Ma J. OPUS-Rota4: a gradient-based protein side-chain modeling framework assisted by deep learning-based predictors. Brief Bioinform 2022; 23:bbab529. [PMID: 34905769 PMCID: PMC8769891 DOI: 10.1093/bib/bbab529] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2021] [Revised: 10/11/2021] [Accepted: 11/15/2021] [Indexed: 11/13/2022] Open
Abstract
Accurate protein side-chain modeling is crucial for protein folding and protein design. In the past decades, many successful methods have been proposed to address this issue. However, most of them depend on the discrete samples from the rotamer library, which may have limitations on their accuracies and usages. In this study, we report an open-source toolkit for protein side-chain modeling, named OPUS-Rota4. It consists of three modules: OPUS-RotaNN2, which predicts protein side-chain dihedral angles; OPUS-RotaCM, which measures the distance and orientation information between the side chain of different residue pairs and OPUS-Fold2, which applies the constraints derived from the first two modules to guide side-chain modeling. OPUS-Rota4 adopts the dihedral angles predicted by OPUS-RotaNN2 as its initial states, and uses OPUS-Fold2 to refine the side-chain conformation with the side-chain contact map constraints derived from OPUS-RotaCM. Therefore, we convert the side-chain modeling problem into a side-chain contact map prediction problem. OPUS-Fold2 is written in Python and TensorFlow2.4, which is user-friendly to include other differentiable energy terms. OPUS-Rota4 also provides a platform in which the side-chain conformation can be dynamically adjusted under the influence of other processes. We apply OPUS-Rota4 on 15 FM predictions submitted by AlphaFold2 on CASP14, the results show that the side chains modeled by OPUS-Rota4 are closer to their native counterparts than those predicted by AlphaFold2 (e.g. the residue-wise RMSD for all residues and core residues are 0.588 and 0.472 for AlphaFold2, and 0.535 and 0.407 for OPUS-Rota4).
Collapse
Affiliation(s)
- Gang Xu
- Multiscale Research Institute of Complex Systems Fudan University Shanghai, 200433, China
- Zhangjiang Fudan International Innovation Center Fudan University Shanghai, 201210, China
- Shanghai AI Laboratory Shanghai, 200030, China
| | - Qinghua Wang
- Verna and Marrs Mclean Department of Biochemistry and Molecular Biology Baylor College of Medicine Houston, Texas 77030, United States
| | - Jianpeng Ma
- Multiscale Research Institute of Complex Systems Fudan University Shanghai, 200433, China
- Zhangjiang Fudan International Innovation Center Fudan University Shanghai, 201210, China
- Shanghai AI Laboratory Shanghai, 200030, China
| |
Collapse
|
47
|
Ochoa R, Soler MA, Gladich I, Battisti A, Minovski N, Rodriguez A, Fortuna S, Cossio P, Laio A. Computational Evolution Protocol for Peptide Design. Methods Mol Biol 2022; 2405:335-359. [PMID: 35298821 DOI: 10.1007/978-1-0716-1855-4_16] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Computational peptide design is useful for therapeutics, diagnostics, and vaccine development. To select the most promising peptide candidates, the key is describing accurately the peptide-target interactions at the molecular level. We here review a computational peptide design protocol whose key feature is the use of all-atom explicit solvent molecular dynamics for describing the different peptide-target complexes explored during the optimization. We describe the milestones behind the development of this protocol, which is now implemented in an open-source code called PARCE. We provide a basic tutorial to run the code for an antibody fragment design example. Finally, we describe three additional applications of the method to design peptides for different targets, illustrating the broad scope of the proposed approach.
Collapse
Affiliation(s)
- Rodrigo Ochoa
- Biophysics of Tropical Diseases, Max Planck Tandem Group, University of Antioquia, Medellin, Colombia
| | | | - Ivan Gladich
- Qatar Environment and Energy Research Institute, Hamad Bin Khalifa University, Doha, Qatar
- SISSA, Trieste, Italy
| | | | - Nikola Minovski
- Department of Chemical and Pharmaceutical Sciences, University of Trieste, Trieste, Italy
- Theory Department, Laboratory for Cheminformatics, National Institute of Chemistry, Ljubljana, Slovenia
| | - Alex Rodriguez
- The Abdus Salam International Centre for Theoretical Physics, Trieste, Italy
| | - Sara Fortuna
- Italian Institute of Technology (IIT), Genova, Italy
- Department of Chemical and Pharmaceutical Sciences, University of Trieste, Trieste, Italy
| | - Pilar Cossio
- Biophysics of Tropical Diseases, Max Planck Tandem Group, University of Antioquia, Medellin, Colombia
- Department of Theoretical Biophysics, Max Planck Institute of Biophysics, Frankfurt am Main, Germany
| | - Alessandro Laio
- The Abdus Salam International Centre for Theoretical Physics, Trieste, Italy
- SISSA, Trieste, Italy
| |
Collapse
|
48
|
San Fabián J, Ema I, Omar S, García de la Vega JM. Toward a Computational NMR Procedure for Modeling Dipeptide Side-Chain Conformation. J Chem Inf Model 2021; 61:6012-6023. [PMID: 34762416 PMCID: PMC8715507 DOI: 10.1021/acs.jcim.1c00773] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
![]()
Theoretical relationships between
the vicinal spin–spin
coupling constants (SSCCs) and the χ1 torsion angles
have been studied to predict the conformations of protein side chains.
An efficient computational procedure is developed to obtain the conformation
of dipeptides through theoretical and experimental SSCCs, Karplus
equations, and quantum chemistry methods, and it is applied to three
aliphatic hydrophobic residues (Val, Leu, and Ile). Three models are
proposed: unimodal-static, trimodal-static-stepped, and trimodal-static-trigonal,
where the most important factors are incorporated (coupled nuclei,
nature and orientation of the substituents, and local geometric properties).
Our results are validated by comparison with NMR and X-ray empirical
data described in the literature, obtaining successful results on
the 29 residues considered. Using out trimodal residue treatment,
it is possible to detect and resolve residues with a simple conformation
and those with two or three staggered conformers. In four residues,
a deeper analysis explains that they do not have a unique conformation
and that the population of each conformation plays an important role.
Collapse
Affiliation(s)
- Jesús San Fabián
- Departamento de Química Física Aplicada, Facultad de Ciencias, Universidad Autónoma de Madrid, 28049 Madrid, Spain
| | - Ignacio Ema
- Departamento de Química Física Aplicada, Facultad de Ciencias, Universidad Autónoma de Madrid, 28049 Madrid, Spain
| | - Salama Omar
- Departamento de Química Física Aplicada, Facultad de Ciencias, Universidad Autónoma de Madrid, 28049 Madrid, Spain
| | | |
Collapse
|
49
|
Zheng W, Li Y, Zhang C, Zhou X, Pearce R, Bell EW, Huang X, Zhang Y. Protein structure prediction using deep learning distance and hydrogen-bonding restraints in CASP14. Proteins 2021; 89:1734-1751. [PMID: 34331351 PMCID: PMC8616857 DOI: 10.1002/prot.26193] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Revised: 07/06/2021] [Accepted: 07/22/2021] [Indexed: 11/10/2022]
Abstract
In this article, we report 3D structure prediction results by two of our best server groups ("Zhang-Server" and "QUARK") in CASP14. These two servers were built based on the D-I-TASSER and D-QUARK algorithms, which integrated four newly developed components into the classical protein folding pipelines, I-TASSER and QUARK, respectively. The new components include: (a) a new multiple sequence alignment (MSA) collection tool, DeepMSA2, which is extended from the DeepMSA program; (b) a contact-based domain boundary prediction algorithm, FUpred, to detect protein domain boundaries; (c) a residual convolutional neural network-based method, DeepPotential, to predict multiple spatial restraints by co-evolutionary features derived from the MSA; and (d) optimized spatial restraint energy potentials to guide the structure assembly simulations. For 37 FM targets, the average TM-scores of the first models produced by D-I-TASSER and D-QUARK were 96% and 112% higher than those constructed by I-TASSER and QUARK, respectively. The data analysis indicates noticeable improvements produced by each of the four new components, especially for the newly added spatial restraints from DeepPotential and the well-tuned force field that combines spatial restraints, threading templates, and generic knowledge-based potentials. However, challenges still exist in the current pipelines. These include difficulties in modeling multi-domain proteins due to low accuracy in inter-domain distance prediction and modeling protein domains from oligomer complexes, as the co-evolutionary analysis cannot distinguish inter-chain and intra-chain distances. Specifically tuning the deep learning-based predictors for multi-domain targets and protein complexes may be helpful to address these issues.
Collapse
Affiliation(s)
- Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Yang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing 210094, China
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Xiaogen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Robin Pearce
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Eric W. Bell
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Xiaoqiang Huang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan 48109, USA
| |
Collapse
|
50
|
Rudnev VR, Kulikova LI, Nikolsky KS, Malsagova KA, Kopylov AT, Kaysheva AL. Current Approaches in Supersecondary Structures Investigation. Int J Mol Sci 2021; 22:11879. [PMID: 34769310 PMCID: PMC8584461 DOI: 10.3390/ijms222111879] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 10/27/2021] [Accepted: 10/29/2021] [Indexed: 11/16/2022] Open
Abstract
Proteins expressed during the cell cycle determine cell function, topology, and responses to environmental influences. The development and improvement of experimental methods in the field of structural biology provide valuable information about the structure and functions of individual proteins. This work is devoted to the study of supersecondary structures of proteins and determination of their structural motifs, description of experimental methods for their detection, databases, and repositories for storage, as well as methods of molecular dynamics research. The interest in the study of supersecondary structures in proteins is due to their autonomous stability outside the protein globule, which makes it possible to study folding processes, conformational changes in protein isoforms, and aberrant proteins with high productivity.
Collapse
Affiliation(s)
- Vladimir R. Rudnev
- Biobanking Group, Branch of Institute of Biomedical Chemistry “Scientific and Education Center”, 109028 Moscow, Russia; (V.R.R.); (L.I.K.); (K.S.N.); (A.T.K.); (A.L.K.)
- Institute of Theoretical and Experimental Biophysics, Russian Academy of Sciences, 142290 Pushchino, Russia
| | - Liudmila I. Kulikova
- Biobanking Group, Branch of Institute of Biomedical Chemistry “Scientific and Education Center”, 109028 Moscow, Russia; (V.R.R.); (L.I.K.); (K.S.N.); (A.T.K.); (A.L.K.)
- Institute of Theoretical and Experimental Biophysics, Russian Academy of Sciences, 142290 Pushchino, Russia
- Institute of Mathematical Problems of Biology RAS—The Branch of Keldysh Institute of Applied Mathematics of Russian Academy of Sciences, 142290 Pushchino, Russia
| | - Kirill S. Nikolsky
- Biobanking Group, Branch of Institute of Biomedical Chemistry “Scientific and Education Center”, 109028 Moscow, Russia; (V.R.R.); (L.I.K.); (K.S.N.); (A.T.K.); (A.L.K.)
| | - Kristina A. Malsagova
- Biobanking Group, Branch of Institute of Biomedical Chemistry “Scientific and Education Center”, 109028 Moscow, Russia; (V.R.R.); (L.I.K.); (K.S.N.); (A.T.K.); (A.L.K.)
| | - Arthur T. Kopylov
- Biobanking Group, Branch of Institute of Biomedical Chemistry “Scientific and Education Center”, 109028 Moscow, Russia; (V.R.R.); (L.I.K.); (K.S.N.); (A.T.K.); (A.L.K.)
| | - Anna L. Kaysheva
- Biobanking Group, Branch of Institute of Biomedical Chemistry “Scientific and Education Center”, 109028 Moscow, Russia; (V.R.R.); (L.I.K.); (K.S.N.); (A.T.K.); (A.L.K.)
| |
Collapse
|