1
|
Paquet E, Soleymani F, Viktor HL, Michalowski W. Annealed fractional Lévy-Itō diffusion models for protein generation. Comput Struct Biotechnol J 2024; 23:1641-1653. [PMID: 38680869 PMCID: PMC11047197 DOI: 10.1016/j.csbj.2024.04.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Revised: 04/03/2024] [Accepted: 04/04/2024] [Indexed: 05/01/2024] Open
Abstract
Protein generation has numerous applications in designing therapeutic antibodies and creating new drugs. Still, it is a demanding task due to the inherent complexities of protein structures and the limitations of current generative models. Proteins possess intricate geometry, and sampling their conformational space is challenging due to its high dimensionality. This paper introduces novel Markovian and non-Markovian generative diffusion models based on fractional stochastic differential equations and the Lévy distribution, allowing for a more effective exploration of the conformational space. The approach is applied to a dataset of 40 , 000 proteins and evaluated in terms of Fréchet distance, fidelity, and diversity, outperforming the state-of-the-art by 25.4%, 35.8%, and 11.8%, respectively.
Collapse
Affiliation(s)
- Eric Paquet
- National Research Council, 1200 Montreal Road, Ottawa, ON, K1A 0R6, Canada
- School of Electrical Engineering and Computer Science, University of Ottawa, ON, K1N 6N5, Canada
| | - Farzan Soleymani
- Telfer School of Management, University of Ottawa, ON, K1N 6N5, Canada
| | - Herna Lydia Viktor
- School of Electrical Engineering and Computer Science, University of Ottawa, ON, K1N 6N5, Canada
| | | |
Collapse
|
2
|
Milchevskiy YV, Kravatskaya GI, Kravatsky YV. AAindexNC: Estimating the Physicochemical Properties of Non-Canonical Amino Acids, Including Those Derived from the PDB and PDBeChem Databank. Int J Mol Sci 2024; 25:12555. [PMID: 39684267 DOI: 10.3390/ijms252312555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2024] [Revised: 11/15/2024] [Accepted: 11/18/2024] [Indexed: 12/18/2024] Open
Abstract
The physicochemical properties of amino acid residues from the AAindex database are widely used as predictors in building models for predicting both protein structures and properties. It should be noted, however, that the AAindex database contains data only for the 20 canonical amino acids. Non-canonical amino acids, while less common, are not rare; the Protein Data Bank includes proteins with more than 1000 distinct non-canonical amino acids. In this study, we propose a method to evaluate the physicochemical properties from the AAindex database for non-canonical amino acids and assess the prediction quality. We implemented our method as a bioinformatics tool and estimated the physicochemical properties of non-canonical amino acids from the PDB with the chemical composition presentation using SMILES encoding obtained from the PDBechem databank. The bioinformatics tool and resulting database of the estimated properties are freely available on the author's website and available for download via GitHub.
Collapse
Affiliation(s)
- Yury V Milchevskiy
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Vavilov Str., 32, 119991 Moscow, Russia
| | - Galina I Kravatskaya
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Vavilov Str., 32, 119991 Moscow, Russia
| | - Yury V Kravatsky
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Vavilov Str., 32, 119991 Moscow, Russia
- Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Vavilov Str., 32, 119991 Moscow, Russia
| |
Collapse
|
3
|
Moraes Dos Santos L, Gutembergue de Mendonça J, Jerônimo Gomes Lobo Y, Henrique Franca de Lima L, Bruno Rocha G, C de Melo-Minardi R. Deep learning for discriminating non-trivial conformational changes in molecular dynamics simulations of SARS-CoV-2 spike-ACE2. Sci Rep 2024; 14:22639. [PMID: 39349594 PMCID: PMC11443059 DOI: 10.1038/s41598-024-72842-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2024] [Accepted: 09/11/2024] [Indexed: 10/04/2024] Open
Abstract
Molecular dynamics (MD) simulations produce a substantial volume of high-dimensional data, and traditional methods for analyzing these data pose significant computational demands. Advances in MD simulation analysis combined with deep learning-based approaches have led to the understanding of specific structural changes observed in MD trajectories, including those induced by mutations. In this study, we model the trajectories resulting from MD simulations of the SARS-CoV-2 spike protein-ACE2, specifically the receptor-binding domain (RBD), as interresidue distance maps, and use deep convolutional neural networks to predict the functional impact of point mutations, related to the virus's infectivity and immunogenicity. Our model was successful in predicting mutant types that increase the affinity of the S protein for human receptors and reduce its immunogenicity, both based on MD trajectories (precision = 0.718; recall = 0.800; [Formula: see text] = 0.757; MCC = 0.488; AUC = 0.800) and their centroids. In an additional analysis, we also obtained a strong positive Pearson's correlation coefficient equal to 0.776, indicating a significant relationship between the average sigmoid probability for the MD trajectories and binding free energy (BFE) changes. Furthermore, we obtained a coefficient of determination of 0.602. Our 2D-RMSD analysis also corroborated predictions for more infectious and immune-evading mutants and revealed fluctuating regions within the receptor-binding motif (RBM), especially in the [Formula: see text] loop. This region presented a significant standard deviation for mutations that enable SARS-CoV-2 to evade the immune response, with RMSD values of 5Å in the simulation. This methodology offers an efficient alternative to identify potential strains of SARS-CoV-2, which may be potentially linked to more infectious and immune-evading mutations. Using clustering and deep learning techniques, our approach leverages information from the ensemble of MD trajectories to recognize a broad spectrum of multiple conformational patterns characteristic of mutant types. This represents a strategic advantage in identifying emerging variants, bypassing the need for long MD simulations. Furthermore, the present work tends to contribute substantially to the field of computational biology and virology, particularly to accelerate the design and optimization of new therapeutic agents and vaccines, offering a proactive stance against the constantly evolving threat of COVID-19 and potential future pandemics.
Collapse
Affiliation(s)
- Lucas Moraes Dos Santos
- Department of Computer Science, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil.
| | | | - Yan Jerônimo Gomes Lobo
- Department of Exact and Biological Sciences, Federal University of São João Del Rei, São João del Rei, Minas Gerais, Brazil
| | | | - Gerd Bruno Rocha
- Department of Chemistry, Federal University of Paraíba, João Pessoa, Paraíba, Brazil
| | - Raquel C de Melo-Minardi
- Department of Computer Science, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil.
| |
Collapse
|
4
|
Rahman J, Newton MAH, Hasan MAM, Sattar A. A stacked meta-ensemble for protein inter-residue distance prediction. Comput Biol Med 2022; 148:105824. [PMID: 35863250 DOI: 10.1016/j.compbiomed.2022.105824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 06/21/2022] [Accepted: 07/03/2022] [Indexed: 11/25/2022]
Abstract
Predicted inter-residue distances are a key behind recent success in high quality protein structure prediction (PSP). However, prediction of both short and long distance values together is challenging. Consequently, predicted short distances are mostly used by existing PSP methods. In this paper, we use a stacked meta-ensemble method to combine deep learning models trained for different ranges of real-valued distances. On five benchmark sets of proteins, our proposed inter-residue distance prediction method improves mean Local Distance Different Test (LDDT) scores at least by 5% over existing such methods. Moreover, using a real-valued distance based conformational search algorithm, we also show that predicted long distances help obtain significantly better protein conformations than when only predicted short distances are used. Our method is named meta-ensemble for distance prediction (MDP) and its program is available from https://gitlab.com/mahnewton/mdp.
Collapse
Affiliation(s)
- Julia Rahman
- School of Information and Communication Technology, Griffith University, Queensland, Australia.
| | - M A Hakim Newton
- Institute of Integrated and Intelligent Systems, Griffith University, Queensland, Australia; School of Information and Physical Sciences, The University of Newcastle, New South Wales, Australia.
| | | | - Abdul Sattar
- School of Information and Communication Technology, Griffith University, Queensland, Australia; Institute of Integrated and Intelligent Systems, Griffith University, Queensland, Australia
| |
Collapse
|
5
|
Detecting Transient Trapping from a Single Trajectory: A Structural Approach. ENTROPY 2021; 23:e23081044. [PMID: 34441183 PMCID: PMC8394669 DOI: 10.3390/e23081044] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Revised: 07/30/2021] [Accepted: 08/04/2021] [Indexed: 11/21/2022]
Abstract
In this article, we introduce a new method to detect transient trapping events within a single particle trajectory, thus allowing the explicit accounting of changes in the particle’s dynamics over time. Our method is based on new measures of a smoothed recurrence matrix. The newly introduced set of measures takes into account both the spatial and temporal structure of the trajectory. Therefore, it is adapted to study short-lived trapping domains that are not visited by multiple trajectories. Contrary to most existing methods, it does not rely on using a window, sliding along the trajectory, but rather investigates the trajectory as a whole. This method provides useful information to study intracellular and plasma membrane compartmentalisation. Additionally, this method is applied to single particle trajectory data of β2-adrenergic receptors, revealing that receptor stimulation results in increased trapping of receptors in defined domains, without changing the diffusion of free receptors.
Collapse
|
6
|
Wang L, Liu J, Xia Y, Xu J, Zhou X, Zhang G. Distance-guided protein folding based on generalized descent direction. Brief Bioinform 2021; 22:6341661. [PMID: 34355233 DOI: 10.1093/bib/bbab296] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Revised: 06/30/2021] [Accepted: 07/12/2021] [Indexed: 12/25/2022] Open
Abstract
Advances in the prediction of the inter-residue distance for a protein sequence have increased the accuracy to predict the correct folds of proteins with distance information. Here, we propose a distance-guided protein folding algorithm based on generalized descent direction, named GDDfold, which achieves effective structural perturbation and potential minimization in two stages. In the global stage, random-based direction is designed using evolutionary knowledge, which guides conformation population to cross potential barriers and explore conformational space rapidly in a large range. In the local stage, locally rugged potential landscape can be explored with the aid of conjugate-based direction integrated into a specific search strategy, which can improve the exploitation ability. GDDfold is tested on 347 proteins of a benchmark set, 24 template-free modeling (FM) approaches targets of CASP13 and 20 FM targets of CASP14. Results show that GDDfold correctly folds [template modeling (TM) score ≥ = 0.5] 316 out of 347 proteins, where 65 proteins have TM scores that are greater than 0.8, and significantly outperforms Rosetta-dist (distance-assisted fragment assembly method) and L-BFGSfold (distance geometry optimization method). On CASP FM targets, GDDfold is comparable with five state-of-the-art full-version methods, namely, Quark, RaptorX, Rosetta, MULTICOM and trRosetta in the CASP 13 and 14 server groups.
Collapse
Affiliation(s)
- Liujing Wang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Jun Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Yuhao Xia
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Jiakang Xu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Xiaogen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Michigan USA
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
7
|
Pakhrin SC, Shrestha B, Adhikari B, KC DB. Deep Learning-Based Advances in Protein Structure Prediction. Int J Mol Sci 2021; 22:5553. [PMID: 34074028 PMCID: PMC8197379 DOI: 10.3390/ijms22115553] [Citation(s) in RCA: 57] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Revised: 05/12/2021] [Accepted: 05/18/2021] [Indexed: 12/29/2022] Open
Abstract
Obtaining an accurate description of protein structure is a fundamental step toward understanding the underpinning of biology. Although recent advances in experimental approaches have greatly enhanced our capabilities to experimentally determine protein structures, the gap between the number of protein sequences and known protein structures is ever increasing. Computational protein structure prediction is one of the ways to fill this gap. Recently, the protein structure prediction field has witnessed a lot of advances due to Deep Learning (DL)-based approaches as evidenced by the success of AlphaFold2 in the most recent Critical Assessment of protein Structure Prediction (CASP14). In this article, we highlight important milestones and progresses in the field of protein structure prediction due to DL-based methods as observed in CASP experiments. We describe advances in various steps of protein structure prediction pipeline viz. protein contact map prediction, protein distogram prediction, protein real-valued distance prediction, and Quality Assessment/refinement. We also highlight some end-to-end DL-based approaches for protein structure prediction approaches. Additionally, as there have been some recent DL-based advances in protein structure determination using Cryo-Electron (Cryo-EM) microscopy based, we also highlight some of the important progress in the field. Finally, we provide an outlook and possible future research directions for DL-based approaches in the protein structure prediction arena.
Collapse
Affiliation(s)
- Subash C. Pakhrin
- Department of Electrical Engineering and Computer Science, Wichita State University, Wichita, KS 67260, USA;
| | - Bikash Shrestha
- Department of Computer Science, University of Missouri-St. Louis, St. Louis, MO 63121, USA;
| | - Badri Adhikari
- Department of Computer Science, University of Missouri-St. Louis, St. Louis, MO 63121, USA;
| | - Dukka B. KC
- Department of Electrical Engineering and Computer Science, Wichita State University, Wichita, KS 67260, USA;
| |
Collapse
|
8
|
Milchevskaya V, Nikitin AM, Lukshin SA, Filatov IV, Kravatsky YV, Tumanyan VG, Esipova NG, Milchevskiy YV. Structural coordinates: A novel approach to predict protein backbone conformation. PLoS One 2021; 16:e0239793. [PMID: 34014953 PMCID: PMC8136669 DOI: 10.1371/journal.pone.0239793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2020] [Accepted: 04/14/2021] [Indexed: 11/19/2022] Open
Abstract
Motivation Local protein structure is usually described via classifying each peptide to a unique class from a set of pre-defined structures. These classifications may differ in the number of structural classes, the length of peptides, or class attribution criteria. Most methods that predict the local structure of a protein from its sequence first rely on some classification and only then proceed to the 3D conformation assessment. However, most classification methods rely on homologous proteins’ existence, unavoidably lose information by attributing a peptide to a single class or suffer from a suboptimal choice of the representative classes. Results To alleviate the above challenges, we propose a method that constructs a peptide’s structural representation from the sequence, reflecting its similarity to several basic representative structures. For 5-mer peptides and 16 representative structures, we achieved the Q16 classification accuracy of 67.9%, which is higher than what is currently reported in the literature. Our prediction method does not utilize information about protein homologues but relies only on the amino acids’ physicochemical properties and the resolved structures’ statistics. We also show that the 3D coordinates of a peptide can be uniquely recovered from its structural coordinates, and show the required conditions under various geometric constraints.
Collapse
Affiliation(s)
- Vladislava Milchevskaya
- Institute of Medical Statistics and Bioinformatics, Faculty of Medicine, University of Cologne, Cologne, Germany
- * E-mail: (VM); (YVM)
| | | | | | - Ivan V. Filatov
- Moscow Institute of Physics and Technology, Dolgoprudny, Russia
| | | | | | | | - Yury V. Milchevskiy
- Engelhardt Institute of Molecular Biology, Moscow, Russia
- * E-mail: (VM); (YVM)
| |
Collapse
|
9
|
McGehee AJ, Bhattacharya S, Roche R, Bhattacharya D. PolyFold: An interactive visual simulator for distance-based protein folding. PLoS One 2020; 15:e0243331. [PMID: 33270805 PMCID: PMC7714222 DOI: 10.1371/journal.pone.0243331] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Accepted: 11/18/2020] [Indexed: 11/18/2022] Open
Abstract
Recent advances in distance-based protein folding have led to a paradigm shift in protein structure prediction. Through sufficiently precise estimation of the inter-residue distance matrix for a protein sequence, it is now feasible to predict the correct folds for new proteins much more accurately than ever before. Despite the exciting progress, a dedicated visualization system that can dynamically capture the distance-based folding process is still lacking. Most molecular visualizers typically provide only a static view of a folded protein conformation, but do not capture the folding process. Even among the selected few graphical interfaces that do adopt a dynamic perspective, none of them are distance-based. Here we present PolyFold, an interactive visual simulator for dynamically capturing the distance-based protein folding process through real-time rendering of a distance matrix and its compatible spatial conformation as it folds in an intuitive and easy-to-use interface. PolyFold integrates highly convergent stochastic optimization algorithms with on-demand customizations and interactive manipulations to maximally satisfy the geometric constraints imposed by a distance matrix. PolyFold is capable of simulating the complex process of protein folding even on modest personal computers, thus making it accessible to the general public for fostering citizen science. Open source code of PolyFold is freely available for download at https://github.com/Bhattacharya-Lab/PolyFold. It is implemented in cross-platform Java and binary executables are available for macOS, Linux, and Windows.
Collapse
Affiliation(s)
- Andrew J. McGehee
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States of America
| | - Sutanu Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States of America
| | - Rahmatullah Roche
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States of America
| | - Debswapna Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States of America
- Department of Biological Sciences, Auburn University, Auburn, AL, United States of America
- * E-mail:
| |
Collapse
|
10
|
Xu J, Wang S. Analysis of distance-based protein structure prediction by deep learning in CASP13. Proteins 2019; 87:1069-1081. [PMID: 31471916 DOI: 10.1002/prot.25810] [Citation(s) in RCA: 97] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Revised: 07/24/2019] [Accepted: 08/27/2019] [Indexed: 12/30/2022]
Abstract
This paper reports the CASP13 results of distance-based contact prediction, threading, and folding methods implemented in three RaptorX servers, which are built upon the powerful deep convolutional residual neural network (ResNet) method initiated by us for contact prediction in CASP12. On the 32 CASP13 FM (free-modeling) targets with a median multiple sequence alignment (MSA) depth of 36, RaptorX yielded the best contact prediction among 46 groups and almost the best 3D structure modeling among all server groups without time-consuming conformation sampling. In particular, RaptorX achieved top L/5, L/2, and L long-range contact precision of 70%, 58%, and 45%, respectively, and predicted correct folds (TMscore > 0.5) for 18 of 32 targets. Further, RaptorX predicted correct folds for all FM targets with >300 residues (T0950-D1, T0969-D1, and T1000-D2) and generated the best 3D models for T0950-D1 and T0969-D1 among all groups. This CASP13 test confirms our previous findings: (a) predicted distance is more useful than contacts for both template-based and free modeling; and (b) structure modeling may be improved by integrating template and coevolutionary information via deep learning. This paper will discuss progress we have made since CASP12, the strength and weakness of our methods, and why deep learning performed much better in CASP13.
Collapse
Affiliation(s)
- Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, Illinois
| | - Sheng Wang
- Toyota Technological Institute at Chicago, Chicago, Illinois
| |
Collapse
|
11
|
Abstract
Direct coupling analysis (DCA) for protein folding has made very good progress, but it is not effective for proteins that lack many sequence homologs, even coupled with time-consuming conformation sampling with fragments. We show that we can accurately predict interresidue distance distribution of a protein by deep learning, even for proteins with ∼60 sequence homologs. Using only the geometric constraints given by the resulting distance matrix we may construct 3D models without involving extensive conformation sampling. Our method successfully folded 21 of the 37 CASP12 hard targets with a median family size of 58 effective sequence homologs within 4 h on a Linux computer of 20 central processing units. In contrast, DCA-predicted contacts cannot be used to fold any of these hard targets in the absence of extensive conformation sampling, and the best CASP12 group folded only 11 of them by integrating DCA-predicted contacts into fragment-based conformation sampling. Rigorous experimental validation in CASP13 shows that our distance-based folding server successfully folded 17 of 32 hard targets (with a median family size of 36 sequence homologs) and obtained 70% precision on the top L/5 long-range predicted contacts. The latest experimental validation in CAMEO shows that our server predicted correct folds for 2 membrane proteins while all of the other servers failed. These results demonstrate that it is now feasible to predict correct fold for many more proteins lack of similar structures in the Protein Data Bank even on a personal computer.
Collapse
|
12
|
Li XT, Xu SG, Yang XB, Zhao YJ. An intrinsic representation of atomic structure: From clusters to periodic systems. J Chem Phys 2017; 147:144106. [DOI: 10.1063/1.4997292] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Xiao-Tian Li
- Department of Physics and School of Materials Science and Engineering, South China University of Technology, Guangzhou, Guangdong 510640, China
| | - Shao-Gang Xu
- Department of Physics and School of Materials Science and Engineering, South China University of Technology, Guangzhou, Guangdong 510640, China
| | - Xiao-Bao Yang
- Department of Physics and School of Materials Science and Engineering, South China University of Technology, Guangzhou, Guangdong 510640, China
- Key Laboratory of Advanced Energy Storage Materials of Guangdong Province, South China University of Technology, Guangzhou, Guangdong 510640, China
| | - Yu-Jun Zhao
- Department of Physics and School of Materials Science and Engineering, South China University of Technology, Guangzhou, Guangdong 510640, China
- Key Laboratory of Advanced Energy Storage Materials of Guangdong Province, South China University of Technology, Guangzhou, Guangdong 510640, China
| |
Collapse
|
13
|
Eschweiler JD, Frank AT, Ruotolo BT. Coming to Grips with Ambiguity: Ion Mobility-Mass Spectrometry for Protein Quaternary Structure Assignment. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2017; 28:1991-2000. [PMID: 28752478 PMCID: PMC5693686 DOI: 10.1007/s13361-017-1757-1] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/18/2017] [Revised: 07/04/2017] [Accepted: 07/05/2017] [Indexed: 05/21/2023]
Abstract
Multiprotein complexes are central to our understanding of cellular biology, as they play critical roles in nearly every biological process. Despite many impressive advances associated with structural characterization techniques, large and highly-dynamic protein complexes are too often refractory to analysis by conventional, high-resolution approaches. To fill this gap, ion mobility-mass spectrometry (IM-MS) methods have emerged as a promising approach for characterizing the structures of challenging assemblies due in large part to the ability of these methods to characterize the composition, connectivity, and topology of large, labile complexes. In this Critical Insight, we present a series of bioinformatics studies aimed at assessing the information content of IM-MS datasets for building models of multiprotein structure. Our computational data highlights the limits of current coarse-graining approaches, and compelled us to develop an improved workflow for multiprotein topology modeling, which we benchmark against a subset of the multiprotein complexes within the PDB. This improved workflow has allowed us to ascertain both the minimal experimental restraint sets required for generation of high-confidence multiprotein topologies, and quantify the ambiguity in models where insufficient IM-MS information is available. We conclude by projecting the future of IM-MS in the context of protein quaternary structure assignment, where we predict that a more complete knowledge of the ultimate information content and ambiguity within such models will undoubtedly lead to applications for a broader array of challenging biomolecular assemblies. Graphical Abstract ᅟ.
Collapse
Affiliation(s)
| | - Aaron T Frank
- Department of Chemistry, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Brandon T Ruotolo
- Department of Chemistry, University of Michigan, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
14
|
Pereira J, Lamzin VS. A distance geometry-based description and validation of protein main-chain conformation. IUCRJ 2017; 4:657-670. [PMID: 28989721 PMCID: PMC5619857 DOI: 10.1107/s2052252517008466] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/03/2017] [Accepted: 06/07/2017] [Indexed: 06/07/2023]
Abstract
Understanding the protein main-chain conformational space forms the basis for the modelling of protein structures and for the validation of models derived from structural biology techniques. Presented here is a novel idea for a three-dimensional distance geometry-based metric to account for the fine details of protein backbone conformations. The metrics are computed for dipeptide units, defined as blocks of Cαi-1-O i-1-Cαi -O i -Cαi+1 atoms, by obtaining the eigenvalues of their Euclidean distance matrices. These were computed for ∼1.3 million dipeptide units collected from nonredundant good-quality structures in the Protein Data Bank and subjected to principal component analysis. The resulting new Euclidean orthogonal three-dimensional space (DipSpace) allows a probabilistic description of protein backbone geometry. The three axes of the DipSpace describe the local extension of the dipeptide unit structure, its twist and its bend. By using a higher-dimensional metric, the method is efficient for the identification of Cα atoms in an unlikely or unusual geometrical environment, and its use for both local and overall validation of protein models is demonstrated. It is also shown, for the example of trypsin proteases, that the detection of unusual conformations that are conserved among the structures of this protein family may indicate geometrically strained residues of potentially functional importance.
Collapse
Affiliation(s)
- Joana Pereira
- European Molecular Biology Laboratory, c/o DESY, Notkestrasse 85, 22607 Hamburg, Germany
| | - Victor S. Lamzin
- European Molecular Biology Laboratory, c/o DESY, Notkestrasse 85, 22607 Hamburg, Germany
| |
Collapse
|
15
|
Li XT, Yang XB, Zhao YJ. Geometrical eigen-subspace framework based molecular conformation representation for efficient structure recognition and comparison. J Chem Phys 2017; 146:154108. [DOI: 10.1063/1.4981212] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Xiao-Tian Li
- Department of Physics and School of Materials Science and Engineering, South China University of Technology, Guangzhou, Guangdong 510640, China
| | - Xiao-Bao Yang
- Department of Physics and School of Materials Science and Engineering, South China University of Technology, Guangzhou, Guangdong 510640, China
- Key Laboratory of Advanced Energy Storage Materials of Guangdong Province, South China University of Technology, Guangzhou, Guangdong 510640, China
| | - Yu-Jun Zhao
- Department of Physics and School of Materials Science and Engineering, South China University of Technology, Guangzhou, Guangdong 510640, China
- Key Laboratory of Advanced Energy Storage Materials of Guangdong Province, South China University of Technology, Guangzhou, Guangdong 510640, China
| |
Collapse
|
16
|
Peterson L, Jamroz M, Kolinski A, Kihara D. Predicting Real-Valued Protein Residue Fluctuation Using FlexPred. Methods Mol Biol 2017; 1484:175-186. [PMID: 27787827 DOI: 10.1007/978-1-4939-6406-2_13] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
The conventional view of a protein structure as static provides only a limited picture. There is increasing evidence that protein dynamics are often vital to protein function including interaction with partners such as other proteins, nucleic acids, and small molecules. Considering flexibility is also important in applications such as computational protein docking and protein design. While residue flexibility is partially indicated by experimental measures such as the B-factor from X-ray crystallography and ensemble fluctuation from nuclear magnetic resonance (NMR) spectroscopy as well as computational molecular dynamics (MD) simulation, these techniques are resource-intensive. In this chapter, we describe the web server and stand-alone version of FlexPred, which rapidly predicts absolute per-residue fluctuation from a three-dimensional protein structure. On a set of 592 nonredundant structures, comparing the fluctuations predicted by FlexPred to the observed fluctuations in MD simulations showed an average correlation coefficient of 0.669 and an average root mean square error of 1.07 Å. FlexPred is available at http://kiharalab.org/flexPred/ .
Collapse
Affiliation(s)
- Lenna Peterson
- Department of Biological Sciences, College of Science, Purdue University, 915 W. State Street, West Lafayette, IN, 47907-2054, USA
| | - Michal Jamroz
- Laboratory of Theory of Biopolymers, Faculty of Chemistry, University of Warsaw, Pasteura 1, Warszawa, 02-093, Poland
| | - Andrzej Kolinski
- Laboratory of Theory of Biopolymers, Faculty of Chemistry, University of Warsaw, Pasteura 1, Warszawa, 02-093, Poland
| | - Daisuke Kihara
- Department of Biological Sciences, College of Science, Purdue University, 915 W. State Street, West Lafayette, IN, 47907-2054, USA. .,Department of Computer Science, College of Science, Purdue University, 305 N. University Street, West Lafayette, IN, 47907-2107, USA.
| |
Collapse
|
17
|
Babicki S, Arndt D, Marcu A, Liang Y, Grant JR, Maciejewski A, Wishart DS. Heatmapper: web-enabled heat mapping for all. Nucleic Acids Res 2016; 44:W147-53. [PMID: 27190236 PMCID: PMC4987948 DOI: 10.1093/nar/gkw419] [Citation(s) in RCA: 1589] [Impact Index Per Article: 176.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2016] [Accepted: 05/04/2016] [Indexed: 11/26/2022] Open
Abstract
Heatmapper is a freely available web server that allows users to interactively visualize their data in the form of heat maps through an easy-to-use graphical interface. Unlike existing non-commercial heat map packages, which either lack graphical interfaces or are specialized for only one or two kinds of heat maps, Heatmapper is a versatile tool that allows users to easily create a wide variety of heat maps for many different data types and applications. More specifically, Heatmapper allows users to generate, cluster and visualize: (i) expression-based heat maps from transcriptomic, proteomic and metabolomic experiments; (ii) pairwise distance maps; (iii) correlation maps; (iv) image overlay heat maps; (v) latitude and longitude heat maps and (vi) geopolitical (choropleth) heat maps. Heatmapper offers a number of simple and intuitive customization options for facile adjustments to each heat map's appearance and plotting parameters. Heatmapper also allows users to interactively explore their numeric data values by hovering their cursor over each heat map cell, or by using a searchable/sortable data table view. Heat map data can be easily uploaded to Heatmapper in text, Excel or tab delimited formatted tables and the resulting heat map images can be easily downloaded in common formats including PNG, JPG and PDF. Heatmapper is designed to appeal to a wide range of users, including molecular biologists, structural biologists, microbiologists, epidemiologists, environmental scientists, agriculture/forestry scientists, fish and wildlife biologists, climatologists, geologists, educators and students. Heatmapper is available at http://www.heatmapper.ca.
Collapse
Affiliation(s)
- Sasha Babicki
- Department of Computing Science, University of Alberta, Edmonton, AB T6G 2E8, Canada
| | - David Arndt
- Department of Computing Science, University of Alberta, Edmonton, AB T6G 2E8, Canada
| | - Ana Marcu
- Department of Computing Science, University of Alberta, Edmonton, AB T6G 2E8, Canada
| | - Yongjie Liang
- Department of Computing Science, University of Alberta, Edmonton, AB T6G 2E8, Canada
| | - Jason R Grant
- Department of Computing Science, University of Alberta, Edmonton, AB T6G 2E8, Canada
| | - Adam Maciejewski
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - David S Wishart
- Department of Computing Science, University of Alberta, Edmonton, AB T6G 2E8, Canada Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada National Institute for Nanotechnology, 11421 Saskatchewan Drive, Edmonton, AB T6G 2M9, Canada
| |
Collapse
|
18
|
Ernst M, Sittel F, Stock G. Contact- and distance-based principal component analysis of protein dynamics. J Chem Phys 2015; 143:244114. [DOI: 10.1063/1.4938249] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
19
|
dos Santos RN, Morcos F, Jana B, Andricopulo AD, Onuchic JN. Dimeric interactions and complex formation using direct coevolutionary couplings. Sci Rep 2015; 5:13652. [PMID: 26338201 PMCID: PMC4559900 DOI: 10.1038/srep13652] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2015] [Accepted: 07/13/2015] [Indexed: 11/09/2022] Open
Abstract
We develop a procedure to characterize the association of protein structures into homodimers using coevolutionary couplings extracted from Direct Coupling Analysis (DCA) in combination with Structure Based Models (SBM). Identification of dimerization contacts using DCA is more challenging than intradomain contacts since direct couplings are mixed with monomeric contacts. Therefore a systematic way to extract dimerization signals has been elusive. We provide evidence that the prediction of homodimeric complexes is possible with high accuracy for all the cases we studied which have rich sequence information. For the most accurate conformations of the structurally diverse dimeric complexes studied the mean and interfacial RMSDs are 1.95Å and 1.44Å, respectively. This methodology is also able to identify distinct dimerization conformations as for the case of the family of response regulators, which dimerize upon activation. The identification of dimeric complexes can provide interesting molecular insights in the construction of large oligomeric complexes and be useful in the study of aggregation related diseases like Alzheimer's or Parkinson's.
Collapse
Affiliation(s)
- Ricardo N. dos Santos
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77005-1827
- Laboratório de Química Medicinal e Computacional, Instituto de Física de São Carlos, Universidade de São Paulo, São Paulo, São Carlos, 13563-120, Brazil
| | - Faruck Morcos
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77005-1827
| | - Biman Jana
- Department of Physical Chemistry, Indian Association for the Cultivation of Science, Jadavpur, Kolkata-700032, India
| | - Adriano D. Andricopulo
- Laboratório de Química Medicinal e Computacional, Instituto de Física de São Carlos, Universidade de São Paulo, São Paulo, São Carlos, 13563-120, Brazil
| | - José N. Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77005-1827
| |
Collapse
|
20
|
Scott WRP, Straus SK. Determining and visualizing flexibility in protein structures. Proteins 2015; 83:820-6. [PMID: 25663079 DOI: 10.1002/prot.24776] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2014] [Revised: 12/29/2014] [Accepted: 01/26/2015] [Indexed: 11/09/2022]
Abstract
How to compare the structures of an ensemble of protein conformations is a fundamental problem in structural biology. As has been previously observed, the widely used RMSD measure due to Kabsch, in which a rigid-body superposition minimizing the least-squares positional deviations is performed, has its drawbacks when comparing and visualizing a set of flexible protein structures. Here, we develop a method, fleximatch, of protein structure comparison that takes flexibility into account. Based on a distance matrix measure of flexibility, a weighted superposition of distance matrices rather than of atomic coordinates is performed. Subsequently, this allows a consistent determination of (a) a superposition of structures for visualization, (b) a partitioning of the protein structure into rigid molecular components (core atoms), and (c) an atomic mobility measure. The method is suitable for highlighting both particularly flexible and rigid parts of a protein from structures derived from NMR, X-ray diffraction or molecular simulation.
Collapse
Affiliation(s)
- Walter R P Scott
- Chemistry Department, University of British Columbia, Vancouver, British Columbia, V6T 1Z1, Canada
| | | |
Collapse
|
21
|
Abstract
Modularity is known as one of the most important features of protein's robust and efficient design. The architecture and topology of proteins play a vital role by providing necessary robust scaffolds to support organism's growth and survival in constant evolutionary pressure. These complex biomolecules can be represented by several layers of modular architecture, but it is pivotal to understand and explore the smallest biologically relevant structural component. In the present study, we have developed a component-based method, using protein's secondary structures and their arrangements (i.e. patterns) in order to investigate its structural space. Our result on all-alpha protein shows that the known structural space is highly populated with limited set of structural patterns. We have also noticed that these frequently observed structural patterns are present as modules or "building blocks" in large proteins (i.e. higher secondary structure content). From structural descriptor analysis, observed patterns are found to be within similar deviation; however, frequent patterns are found to be distinctly occurring in diverse functions e.g. in enzymatic classes and reactions. In this study, we are introducing a simple approach to explore protein structural space using combinatorial- and graph-based geometry methods, which can be used to describe modularity in protein structures. Moreover, analysis indicates that protein function seems to be the driving force that shapes the known structure space.
Collapse
Affiliation(s)
- Taushif Khan
- a School of Computational & Integrative Sciences , Jawaharlal Nehru University , New Delhi 110067 , India
| | - Indira Ghosh
- a School of Computational & Integrative Sciences , Jawaharlal Nehru University , New Delhi 110067 , India
| |
Collapse
|
22
|
Bouvier G, Desdouits N, Ferber M, Blondel A, Nilges M. An automatic tool to analyze and cluster macromolecular conformations based on self-organizing maps. ACTA ACUST UNITED AC 2014; 31:1490-2. [PMID: 25543048 DOI: 10.1093/bioinformatics/btu849] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2014] [Accepted: 12/21/2014] [Indexed: 11/12/2022]
Abstract
MOTIVATION Sampling the conformational space of biological macromolecules generates large sets of data with considerable complexity. Data-mining techniques, such as clustering, can extract meaningful information. Among them, the self-organizing maps (SOMs) algorithm has shown great promise; in particular since its computation time rises only linearly with the size of the data set. Whereas SOMs are generally used with few neurons, we investigate here their behavior with large numbers of neurons. RESULTS We present here a python library implementing the full SOM analysis workflow. Large SOMs can readily be applied on heavy data sets. Coupled with visualization tools they have very interesting properties. Descriptors for each conformation of a trajectory are calculated and mapped onto a 3D landscape, the U-matrix, reporting the distance between neighboring neurons. To delineate clusters, we developed the flooding algorithm, which hierarchically identifies local basins of the U-matrix from the global minimum to the maximum. AVAILABILITY AND IMPLEMENTATION The python implementation of the SOM library is freely available on github: https://github.com/bougui505/SOM. CONTACT michael.nilges@pasteur.fr or guillaume.bouvier@pasteur.fr SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Guillaume Bouvier
- Institut Pasteur, Unité de Bioinformatique Structurale; CNRS UMR 3528; Département de Biologie Structurale et Chimie; F-75015, Paris, France
| | - Nathan Desdouits
- Institut Pasteur, Unité de Bioinformatique Structurale; CNRS UMR 3528; Département de Biologie Structurale et Chimie; F-75015, Paris, France
| | - Mathias Ferber
- Institut Pasteur, Unité de Bioinformatique Structurale; CNRS UMR 3528; Département de Biologie Structurale et Chimie; F-75015, Paris, France
| | - Arnaud Blondel
- Institut Pasteur, Unité de Bioinformatique Structurale; CNRS UMR 3528; Département de Biologie Structurale et Chimie; F-75015, Paris, France
| | - Michael Nilges
- Institut Pasteur, Unité de Bioinformatique Structurale; CNRS UMR 3528; Département de Biologie Structurale et Chimie; F-75015, Paris, France
| |
Collapse
|
23
|
Bouvier G, Duclert-Savatier N, Desdouits N, Meziane-Cherif D, Blondel A, Courvalin P, Nilges M, Malliavin TE. Functional Motions Modulating VanA Ligand Binding Unraveled by Self-Organizing Maps. J Chem Inf Model 2014; 54:289-301. [DOI: 10.1021/ci400354b] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Guillaume Bouvier
- Département
de Biologie Structurale et Chimie, Institut Pasteur, Unité de Bioinformatique Structurale, CNRS UMR 3825, 25, rue du Dr Roux, 75015 Paris, France
| | - Nathalie Duclert-Savatier
- Département
de Biologie Structurale et Chimie, Institut Pasteur, Unité de Bioinformatique Structurale, CNRS UMR 3825, 25, rue du Dr Roux, 75015 Paris, France
| | - Nathan Desdouits
- Département
de Biologie Structurale et Chimie, Institut Pasteur, Unité de Bioinformatique Structurale, CNRS UMR 3825, 25, rue du Dr Roux, 75015 Paris, France
| | - Djalal Meziane-Cherif
- Institut
Pasteur,
Unité des Agents Antibactériens, 25, rue du Dr Roux, 75015 Paris, France
| | - Arnaud Blondel
- Département
de Biologie Structurale et Chimie, Institut Pasteur, Unité de Bioinformatique Structurale, CNRS UMR 3825, 25, rue du Dr Roux, 75015 Paris, France
| | - Patrice Courvalin
- Institut
Pasteur,
Unité des Agents Antibactériens, 25, rue du Dr Roux, 75015 Paris, France
| | - Michael Nilges
- Département
de Biologie Structurale et Chimie, Institut Pasteur, Unité de Bioinformatique Structurale, CNRS UMR 3825, 25, rue du Dr Roux, 75015 Paris, France
| | - Thérèse E. Malliavin
- Département
de Biologie Structurale et Chimie, Institut Pasteur, Unité de Bioinformatique Structurale, CNRS UMR 3825, 25, rue du Dr Roux, 75015 Paris, France
| |
Collapse
|
24
|
Ren Z. Reverse engineering the cooperative machinery of human hemoglobin. PLoS One 2013; 8:e77363. [PMID: 24312167 PMCID: PMC3842276 DOI: 10.1371/journal.pone.0077363] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2013] [Accepted: 08/30/2013] [Indexed: 11/25/2022] Open
Abstract
Hemoglobin transports molecular oxygen from the lungs to all human tissues for cellular respiration. Its α2β2 tetrameric assembly undergoes cooperative binding and releasing of oxygen for superior efficiency and responsiveness. Over past decades, hundreds of hemoglobin structures were determined under a wide range of conditions for investigation of molecular mechanism of cooperativity. Based on a joint analysis of hemoglobin structures in the Protein Data Bank (Ren, companion article), here I present a reverse engineering approach to elucidate how two subunits within each dimer reciprocate identical motions that achieves intradimer cooperativity, how ligand-induced structural signals from two subunits are integrated to drive quaternary rotation, and how the structural environment at the oxygen binding sites alter their binding affinity. This mechanical model reveals the intricate design that achieves the cooperative mechanism and has previously been masked by inconsistent structural fluctuations. A number of competing theories on hemoglobin cooperativity and broader protein allostery are reconciled and unified.
Collapse
Affiliation(s)
- Zhong Ren
- Center for Advanced Radiation Sources, The University of Chicago, Argonne, Illinois, United States of America
- Renz Research, Inc., Westmont, Illinois, United States of America
- * E-mail:
| |
Collapse
|
25
|
Reaction trajectory revealed by a joint analysis of protein data bank. PLoS One 2013; 8:e77141. [PMID: 24244274 PMCID: PMC3823880 DOI: 10.1371/journal.pone.0077141] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2013] [Accepted: 08/29/2013] [Indexed: 11/19/2022] Open
Abstract
Structural motions along a reaction pathway hold the secret about how a biological macromolecule functions. If each static structure were considered as a snapshot of the protein molecule in action, a large collection of structures would constitute a multidimensional conformational space of an enormous size. Here I present a joint analysis of hundreds of known structures of human hemoglobin in the Protein Data Bank. By applying singular value decomposition to distance matrices of these structures, I demonstrate that this large collection of structural snapshots, derived under a wide range of experimental conditions, arrange orderly along a reaction pathway. The structural motions along this extensive trajectory, including several helical transformations, arrive at a reverse engineered mechanism of the cooperative machinery (Ren, companion article), and shed light on pathological properties of the abnormal homotetrameric hemoglobins from α-thalassemia. This method of meta-analysis provides a general approach to structural dynamics based on static protein structures in this post genomics era.
Collapse
|
26
|
Miri L, Bouvier G, Kettani A, Mikou A, Wakrim L, Nilges M, Malliavin TE. Stabilization of the integrase-DNA complex by Mg2+ions and prediction of key residues for binding HIV-1 integrase inhibitors. Proteins 2013; 82:466-78. [DOI: 10.1002/prot.24412] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2013] [Revised: 07/18/2013] [Accepted: 08/14/2013] [Indexed: 01/02/2023]
Affiliation(s)
- Lamia Miri
- Laboratoire de Virologie; Institut Pasteur du Maroc; Casablanca 20360 Morocco
- Unité de modélisation moléculaire et d'ingénierie des biomolécules, Laboratoire de recherche sur les lipoprotéines et l'athérosclérose; Unité Associée au CNRST-URAC34, Faculté des Sciences Ben M'Sik; Casablanca Morocco
| | - Guillaume Bouvier
- Unité de Bioinformatique Structurale; UMR 3528 CNRS, Institut Pasteur; Paris 75724 France
| | - Anass Kettani
- Unité de modélisation moléculaire et d'ingénierie des biomolécules, Laboratoire de recherche sur les lipoprotéines et l'athérosclérose; Unité Associée au CNRST-URAC34, Faculté des Sciences Ben M'Sik; Casablanca Morocco
| | - Afaf Mikou
- Laboratoire de Catalyse et environnement; Faculté des Sciences Ain Chock; Casablanca Morocco
| | - Lahcen Wakrim
- Laboratoire de Virologie; Institut Pasteur du Maroc; Casablanca 20360 Morocco
| | - Michael Nilges
- Unité de Bioinformatique Structurale; UMR 3528 CNRS, Institut Pasteur; Paris 75724 France
| | - Thérèse E. Malliavin
- Unité de Bioinformatique Structurale; UMR 3528 CNRS, Institut Pasteur; Paris 75724 France
| |
Collapse
|
27
|
Heinke F, Schildbach S, Stockmann D, Labudde D. eProS--a database and toolbox for investigating protein sequence-structure-function relationships through energy profiles. Nucleic Acids Res 2012; 41:D320-6. [PMID: 23161695 PMCID: PMC3531212 DOI: 10.1093/nar/gks1079] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Gaining information about structural and functional features of newly identified proteins is often a difficult task. This information is crucial for understanding sequence–structure–function relationships of target proteins and, thus, essential in comprehending the mechanisms and dynamics of the molecular systems of interest. Using protein energy profiles is a novel approach that can contribute in addressing such problems. An energy profile corresponds to the sequence of energy values that are derived from a coarse-grained energy model. Energy profiles can be computed from protein structures or predicted from sequences. As shown, correspondences and dissimilarities in energy profiles can be applied for investigations of protein mechanics and dynamics. We developed eProS (energy profile suite, freely available at http://bioservices.hs-mittweida.de/Epros/), a database that provides ∼76 000 pre-calculated energy profiles as well as a toolbox for addressing numerous problems of structure biology. Energy profiles can be browsed, visualized, calculated from an uploaded structure or predicted from sequence. Furthermore, it is possible to align energy profiles of interest or compare them with all entries in the eProS database to identify significantly similar energy profiles and, thus, possibly relevant structural and functional relationships. Additionally, annotations and cross-links from numerous sources provide a broad view of potential biological correspondences.
Collapse
Affiliation(s)
- Florian Heinke
- Department of Mathematics, University of Applied Sciences Mittweida, Mittweida, Saxony, Technikumplatz 17, D-09648, Germany.
| | | | | | | |
Collapse
|
28
|
|
29
|
Atilgan AR, Atilgan C. Local motifs in proteins combine to generate global functional moves. Brief Funct Genomics 2012; 11:479-88. [PMID: 22811517 DOI: 10.1093/bfgp/els027] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Literature on the topological properties of folded proteins that has emerged as a field in its own right in the past decade is reviewed. Physics-based construction of coarse-grained models of proteins from knowledge of all-atom coordinates of the average structure is discussed. Once network is thus obtained with the node and link information, local motifs provide plethora of information on protein function. The hierarchical structure of the proteins manifested in the interrelations of local motifs is emphasized. Motifs are also related to modularity of the structure, and they quantify shifts in the landscapes upon conformational changes induced by, e.g. ligand binding. Redundancy emerges as a balance between local and global network descriptors and is related to the collectivity of the protein motions. Introducing weight on links followed by sequential removal of least cohesive contacts allows interactions in proteins to be represented as the superposition of essential and redundant sets. Lack of the former makes the network non-functional, while the latter ensures robust functioning under a wide range of perturbation scenarios.
Collapse
Affiliation(s)
- Ali Rana Atilgan
- Faculty of Engineering and Natural Sciences, Sabanci University, 34956 Istanbul, Turkey
| | | |
Collapse
|
30
|
De Ruvo M, Giuliani A, Paci P, Santoni D, Di Paola L. Shedding light on protein-ligand binding by graph theory: the topological nature of allostery. Biophys Chem 2012; 165-166:21-9. [PMID: 22464849 DOI: 10.1016/j.bpc.2012.03.001] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2012] [Revised: 03/02/2012] [Accepted: 03/02/2012] [Indexed: 11/17/2022]
Abstract
Allostery is a very important feature of proteins; we propose a mesoscopic approach to allosteric mechanisms elucidation, based on protein contact matrices. The application of graph theory methods to the characterization of the allosteric process and, more broadly, to obtain the conformational changes upon binding, reveals key features of the protein function. The proposed method highlights the leading role played by topological over geometrical changes in allosteric transitions. Topological invariants were able to discriminate between true allosteric motions and generic protein motions upon binding.
Collapse
Affiliation(s)
- Micol De Ruvo
- Faculty of Engineering, Università CAMPUS BioMedico, Via A. del Portillo, 21, 00128 Roma, Italy
| | | | | | | | | |
Collapse
|
31
|
Atilgan C, Okan OB, Atilgan AR. Network-based models as tools hinting at nonevident protein functionality. Annu Rev Biophys 2012; 41:205-25. [PMID: 22404685 DOI: 10.1146/annurev-biophys-050511-102305] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Network-based models of proteins are popular tools employed to determine dynamic features related to the folded structure. They encompass all topological and geometric computational approaches idealizing proteins as directly interacting nodes. Topology makes use of neighborhood information of residues, and geometry includes relative placement of neighbors. Coarse-grained approaches efficiently predict alternative conformations because of inherent collectivity in the protein structure. Such collectivity is moderated by topological characteristics that also tune neighborhood structure: That rich residues have richer neighbors secures robustness toward random loss of interactions/nodes due to environmental fluctuations/mutations. Geometry conveys the additional information of force balance to network models, establishing the local shape of the energy landscape. Here, residue and/or bond perturbations are critically evaluated to suggest new experiments, as network-based computational techniques prove useful in capturing domain movements and conformational shifts resulting from environmental alterations. Evolutionarily conserved residues are optimally connected, defining a subnetwork that may be utilized for further coarsening.
Collapse
Affiliation(s)
- Canan Atilgan
- Faculty of Engineering and Natural Sciences, Sabanci University, 34956 Istanbul, Turkey
| | | | | |
Collapse
|
32
|
Jamroz M, Kolinski A, Kihara D. Structural features that predict real-value fluctuations of globular proteins. Proteins 2012; 80:1425-35. [PMID: 22328193 DOI: 10.1002/prot.24040] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2011] [Revised: 01/03/2012] [Accepted: 01/11/2012] [Indexed: 12/20/2022]
Abstract
It is crucial to consider dynamics for understanding the biological function of proteins. We used a large number of molecular dynamics (MD) trajectories of nonhomologous proteins as references and examined static structural features of proteins that are most relevant to fluctuations. We examined correlation of individual structural features with fluctuations and further investigated effective combinations of features for predicting the real value of residue fluctuations using the support vector regression (SVR). It was found that some structural features have higher correlation than crystallographic B-factors with fluctuations observed in MD trajectories. Moreover, SVR that uses combinations of static structural features showed accurate prediction of fluctuations with an average Pearson's correlation coefficient of 0.669 and a root mean square error of 1.04 Å. This correlation coefficient is higher than the one observed in predictions by the Gaussian network model (GNM). An advantage of the developed method over the GNMs is that the former predicts the real value of fluctuation. The results help improve our understanding of relationships between protein structure and fluctuation. Furthermore, the developed method provides a convienient practial way to predict fluctuations of proteins using easily computed static structural features of proteins.
Collapse
Affiliation(s)
- Michal Jamroz
- Laboratory of Theory of Biopolymers, Faculty of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warszawa, Poland
| | | | | |
Collapse
|
33
|
Cossio P, Laio A, Pietrucci F. Which similarity measure is better for analyzing protein structures in a molecular dynamics trajectory? Phys Chem Chem Phys 2011; 13:10421-5. [DOI: 10.1039/c0cp02675a] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
34
|
Pierri CL, Parisi G, Porcelli V. Computational approaches for protein function prediction: a combined strategy from multiple sequence alignment to molecular docking-based virtual screening. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2010; 1804:1695-712. [PMID: 20433957 DOI: 10.1016/j.bbapap.2010.04.008] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/19/2010] [Revised: 03/04/2010] [Accepted: 04/14/2010] [Indexed: 12/12/2022]
Abstract
The functional characterization of proteins represents a daily challenge for biochemical, medical and computational sciences. Although finally proved on the bench, the function of a protein can be successfully predicted by computational approaches that drive the further experimental assays. Current methods for comparative modeling allow the construction of accurate 3D models for proteins of unknown structure, provided that a crystal structure of a homologous protein is available. Binding regions can be proposed by using binding site predictors, data inferred from homologous crystal structures, and data provided from a careful interpretation of the multiple sequence alignment of the investigated protein and its homologs. Once the location of a binding site has been proposed, chemical ligands that have a high likelihood of binding can be identified by using ligand docking and structure-based virtual screening of chemical libraries. Most docking algorithms allow building a list sorted by energy of the lowest energy docking configuration for each ligand of the library. In this review the state-of-the-art of computational approaches in 3D protein comparative modeling and in the study of protein-ligand interactions is provided. Furthermore a possible combined/concerted multistep strategy for protein function prediction, based on multiple sequence alignment, comparative modeling, binding region prediction, and structure-based virtual screening of chemical libraries, is described by using suitable examples. As practical examples, Abl-kinase molecular modeling studies, HPV-E6 protein multiple sequence alignment analysis, and some other model docking-based characterization reports are briefly described to highlight the importance of computational approaches in protein function prediction.
Collapse
Affiliation(s)
- Ciro Leonardo Pierri
- Department of Pharmaco-Biology, Laboratory of Biochemistry and Molecular Biology, University of Bari, Va E. Orabona, 4 - 70125 Bari, Italy.
| | | | | |
Collapse
|