1
|
Jiang Y, Wang R, Feng J, Jin J, Liang S, Li Z, Yu Y, Ma A, Su R, Zou Q, Ma Q, Wei L. Explainable Deep Hypergraph Learning Modeling the Peptide Secondary Structure Prediction. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2023; 10:e2206151. [PMID: 36794291 PMCID: PMC10104664 DOI: 10.1002/advs.202206151] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Revised: 01/20/2023] [Indexed: 06/18/2023]
Abstract
Accurately predicting peptide secondary structures remains a challenging task due to the lack of discriminative information in short peptides. In this study, PHAT is proposed, a deep hypergraph learning framework for the prediction of peptide secondary structures and the exploration of downstream tasks. The framework includes a novel interpretable deep hypergraph multi-head attention network that uses residue-based reasoning for structure prediction. The algorithm can incorporate sequential semantic information from large-scale biological corpus and structural semantic information from multi-scale structural segmentation, leading to better accuracy and interpretability even with extremely short peptides. The interpretable models are able to highlight the reasoning of structural feature representations and the classification of secondary substructures. The importance of secondary structures in peptide tertiary structure reconstruction and downstream functional analysis is further demonstrated, highlighting the versatility of our models. To facilitate the use of the model, an online server is established which is accessible via http://inner.wei-group.net/PHAT/. The work is expected to assist in the design of functional peptides and contribute to the advancement of structural biology research.
Collapse
Affiliation(s)
- Yi Jiang
- School of SoftwareShandong UniversityJinanShandong250101China
- Joint SDU‐NTU Centre for Artificial Intelligence Research (C‐FAIR)Shandong UniversityJinanShandong250101China
| | - Ruheng Wang
- School of SoftwareShandong UniversityJinanShandong250101China
- Joint SDU‐NTU Centre for Artificial Intelligence Research (C‐FAIR)Shandong UniversityJinanShandong250101China
| | - Jiuxin Feng
- School of SoftwareShandong UniversityJinanShandong250101China
- Joint SDU‐NTU Centre for Artificial Intelligence Research (C‐FAIR)Shandong UniversityJinanShandong250101China
| | - Junru Jin
- School of SoftwareShandong UniversityJinanShandong250101China
- Joint SDU‐NTU Centre for Artificial Intelligence Research (C‐FAIR)Shandong UniversityJinanShandong250101China
| | - Sirui Liang
- School of SoftwareShandong UniversityJinanShandong250101China
- Joint SDU‐NTU Centre for Artificial Intelligence Research (C‐FAIR)Shandong UniversityJinanShandong250101China
| | - Zhongshen Li
- School of SoftwareShandong UniversityJinanShandong250101China
- Joint SDU‐NTU Centre for Artificial Intelligence Research (C‐FAIR)Shandong UniversityJinanShandong250101China
| | - Yingying Yu
- School of SoftwareShandong UniversityJinanShandong250101China
- Joint SDU‐NTU Centre for Artificial Intelligence Research (C‐FAIR)Shandong UniversityJinanShandong250101China
| | - Anjun Ma
- Department of Biomedical InformaticsCollege of MedicineThe Ohio State UniversityColumbusOH43210USA
| | - Ran Su
- College of Intelligence and ComputingTianjin UniversityTianjin300350China
| | - Quan Zou
- Institute of Fundamental and Frontier SciencesUniversity of Electronic Science and Technology of ChinaChengduSichuan610054China
| | - Qin Ma
- Department of Biomedical InformaticsCollege of MedicineThe Ohio State UniversityColumbusOH43210USA
| | - Leyi Wei
- School of SoftwareShandong UniversityJinanShandong250101China
- Joint SDU‐NTU Centre for Artificial Intelligence Research (C‐FAIR)Shandong UniversityJinanShandong250101China
| |
Collapse
|
2
|
Rozano L, Mukuka YM, Hane JK, Mancera RL. Ab Initio Modelling of the Structure of ToxA-like and MAX Fungal Effector Proteins. Int J Mol Sci 2023; 24:ijms24076262. [PMID: 37047233 PMCID: PMC10094246 DOI: 10.3390/ijms24076262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 03/09/2023] [Accepted: 03/21/2023] [Indexed: 03/29/2023] Open
Abstract
Pathogenic fungal diseases in crops are mediated by the release of effector proteins that facilitate infection. Characterising the structure of these fungal effectors is vital to understanding their virulence mechanisms and interactions with their hosts, which is crucial in the breeding of plant cultivars for disease resistance. Several effectors have been identified and validated experimentally; however, their lack of sequence conservation often impedes the identification and prediction of their structure using sequence similarity approaches. Structural similarity has, nonetheless, been observed within fungal effector protein families, creating interest in validating the use of computational methods to predict their tertiary structure from their sequence. We used Rosetta ab initio modelling to predict the structures of members of the ToxA-like and MAX effector families for which experimental structures are known to validate this method. An optimised approach was then used to predict the structures of phenotypically validated effectors lacking known structures. Rosetta was found to successfully predict the structure of fungal effectors in the ToxA-like and MAX families, as well as phenotypically validated but structurally unconfirmed effector sequences. Interestingly, potential new effector structural families were identified on the basis of comparisons with structural homologues and the identification of associated protein domains.
Collapse
|
3
|
Pan Q, Nguyen TB, Ascher DB, Pires DEV. Systematic evaluation of computational tools to predict the effects of mutations on protein stability in the absence of experimental structures. Brief Bioinform 2022; 23:bbac025. [PMID: 35189634 PMCID: PMC9155634 DOI: 10.1093/bib/bbac025] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Revised: 01/13/2022] [Accepted: 01/30/2022] [Indexed: 12/26/2022] Open
Abstract
Changes in protein sequence can have dramatic effects on how proteins fold, their stability and dynamics. Over the last 20 years, pioneering methods have been developed to try to estimate the effects of missense mutations on protein stability, leveraging growing availability of protein 3D structures. These, however, have been developed and validated using experimentally derived structures and biophysical measurements. A large proportion of protein structures remain to be experimentally elucidated and, while many studies have based their conclusions on predictions made using homology models, there has been no systematic evaluation of the reliability of these tools in the absence of experimental structural data. We have, therefore, systematically investigated the performance and robustness of ten widely used structural methods when presented with homology models built using templates at a range of sequence identity levels (from 15% to 95%) and contrasted performance with sequence-based tools, as a baseline. We found there is indeed performance deterioration on homology models built using templates with sequence identity below 40%, where sequence-based tools might become preferable. This was most marked for mutations in solvent exposed residues and stabilizing mutations. As structure prediction tools improve, the reliability of these predictors is expected to follow, however we strongly suggest that these factors should be taken into consideration when interpreting results from structure-based predictors of mutation effects on protein stability.
Collapse
Affiliation(s)
- Qisheng Pan
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria 3004, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane City, Queensland 4072, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, 30 Flemington Rd, Parkville, Victoria 3052, Australia
| | - Thanh Binh Nguyen
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria 3004, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane City, Queensland 4072, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, 30 Flemington Rd, Parkville, Victoria 3052, Australia
| | - David B Ascher
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria 3004, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane City, Queensland 4072, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, 30 Flemington Rd, Parkville, Victoria 3052, Australia
- Department of Biochemistry, University of Cambridge, 80 Tennis Ct Rd, Cambridge CB2 1GA, UK
| | - Douglas E V Pires
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria 3004, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane City, Queensland 4072, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, 30 Flemington Rd, Parkville, Victoria 3052, Australia
- School of Computing and Information Systems, University of Melbourne, Melbourne, Victoria 3053, Australia
| |
Collapse
|
4
|
Abstract
For two decades, Rosetta has consistently been at the forefront of protein structure
prediction. While it has become a very large package comprising programs, scripts, and tools, for
different types of macromolecular modelling such as ligand docking, protein-protein docking,
protein design, and loop modelling, it started as the implementation of an algorithm for ab initio
protein structure prediction. The term ’Rosetta’ appeared for the first time twenty years ago in the
literature to describe that algorithm and its contribution to the third edition of the community wide
Critical Assessment of techniques for protein Structure Prediction (CASP3). Similar to the Rosetta
stone that allowed deciphering the ancient Egyptian civilisation, David Baker and his co-workers
have been contributing to deciphering ’the second half of the genetic code’. Although the focus of
Baker’s team has expended to de novo protein design in the past few years, Rosetta’s ‘fame’ is
associated with its fragment-assembly protein structure prediction approach. Following a
presentation of the main concepts underpinning its foundation, especially sequence-structure
correlation and usage of fragments, we review the main stages of its developments and highlight
the milestones it has achieved in terms of protein structure prediction, particularly in CASP.
Collapse
Affiliation(s)
- Jad Abbass
- Department of Computer Science, Lebanese International University, Bekaa, Lebanon
| | - Jean-Christophe Nebel
- Faculty of Science, Engineering and Computing, Kingston University, London, KT1 2EE, United Kingdom
| |
Collapse
|
5
|
Abbass J, Nebel JC. Enhancing fragment-based protein structure prediction by customising fragment cardinality according to local secondary structure. BMC Bioinformatics 2020; 21:170. [PMID: 32357827 PMCID: PMC7195757 DOI: 10.1186/s12859-020-3491-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Accepted: 04/13/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Whenever suitable template structures are not available, usage of fragment-based protein structure prediction becomes the only practical alternative as pure ab initio techniques require massive computational resources even for very small proteins. However, inaccuracy of their energy functions and their stochastic nature imposes generation of a large number of decoys to explore adequately the solution space, limiting their usage to small proteins. Taking advantage of the uneven complexity of the sequence-structure relationship of short fragments, we adjusted the fragment insertion process by customising the number of available fragment templates according to the expected complexity of the predicted local secondary structure. Whereas the number of fragments is kept to its default value for coil regions, important and dramatic reductions are proposed for beta sheet and alpha helical regions, respectively. RESULTS The evaluation of our fragment selection approach was conducted using an enhanced version of the popular Rosetta fragment-based protein structure prediction tool. It was modified so that the number of fragment candidates used in Rosetta could be adjusted based on the local secondary structure. Compared to Rosetta's standard predictions, our strategy delivered improved first models, + 24% and + 6% in terms of GDT, when using 2000 and 20,000 decoys, respectively, while reducing significantly the number of fragment candidates. Furthermore, our enhanced version of Rosetta is able to deliver with 2000 decoys a performance equivalent to that produced by standard Rosetta while using 20,000 decoys. We hypothesise that, as the fragment insertion process focuses on the most challenging regions, such as coils, fewer decoys are needed to explore satisfactorily conformation spaces. CONCLUSIONS Taking advantage of the high accuracy of sequence-based secondary structure predictions, we showed the value of that information to customise the number of candidates used during the fragment insertion process of fragment-based protein structure prediction. Experimentations conducted using standard Rosetta showed that, when using the recommended number of decoys, i.e. 20,000, our strategy produces better results. Alternatively, similar results can be achieved using only 2000 decoys. Consequently, we recommend the adoption of this strategy to either improve significantly model quality or reduce processing times by a factor 10.
Collapse
Affiliation(s)
- Jad Abbass
- Faculty of Science, Engineering and Computing, Kingston University, London, KT1 2EE UK
- Department of Computer Science, Lebanese International University, Bekaa, Lebanon
| | - Jean-Christophe Nebel
- Faculty of Science, Engineering and Computing, Kingston University, London, KT1 2EE UK
| |
Collapse
|
6
|
Martin OA, Vorobjev Y, Scheraga HA, Vila JA. Outline of an experimental design aimed to detect a protein A mirror image in solution. PEERJ PHYSICAL CHEMISTRY 2019; 1. [PMID: 34079958 DOI: 10.7717/peerj-pchem.2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
There is abundant theoretical evidence indicating that a mirror image of Protein A may occur during the protein folding process. However, as to whether such mirror image exists in solution is an unsolved issue. Here we provide outline of an experimental design aimed to detect the mirror image of Protein A in solution. The proposal is based on computational simulations indicating that the use of a mutant of protein A, namely Q10H, could be used to detect the mirror image conformation in solution. Our results indicate that the native conformation of the protein A should have a pKa, for the Q10H mutant, at ≈6.2, while the mirror-image conformation should have a pKa close to ≈7.3. Naturally, if all the population is in the native state for the Q10H mutant, the pKa should be ≈6.2, while, if all are in the mirror-image state, it would be ≈7.3, and, if it is a mixture, the pKa should be largerthan 6.2, presumably in proportion to the mirror population. In addition, evidence is provided indicating the tautomeric distribution of H10 must also change between the native and mirror conformations. Although this may not be completely relevant for the purpose of determining whether the protein A mirror image exists in solution, it could provide valuable information to validate the pKa findings. We hope this proposal will foster experimental work on this problem either by direct application of our proposed experimental design or serving as inspiration and motivation for other experiments.
Collapse
Affiliation(s)
- Osvaldo A Martin
- Instituto de Matemática Aplicada San Luis, UNSL-CONICET, San Luis, Argentina
| | - Yury Vorobjev
- Institute of Chemical Biology and Fundamental Medicine, Siberian Branch of the Russian Academy of Science, Novosibirsk, Russia
| | - Harold A Scheraga
- Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, New York, United States
| | - Jorge A Vila
- Instituto de Matemática Aplicada San Luis, UNSL-CONICET, San Luis, Argentina.,Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, New York, United States
| |
Collapse
|
7
|
Baiesi M, Orlandini E, Seno F, Trovato A. Sequence and structural patterns detected in entangled proteins reveal the importance of co-translational folding. Sci Rep 2019; 9:8426. [PMID: 31182755 PMCID: PMC6557820 DOI: 10.1038/s41598-019-44928-3] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2019] [Accepted: 05/23/2019] [Indexed: 11/09/2022] Open
Abstract
Proteins must fold quickly to acquire their biologically functional three-dimensional native structures. Hence, these are mainly stabilized by local contacts, while intricate topologies such as knots are rare. Here, we reveal the existence of specific patterns adopted by protein sequences and structures to deal with backbone self-entanglement. A large scale analysis of the Protein Data Bank shows that loops significantly intertwined with another chain portion are typically closed by weakly bound amino acids. Why is this energetic frustration maintained? A possible picture is that entangled loops are formed only toward the end of the folding process to avoid kinetic traps. Consistently, these loops are more frequently found to be wrapped around a portion of the chain on their N-terminal side, the one translated earlier at the ribosome. Finally, these motifs are less abundant in natural native states than in simulated protein-like structures, yet they appear in 32% of proteins, which in some cases display an amazingly complex intertwining.
Collapse
Affiliation(s)
- Marco Baiesi
- Department of Physics and Astronomy, University of Padova, Via Marzolo 8, I-35131, Padova, Italy.,INFN, Sezione di Padova, Via Marzolo 8, I-35131, Padova, Italy
| | - Enzo Orlandini
- Department of Physics and Astronomy, University of Padova, Via Marzolo 8, I-35131, Padova, Italy.,INFN, Sezione di Padova, Via Marzolo 8, I-35131, Padova, Italy
| | - Flavio Seno
- Department of Physics and Astronomy, University of Padova, Via Marzolo 8, I-35131, Padova, Italy. .,INFN, Sezione di Padova, Via Marzolo 8, I-35131, Padova, Italy.
| | - Antonio Trovato
- Department of Physics and Astronomy, University of Padova, Via Marzolo 8, I-35131, Padova, Italy.,INFN, Sezione di Padova, Via Marzolo 8, I-35131, Padova, Italy
| |
Collapse
|
8
|
Kc DB. Recent advances in sequence-based protein structure prediction. Brief Bioinform 2018; 18:1021-1032. [PMID: 27562963 DOI: 10.1093/bib/bbw070] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2016] [Indexed: 11/13/2022] Open
Abstract
The most accurate characterizations of the structure of proteins are provided by structural biology experiments. However, because of the high cost and labor-intensive nature of the structural experiments, the gap between the number of protein sequences and solved structures is widening rapidly. Development of computational methods to accurately model protein structures from sequences is becoming increasingly important to the biological community. In this article, we highlight some important progress in the field of protein structure prediction, especially those related to free modeling (FM) methods that generate structure models without using homologous templates. We also provide a short synopsis of some of the recent advances in FM approaches as demonstrated in the recent Computational Assessment of Structure Prediction competition as well as recent trends and outlook for FM approaches in protein structure prediction.
Collapse
|
9
|
An Energy Landscape Treatment of Decoy Selection in Template-Free Protein Structure Prediction. COMPUTATION 2018. [DOI: 10.3390/computation6020039] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
10
|
Álvarez Ó, Fernández-Martínez JL, Fernández-Brillet C, Cernea A, Fernández-Muñiz Z, Kloczkowski A. Principal component analysis in protein tertiary structure prediction. J Bioinform Comput Biol 2018; 16:1850005. [PMID: 29566640 DOI: 10.1142/s0219720018500051] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
We discuss applicability of principal component analysis (PCA) for protein tertiary structure prediction from amino acid sequence. The algorithm presented in this paper belongs to the category of protein refinement models and involves establishing a low-dimensional space where the sampling (and optimization) is carried out via particle swarm optimizer (PSO). The reduced space is found via PCA performed for a set of low-energy protein models previously found using different optimization techniques. A high frequency term is added into this expansion by projecting the best decoy into the PCA basis set and calculating the residual model. This term is aimed at providing high frequency details in the energy optimization. The goal of this research is to analyze how the dimensionality reduction affects the prediction capability of the PSO procedure. For that purpose, different proteins from the Critical Assessment of Techniques for Protein Structure Prediction experiments were modeled. In all the cases, both the energy of the best decoy and the distance to the native structure have decreased. Our analysis also shows how the predicted backbone structure of native conformation and of alternative low energy states varies with respect to the PCA dimensionality. Generally speaking, the reconstruction can be successfully achieved with 10 principal components and the high frequency term. We also provide a computational analysis of protein energy landscape for the inverse problem of reconstructing structure from the reduced number of principal components, showing that the dimensionality reduction alleviates the ill-posed character of this high-dimensional energy optimization problem. The procedure explained in this paper is very fast and allows testing different PCA expansions. Our results show that PSO improves the energy of the best decoy used in the PCA when the adequate number of PCA terms is considered.
Collapse
Affiliation(s)
- Óscar Álvarez
- * Group of Inverse Problems, Optimization and Machine Learning, Department of Mathematics, University of Oviedo, C. Federico García Lorca, 18, 33007 Oviedo, Spain
| | - Juan Luis Fernández-Martínez
- * Group of Inverse Problems, Optimization and Machine Learning, Department of Mathematics, University of Oviedo, C. Federico García Lorca, 18, 33007 Oviedo, Spain
| | - Celia Fernández-Brillet
- * Group of Inverse Problems, Optimization and Machine Learning, Department of Mathematics, University of Oviedo, C. Federico García Lorca, 18, 33007 Oviedo, Spain
| | - Ana Cernea
- * Group of Inverse Problems, Optimization and Machine Learning, Department of Mathematics, University of Oviedo, C. Federico García Lorca, 18, 33007 Oviedo, Spain
| | - Zulima Fernández-Muñiz
- * Group of Inverse Problems, Optimization and Machine Learning, Department of Mathematics, University of Oviedo, C. Federico García Lorca, 18, 33007 Oviedo, Spain
| | - Andrzej Kloczkowski
- † Batelle Center for Mathematical Medicine, Nationwide Children's Hospital, Columbus, OH, USA.,‡ Department of Pediatrics, The Ohio State University, Columbus, OH, USA
| |
Collapse
|
11
|
Li B, Fooksa M, Heinze S, Meiler J. Finding the needle in the haystack: towards solving the protein-folding problem computationally. Crit Rev Biochem Mol Biol 2018; 53:1-28. [PMID: 28976219 PMCID: PMC6790072 DOI: 10.1080/10409238.2017.1380596] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2017] [Revised: 08/22/2017] [Accepted: 09/13/2017] [Indexed: 12/22/2022]
Abstract
Prediction of protein tertiary structures from amino acid sequence and understanding the mechanisms of how proteins fold, collectively known as "the protein folding problem," has been a grand challenge in molecular biology for over half a century. Theories have been developed that provide us with an unprecedented understanding of protein folding mechanisms. However, computational simulation of protein folding is still difficult, and prediction of protein tertiary structure from amino acid sequence is an unsolved problem. Progress toward a satisfying solution has been slow due to challenges in sampling the vast conformational space and deriving sufficiently accurate energy functions. Nevertheless, several techniques and algorithms have been adopted to overcome these challenges, and the last two decades have seen exciting advances in enhanced sampling algorithms, computational power and tertiary structure prediction methodologies. This review aims at summarizing these computational techniques, specifically conformational sampling algorithms and energy approximations that have been frequently used to study protein-folding mechanisms or to de novo predict protein tertiary structures. We hope that this review can serve as an overview on how the protein-folding problem can be studied computationally and, in cases where experimental approaches are prohibitive, help the researcher choose the most relevant computational approach for the problem at hand. We conclude with a summary of current challenges faced and an outlook on potential future directions.
Collapse
Affiliation(s)
- Bian Li
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
- Center for Structural Biology, Vanderbilt University, Nashville, TN, USA
| | - Michaela Fooksa
- Center for Structural Biology, Vanderbilt University, Nashville, TN, USA
- Chemical and Physical Biology Graduate Program, Vanderbilt University, Nashville, TN, USA
| | - Sten Heinze
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
- Center for Structural Biology, Vanderbilt University, Nashville, TN, USA
| | - Jens Meiler
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
- Center for Structural Biology, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
12
|
Voshol GP, Vijgenboom E, Punt PJ. The discovery of novel LPMO families with a new Hidden Markov model. BMC Res Notes 2017; 10:105. [PMID: 28222763 PMCID: PMC5320794 DOI: 10.1186/s13104-017-2429-8] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2016] [Accepted: 02/15/2017] [Indexed: 12/20/2022] Open
Abstract
Background Renewable biopolymers, such as cellulose, starch and chitin are highly resistance to enzymatic degradation. Therefore, there is a need to upgrade current degradation processes by including novel enzymes. Lytic polysaccharide mono-oxygenases (LPMOs) can disrupt recalcitrant biopolymers, thereby enhancing hydrolysis by conventional enzymes. However, novel LPMO families are difficult to identify using existing methods. Therefore, we developed a novel profile Hidden Markov model (HMM) and used it to mine genomes of ascomycetous fungi for novel LPMOs. Results We constructed a structural alignment and verified that the alignment was correct. In the alignment we identified several known conserved features, such as the histidine brace and the N/Q/E-X-F/Y motif and previously unidentified conserved proline and glycine residues. These residues are distal from the active site, suggesting a role in structure rather than activity. The multiple protein alignment was subsequently used to build a profile Hidden Markov model. This model was initially tested on manually curated datasets and proved to be both sensitive (no false negatives) and specific (no false positives). In some of the genomes analyzed we identified a yet unknown LPMO family. This new family is mostly confined to the phyla of Ascomycota and Basidiomycota and the class of Oomycota. Genomic clustering indicated that at least some members might be involved in the degradation of β-glucans, while transcriptomic data suggested that others are possibly involved in the degradation of pectin. Conclusions The newly developed profile hidden Markov Model was successfully used to mine fungal genomes for a novel family of LPMOs. However, the model is not limited to bacterial and fungal genomes. This is illustrated by the fact that the model was also able to identify another new LPMO family in Drosophila melanogaster. Furthermore, the Hidden Markov model was used to verify the more distant blast hits from the new fungal family of LPMOs, which belong to the Bivalves, Stony corals and Sea anemones. So this Hidden Markov model (Additional file 3) will help the broader scientific community in identifying other yet unknown LPMOs. Electronic supplementary material The online version of this article (doi:10.1186/s13104-017-2429-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Gerben P Voshol
- Molecular Microbiology and Health, Institute of Biology Leiden, Leiden University, Leiden, The Netherlands
| | - Erik Vijgenboom
- Molecular Microbiology and Health, Institute of Biology Leiden, Leiden University, Leiden, The Netherlands
| | - Peter J Punt
- Molecular Microbiology and Health, Institute of Biology Leiden, Leiden University, Leiden, The Netherlands. .,Dutch DNA Biotech B.V., Utrechtseweg 48, 3703HE, Zeist, The Netherlands.
| |
Collapse
|
13
|
Piwowar M, Matczyńska E, Malawski M, Szapieniec T, Roterman-Konieczna I. Genetic traces of never born proteins. BIO-ALGORITHMS AND MED-SYSTEMS 2017. [DOI: 10.1515/bams-2017-0006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
AbstractThe presented results cover issues related to proteins that were “never born in nature”. The paper is focused on identifying genetic information stretches of protein sequences that were not identified to be existing in nature. The aim of the work was finding traces of “never born proteins” (NBP) everywhere in completely sequenced genomes including regions not expected as carrying the genetic information. The results of analyses relate to the search of the genetic material of species from different levels of the evolutionary tree from yeast through plant organisms up to the human genome. The analysis concerns searching the genome sequences. There are presented statistical details such as sequence frequencies, their length, percent identity and similarity of alignments, as well as E value of sequences found. Computations were performed on gLite-based grid environment. The results of the analyses showed that the NBP genetic record in the genomes of the studied organisms is absent at a significant level in terms of identity of contents and length of the sequences found. Most of the found sequences considered to be similar do not exceed 50% of the length of the NBP output sequences, which confirms that the genetic record of proteins is not accidental in terms of composition of gene sequences but also as regards the place of recording in genomes of living organisms.
Collapse
|
14
|
Critical Features of Fragment Libraries for Protein Structure Prediction. PLoS One 2017; 12:e0170131. [PMID: 28085928 PMCID: PMC5235372 DOI: 10.1371/journal.pone.0170131] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2016] [Accepted: 12/29/2016] [Indexed: 11/19/2022] Open
Abstract
The use of fragment libraries is a popular approach among protein structure prediction methods and has proven to substantially improve the quality of predicted structures. However, some vital aspects of a fragment library that influence the accuracy of modeling a native structure remain to be determined. This study investigates some of these features. Particularly, we analyze the effect of using secondary structure prediction guiding fragments selection, different fragments sizes and the effect of structural clustering of fragments within libraries. To have a clearer view of how these factors affect protein structure prediction, we isolated the process of model building by fragment assembly from some common limitations associated with prediction methods, e.g., imprecise energy functions and optimization algorithms, by employing an exact structure-based objective function under a greedy algorithm. Our results indicate that shorter fragments reproduce the native structure more accurately than the longer. Libraries composed of multiple fragment lengths generate even better structures, where longer fragments show to be more useful at the beginning of the simulations. The use of many different fragment sizes shows little improvement when compared to predictions carried out with libraries that comprise only three different fragment sizes. Models obtained from libraries built using only sequence similarity are, on average, better than those built with a secondary structure prediction bias. However, we found that the use of secondary structure prediction allows greater reduction of the search space, which is invaluable for prediction methods. The results of this study can be critical guidelines for the use of fragment libraries in protein structure prediction.
Collapse
|
15
|
Yang Y, Heffernan R, Paliwal K, Lyons J, Dehzangi A, Sharma A, Wang J, Sattar A, Zhou Y. SPIDER2: A Package to Predict Secondary Structure, Accessible Surface Area, and Main-Chain Torsional Angles by Deep Neural Networks. Methods Mol Biol 2017; 1484:55-63. [PMID: 27787820 DOI: 10.1007/978-1-4939-6406-2_6] [Citation(s) in RCA: 101] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
Predicting one-dimensional structure properties has played an important role to improve prediction of protein three-dimensional structures and functions. The most commonly predicted properties are secondary structure and accessible surface area (ASA) representing local and nonlocal structural characteristics, respectively. Secondary structure prediction is further complemented by prediction of continuous main-chain torsional angles. Here we describe a newly developed method SPIDER2 that utilizes three iterations of deep learning neural networks to improve the prediction accuracy of several structural properties simultaneously. For an independent test set of 1199 proteins SPIDER2 achieves 82 % accuracy for secondary structure prediction, 0.76 for the correlation coefficient between predicted and actual solvent accessible surface area, 19° and 30° for mean absolute errors of backbone φ and ψ angles, respectively, and 8° and 32° for mean absolute errors of Cα-based θ and τ angles, respectively. The method provides state-of-the-art, all-in-one accurate prediction of local structure and solvent accessible surface area. The method is implemented, as a webserver along with a standalone package that are available in our website: http://sparks-lab.org .
Collapse
Affiliation(s)
- Yuedong Yang
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Gold Coast Campus, Science 1 (G24) 2.10, Parklands Drive, Southport, QLD, 4222, Australia
| | - Rhys Heffernan
- Signal Processing Laboratory, School of Engineering, Griffith University, Brisbane, QLD, Australia
| | - Kuldip Paliwal
- Signal Processing Laboratory, School of Engineering, Griffith University, Brisbane, QLD, Australia
| | - James Lyons
- Signal Processing Laboratory, School of Engineering, Griffith University, Brisbane, QLD, Australia
| | - Abdollah Dehzangi
- Department of Psychiatry, Medical Research Center, University of Iowa, Iowa City, IA, USA
| | - Alok Sharma
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, QLD, Australia
- School of Engineering and Physics, University of the South Pacific, Private Mail Bag, Laucala Campus, Suva, Fiji
| | - Jihua Wang
- Shandong Provincial Key Laboratory of Functional Macromolecular Biophysics, Dezhou University, Dezhou, Shandong, China
| | - Abdul Sattar
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, QLD, Australia
- National ICT Australia (NICTA), Brisbane, QLD, Australia
| | - Yaoqi Zhou
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Gold Coast Campus, Science 1 (G24) 2.10, Parklands Drive, Southport, QLD, 4222, Australia.
| |
Collapse
|
16
|
Leelananda SP, Lindert S. Computational methods in drug discovery. Beilstein J Org Chem 2016; 12:2694-2718. [PMID: 28144341 PMCID: PMC5238551 DOI: 10.3762/bjoc.12.267] [Citation(s) in RCA: 280] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2016] [Accepted: 11/22/2016] [Indexed: 12/11/2022] Open
Abstract
The process for drug discovery and development is challenging, time consuming and expensive. Computer-aided drug discovery (CADD) tools can act as a virtual shortcut, assisting in the expedition of this long process and potentially reducing the cost of research and development. Today CADD has become an effective and indispensable tool in therapeutic development. The human genome project has made available a substantial amount of sequence data that can be used in various drug discovery projects. Additionally, increasing knowledge of biological structures, as well as increasing computer power have made it possible to use computational methods effectively in various phases of the drug discovery and development pipeline. The importance of in silico tools is greater than ever before and has advanced pharmaceutical research. Here we present an overview of computational methods used in different facets of drug discovery and highlight some of the recent successes. In this review, both structure-based and ligand-based drug discovery methods are discussed. Advances in virtual high-throughput screening, protein structure prediction methods, protein-ligand docking, pharmacophore modeling and QSAR techniques are reviewed.
Collapse
Affiliation(s)
- Sumudu P Leelananda
- Department of Chemistry and Biochemistry, Ohio State University, Columbus, OH 43210, USA
| | - Steffen Lindert
- Department of Chemistry and Biochemistry, Ohio State University, Columbus, OH 43210, USA
| |
Collapse
|
17
|
Heffernan R, Dehzangi A, Lyons J, Paliwal K, Sharma A, Wang J, Sattar A, Zhou Y, Yang Y. Highly accurate sequence-based prediction of half-sphere exposures of amino acid residues in proteins. Bioinformatics 2015; 32:843-9. [DOI: 10.1093/bioinformatics/btv665] [Citation(s) in RCA: 69] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2015] [Accepted: 11/07/2015] [Indexed: 11/14/2022] Open
Abstract
Abstract
Motivation: Solvent exposure of amino acid residues of proteins plays an important role in understanding and predicting protein structure, function and interactions. Solvent exposure can be characterized by several measures including solvent accessible surface area (ASA), residue depth (RD) and contact numbers (CN). More recently, an orientation-dependent contact number called half-sphere exposure (HSE) was introduced by separating the contacts within upper and down half spheres defined according to the Cα-Cβ (HSEβ) vector or neighboring Cα-Cα vectors (HSEα). HSEα calculated from protein structures was found to better describe the solvent exposure over ASA, CN and RD in many applications. Thus, a sequence-based prediction is desirable, as most proteins do not have experimentally determined structures. To our best knowledge, there is no method to predict HSEα and only one method to predict HSEβ.
Results: This study developed a novel method for predicting both HSEα and HSEβ (SPIDER-HSE) that achieved a consistent performance for 10-fold cross validation and two independent tests. The correlation coefficients between predicted and measured HSEβ (0.73 for upper sphere, 0.69 for down sphere and 0.76 for contact numbers) for the independent test set of 1199 proteins are significantly higher than existing methods. Moreover, predicted HSEα has a higher correlation coefficient (0.46) to the stability change by residue mutants than predicted HSEβ (0.37) and ASA (0.43). The results, together with its easy Cα-atom-based calculation, highlight the potential usefulness of predicted HSEα for protein structure prediction and refinement as well as function prediction.
Availability and implementation: The method is available at http://sparks-lab.org.
Contact: yuedong.yang@griffith.edu.au or yaoqi.zhou@griffith.edu.au
Supplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rhys Heffernan
- Signal Processing Laboratory, School of Engineering, Griffith University, Brisbane, Australia,
| | - Abdollah Dehzangi
- Signal Processing Laboratory, School of Engineering, Griffith University, Brisbane, Australia,
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia,
- Medical Research Center (MRC), Department of Psychiatry, University of Iowa, Iowa City, USA,
| | - James Lyons
- Signal Processing Laboratory, School of Engineering, Griffith University, Brisbane, Australia,
| | - Kuldip Paliwal
- Signal Processing Laboratory, School of Engineering, Griffith University, Brisbane, Australia,
| | - Alok Sharma
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia,
- School of Engineering and Physics, University of the South Pacific, Private Mail Bag, Laucala Campus, Suva, Fiji,
| | - Jihua Wang
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, Shandong 253023, China,
| | - Abdul Sattar
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia,
- National ICT Australia (NICTA), Brisbane, Australia and
| | - Yaoqi Zhou
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, Shandong 253023, China,
- Institute for Glycomics and School of Information and Communication Technique, Griffith University, Parklands Dr. Southport, QLD 4222, Australia
| | - Yuedong Yang
- Institute for Glycomics and School of Information and Communication Technique, Griffith University, Parklands Dr. Southport, QLD 4222, Australia
| |
Collapse
|
18
|
Zhang Y, Sagui C. Secondary structure assignment for conformationally irregular peptides: comparison between DSSP, STRIDE and KAKSI. J Mol Graph Model 2014; 55:72-84. [PMID: 25424660 DOI: 10.1016/j.jmgm.2014.10.005] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2014] [Accepted: 10/08/2014] [Indexed: 11/25/2022]
Abstract
Secondary structure assignment codes were built to explore the regularities associated with the periodic motifs of proteins, such as those in backbone dihedral angles or in hydrogen bonds between backbone atoms. Precise structure assignment is challenging because real-life secondary structures are susceptible to bending, twist, fraying and other deformations that can distance them from their geometrical prototypes. Although results from codes such as DSSP and STRIDE converge in well-ordered structures, the agreement between the secondary structure assignments is known to deteriorate as the conformations become more distorted. Conformationally irregular peptides therefore offer a great opportunity to explore the differences between these codes. This is especially important for unfolded proteins and intrinsically disordered proteins, which are known to exhibit residual and/or transient secondary structure whose characterization is challenging. In this work, we have carried out Molecular Dynamics simulations of (relatively) disordered peptides, specifically gp41659-671 (ELLELDKWASLWN), the homopeptide polyasparagine (N18), and polyasparagine dimers. We have analyzed the resulting conformations with DSSP and STRIDE, based on hydrogen-bond patterns (and dihedral angles for STRIDE), and KAKSI, based on α-Carbon distances; and carefully characterized the differences in structural assignments. The full-sequence Segment Overlap (SOV) scores, that quantify the agreement between two secondary structure assignments, vary from 70% for gp41659-671 (STRIDE as reference) to 49% for N18 (DSSP as reference). Major differences are observed in turns, in the distinction between α and 310 helices, and in short parallel-sheet segments.
Collapse
Affiliation(s)
- Yuan Zhang
- Department of Physics, North Carolina State University, Raleigh, NC 27695, United States; Center for High Performance Simulations (CHiPS), North Carolina State University, Raleigh, NC 27695, United States
| | - Celeste Sagui
- Department of Physics, North Carolina State University, Raleigh, NC 27695, United States; Center for High Performance Simulations (CHiPS), North Carolina State University, Raleigh, NC 27695, United States.
| |
Collapse
|
19
|
Hoffmann F, Vancea I, Kamat SG, Strodel B. Protein structure prediction: assembly of secondary structure elements by basin-hopping. Chemphyschem 2014; 15:3378-90. [PMID: 25056272 DOI: 10.1002/cphc.201402247] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2014] [Indexed: 12/30/2022]
Abstract
The prediction of protein tertiary structure from primary structure remains a challenging task. One possible approach to this problem is the application of basin-hopping global optimization combined with an all-atom force field. In this work, the efficiency of basin-hopping is improved by introducing an approach that derives tertiary structures from the secondary structure assignments of individual residues. This approach is termed secondary-to-tertiary basin-hopping and benchmarked for three miniproteins: trpzip, trp-cage and ER-10. For each of the three miniproteins, the secondary-to-tertiary basin-hopping approach successfully and reliably predicts their three-dimensional structure. When it is applied to larger proteins, correctly folded structures are obtained. It can be concluded that the assembly of secondary structure elements using basin-hopping is a promising tool for de novo protein structure prediction.
Collapse
Affiliation(s)
- Falk Hoffmann
- Institute of Complex Systems: Structural Biochemistry, Forschungszentrum Jülich, 52425 Jülich (Germany)
| | | | | | | |
Collapse
|
20
|
Xu B, Wang Y, Liang H, Li G. Structural Based Strategy for Predicting Transcription Factor Binding Sites. Bio Protoc 2013. [DOI: 10.21769/bioprotoc.794] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022] Open
|
21
|
Karakaş M, Woetzel N, Staritzbichler R, Alexander N, Weiner BE, Meiler J. BCL::Fold--de novo prediction of complex and large protein topologies by assembly of secondary structure elements. PLoS One 2012; 7:e49240. [PMID: 23173050 PMCID: PMC3500284 DOI: 10.1371/journal.pone.0049240] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2012] [Accepted: 10/07/2012] [Indexed: 01/10/2023] Open
Abstract
Computational de novo protein structure prediction is limited to small proteins of simple topology. The present work explores an approach to extend beyond the current limitations through assembling protein topologies from idealized α-helices and β-strands. The algorithm performs a Monte Carlo Metropolis simulated annealing folding simulation. It optimizes a knowledge-based potential that analyzes radius of gyration, β-strand pairing, secondary structure element (SSE) packing, amino acid pair distance, amino acid environment, contact order, secondary structure prediction agreement and loop closure. Discontinuation of the protein chain favors sampling of non-local contacts and thereby creation of complex protein topologies. The folding simulation is accelerated through exclusion of flexible loop regions further reducing the size of the conformational search space. The algorithm is benchmarked on 66 proteins with lengths between 83 and 293 amino acids. For 61 out of these proteins, the best SSE-only models obtained have an RMSD100 below 8.0 Å and recover more than 20% of the native contacts. The algorithm assembles protein topologies with up to 215 residues and a relative contact order of 0.46. The method is tailored to be used in conjunction with low-resolution or sparse experimental data sets which often provide restraints for regions of defined secondary structure.
Collapse
Affiliation(s)
- Mert Karakaş
- Department of Chemistry, Center for Structural Biology, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Nils Woetzel
- Department of Chemistry, Center for Structural Biology, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Rene Staritzbichler
- Department of Chemistry, Center for Structural Biology, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Nathan Alexander
- Department of Chemistry, Center for Structural Biology, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Brian E. Weiner
- Department of Chemistry, Center for Structural Biology, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Jens Meiler
- Department of Chemistry, Center for Structural Biology, Vanderbilt University, Nashville, Tennessee, United States of America
| |
Collapse
|
22
|
Gniewek P, Kolinski A, Jernigan RL, Kloczkowski A. Elastic network normal modes provide a basis for protein structure refinement. J Chem Phys 2012; 136:195101. [PMID: 22612113 DOI: 10.1063/1.4710986] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
It is well recognized that thermal motions of atoms in the protein native state, the fluctuations about the minimum of the global free energy, are well reproduced by the simple elastic network models (ENMs) such as the anisotropic network model (ANM). Elastic network models represent protein dynamics as vibrations of a network of nodes (usually represented by positions of the heavy atoms or by the C(α) atoms only for coarse-grained representations) in which the spatially close nodes are connected by harmonic springs. These models provide a reliable representation of the fluctuational dynamics of proteins and RNA, and explain various conformational changes in protein structures including those important for ligand binding. In the present paper, we study the problem of protein structure refinement by analyzing thermal motions of proteins in non-native states. We represent the conformational space close to the native state by a set of decoys generated by the I-TASSER protein structure prediction server utilizing template-free modeling. The protein substates are selected by hierarchical structure clustering. The main finding is that thermal motions for some substates, overlap significantly with the deformations necessary to reach the native state. Additionally, more mobile residues yield higher overlaps with the required deformations than do the less mobile ones. These findings suggest that structural refinement of poorly resolved protein models can be significantly enhanced by reduction of the conformational space to the motions imposed by the dominant normal modes.
Collapse
Affiliation(s)
- Pawel Gniewek
- Laboratory of Theory of Biopolymers, Faculty of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland
| | | | | | | |
Collapse
|
23
|
Strunk T, Wolf M, Brieg M, Klenin K, Biewer A, Tristram F, Ernst M, Kleine PJ, Heilmann N, Kondov I, Wenzel W. SIMONA 1.0: an efficient and versatile framework for stochastic simulations of molecular and nanoscale systems. J Comput Chem 2012; 33:2602-13. [PMID: 22886395 DOI: 10.1002/jcc.23089] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2012] [Revised: 07/24/2012] [Accepted: 07/25/2012] [Indexed: 11/05/2022]
Abstract
Molecular simulation methods have increasingly contributed to our understanding of molecular and nanoscale systems. However, the family of Monte Carlo techniques has taken a backseat to molecular dynamics based methods, which is also reflected in the number of available simulation packages. Here, we report the development of a generic, versatile simulation package for stochastic simulations and demonstrate its application to protein conformational change, protein-protein association, small-molecule protein docking, and simulation of the growth of nanoscale clusters of organic molecules. Simulation of molecular and nanoscale systems (SIMONA) is easy to use for standard simulations via a graphical user interface and highly parallel both via MPI and the use of graphical processors. It is also extendable to many additional simulations types. Being freely available to academic users, we hope it will enable a large community of researchers in the life- and materials-sciences to use and extend SIMONA in the future. SIMONA is available for download under http://int.kit.edu/nanosim/simona.
Collapse
Affiliation(s)
- T Strunk
- Institute of Nanotechnology, Karlsruhe Institute of Technology, PO Box 3640, 76021 Karlsruhe, Germany
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
24
|
Vishnepolsky B, Managadze G, Grigolava M, Pirtskhalava M. Evaluation performance of substitution matrices, based on contacts between residue terminal groups. J Biomol Struct Dyn 2012; 30:180-90. [PMID: 22702729 DOI: 10.1080/07391102.2012.677769] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Sequence alignment is a standard method for the estimation of the evolutionary, structural, and functional relationships among amino acid sequences. The quality of alignments depends on the used similarity matrix. Statistical contact potentials (CPs) contain information on contact propensities among residues in native protein structures. Substitution matrices (SMs) based on CPs are applicable for the comparison of distantly related sequences. Here, contact between amino acids was estimated on the basis of the evaluation of the distances between side-chain terminal groups (SCTGs), which are defined as the group of the side-chain heavy atoms with fixed distances between them. In this paper, two new types of CPs and similarity matrices have been constructed: one based on fixed cutoff distance obtained from geometric characteristics of the SCTGs (TGC1), while the other is distance-dependent potential (TGC2). These matrices are compared with other popular SMs. The performance of the matrices was evaluated by comparing sequence with structural alignments. The obtained results show that TGC2 has the best performance among contact-based matrices, but on the whole, contact-based matrices have slightly lower performance than other SMs except fold-level similarity.
Collapse
Affiliation(s)
- Boris Vishnepolsky
- Life Science Research Centre, Laboratory of Bioinformatics, 14 Gotua St, Tbilisi, 0160, Georgia.
| | | | | | | |
Collapse
|
25
|
Maftei M, Tian X, Manea M, Exner TE, Schwanzar D, von Arnim CAF, Przybylski M. Interaction structure of the complex between neuroprotective factor humanin and Alzheimer's β-amyloid peptide revealed by affinity mass spectrometry and molecular modeling. J Pept Sci 2012; 18:373-82. [PMID: 22522311 DOI: 10.1002/psc.2404] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2011] [Revised: 01/19/2012] [Accepted: 01/20/2012] [Indexed: 02/02/2023]
Abstract
Humanin (HN) is a linear 24-aa peptide recently detected in human Alzheimer's disease (AD) brain. HN specifically inhibits neuronal cell death in vitro induced by ß-amyloid (Aß) peptides and by amyloid precursor protein and its gene mutations in familial AD, thereby representing a potential therapeutic lead structure for AD; however, its molecular mechanism of action is not well understood. We report here the identification of the binding epitopes between HN and Aß(1-40) and characterization of the interaction structure through a molecular modeling study. Wild-type HN and HN-sequence mutations were synthesized by SPPS and the HPLC-purified peptides characterized by MALDI-MS. The interaction epitopes between HN and Aß(1-40) were identified by affinity-MS using proteolytic epitope excision and extraction, followed by elution and mass spectrometric characterization of the affinity-bound peptides. The affinity-MS analyses revealed HN(5-15) as the epitope sequence of HN, whereas Aß(17-28) was identified as the Aß interaction epitope. The epitopes and binding sites were ascertained by ELISA of the complex of HN peptides with immobilized Aß(1-40) and by ELISA with Aß(1-40) and Aß-partial sequences as ligands to immobilized HN. The specificity and affinity of the HN-Aß interaction were characterized by direct ESI-MS of the HN-Aß(1-40) complex and by bioaffinity analysis using a surface acoustic wave biosensor, providing a K(D) of the complex of 610 nm. A molecular dynamics simulation of the HN-Aß(1-40) complex was consistent with the binding specificity and shielding effects of the HN and Aß interaction epitopes. These results indicate a specific strong association of HN and Aß(1-40) polypeptide and provide a molecular basis for understanding the neuroprotective function of HN.
Collapse
Affiliation(s)
- Madalina Maftei
- Laboratory of Analytical Chemistry and Biopolymer Structure Analysis, Department of Chemistry, University of Konstanz, 78457, Konstanz, Germany
| | | | | | | | | | | | | |
Collapse
|
26
|
Du S, Harano Y, Kinoshita M, Sakurai M. A scoring function based on solvation thermodynamics for protein structure prediction. Biophysics (Nagoya-shi) 2012; 8:127-38. [PMID: 27493529 PMCID: PMC4629643 DOI: 10.2142/biophysics.8.127] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2012] [Accepted: 07/31/2012] [Indexed: 12/01/2022] Open
Abstract
We predict protein structure using our recently developed free energy function for describing protein stability, which is focused on solvation thermodynamics. The function is combined with the current most reliable sampling methods, i.e., fragment assembly (FA) and comparative modeling (CM). The prediction is tested using 11 small proteins for which high-resolution crystal structures are available. For 8 of these proteins, sequence similarities are found in the database, and the prediction is performed with CM. Fairly accurate models with average Cα root mean square deviation (RMSD) ∼ 2.0 Å are successfully obtained for all cases. For the rest of the target proteins, we perform the prediction following FA protocols. For 2 cases, we obtain predicted models with an RMSD ∼ 3.0 Å as the best-scored structures. For the other case, the RMSD remains larger than 7 Å. For all the 11 target proteins, our scoring function identifies the experimentally determined native structure as the best structure. Starting from the predicted structure, replica exchange molecular dynamics is performed to further refine the structures. However, we are unable to improve its RMSD toward the experimental structure. The exhaustive sampling by coarse-grained normal mode analysis around the native structures reveals that our function has a linear correlation with RMSDs < 3.0 Å. These results suggest that the function is quite reliable for the protein structure prediction while the sampling method remains one of the major limiting factors in it. The aspects through which the methodology could further be improved are discussed.
Collapse
Affiliation(s)
- Shiqiao Du
- Center for Biological Resources and Informatics, Tokyo Institute of Technology, Yokohama 226-8501, Japan
| | - Yuichi Harano
- Institute for Protein Research, Osaka University, Suita, Osaka 565-0871, Japan
| | - Masahiro Kinoshita
- Institute of Advanced Energy, Kyoto University, Uji, Kyoto 611-0011, Japan
| | - Minoru Sakurai
- Center for Biological Resources and Informatics, Tokyo Institute of Technology, Yokohama 226-8501, Japan
| |
Collapse
|
27
|
Gniewek P, Kolinski A, Jernigan RL, Kloczkowski A. How noise in force fields can affect the structural refinement of protein models? Proteins 2011; 80:335-41. [PMID: 22223184 DOI: 10.1002/prot.23240] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2011] [Revised: 10/19/2011] [Accepted: 10/30/2011] [Indexed: 12/27/2022]
Abstract
Structural refinement of predicted models of biological macromolecules using atomistic or coarse-grained molecular force fields having various degree of error is investigated. The goal of this analysis is to estimate what is the probability for designing an effective structural refinement based on computations of conformational energies using force field, and starting from a structure predicted from the sequence (using template-based or template-free modeling), and refining it to bring the structure into closer proximity to the native state. It is widely believed that it should be possible to develop such a successful structure refinement algorithm by applying an iterative procedure with stochastic sampling and appropriate energy function, which assesses the quality (correctness) of protein decoys. Here, an analysis of noise in an artificially introduced scoring function is investigated for a model of an ideal sampling scheme, where the underlying distribution of RMSDs is assumed to be Gaussian. Sampling of the conformational space is performed by random generation of RMSD values. We demonstrate that whenever the random noise in a force field exceeds some level, it is impossible to obtain reliable structural refinement. The magnitude of the noise, above which a structural refinement, on average is impossible, depends strongly on the quality of sampling scheme and a size of the protein. Finally, possible strategies to overcome the intrinsic limitations in the force fields for impacting the development of successful refinement algorithms are discussed.
Collapse
Affiliation(s)
- Pawel Gniewek
- Faculty of Chemistry, Laboratory of Theory of Biopolymers, University of Warsaw, Warsaw, Poland; Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, Ames, Iowa
| | | | | | | |
Collapse
|
28
|
Marks DS, Colwell LJ, Sheridan R, Hopf TA, Pagnani A, Zecchina R, Sander C. Protein 3D structure computed from evolutionary sequence variation. PLoS One 2011; 6:e28766. [PMID: 22163331 PMCID: PMC3233603 DOI: 10.1371/journal.pone.0028766] [Citation(s) in RCA: 739] [Impact Index Per Article: 56.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2011] [Accepted: 11/14/2011] [Indexed: 11/19/2022] Open
Abstract
The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing. In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy. We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues., including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7–4.8 Å Cα-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org). This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of protein structures, new strategies in protein and drug design, and the identification of functional genetic variants in normal and disease genomes.
Collapse
Affiliation(s)
- Debora S Marks
- Department of Systems Biology, Harvard Medical School, Boston, Massachusetts, United States of America.
| | | | | | | | | | | | | |
Collapse
|
29
|
Scoring function based on weighted residue network. Int J Mol Sci 2011; 12:8773-86. [PMID: 22272103 PMCID: PMC3257100 DOI: 10.3390/ijms12128773] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2011] [Revised: 11/04/2011] [Accepted: 11/28/2011] [Indexed: 11/17/2022] Open
Abstract
Molecular docking is an important method for the research of protein-protein interaction and recognition. A protein can be considered as a network when the residues are treated as its nodes. With the contact energy between residues as link weight, a weighted residue network is constructed in this paper. Two weighted parameters (strength and weighted average nearest neighbors' degree) are introduced into this model at the same time. The stability of a protein is characterized by its strength. The global topological properties of the protein-protein complex are reflected by the weighted average nearest neighbors' degree. Based on this weighted network model and these two parameters, a new docking scoring function is proposed in this paper. The scoring and ranking for 42 systems' bound and unbounded docking results are performed with this new scoring function. Comparing the results obtained from this new scoring function with that from the pair potentials scoring function, we found that this new scoring function has a similar performance to the pair potentials on some items, and this new scoring function can get a better success rate. The calculation of this new scoring function is easy, and the result of its scoring and ranking is acceptable. This work can help us better understand the mechanisms of protein-protein interactions and recognition.
Collapse
|
30
|
Sette P, Mu R, Dussupt V, Jiang J, Snyder G, Smith P, Xiao TS, Bouamr F. The Phe105 loop of Alix Bro1 domain plays a key role in HIV-1 release. Structure 2011; 19:1485-95. [PMID: 21889351 PMCID: PMC3195861 DOI: 10.1016/j.str.2011.07.016] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2011] [Revised: 07/08/2011] [Accepted: 07/19/2011] [Indexed: 01/07/2023]
Abstract
Alix and cellular paralogs HD-PTP and Brox contain N-terminal Bro1 domains that bind ESCRT-III CHMP4. In contrast to HD-PTP and Brox, expression of the Bro1 domain of Alix alleviates HIV-1 release defects that result from interrupted access to ESCRT. In an attempt to elucidate this functional discrepancy, we solved the crystal structures of the Bro1 domains of HD-PTP and Brox. They revealed typical "boomerang" folds they share with the Bro1 Alix domain. However, they each contain unique structural features that may be relevant to their specific function(s). In particular, phenylalanine residue in position 105 (Phe105) of Alix belongs to a long loop that is unique to its Bro1 domain. Concurrently, mutation of Phe105 and surrounding residues at the tip of the loop compromise the function of Alix in HIV-1 budding without affecting its interactions with Gag or CHMP4. These studies identify a new functional determinant in the Bro1 domain of Alix.
Collapse
Affiliation(s)
- Paola Sette
- Laboratory of Molecular Microbiology, Structural Immunobiology Unit, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, 20892, MD, USA
| | - Ruiling Mu
- Laboratory of Immunology, Structural Immunobiology Unit, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, 20892, MD, USA
| | - Vincent Dussupt
- Laboratory of Molecular Microbiology, Structural Immunobiology Unit, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, 20892, MD, USA
| | - Jiansheng Jiang
- Laboratory of Immunology, Structural Immunobiology Unit, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, 20892, MD, USA
| | - Greg Snyder
- Laboratory of Immunology, Structural Immunobiology Unit, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, 20892, MD, USA
| | - Patrick Smith
- Laboratory of Immunology, Structural Immunobiology Unit, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, 20892, MD, USA
| | - Tsan. Sam Xiao
- Laboratory of Immunology, Structural Immunobiology Unit, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, 20892, MD, USA
- Corresponding authors. Laboratory of Molecular Microbiology, NIAID, NIH, 4 Center Dr, Bethesda, MD, 20892, Phone: 301 496 4099, Fax: 301 402 0226, . Laboratory of Immunology, NIAID, NIH, 4 Center Dr, Bethesda, MD, 20892, Phone: 301 402 9782, Fax: 301 480 1291,
| | - Fadila Bouamr
- Laboratory of Molecular Microbiology, Structural Immunobiology Unit, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, 20892, MD, USA
- Corresponding authors. Laboratory of Molecular Microbiology, NIAID, NIH, 4 Center Dr, Bethesda, MD, 20892, Phone: 301 496 4099, Fax: 301 402 0226, . Laboratory of Immunology, NIAID, NIH, 4 Center Dr, Bethesda, MD, 20892, Phone: 301 402 9782, Fax: 301 480 1291,
| |
Collapse
|
31
|
Hoque MT, Chetty M, Lewis A, Sattar A. Twin removal in genetic algorithms for protein structure prediction using low-resolution model. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:234-245. [PMID: 21071811 DOI: 10.1109/tcbb.2009.34] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
This paper presents the impact of twins and the measures for their removal from the population of genetic algorithm (GA) when applied to effective conformational searching. It is conclusively shown that a twin removal strategy for a GA provides considerably enhanced performance when investigating solutions to complex ab initio protein structure prediction (PSP) problems in low-resolution model. Without twin removal, GA crossover and mutation operations can become ineffectual as generations lose their ability to produce significant differences, which can lead to the solution stalling. The paper relaxes the definition of chromosomal twins in the removal strategy to not only encompass identical, but also highly correlated chromosomes within the GA population, with empirical results consistently exhibiting significant improvements solving PSP problems.
Collapse
Affiliation(s)
- Md Tamjidul Hoque
- Griffith University, Nathan campus, 170 Kessels Road, Nathan, Brisbane, Qld 4111, Australia.
| | | | | | | |
Collapse
|
32
|
Hu X, Hu H, Beratan DN, Yang W. A gradient-directed Monte Carlo approach for protein design. J Comput Chem 2010; 31:2164-8. [PMID: 20186860 DOI: 10.1002/jcc.21506] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
We develop a new global optimization strategy, gradient-directed Monte Carlo (GDMC) sampling, to optimize protein sequence for a target structure using RosettaDesign. GDMC significantly improves the sampling of sequence space, compared to the classical Monte Carlo search protocol, for a fixed backbone conformation as well as for the simultaneous optimization of sequence and structure. As such, GDMC sampling enhances the efficiency of protein design.
Collapse
Affiliation(s)
- Xiangqian Hu
- Department of Chemistry, French Family Science Center, Duke University, Durham, North Carolina 27708-0346, USA
| | | | | | | |
Collapse
|
33
|
Hirst SJ, Alexander N, McHaourab HS, Meiler J. RosettaEPR: an integrated tool for protein structure determination from sparse EPR data. J Struct Biol 2010; 173:506-14. [PMID: 21029778 DOI: 10.1016/j.jsb.2010.10.013] [Citation(s) in RCA: 96] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2010] [Revised: 10/19/2010] [Accepted: 10/21/2010] [Indexed: 11/17/2022]
Abstract
Site-directed spin labeling electron paramagnetic resonance (SDSL-EPR) is often used for the structural characterization of proteins that elude other techniques, such as X-ray crystallography and nuclear magnetic resonance (NMR). However, high-resolution structures are difficult to obtain due to uncertainty in the spin label location and sparseness of experimental data. Here, we introduce RosettaEPR, which has been designed to improve de novo high-resolution protein structure prediction using sparse SDSL-EPR distance data. The "motion-on-a-cone" spin label model is converted into a knowledge-based potential, which was implemented as a scoring term in Rosetta. RosettaEPR increased the fractions of correctly folded models ( [Formula: see text] <7.5Å) and models accurate at medium resolution ( [Formula: see text] <3.5Å) by 25%. The correlation of score and model quality increased from 0.42 when using no restraints to 0.51 when using bounded restraints and again to 0.62 when using RosettaEPR. This allowed for the selection of accurate models by score. After full-atom refinement, RosettaEPR yielded a 1.7Å model of T4-lysozyme, thus indicating that atomic detail models can be achieved by combining sparse EPR data with Rosetta. While these results indicate RosettaEPR's potential utility in high-resolution protein structure prediction, they are based on a single example. In order to affirm the method's general performance, it must be tested on a larger and more versatile dataset of proteins.
Collapse
Affiliation(s)
- Stephanie J Hirst
- Center for Structural Biology, Vanderbilt University, Nashville, TN 37212, USA
| | | | | | | |
Collapse
|
34
|
|
35
|
Pierri CL, Parisi G, Porcelli V. Computational approaches for protein function prediction: a combined strategy from multiple sequence alignment to molecular docking-based virtual screening. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2010; 1804:1695-712. [PMID: 20433957 DOI: 10.1016/j.bbapap.2010.04.008] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/19/2010] [Revised: 03/04/2010] [Accepted: 04/14/2010] [Indexed: 12/12/2022]
Abstract
The functional characterization of proteins represents a daily challenge for biochemical, medical and computational sciences. Although finally proved on the bench, the function of a protein can be successfully predicted by computational approaches that drive the further experimental assays. Current methods for comparative modeling allow the construction of accurate 3D models for proteins of unknown structure, provided that a crystal structure of a homologous protein is available. Binding regions can be proposed by using binding site predictors, data inferred from homologous crystal structures, and data provided from a careful interpretation of the multiple sequence alignment of the investigated protein and its homologs. Once the location of a binding site has been proposed, chemical ligands that have a high likelihood of binding can be identified by using ligand docking and structure-based virtual screening of chemical libraries. Most docking algorithms allow building a list sorted by energy of the lowest energy docking configuration for each ligand of the library. In this review the state-of-the-art of computational approaches in 3D protein comparative modeling and in the study of protein-ligand interactions is provided. Furthermore a possible combined/concerted multistep strategy for protein function prediction, based on multiple sequence alignment, comparative modeling, binding region prediction, and structure-based virtual screening of chemical libraries, is described by using suitable examples. As practical examples, Abl-kinase molecular modeling studies, HPV-E6 protein multiple sequence alignment analysis, and some other model docking-based characterization reports are briefly described to highlight the importance of computational approaches in protein function prediction.
Collapse
Affiliation(s)
- Ciro Leonardo Pierri
- Department of Pharmaco-Biology, Laboratory of Biochemistry and Molecular Biology, University of Bari, Va E. Orabona, 4 - 70125 Bari, Italy.
| | | | | |
Collapse
|
36
|
Kaufmann KW, Lemmon GH, Deluca SL, Sheehan JH, Meiler J. Practically useful: what the Rosetta protein modeling suite can do for you. Biochemistry 2010; 49:2987-98. [PMID: 20235548 PMCID: PMC2850155 DOI: 10.1021/bi902153g] [Citation(s) in RCA: 282] [Impact Index Per Article: 20.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
![]()
The objective of this review is to enable researchers to use the software package Rosetta for biochemical and biomedicinal studies. We provide a brief review of the six most frequent research problems tackled with Rosetta. For each of these six tasks, we provide a tutorial that illustrates a basic Rosetta protocol. The Rosetta method was originally developed for de novo protein structure prediction and is regularly one of the best performers in the community-wide biennial Critical Assessment of Structure Prediction. Predictions for protein domains with fewer than 125 amino acids regularly have a backbone root-mean-square deviation of better than 5.0 Å. More impressively, there are several cases in which Rosetta has been used to predict structures with atomic level accuracy better than 2.5 Å. In addition to de novo structure prediction, Rosetta also has methods for molecular docking, homology modeling, determining protein structures from sparse experimental NMR or EPR data, and protein design. Rosetta has been used to accurately design a novel protein structure, predict the structure of protein−protein complexes, design altered specificity protein−protein and protein−DNA interactions, and stabilize proteins and protein complexes. Most recently, Rosetta has been used to solve the X-ray crystallographic phase problem.
Collapse
Affiliation(s)
- Kristian W Kaufmann
- Department of Chemistry, Vanderbilt University, 7330 Stevenson Center, Station B 351822, Nashville, Tennessee 37235, USA
| | | | | | | | | |
Collapse
|
37
|
Maupetit J, Derreumaux P, Tufféry P. A fast method for large-scale de novo peptide and miniprotein structure prediction. J Comput Chem 2010; 31:726-38. [PMID: 19569182 DOI: 10.1002/jcc.21365] [Citation(s) in RCA: 93] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Although peptides have many biological and biomedical implications, an accurate method predicting their equilibrium structural ensembles from amino acid sequences and suitable for large-scale experiments is still missing. We introduce a new approach-PEP-FOLD-to the de novo prediction of peptides and miniproteins. It first predicts, in the terms of a Hidden Markov Model-derived structural alphabet, a limited number of local conformations at each position of the structure. It then performs their assembly using a greedy procedure driven by a coarse-grained energy score. On a benchmark of 52 peptides with 9-23 amino acids, PEP-FOLD generates lowest-energy conformations within 2.8 and 2.3 A Calpha root-mean-square deviation from the full nuclear magnetic resonance structures (NMR) and the NMR rigid cores, respectively, outperforming previous approaches. For 13 miniproteins with 27-49 amino acids, PEP-FOLD reaches an accuracy of 3.6 and 4.6 A Calpha root-mean-square deviation for the most-native and lowest-energy conformations, using the nonflexible regions identified by NMR. PEP-FOLD simulations are fast-a few minutes only-opening therefore, the door to in silico large-scale rational design of new bioactive peptides and miniproteins.
Collapse
Affiliation(s)
- Julien Maupetit
- MTi, INSERM UMR-S973 and RPBS, Université Paris Diderot - Paris 7, 5 rue Marie-Andrée Lagroua Weill-Halle, 75205 Paris, Cedex 13, France
| | | | | |
Collapse
|
38
|
Menke M, Berger B, Cowen L. Markov random fields reveal an N-terminal double beta-propeller motif as part of a bacterial hybrid two-component sensor system. Proc Natl Acad Sci U S A 2010; 107:4069-74. [PMID: 20147619 PMCID: PMC2819974 DOI: 10.1073/pnas.0909950107] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The recent explosion in newly sequenced bacterial genomes is outpacing the capacity of researchers to try to assign functional annotation to all the new proteins. Hence, computational methods that can help predict structural motifs provide increasingly important clues in helping to determine how these proteins might function. We introduce a Markov Random Field approach tailored for recognizing proteins that fold into mainly beta-structural motifs, and apply it to build recognizers for the beta-propeller shapes. As an application, we identify a potential class of hybrid two-component sensor proteins, that we predict contain a double-propeller domain.
Collapse
Affiliation(s)
- Matt Menke
- Tufts University, Medford, MA 02155; and
- Massachusetts Institute of Technology, Cambridge, MA 02139
| | - Bonnie Berger
- Massachusetts Institute of Technology, Cambridge, MA 02139
| | | |
Collapse
|
39
|
Feng Y, Kloczkowski A, Jernigan RL. Potentials 'R' Us web-server for protein energy estimations with coarse-grained knowledge-based potentials. BMC Bioinformatics 2010; 11:92. [PMID: 20163737 PMCID: PMC3098114 DOI: 10.1186/1471-2105-11-92] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2009] [Accepted: 02/17/2010] [Indexed: 11/13/2022] Open
Abstract
Background Knowledge-based potentials have been widely used in the last 20 years for fold recognition, protein structure prediction from amino acid sequence, ligand binding, protein design, and many other purposes. However generally these are not readily accessible online. Results Our new knowledge-based potential server makes available many of these potentials for easy use to automatically compute the energies of protein structures or models supplied. Our web server for protein energy estimation uses four-body potentials, short-range potentials, and 23 different two-body potentials. Users can select potentials according to their needs and preferences. Files containing the coordinates of protein atoms in the PDB format can be uploaded as input. The results will be returned to the user's email address. Conclusions Our Potentials 'R'Us server is an easily accessible, freely available tool with a web interface that collects all existing and future protein coarse-grained potentials and computes energies of multiple structural models.
Collapse
Affiliation(s)
- Yaping Feng
- Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, Ames, IA 50011-0320, USA
| | | | | |
Collapse
|
40
|
Zhou T, Shu N, Hovmöller S. A novel method for accurate one-dimensional protein structure prediction based on fragment matching. Bioinformatics 2009; 26:470-7. [DOI: 10.1093/bioinformatics/btp679] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
41
|
He Y, Xiao Y, Liwo A, Scheraga HA. Exploring the parameter space of the coarse-grained UNRES force field by random search: selecting a transferable medium-resolution force field. J Comput Chem 2009; 30:2127-35. [PMID: 19242966 DOI: 10.1002/jcc.21215] [Citation(s) in RCA: 60] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
We explored the energy-parameter space of our coarse-grained UNRES force field for large-scale ab initio simulations of protein folding, to obtain good initial approximations for hierarchical optimization of the force field with new virtual-bond-angle bending and side-chain-rotamer potentials which we recently introduced to replace the statistical potentials. 100 sets of energy-term weights were generated randomly, and good sets were selected by carrying out replica-exchange molecular dynamics simulations of two peptides with a minimal alpha-helical and a minimal beta-hairpin fold, respectively: the tryptophan cage (PDB code: 1L2Y) and tryptophan zipper (PDB code: 1LE1). Eight sets of parameters produced native-like structures of these two peptides. These eight sets were tested on two larger proteins: the engrailed homeodomain (PDB code: 1ENH) and FBP WW domain (PDB code: 1E0L); two sets were found to produce native-like conformations of these proteins. These two sets were tested further on a larger set of nine proteins with alpha or alpha + beta structure and found to locate native-like structures of most of them. These results demonstrate that, in addition to finding reasonable initial starting points for optimization, an extensive search of parameter space is a powerful method to produce a transferable force field.
Collapse
Affiliation(s)
- Yi He
- Biomolecular Physics and Modeling Group, School of Physics, Huazhong University of Science and Technology, Wuhan 430074, China
| | | | | | | |
Collapse
|
42
|
Pokarowski P, Kloczkowski A, Nowakowski S, Pokarowska M, Jernigan RL, Kolinski A. Ideal amino acid exchange forms for approximating substitution matrices. Proteins 2009; 69:379-93. [PMID: 17623859 DOI: 10.1002/prot.21509] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
We have analyzed 29 published substitution matrices (SMs) and five statistical protein contact potentials (CPs) for comparison. We find that popular, 'classical' SMs obtained mainly from sequence alignments of globular proteins are mostly correlated by at least a value of 0.9. The BLOSUM62 is the central element of this group. A second group includes SMs derived from alignments of remote homologs or transmembrane proteins. These matrices correlate better with classical SMs (0.8) than among themselves (0.7). A third group consists of intermediate links between SMs and CPs - matrices and potentials that exhibit mutual correlations of at least 0.8. Next, we show that SMs can be approximated with a correlation of 0.9 by expressions c(0) + x(i)x(j) + y(i)y(j) + z(i)z(j), 1<or= i, j <or= 20, where c(0) is a constant and the vectors (x(i)), (y(i)), (z(i)) correlate highly with hydrophobicity, molecular volume and coil preferences of amino acids, respectively. The present paper is the continuation of our work (Pokarowski et al., Proteins 2005;59:49-57), where similar approximation were used to derive ideal amino acid interaction forms from CPs. Both approximations allow us to understand general trends in amino acid similarity and can help improve multiple sequence alignments using the fast Fourier transform (MAFFT), fast threading or another methods based on alignments of physicochemical profiles of protein sequences. The use of this approximation in sequence alignments instead of a classical SM yields results that differ by less than 5%. Intermediate links between SMs and CPs, new formulas for approximating these matrices, and the highly significant dependence of classical SMs on coil preferences are new findings.
Collapse
Affiliation(s)
- Piotr Pokarowski
- Institute of Informatics, Faculty of Mathematics, Informatics and Mechanics, Warsaw University, 02-097 Warsaw, Poland.
| | | | | | | | | | | |
Collapse
|
43
|
Wei ZJ, Hong GY, Wei HY, Jiang ST, Lu C. Molecular characters and expression analysis of the gene encoding eclosion hormone from the Asian corn borer,Ostrinia furnacalis. ACTA ACUST UNITED AC 2009; 19:301-7. [PMID: 17852339 DOI: 10.1080/10425170701605849] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
Using rapid amplification of cDNA ends (RACE), the cDNA encoding eclosion hormone (EH) was cloned from the brain of Ostrinia furnacalis. The full Osf-EH cDNA is 986 bp and contains a 267 bp open reading frame encoding an 88 amino acid preprohormone, which including a hydrophobic 26 amino acid signal peptide and a 62 amino acid mature peptide. The mature Osf-EH shows high identity with Manduca sexta (95.2%), Helicoverpa armigera (91.9%) and Bombyx mori (85.5%), but low identify with Tribolium castaneum (63.6%), Drosophila melanogaster (56.5%) and Apis mellifera (54.8%). Using the HMMSTR Prediction Server, the 3D structure of Osf-EH was modeled. There are four beta-turns and three alpha-helixes predicted in Osf-EH, with the pattern of beta-beta-alpha-alpha-beta-beta-alpha. Northern blot analysis indicated a 1.0 kb transcript present only in the brain. The Osf-EH mRNA can not be detected in other neural tissues, such as the suboesophageal ganglion, thoracic ganglion, abdominal ganglion and other non-neural tissues, such as the midgut, fat body and epidermis. The Osf-EH mRNA content in the brain was measured using the combined method of quantitative RT-PCR and Southern blotting, which reached its highest level the day before the molt.
Collapse
Affiliation(s)
- Zhao-Jun Wei
- Department of Biotechnology, Hefei University of Technology, Hefei, People's Republic of China.
| | | | | | | | | |
Collapse
|
44
|
Khatib F, Rohl CA, Karplus K. Pokefind: a novel topological filter for use with protein structure prediction. Bioinformatics 2009; 25:i281-8. [PMID: 19478000 PMCID: PMC2687952 DOI: 10.1093/bioinformatics/btp198] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Our focus has been on detecting topological properties that are rare in real proteins, but occur more frequently in models generated by protein structure prediction methods such as Rosetta. We previously created the Knotfind algorithm, successfully decreasing the frequency of knotted Rosetta models during CASP6. We observed an additional class of knot-like loops that appeared to be equally un-protein-like and yet do not contain a mathematical knot. These topological features are commonly referred to as slip-knots and are caused by the same mechanisms that result in knotted models. Slip-knots are undetectable by the original Knotfind algorithm. We have generalized our algorithm to detect them, and analyzed CASP6 models built using the Rosetta loop modeling method. RESULTS After analyzing known protein structures in the PDB, we found that slip-knots do occur in certain proteins, but are rare and fall into a small number of specific classes. Our group used this new Pokefind algorithm to distinguish between these rare real slip-knots and the numerous classes of slip-knots that we discovered in Rosetta models and models submitted by the various CASP7 servers. The goal of this work is to improve future models created by protein structure prediction methods. Both algorithms are able to detect un-protein-like features that current metrics such as GDT are unable to identify, so these topological filters can also be used as additional assessment tools.
Collapse
Affiliation(s)
- Firas Khatib
- Department of Biomolecular Engineering, University of California, Santa Cruz, CA 95064, USA.
| | | | | |
Collapse
|
45
|
Maupetit J, Derreumaux P, Tuffery P. PEP-FOLD: an online resource for de novo peptide structure prediction. Nucleic Acids Res 2009; 37:W498-503. [PMID: 19433514 PMCID: PMC2703897 DOI: 10.1093/nar/gkp323] [Citation(s) in RCA: 282] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Rational peptide design and large-scale prediction of peptide structure from sequence remain a challenge for chemical biologists. We present PEP-FOLD, an online service, aimed at de novo modelling of 3D conformations for peptides between 9 and 25 amino acids in aqueous solution. Using a hidden Markov model-derived structural alphabet (SA) of 27 four-residue letters, PEP-FOLD first predicts the SA letter profiles from the amino acid sequence and then assembles the predicted fragments by a greedy procedure driven by a modified version of the OPEP coarse-grained force field. Starting from an amino acid sequence, PEP-FOLD performs series of 50 simulations and returns the most representative conformations identified in terms of energy and population. Using a benchmark of 25 peptides with 9–23 amino acids, and considering the reproducibility of the runs, we find that, on average, PEP-FOLD locates lowest energy conformations differing by 2.6 Å Cα root mean square deviation from the full NMR structures. PEP-FOLD can be accessed at http://bioserv.rpbs.univ-paris-diderot.fr/PEP-FOLD
Collapse
Affiliation(s)
- Julien Maupetit
- MTi, INSERM UMR-S 973, - Paris 7, 35 rue H. Brion, F75205, Paris, France
| | | | | |
Collapse
|
46
|
Brunette TJ, Brock O. Guiding conformation space search with an all-atom energy potential. Proteins 2008; 73:958-72. [PMID: 18536015 DOI: 10.1002/prot.22123] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
The most significant impediment for protein structure prediction is the inadequacy of conformation space search. Conformation space is too large and the energy landscape too rugged for existing search methods to consistently find near-optimal minima. To alleviate this problem, we present model-based search, a novel conformation space search method. Model-based search uses highly accurate information obtained during search to build an approximate, partial model of the energy landscape. Model-based search aggregates information in the model as it progresses, and in turn uses this information to guide exploration toward regions most likely to contain a near-optimal minimum. We validate our method by predicting the structure of 32 proteins, ranging in length from 49 to 213 amino acids. Our results demonstrate that model-based search is more effective at finding low-energy conformations in high-dimensional conformation spaces than existing search methods. The reduction in energy translates into structure predictions of increased accuracy.
Collapse
Affiliation(s)
- T J Brunette
- Robotics and Biology Laboratory, Department of Computer Science, University of Massachusetts Amherst, Amherst, Massachusetts 01003-9264, USA
| | | |
Collapse
|
47
|
Maisuradze GG, Liwo A, Scheraga HA. Principal component analysis for protein folding dynamics. J Mol Biol 2008; 385:312-29. [PMID: 18952103 DOI: 10.1016/j.jmb.2008.10.018] [Citation(s) in RCA: 266] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2008] [Revised: 09/01/2008] [Accepted: 10/03/2008] [Indexed: 12/01/2022]
Abstract
Protein folding is considered here by studying the dynamics of the folding of the triple beta-strand WW domain from the Formin-binding protein 28. Starting from the unfolded state and ending either in the native or nonnative conformational states, trajectories are generated with the coarse-grained united residue (UNRES) force field. The effectiveness of principal components analysis (PCA), an already established mathematical technique for finding global, correlated motions in atomic simulations of proteins, is evaluated here for coarse-grained trajectories. The problems related to PCA and their solutions are discussed. The folding and nonfolding of proteins are examined with free-energy landscapes. Detailed analyses of many folding and nonfolding trajectories at different temperatures show that PCA is very efficient for characterizing the general folding and nonfolding features of proteins. It is shown that the first principal component captures and describes in detail the dynamics of a system. Anomalous diffusion in the folding/nonfolding dynamics is examined by the mean-square displacement (MSD) and the fractional diffusion and fractional kinetic equations. The collisionless (or ballistic) behavior of a polypeptide undergoing Brownian motion along the first few principal components is accounted for.
Collapse
Affiliation(s)
- Gia G Maisuradze
- Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, NY 14853-1301, USA
| | | | | |
Collapse
|
48
|
Zhang H, Zhang T, Chen K, Shen S, Ruan J, Kurgan L. Sequence based residue depth prediction using evolutionary information and predicted secondary structure. BMC Bioinformatics 2008; 9:388. [PMID: 18803867 PMCID: PMC2567998 DOI: 10.1186/1471-2105-9-388] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2008] [Accepted: 09/20/2008] [Indexed: 11/29/2022] Open
Abstract
Background Residue depth allows determining how deeply a given residue is buried, in contrast to the solvent accessibility that differentiates between buried and solvent-exposed residues. When compared with the solvent accessibility, the depth allows studying deep-level structures and functional sites, and formation of the protein folding nucleus. Accurate prediction of residue depth would provide valuable information for fold recognition, prediction of functional sites, and protein design. Results A new method, RDPred, for the real-value depth prediction from protein sequence is proposed. RDPred combines information extracted from the sequence, PSI-BLAST scoring matrices, and secondary structure predicted with PSIPRED. Three-fold/ten-fold cross validation based tests performed on three independent, low-identity datasets show that the distance based depth (computed using MSMS) predicted by RDPred is characterized by 0.67/0.67, 0.66/0.67, and 0.64/0.65 correlation with the actual depth, by the mean absolute errors equal 0.56/0.56, 0.61/0.60, and 0.58/0.57, and by the mean relative errors equal 17.0%/16.9%, 18.2%/18.1%, and 17.7%/17.6%, respectively. The mean absolute and the mean relative errors are shown to be statistically significantly better when compared with a method recently proposed by Yuan and Wang [Proteins 2008; 70:509–516]. The results show that three-fold cross validation underestimates the variability of the prediction quality when compared with the results based on the ten-fold cross validation. We also show that the hydrophilic and flexible residues are predicted more accurately than hydrophobic and rigid residues. Similarly, the charged residues that include Lys, Glu, Asp, and Arg are the most accurately predicted. Our analysis reveals that evolutionary information encoded using PSSM is characterized by stronger correlation with the depth for hydrophilic amino acids (AAs) and aliphatic AAs when compared with hydrophobic AAs and aromatic AAs. Finally, we show that the secondary structure of coils and strands is useful in depth prediction, in contrast to helices that have relatively uniform distribution over the protein depth. Application of the predicted residue depth to prediction of buried/exposed residues shows consistent improvements in detection rates of both buried and exposed residues when compared with the competing method. Finally, we contrasted the prediction performance among distance based (MSMS and DPX) and volume based (SADIC) depth definitions. We found that the distance based indices are harder to predict due to the more complex nature of the corresponding depth profiles. Conclusion The proposed method, RDPred, provides statistically significantly better predictions of residue depth when compared with the competing method. The predicted depth can be used to provide improved prediction of both buried and exposed residues. The prediction of exposed residues has implications in characterization/prediction of interactions with ligands and other proteins, while the prediction of buried residues could be used in the context of folding predictions and simulations.
Collapse
Affiliation(s)
- Hua Zhang
- College of Mathematical Science and LPMC, Nankai University, Tianjin, PR China.
| | | | | | | | | | | |
Collapse
|
49
|
Abstract
MOTIVATION The 3D structure of a protein sequence can be assembled from the substructures corresponding to small segments of this sequence. For each small sequence segment, there are only a few more likely substructures. We call them the 'structural alphabet' for this segment. Classical approaches such as ROSETTA used sequence profile and secondary structure information, to predict structural fragments. In contrast, we utilize more structural information, such as solvent accessibility and contact capacity, for finding structural fragments. RESULTS Integer linear programming technique is applied to derive the best combination of these sequence and structural information items. This approach generates significantly more accurate and succinct structural alphabets with more than 50% improvement over the previous accuracies. With these novel structural alphabets, we are able to construct more accurate protein structures than the state-of-art ab initio protein structure prediction programs such as ROSETTA. We are also able to reduce the Kolodny's library size by a factor of 8, at the same accuracy. AVAILABILITY The online FRazor server is under construction.
Collapse
Affiliation(s)
- Shuai Cheng Li
- David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ont. N2L 3G1, Canada.
| | | | | | | | | |
Collapse
|
50
|
Indarte M, Madura JD, Surratt CK. Dopamine transporter comparative molecular modeling and binding site prediction using the LeuT(Aa) leucine transporter as a template. Proteins 2008; 70:1033-46. [PMID: 17847094 DOI: 10.1002/prot.21598] [Citation(s) in RCA: 67] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Pharmacological and behavioral studies indicate that binding of cocaine and the amphetamines by the dopamine transporter (DAT) protein is principally responsible for initiating the euphoria and addiction associated with these drugs. The lack of an X-ray crystal structure for the DAT or any other member of the neurotransmitter:sodium symporter (NSS) family has hindered understanding of psychostimulant recognition at the atomic level; structural information has been obtained largely from mutagenesis and biophysical studies. The recent publication of a crystal structure for the bacterial leucine transporter LeuT(Aa), a distantly related NSS family homolog, provides for the first time a template for three-dimensional comparative modeling of NSS proteins. A novel computational modeling approach using the capabilities of the Molecular Operating Environment program MOE 2005.06 in conjunction with other comparative modeling servers generated the LeuT(Aa)-directed DAT model. Probable dopamine and amphetamine binding sites were identified within the DAT model using multiple docking approaches. Binding sites for the substrate ligands (dopamine and amphetamine) overlapped substantially with the analogous region of the LeuT(Aa) crystal structure for the substrate leucine. The docking predictions implicated DAT side chains known to be critical for high affinity ligand binding and suggest novel mutagenesis targets in elucidating discrete substrate and inhibitor binding sites. The DAT model may guide DAT ligand QSAR studies, and rational design of novel DAT-binding therapeutics.
Collapse
Affiliation(s)
- Martín Indarte
- Division of Pharmaceutical Sciences, Duquesne University, Pittsburgh, Pennsylvania 15282, USA.
| | | | | |
Collapse
|