1
|
Gogoi CR, Rahman A, Saikia B, Baruah A. Protein Dihedral Angle Prediction: The State of the Art. ChemistrySelect 2023. [DOI: 10.1002/slct.202203427] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Affiliation(s)
| | - Aziza Rahman
- Department of Chemistry Dibrugarh University Dibrugarh Assam India
| | - Bondeepa Saikia
- Department of Chemistry Dibrugarh University Dibrugarh Assam India
| | - Anupaul Baruah
- Department of Chemistry Dibrugarh University Dibrugarh Assam India
| |
Collapse
|
2
|
Villalobos-Alva J, Ochoa-Toledo L, Villalobos-Alva MJ, Aliseda A, Pérez-Escamirosa F, Altamirano-Bustamante NF, Ochoa-Fernández F, Zamora-Solís R, Villalobos-Alva S, Revilla-Monsalve C, Kemper-Valverde N, Altamirano-Bustamante MM. Protein Science Meets Artificial Intelligence: A Systematic Review and a Biochemical Meta-Analysis of an Inter-Field. Front Bioeng Biotechnol 2022; 10:788300. [PMID: 35875501 PMCID: PMC9301016 DOI: 10.3389/fbioe.2022.788300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2021] [Accepted: 05/25/2022] [Indexed: 11/23/2022] Open
Abstract
Proteins are some of the most fascinating and challenging molecules in the universe, and they pose a big challenge for artificial intelligence. The implementation of machine learning/AI in protein science gives rise to a world of knowledge adventures in the workhorse of the cell and proteome homeostasis, which are essential for making life possible. This opens up epistemic horizons thanks to a coupling of human tacit–explicit knowledge with machine learning power, the benefits of which are already tangible, such as important advances in protein structure prediction. Moreover, the driving force behind the protein processes of self-organization, adjustment, and fitness requires a space corresponding to gigabytes of life data in its order of magnitude. There are many tasks such as novel protein design, protein folding pathways, and synthetic metabolic routes, as well as protein-aggregation mechanisms, pathogenesis of protein misfolding and disease, and proteostasis networks that are currently unexplored or unrevealed. In this systematic review and biochemical meta-analysis, we aim to contribute to bridging the gap between what we call binomial artificial intelligence (AI) and protein science (PS), a growing research enterprise with exciting and promising biotechnological and biomedical applications. We undertake our task by exploring “the state of the art” in AI and machine learning (ML) applications to protein science in the scientific literature to address some critical research questions in this domain, including What kind of tasks are already explored by ML approaches to protein sciences? What are the most common ML algorithms and databases used? What is the situational diagnostic of the AI–PS inter-field? What do ML processing steps have in common? We also formulate novel questions such as Is it possible to discover what the rules of protein evolution are with the binomial AI–PS? How do protein folding pathways evolve? What are the rules that dictate the folds? What are the minimal nuclear protein structures? How do protein aggregates form and why do they exhibit different toxicities? What are the structural properties of amyloid proteins? How can we design an effective proteostasis network to deal with misfolded proteins? We are a cross-functional group of scientists from several academic disciplines, and we have conducted the systematic review using a variant of the PICO and PRISMA approaches. The search was carried out in four databases (PubMed, Bireme, OVID, and EBSCO Web of Science), resulting in 144 research articles. After three rounds of quality screening, 93 articles were finally selected for further analysis. A summary of our findings is as follows: regarding AI applications, there are mainly four types: 1) genomics, 2) protein structure and function, 3) protein design and evolution, and 4) drug design. In terms of the ML algorithms and databases used, supervised learning was the most common approach (85%). As for the databases used for the ML models, PDB and UniprotKB/Swissprot were the most common ones (21 and 8%, respectively). Moreover, we identified that approximately 63% of the articles organized their results into three steps, which we labeled pre-process, process, and post-process. A few studies combined data from several databases or created their own databases after the pre-process. Our main finding is that, as of today, there are no research road maps serving as guides to address gaps in our knowledge of the AI–PS binomial. All research efforts to collect, integrate multidimensional data features, and then analyze and validate them are, so far, uncoordinated and scattered throughout the scientific literature without a clear epistemic goal or connection between the studies. Therefore, our main contribution to the scientific literature is to offer a road map to help solve problems in drug design, protein structures, design, and function prediction while also presenting the “state of the art” on research in the AI–PS binomial until February 2021. Thus, we pave the way toward future advances in the synthetic redesign of novel proteins and protein networks and artificial metabolic pathways, learning lessons from nature for the welfare of humankind. Many of the novel proteins and metabolic pathways are currently non-existent in nature, nor are they used in the chemical industry or biomedical field.
Collapse
Affiliation(s)
- Jalil Villalobos-Alva
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Luis Ochoa-Toledo
- Instituto de Ciencias Aplicadas y Tecnología (ICAT), Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | - Mario Javier Villalobos-Alva
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Atocha Aliseda
- Instituto de Investigaciones Filosóficas, Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | - Fernando Pérez-Escamirosa
- Instituto de Ciencias Aplicadas y Tecnología (ICAT), Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | | | - Francine Ochoa-Fernández
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Ricardo Zamora-Solís
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Sebastián Villalobos-Alva
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Cristina Revilla-Monsalve
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Nicolás Kemper-Valverde
- Instituto de Ciencias Aplicadas y Tecnología (ICAT), Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
| | - Myriam M. Altamirano-Bustamante
- Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
- *Correspondence: Myriam M. Altamirano-Bustamante,
| |
Collapse
|
3
|
Accurate prediction of protein torsion angles using evolutionary signatures and recurrent neural network. Sci Rep 2021; 11:21033. [PMID: 34702851 PMCID: PMC8548351 DOI: 10.1038/s41598-021-00477-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Accepted: 09/27/2021] [Indexed: 11/08/2022] Open
Abstract
The amino acid sequence of a protein contains all the necessary information to specify its shape, which dictates its biological activities. However, it is challenging and expensive to experimentally determine the three-dimensional structure of proteins. The backbone torsion angles play a critical role in protein structure prediction, and accurately predicting the angles can considerably advance the tertiary structure prediction by accelerating efficient sampling of the large conformational space for low energy structures. Here we first time propose evolutionary signatures computed from protein sequence profiles, and a novel recurrent architecture, termed ESIDEN, that adopts a straightforward architecture of recurrent neural networks with a small number of learnable parameters. The ESIDEN can capture efficient information from both the classic and new features benefiting from different recurrent architectures in processing information. On the other hand, compared to widely used classic features, the new features, especially the Ramachandran basin potential, provide statistical and evolutionary information to improve prediction accuracy. On four widely used benchmark datasets, the ESIDEN significantly improves the accuracy in predicting the torsion angles by comparison to the best-so-far methods. As demonstrated in the present study, the predicted angles can be used as structural constraints to accurately infer protein tertiary structures. Moreover, the proposed features would pave the way to improve machine learning-based methods in protein folding and structure prediction, as well as function prediction. The source code and data are available at the website https://kornmann.bioch.ox.ac.uk/leri/resources/download.html .
Collapse
|
4
|
Modelling and Prediction of Monthly Global Irradiation Using Different Prediction Models. ENERGIES 2021. [DOI: 10.3390/en14082332] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Different prediction models (multiple linear regression, vector support machines, artificial neural networks and random forests) are applied to model the monthly global irradiation (MGI) from different input variables (latitude, longitude and altitude of meteorological station, month, average temperatures, among others) of different areas of Galicia (Spain). The models were trained, validated and queried using data from three stations, and each best model was checked in two independent stations. The results obtained confirmed that the best methodology is the ANN model which presents the lowest RMSE value in the validation and querying phases 1226 kJ/(m2∙day) and 1136 kJ/(m2∙day), respectively, and predict conveniently for independent stations, 2013 kJ/(m2∙day) and 2094 kJ/(m2∙day), respectively. Given the good results obtained, it is convenient to continue with the design of artificial neural networks applied to the analysis of monthly global irradiation.
Collapse
|
5
|
Armstrong DA, Kaas Q, Rosengren KJ. Prediction of disulfide dihedral angles using chemical shifts. Chem Sci 2018; 9:6548-6556. [PMID: 30310586 PMCID: PMC6115640 DOI: 10.1039/c8sc01423j] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2018] [Accepted: 07/02/2018] [Indexed: 01/02/2023] Open
Abstract
Cystine residues result from the formation of disulfide bonds between pairs of cysteine residues. This cross linking of the backbone is essential for the structure and activity of peptides and proteins. The conformation of a cystine side chain can be described using five dihedral angles, χ1, χ2, χ3, χ2', and χ1', with cystines favouring certain combinations of these angles. 2D NMR spectroscopy is ideally suited for structure determination of disulfide-rich peptides, because of their small size and constrained nature. However, only limited information of the cystine side chain conformation can be determined by NMR spectroscopy, leading to ambiguity in the deduced 3D structures. Resolving accurate structures is important as disulfide-rich peptides have proven to be promising drug candidates in a number of fields, either as bioactive leads or scaffolds. Using a database of NMR chemical shifts combined with crystallographic structures, we have developed a method called DISH that uses support vector machines to predict the dihedral angles of cysteine side chains. It is able to successfully predict χ2 angles with 91% accuracy, and has improved performance over existing prediction methods for χ1 angles, with 87% accuracy. For 81% of cysteine residues, DISH successfully predicted both the χ1 and χ2 angles. By revisiting published solution structures of peptides determined using NMR spectroscopy, we assessed the impact of additional cystine dihedral restraints on the quality of 3D models. DISH improved the resolution and accuracy, highlighting the potential for improving the understanding of structure-activity relationships and rational development of peptide drugs.
Collapse
Affiliation(s)
- David A Armstrong
- The University of Queensland , Faculty of Medicine , School of Biomedical Sciences , Brisbane , Australia . ;
| | - Quentin Kaas
- The University of Queensland , Institute for Molecular Biosciences , Brisbane , Australia
| | - K Johan Rosengren
- The University of Queensland , Faculty of Medicine , School of Biomedical Sciences , Brisbane , Australia . ;
| |
Collapse
|
6
|
Gao Y, Wang S, Deng M, Xu J. RaptorX-Angle: real-value prediction of protein backbone dihedral angles through a hybrid method of clustering and deep learning. BMC Bioinformatics 2018; 19:100. [PMID: 29745828 PMCID: PMC5998898 DOI: 10.1186/s12859-018-2065-x] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Background Protein dihedral angles provide a detailed description of protein local conformation. Predicted dihedral angles can be used to narrow down the conformational space of the whole polypeptide chain significantly, thus aiding protein tertiary structure prediction. However, direct angle prediction from sequence alone is challenging. Results In this article, we present a novel method (named RaptorX-Angle) to predict real-valued angles by combining clustering and deep learning. Tested on a subset of PDB25 and the targets in the latest two Critical Assessment of protein Structure Prediction (CASP), our method outperforms the existing state-of-art method SPIDER2 in terms of Pearson Correlation Coefficient (PCC) and Mean Absolute Error (MAE). Our result also shows approximately linear relationship between the real prediction errors and our estimated bounds. That is, the real prediction error can be well approximated by our estimated bounds. Conclusions Our study provides an alternative and more accurate prediction of dihedral angles, which may facilitate protein structure prediction and functional study. Electronic supplementary material The online version of this article (10.1186/s12859-018-2065-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yujuan Gao
- Center for Quantitative Biology, Peking University, Beijing, China.,Toyota Technological Institute at Chicago, 6045 S Kenwood Ave., Chicago, USA
| | - Sheng Wang
- Toyota Technological Institute at Chicago, 6045 S Kenwood Ave., Chicago, USA
| | - Minghua Deng
- Center for Quantitative Biology, Peking University, Beijing, China. .,School of Mathematical Sciences, Beijing, China. .,Center for Statistical Sciences, Beijing, China.
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, 6045 S Kenwood Ave., Chicago, USA.
| |
Collapse
|
7
|
Sabirzyanov FA, Sabirzyanova TA, Rekstina VV, Adzhubei AA, Kalebina TS. C-Terminal sequence is involved in the incorporation of Bgl2p glucanosyltransglycosylase in the cell wall of Saccharomyces cerevisiae. FEMS Yeast Res 2018; 18:4768138. [PMID: 29272386 DOI: 10.1093/femsyr/fox093] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2017] [Accepted: 12/18/2017] [Indexed: 01/01/2023] Open
Abstract
A cell wall (CW) provides a protective barrier for a yeast cell and is a firm structure that nevertheless dynamically changes during cell's growth. Bgl2p is a non-covalently anchored glucanosyltransglycosylase in the CW of the yeast Saccharomyces cerevisiae. The mode of its anchorage is poorly understood, while its association with CW components is tight and resistant to 1-h treatment with 1% SDS at 37°C. In order to demarcate the potential structural block responsible for incorporation of Bgl2p into the CW, bioinformatics analysis of its sequence was performed, and a conservative structural region was identified in the C-terminal region of Bgl2p, which was absent in its homologues in S. cerevisiae, the Scw4p and Scw10p. Deletion of this region disrupted the incorporation of Bgl2p into the CW and led to release of this protein through the CW into the culture medium. Two left-handed polyproline-II helices were identified in the C-terminal region of the structure model of a wild-type Bgl2p. These helices potentially formed binding sites, which were absent in the truncated protein. Using immune fluorescence microscopy, we demonstrated that C-truncated Bgl2p was exported into culture medium and lost its ability to form fibrils described earlier. It was also shown that the C-terminal truncation of Bgl2p led to a more severe decrease of cell survivability in extreme conditions than BGL2 deletion.
Collapse
Affiliation(s)
- F A Sabirzyanov
- Department of Molecular Biology, Biological Faculty, Lomonosov Moscow State University, Moscow 119991, Russia
| | - T A Sabirzyanova
- Department of Molecular Biology, Biological Faculty, Lomonosov Moscow State University, Moscow 119991, Russia
| | - V V Rekstina
- Department of Molecular Biology, Biological Faculty, Lomonosov Moscow State University, Moscow 119991, Russia
| | - A A Adzhubei
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow 119991, Russia
| | - T S Kalebina
- Department of Molecular Biology, Biological Faculty, Lomonosov Moscow State University, Moscow 119991, Russia
| |
Collapse
|
8
|
Yellapu NK. Molecular Modelling, Dynamics, and Docking of Membrane Proteins. PHARMACEUTICAL SCIENCES 2017. [DOI: 10.4018/978-1-5225-1762-7.ch029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
Computational tools and techniques are now most popular and promising to progress the research at rapid rate. Molecular modelling studies contribute their maximum role in wide variety of disciplines especially in proteomics and drug discovery strategies. Molecular dynamics and molecular docking algorithms are now became an essential part in daily research activities of every laboratory throughout the world. These strategies are now well established and standardised to study any specific protein of interest and drug molecule. But still there exist considerable drawbacks in a special concern with membrane proteins as the presently available tools and methods cannot be applied directly to them. Modelling, dynamics and docking studies of membrane proteins need a special care and attention as several challenges are to be crossed with an intensive care to produce a reliable result. This chapter is aimed to discuss such challenges and solutions to handle membrane proteins.
Collapse
|
9
|
Faraggi E, Kloczkowski A. Accurate Prediction of One-Dimensional Protein Structure Features Using SPINE-X. Methods Mol Biol 2017; 1484:45-53. [PMID: 27787819 DOI: 10.1007/978-1-4939-6406-2_5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Accurate prediction of protein secondary structure and other one-dimensional structure features is essential for accurate sequence alignment, three-dimensional structure modeling, and function prediction. SPINE-X is a software package to predict secondary structure as well as accessible surface area and dihedral angles ϕ and ψ. For secondary structure SPINE-X achieves an accuracy of between 81 and 84 % depending on the dataset and choice of tests. The Pearson correlation coefficient for accessible surface area prediction is 0.75 and the mean absolute error from the ϕ and ψ dihedral angles are 20∘ and 33∘, respectively. The source code and a Linux executables for SPINE-X are available from Research and Information Systems at http://mamiris.com .
Collapse
Affiliation(s)
- Eshel Faraggi
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, IN, 46032, USA
- Research and Information Systems, LLC, Indianapolis, IN, USA
| | - Andrzej Kloczkowski
- Battelle Center for Mathematical Medicine, Nationwide Children's Hospital, Columbus, OH, USA
- Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH, USA
| |
Collapse
|
10
|
Abstract
More than two decades of research have enabled dihedral angle predictions at an accuracy that makes them an interesting alternative or supplement to secondary structure prediction that provides detailed local structure information for every residue of a protein. The evolution of dihedral angle prediction methods is closely linked to advancements in machine learning and other relevant technologies. Consequently recent improvements in large-scale training of deep neural networks have led to the best method currently available, which achieves a mean absolute error of 19° for phi, and 30° for psi. This performance opens interesting perspectives for the application of dihedral angle prediction in the comparison, prediction, and design of protein structures.
Collapse
Affiliation(s)
- Olav Zimmermann
- Jülich Supercomputing Centre (JSC), Institute for Advanced Simulation (IAS), Forschungszentrum Jülich GmbH, 52425, Jülich, Germany.
| |
Collapse
|
11
|
Computational Approaches to Identification of Aggregation Sites and the Mechanism of Amyloid Growth. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2015; 855:213-39. [DOI: 10.1007/978-3-319-17344-3_9] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
|
12
|
Singh H, Singh S, Raghava GPS. Evaluation of protein dihedral angle prediction methods. PLoS One 2014; 9:e105667. [PMID: 25166857 PMCID: PMC4148315 DOI: 10.1371/journal.pone.0105667] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2013] [Accepted: 07/26/2014] [Indexed: 11/30/2022] Open
Abstract
Tertiary structure prediction of a protein from its amino acid sequence is one of the major challenges in the field of bioinformatics. Hierarchical approach is one of the persuasive techniques used for predicting protein tertiary structure, especially in the absence of homologous protein structures. In hierarchical approach, intermediate states are predicted like secondary structure, dihedral angles, Cα-Cα distance bounds, etc. These intermediate states are used to restraint the protein backbone and assist its correct folding. In the recent years, several methods have been developed for predicting dihedral angles of a protein, but it is difficult to conclude which method is better than others. In this study, we benchmarked the performance of dihedral prediction methods ANGLOR and SPINE X on various datasets, including independent datasets. TANGLE dihedral prediction method was not benchmarked (due to unavailability of its standalone) and was compared with SPINE X and ANGLOR on only ANGLOR dataset on which TANGLE has reported its results. It was observed that SPINE X performed better than ANGLOR and TANGLE, especially in case of prediction of dihedral angles of glycine and proline residues. The analysis suggested that angle shifting was the foremost reason of better performance of SPINE X. We further evaluated the performance of the methods on independent ccPDB30 dataset and observed that SPINE X performed better than ANGLOR.
Collapse
Affiliation(s)
- Harinder Singh
- Bioinformatics Center, Institute of Microbial Technology, Chandigarh, India
| | - Sandeep Singh
- Bioinformatics Center, Institute of Microbial Technology, Chandigarh, India
| | | |
Collapse
|
13
|
Joseph AP, de Brevern AG. From local structure to a global framework: recognition of protein folds. J R Soc Interface 2014; 11:20131147. [PMID: 24740960 DOI: 10.1098/rsif.2013.1147] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Protein folding has been a major area of research for many years. Nonetheless, the mechanisms leading to the formation of an active biological fold are still not fully apprehended. The huge amount of available sequence and structural information provides hints to identify the putative fold for a given sequence. Indeed, protein structures prefer a limited number of local backbone conformations, some being characterized by preferences for certain amino acids. These preferences largely depend on the local structural environment. The prediction of local backbone conformations has become an important factor to correctly identifying the global protein fold. Here, we review the developments in the field of local structure prediction and especially their implication in protein fold recognition.
Collapse
Affiliation(s)
- Agnel Praveen Joseph
- Science and Technology Facilities Council, Rutherford Appleton Laboratory, Harwell Oxford, , Didcot OX11 0QX, UK
| | | |
Collapse
|
14
|
Feng Y, Lin H, Luo L. Prediction of protein secondary structure using feature selection and analysis approach. Acta Biotheor 2014; 62:1-14. [PMID: 24052343 DOI: 10.1007/s10441-013-9203-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2012] [Accepted: 08/24/2013] [Indexed: 01/09/2023]
Abstract
The prediction of the secondary structure of a protein from its amino acid sequence is an important step towards the prediction of its three-dimensional structure. However, the accuracy of ab initio secondary structure prediction from sequence is about 80% currently, which is still far from satisfactory. In this study, we proposed a novel method that uses binomial distribution to optimize tetrapeptide structural words and increment of diversity with quadratic discriminant to perform prediction for protein three-state secondary structure. A benchmark dataset including 2,640 proteins with sequence identity of less than 25% was used to train and test the proposed method. The results indicate that overall accuracy of 87.8% was achieved in secondary structure prediction by using ten-fold cross-validation. Moreover, the accuracy of predicted secondary structures ranges from 84 to 89% at the level of residue. These results suggest that the feature selection technique can detect the optimized tetrapeptide structural words which affect the accuracy of predicted secondary structures.
Collapse
|
15
|
Guilloux A, Caudron B, Jestin JL. A method to predict edge strands in beta-sheets from protein sequences. Comput Struct Biotechnol J 2013; 7:e201305001. [PMID: 24688737 PMCID: PMC3962219 DOI: 10.5936/csbj.201305001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2013] [Revised: 05/27/2013] [Accepted: 05/30/2013] [Indexed: 12/15/2022] Open
Abstract
There is a need for rules allowing three-dimensional structure information to be derived from protein sequences. In this work, consideration of an elementary protein folding step allows protein sub-sequences which optimize folding to be derived for any given protein sequence. Classical mechanics applied to this system and the energy conservation law during the elementary folding step yields an equation whose solutions are taken over the field of rational numbers. This formalism is applied to beta-sheets containing two edge strands and at least two central strands. The number of protein sub-sequences optimized for folding per amino acid in beta-strands is shown in particular to predict edge strands from protein sequences. Topological information on beta-strands and loops connecting them is derived for protein sequences with a prediction accuracy of 75%. The statistical significance of the finding is given. Applications in protein structure prediction are envisioned such as for the quality assessment of protein structure models.
Collapse
Affiliation(s)
- Antonin Guilloux
- Analyse algébrique, Institut de Mathématiques de Jussieu, Université Pierre et Marie Curie, Paris VI, France
| | - Bernard Caudron
- Centre d'Informatique pour la Biologie, Institut Pasteur, Paris, France
| | | |
Collapse
|
16
|
Bezsonov EE, Groenning M, Galzitskaya OV, Gorkovskii AA, Semisotnov GV, Selyakh IO, Ziganshin RH, Rekstina VV, Kudryashova IB, Kuznetsov SA, Kulaev IS, Kalebina TS. Amyloidogenic peptides of yeast cell wall glucantransferase Bgl2p as a model for the investigation of its pH-dependent fibril formation. Prion 2012. [PMID: 23208381 DOI: 10.4161/pri.22992] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
The pH-dependence of the ability of Bgl2p to form fibrils was studied using synthetic peptides with potential amyloidogenic determinants (PADs) predicted in the Bgl2p sequence. Three PADs, FTIFVGV, SWNVLVA and NAFS, were selected on the basis of combination of computational algorithms. Peptides AEGFTIFVGV, VDSWNVLVAG and VMANAFSYWQ, containing these PADs, were synthesized. It was demonstrated that these peptides had an ability to fibrillate at pH values from 3.2 to 5.0. The PAD-containing peptides, except for VDSWNVLVAG, could fibrillate also at pH values from pH 5.0 to 7.6. We supposed that the ability of Bgl2p to form fibrils most likely depended on the coordination of fibrillation activity of the PAD-containing areas and Bgl2p could fibrillate at mild acid and neutral pH values and lose the ability to fibrillate with the increasing of pH values. It was demonstrated that Bgl2p was able to fibrillate at pH value 5.0, to form fibrils of various morphology at neutral pH values and lost the fibrillation ability at pH value 7.6. The results obtained allowed us to suggest a new simple approach for the isolation of Bgl2p from Saccharomyces cerevisiae cell wall.
Collapse
Affiliation(s)
- Evgeny E Bezsonov
- Department of Molecular Biology, Faculty of Biology, Lomonosov Moscow State University, Moscow, Russia
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
17
|
Watanabe H, Elstner M, Steinbrecher T. Rotamer decomposition and protein dynamics: Efficiently analyzing dihedral populations from molecular dynamics. J Comput Chem 2012; 34:198-205. [DOI: 10.1002/jcc.23119] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2012] [Revised: 08/10/2012] [Accepted: 08/15/2012] [Indexed: 11/11/2022]
|
18
|
Song J, Tan H, Wang M, Webb GI, Akutsu T. TANGLE: two-level support vector regression approach for protein backbone torsion angle prediction from primary sequences. PLoS One 2012; 7:e30361. [PMID: 22319565 PMCID: PMC3271071 DOI: 10.1371/journal.pone.0030361] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2011] [Accepted: 12/14/2011] [Indexed: 12/29/2022] Open
Abstract
Protein backbone torsion angles (Phi) and (Psi) involve two rotation angles rotating around the Cα-N bond (Phi) and the Cα-C bond (Psi). Due to the planarity of the linked rigid peptide bonds, these two angles can essentially determine the backbone geometry of proteins. Accordingly, the accurate prediction of protein backbone torsion angle from sequence information can assist the prediction of protein structures. In this study, we develop a new approach called TANGLE (Torsion ANGLE predictor) to predict the protein backbone torsion angles from amino acid sequences. TANGLE uses a two-level support vector regression approach to perform real-value torsion angle prediction using a variety of features derived from amino acid sequences, including the evolutionary profiles in the form of position-specific scoring matrices, predicted secondary structure, solvent accessibility and natively disordered region as well as other global sequence features. When evaluated based on a large benchmark dataset of 1,526 non-homologous proteins, the mean absolute errors (MAEs) of the Phi and Psi angle prediction are 27.8° and 44.6°, respectively, which are 1% and 3% respectively lower than that using one of the state-of-the-art prediction tools ANGLOR. Moreover, the prediction of TANGLE is significantly better than a random predictor that was built on the amino acid-specific basis, with the p-value<1.46e-147 and 7.97e-150, respectively by the Wilcoxon signed rank test. As a complementary approach to the current torsion angle prediction algorithms, TANGLE should prove useful in predicting protein structural properties and assisting protein fold recognition by applying the predicted torsion angles as useful restraints. TANGLE is freely accessible at http://sunflower.kuicr.kyoto-u.ac.jp/~sjn/TANGLE/.
Collapse
Affiliation(s)
- Jiangning Song
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, Victoria, Australia
- National Engineering Laboratory for Industrial Enzymes and Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, China
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto, Japan
- * E-mail: (JS); (GIW); (TA)
| | - Hao Tan
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, Victoria, Australia
| | - Mingjun Wang
- National Engineering Laboratory for Industrial Enzymes and Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, China
| | - Geoffrey I. Webb
- Faculty of Information Technology, Monash University, Melbourne, Victoria, Australia
- * E-mail: (JS); (GIW); (TA)
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto, Japan
- * E-mail: (JS); (GIW); (TA)
| |
Collapse
|
19
|
Bayrak CS, Erman B. Predicting most probable conformations of a given peptide sequence in the random coil state. MOLECULAR BIOSYSTEMS 2012; 8:3010-6. [DOI: 10.1039/c2mb25181g] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
20
|
Schudoma C, Larhlimi A, Walther D. The influence of the local sequence environment on RNA loop structures. RNA (NEW YORK, N.Y.) 2011; 17:1247-57. [PMID: 21628431 PMCID: PMC3138562 DOI: 10.1261/rna.2550211] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/07/2023]
Abstract
RNA folding is assumed to be a hierarchical process. The secondary structure of an RNA molecule, signified by base-pairing and stacking interactions between the paired bases, is formed first. Subsequently, the RNA molecule adopts an energetically favorable three-dimensional conformation in the structural space determined mainly by the rotational degrees of freedom associated with the backbone of regions of unpaired nucleotides (loops). To what extent the backbone conformation of RNA loops also results from interactions within the local sequence context or rather follows global optimization constraints alone has not been addressed yet. Because the majority of base stacking interactions are exerted locally, a critical influence of local sequence on local structure appears plausible. Thus, local loop structure ought to be predictable, at least in part, from the local sequence context alone. To test this hypothesis, we used Random Forests on a nonredundant data set of unpaired nucleotides extracted from 97 X-ray structures from the Protein Data Bank (PDB) to predict discrete backbone angle conformations given by the discretized η/θ-pseudo-torsional space. Predictions on balanced sets with four to six conformational classes using local sequence information yielded average accuracies of up to 55%, thus significantly better than expected by chance (17%-25%). Bases close to the central nucleotide appear to be most tightly linked to its conformation. Our results suggest that RNA loop structure does not only depend on long-range base-pairing interactions; instead, it appears that local sequence context exerts a significant influence on the formation of the local loop structure.
Collapse
Affiliation(s)
- Christian Schudoma
- Bioinformatics Group, Max Planck Institute of Molecular Plant Physiology, 14476 Potsdam-Golm, Germany.
| | | | | |
Collapse
|
21
|
Kountouris P, Hirst JD. Prediction of backbone dihedral angles and protein secondary structure using support vector machines. BMC Bioinformatics 2009; 10:437. [PMID: 20025785 PMCID: PMC2811710 DOI: 10.1186/1471-2105-10-437] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2009] [Accepted: 12/22/2009] [Indexed: 11/26/2022] Open
Abstract
Background The prediction of the secondary structure of a protein is a critical step in the prediction of its tertiary structure and, potentially, its function. Moreover, the backbone dihedral angles, highly correlated with secondary structures, provide crucial information about the local three-dimensional structure. Results We predict independently both the secondary structure and the backbone dihedral angles and combine the results in a loop to enhance each prediction reciprocally. Support vector machines, a state-of-the-art supervised classification technique, achieve secondary structure predictive accuracy of 80% on a non-redundant set of 513 proteins, significantly higher than other methods on the same dataset. The dihedral angle space is divided into a number of regions using two unsupervised clustering techniques in order to predict the region in which a new residue belongs. The performance of our method is comparable to, and in some cases more accurate than, other multi-class dihedral prediction methods. Conclusions We have created an accurate predictor of backbone dihedral angles and secondary structure. Our method, called DISSPred, is available online at http://comp.chem.nottingham.ac.uk/disspred/.
Collapse
Affiliation(s)
- Petros Kountouris
- School of Chemistry, University of Nottingham, University Park, Nottingham NG7 2RD, UK.
| | | |
Collapse
|
22
|
Helles G, Fonseca R. Predicting dihedral angle probability distributions for protein coil residues from primary sequence using neural networks. BMC Bioinformatics 2009; 10:338. [PMID: 19835576 PMCID: PMC2771020 DOI: 10.1186/1471-2105-10-338] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2009] [Accepted: 10/16/2009] [Indexed: 11/10/2022] Open
Abstract
Background Predicting the three-dimensional structure of a protein from its amino acid sequence is currently one of the most challenging problems in bioinformatics. The internal structure of helices and sheets is highly recurrent and help reduce the search space significantly. However, random coil segments make up nearly 40% of proteins and they do not have any apparent recurrent patterns, which complicates overall prediction accuracy of protein structure prediction methods. Luckily, previous work has indicated that coil segments are in fact not completely random in structure and flanking residues do seem to have a significant influence on the dihedral angles adopted by the individual amino acids in coil segments. In this work we attempt to predict a probability distribution of these dihedral angles based on the flanking residues. While attempts to predict dihedral angles of coil segments have been done previously, none have, to our knowledge, presented comparable results for the probability distribution of dihedral angles. Results In this paper we develop an artificial neural network that uses an input-window of amino acids to predict a dihedral angle probability distribution for the middle residue in the input-window. The trained neural network shows a significant improvement (4-68%) in predicting the most probable bin (covering a 30° × 30° area of the dihedral angle space) for all amino acids in the data set compared to baseline statistics. An accuracy comparable to that of secondary structure prediction (≈ 80%) is achieved by observing the 20 bins with highest output values. Conclusion Many different protein structure prediction methods exist and each uses different tools and auxiliary predictions to help determine the native structure. In this work the sequence is used to predict local context dependent dihedral angle propensities in coil-regions. This predicted distribution can potentially improve tertiary structure prediction methods that are based on sampling the backbone dihedral angles of individual amino acids. The predicted distribution may also help predict local structure fragments used in fragment assembly methods.
Collapse
Affiliation(s)
- Glennie Helles
- University of Copenhagen, Department of Computer Science, Universitetsparken 1, 2100 Copenhagen, Denmark.
| | | |
Collapse
|
23
|
Faraggi E, Xue B, Zhou Y. Improving the prediction accuracy of residue solvent accessibility and real-value backbone torsion angles of proteins by guided-learning through a two-layer neural network. Proteins 2009; 74:847-56. [PMID: 18704931 DOI: 10.1002/prot.22193] [Citation(s) in RCA: 116] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
This article attempts to increase the prediction accuracy of residue solvent accessibility and real-value backbone torsion angles of proteins through improved learning. Most methods developed for improving the backpropagation algorithm of artificial neural networks are limited to small neural networks. Here, we introduce a guided-learning method suitable for networks of any size. The method employs a part of the weights for guiding and the other part for training and optimization. We demonstrate this technique by predicting residue solvent accessibility and real-value backbone torsion angles of proteins. In this application, the guiding factor is designed to satisfy the intuitive condition that for most residues, the contribution of a residue to the structural properties of another residue is smaller for greater separation in the protein-sequence distance between the two residues. We show that the guided-learning method makes a 2-4% reduction in 10-fold cross-validated mean absolute errors (MAE) for predicting residue solvent accessibility and backbone torsion angles, regardless of the size of database, the number of hidden layers and the size of input windows. This together with introduction of two-layer neural network with a bipolar activation function leads to a new method that has a MAE of 0.11 for residue solvent accessibility, 36 degrees for psi, and 22 degrees for phi. The method is available as a Real-SPINE 3.0 server in http://sparks.informatics.iupui.edu.
Collapse
Affiliation(s)
- Eshel Faraggi
- Indiana University School of Informatics, Indiana University-Purdue University, Indianapolis, IN 46202, USA
| | | | | |
Collapse
|
24
|
Wu S, Zhang Y. ANGLOR: a composite machine-learning algorithm for protein backbone torsion angle prediction. PLoS One 2008; 3:e3400. [PMID: 18923703 PMCID: PMC2559866 DOI: 10.1371/journal.pone.0003400] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2008] [Accepted: 09/18/2008] [Indexed: 11/20/2022] Open
Abstract
We developed a composite machine-learning based algorithm, called ANGLOR, to predict real-value protein backbone torsion angles from amino acid sequences. The input features of ANGLOR include sequence profiles, predicted secondary structure and solvent accessibility. In a large-scale benchmarking test, the mean absolute error (MAE) of the phi/psi prediction is 28°/46°, which is ∼10% lower than that generated by software in literature. The prediction is statistically different from a random predictor (or a purely secondary-structure-based predictor) with p-value <1.0×10−300 (or <1.0×10−148) by Wilcoxon signed rank test. For some residues (ILE, LEU, PRO and VAL) and especially the residues in helix and buried regions, the MAE of phi angles is much smaller (10–20°) than that in other environments. Thus, although the average accuracy of the ANGLOR prediction is still low, the portion of the accurately predicted dihedral angles may be useful in assisting protein fold recognition and ab initio 3D structure modeling.
Collapse
Affiliation(s)
- Sitao Wu
- Center for Bioinformatics and Department of Molecular Bioscience, University of Kansas, Lawrence, Kansas, United States of America
| | - Yang Zhang
- Center for Bioinformatics and Department of Molecular Bioscience, University of Kansas, Lawrence, Kansas, United States of America
- * E-mail:
| |
Collapse
|
25
|
Abstract
The backbone structure of a protein is largely determined by the phi and psi torsion angles. Thus, knowing these angles, even if approximately, will be very useful for protein-structure prediction. However, in a previous work, a sequence-based, real-value prediction of psi angle could only achieve a mean absolute error of 54 degrees (83 degrees, 35 degrees, 33 degrees for coil, strand, and helix residues, respectively) between predicted and actual angles. Moreover, a real-value prediction of phi angle is not yet available. This article employs a neural-network based approach to improve psi prediction by taking advantage of angle periodicity and apply the new method to the prediction to phi angles. The 10-fold-cross-validated mean absolute error for the new method is 38 degrees (58 degrees, 33 degrees, 22 degrees for coil, strand, and helix, respectively) for psi and 25 degrees (35 degrees, 22 degrees, 16 degrees for coil, strand, and helix, respectively) for phi. The accuracy of real-value prediction is comparable to or more accurate than the predictions based on multistate classification of the phi-psi map. More accurate prediction of real-value angles will likely be useful for improving the accuracy of fold recognition and ab initio protein-structure prediction. The Real-SPINE 2.0 server is available on the website http://sparks.informatics.iupui.edu.
Collapse
Affiliation(s)
- Bin Xue
- Indiana University School of Informatics, Indiana University-Purdue University, Indianapolis, Indiana 46202, USA
| | | | | | | |
Collapse
|
26
|
Zhang W, Liu S, Zhou Y. SP5: improving protein fold recognition by using torsion angle profiles and profile-based gap penalty model. PLoS One 2008; 3:e2325. [PMID: 18523556 PMCID: PMC2391293 DOI: 10.1371/journal.pone.0002325] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2008] [Accepted: 04/28/2008] [Indexed: 11/19/2022] Open
Abstract
How to recognize the structural fold of a protein is one of the challenges in protein structure prediction. We have developed a series of single (non-consensus) methods (SPARKS, SP(2), SP(3), SP(4)) that are based on weighted matching of two to four sequence and structure-based profiles. There is a robust improvement of the accuracy and sensitivity of fold recognition as the number of matching profiles increases. Here, we introduce a new profile-profile comparison term based on real-value dihedral torsion angles. Together with updated real-value solvent accessibility profile and a new variable gap-penalty model based on fractional power of insertion/deletion profiles, the new method (SP(5)) leads to a robust improvement over previous SP method. There is a 2% absolute increase (5% relative improvement) in alignment accuracy over SP(4) based on two independent benchmarks. Moreover, SP(5) makes 7% absolute increase (22% relative improvement) in success rate of recognizing correct structural folds, and 32% relative improvement in model accuracy of models within the same fold in Lindahl benchmark. In addition, modeling accuracy of top-1 ranked models is improved by 12% over SP(4) for the difficult targets in CASP 7 test set. These results highlight the importance of harnessing predicted structural properties in challenging remote-homolog recognition. The SP(5) server is available at http://sparks.informatics.iupui.edu.
Collapse
Affiliation(s)
- Wei Zhang
- Indiana University School of Informatics and Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indiana University-Purdue University Indianapolis, Indianapolis, Indiana, United States of America
- Institute of Applied Physics and Computational Mathematics, Beijing, People's Republic of China
| | - Song Liu
- Department of Biostatistics, Center of Excellence in Bioinformatics & Life Sciences, University at Buffalo, State University of New York, Buffalo, New York, United States of America
- Department of Biostatistics, Roswell Park Cancer Institute, Buffalo, New York, United States of America
- Howard Hughes Medical Institute Center for Single Molecule Biophysics, Department of Physiology & Biophysics, University at Buffalo, State University of New York, Buffalo, New York, United States of America
| | - Yaoqi Zhou
- Indiana University School of Informatics and Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indiana University-Purdue University Indianapolis, Indianapolis, Indiana, United States of America
| |
Collapse
|
27
|
Zimmermann O, Hansmann UH. Understanding protein folding: small proteins in silico. BIOCHIMICA ET BIOPHYSICA ACTA 2008; 1784:252-8. [PMID: 18036571 PMCID: PMC2244683 DOI: 10.1016/j.bbapap.2007.10.010] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/12/2007] [Accepted: 10/26/2007] [Indexed: 10/24/2022]
Abstract
Recent improvements in methodology and increased computer power now allow atomistic computer simulations of protein folding. We briefly review several advanced Monte Carlo algorithms that have contributed to this development. Details of folding simulations of three designed mini proteins are shown. Adding global translations and rotations has allowed us to handle multiple chains and to simulate the aggregation of six beta-amyloid fragments. In a different line of research we have developed several algorithms to predict local features from sequence. In an outlook we sketch how such biasing could extend the application spectrum of Monte Carlo simulations to structure prediction of larger proteins.
Collapse
Affiliation(s)
- Olav Zimmermann
- John von Neumann Institut für Computing, Research Centre Jülich, 52425 Jülich, Germany
| | - Ulrich H.E. Hansmann
- John von Neumann Institut für Computing, Research Centre Jülich, 52425 Jülich, Germany
- Department of Physics, Michigan Technological University, Houghton, MI 49931, U.S.A
| |
Collapse
|