1
|
Broz M, Jukič M, Bren U. Naive Prediction of Protein Backbone Phi and Psi Dihedral Angles Using Deep Learning. Molecules 2023; 28:7046. [PMID: 37894526 PMCID: PMC10609058 DOI: 10.3390/molecules28207046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 10/06/2023] [Accepted: 10/09/2023] [Indexed: 10/29/2023] Open
Abstract
Protein structure prediction represents a significant challenge in the field of bioinformatics, with the prediction of protein structures using backbone dihedral angles recently achieving significant progress due to the rise of deep neural network research. However, there is a trend in protein structure prediction research to employ increasingly complex neural networks and contributions from multiple models. This study, on the other hand, explores how a single model transparently behaves using sequence data only and what can be expected from the predicted angles. To this end, the current paper presents data acquisition, deep learning model definition, and training toward the final protein backbone angle prediction. The method applies a simple fully connected neural network (FCNN) model that takes only the primary structure of the protein with a sliding window of size 21 as input to predict protein backbone ϕ and ψ dihedral angles. Despite its simplicity, the model shows surprising accuracy for the ϕ angle prediction and somewhat lower accuracy for the ψ angle prediction. Moreover, this study demonstrates that protein secondary structure prediction is also possible with simple neural networks that take in only the protein amino-acid residue sequence, but more complex models are required for higher accuracies.
Collapse
Affiliation(s)
- Matic Broz
- Faculty of Chemistry and Chemical Engineering, University of Maribor, Smetanova ulica 17, SI-2000 Maribor, Slovenia
| | - Marko Jukič
- Faculty of Chemistry and Chemical Engineering, University of Maribor, Smetanova ulica 17, SI-2000 Maribor, Slovenia
- Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, Glagoljaška ulica 8, SI-6000 Koper, Slovenia
- Institute of Environmental Protection and Sensors, Beloruska ulica 7, SI-2000 Maribor, Slovenia
| | - Urban Bren
- Faculty of Chemistry and Chemical Engineering, University of Maribor, Smetanova ulica 17, SI-2000 Maribor, Slovenia
- Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, Glagoljaška ulica 8, SI-6000 Koper, Slovenia
- Institute of Environmental Protection and Sensors, Beloruska ulica 7, SI-2000 Maribor, Slovenia
| |
Collapse
|
2
|
Li S, Yuan L, Ma Y, Liu Y. WG-ICRN: Protein 8-state secondary structure prediction based on Wasserstein generative adversarial networks and residual networks with Inception modules. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:7721-7737. [PMID: 37161169 DOI: 10.3934/mbe.2023333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Protein secondary structure is the basis of studying the tertiary structure of proteins, drug design and development, and the 8-state protein secondary structure can provide more adequate protein information than the 3-state structure. Therefore, this paper proposes a novel method WG-ICRN for predicting protein 8-state secondary structures. First, we use the Wasserstein generative adversarial network (WGAN) to extract protein features in the position-specific scoring matrix (PSSM). The extracted features are combined with PSSM into a new feature set of WG-data, which contains richer feature information. Then, we use the residual network (ICRN) with Inception to further extract the features in WG-data and complete the prediction. Compared with the residual network, ICRN can reduce parameter calculations and increase the width of feature extraction to obtain more feature information. We evaluated the prediction performance of the model using six datasets. The experimental results show that the WGAN has excellent feature extraction capabilities, and ICRN can further improve network performance and improve prediction accuracy. Compared with four popular models, WG-ICRN achieves better prediction performance.
Collapse
Affiliation(s)
- Shun Li
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, China
| | - Lu Yuan
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, China
| | - Yuming Ma
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, China
| | - Yihui Liu
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, China
| |
Collapse
|
3
|
Yuan L, Ma Y, Liu Y. Ensemble deep learning models for protein secondary structure prediction using bidirectional temporal convolution and bidirectional long short-term memory. Front Bioeng Biotechnol 2023; 11:1051268. [PMID: 36860882 PMCID: PMC9968878 DOI: 10.3389/fbioe.2023.1051268] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 02/03/2023] [Indexed: 02/16/2023] Open
Abstract
Protein secondary structure prediction (PSSP) is a challenging task in computational biology. However, existing models with deep architectures are not sufficient and comprehensive for deep long-range feature extraction of long sequences. This paper proposes a novel deep learning model to improve Protein secondary structure prediction. In the model, our proposed bidirectional temporal convolutional network (BTCN) can extract the bidirectional deep local dependencies in protein sequences segmented by the sliding window technique, the bidirectional long short-term memory (BLSTM) network can extract the global interactions between residues, and our proposed multi-scale bidirectional temporal convolutional network (MSBTCN) can further capture the bidirectional multi-scale long-range features of residues while preserving the hidden layer information more comprehensively. In particular, we also propose that fusing the features of 3-state and 8-state Protein secondary structure prediction can further improve the prediction accuracy. Moreover, we also propose and compare multiple novel deep models by combining bidirectional long short-term memory with temporal convolutional network (TCN), reverse temporal convolutional network (RTCN), multi-scale temporal convolutional network (multi-scale bidirectional temporal convolutional network), bidirectional temporal convolutional network and multi-scale bidirectional temporal convolutional network, respectively. Furthermore, we demonstrate that the reverse prediction of secondary structure outperforms the forward prediction, suggesting that amino acids at later positions have a greater impact on secondary structure recognition. Experimental results on benchmark datasets including CASP10, CASP11, CASP12, CASP13, CASP14, and CB513 show that our methods achieve better prediction performance compared to five state-of-the-art methods.
Collapse
Affiliation(s)
- Lu Yuan
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
| | - Yuming Ma
- *Correspondence: Yuming Ma, ; Yihui Liu,
| | - Yihui Liu
- *Correspondence: Yuming Ma, ; Yihui Liu,
| |
Collapse
|
4
|
Yuan L, Ma Y, Liu Y. Protein secondary structure prediction based on Wasserstein generative adversarial networks and temporal convolutional networks with convolutional block attention modules. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:2203-2218. [PMID: 36899529 DOI: 10.3934/mbe.2023102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
As an important task in bioinformatics, protein secondary structure prediction (PSSP) is not only beneficial to protein function research and tertiary structure prediction, but also to promote the design and development of new drugs. However, current PSSP methods cannot sufficiently extract effective features. In this study, we propose a novel deep learning model WGACSTCN, which combines Wasserstein generative adversarial network with gradient penalty (WGAN-GP), convolutional block attention module (CBAM) and temporal convolutional network (TCN) for 3-state and 8-state PSSP. In the proposed model, the mutual game of generator and discriminator in WGAN-GP module can effectively extract protein features, and our CBAM-TCN local extraction module can capture key deep local interactions in protein sequences segmented by sliding window technique, and the CBAM-TCN long-range extraction module can further capture the key deep long-range interactions in sequences. We evaluate the performance of the proposed model on seven benchmark datasets. Experimental results show that our model exhibits better prediction performance compared to the four state-of-the-art models. The proposed model has strong feature extraction ability, which can extract important information more comprehensively.
Collapse
Affiliation(s)
- Lu Yuan
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, China
| | - Yuming Ma
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, China
| | - Yihui Liu
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, China
| |
Collapse
|
5
|
Yuan L, Hu X, Ma Y, Liu Y. DLBLS_SS: protein secondary structure prediction using deep learning and broad learning system. RSC Adv 2022; 12:33479-33487. [PMID: 36505696 PMCID: PMC9682407 DOI: 10.1039/d2ra06433b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Accepted: 11/16/2022] [Indexed: 11/24/2022] Open
Abstract
Protein secondary structure prediction (PSSP) is not only beneficial to the study of protein structure and function but also to the development of drugs. As a challenging task in computational biology, experimental methods for PSSP are time-consuming and expensive. In this paper, we propose a novel PSSP model DLBLS_SS based on deep learning and broad learning system (BLS) to predict 3-state and 8-state secondary structure. We first use a bidirectional long short-term memory (BLSTM) network to extract global features in residue sequences. Then, our proposed SEBTCN based on temporal convolutional networks (TCN) and channel attention can capture bidirectional key long-range dependencies in sequences. We also use BLS to rapidly optimize fused features while further capturing local interactions between residues. We conduct extensive experiments on public test sets including CASP10, CASP11, CASP12, CASP13, CASP14 and CB513 to evaluate the performance of the model. Experimental results show that our model exhibits better 3-state and 8-state PSSP performance compared to five state-of-the-art models.
Collapse
Affiliation(s)
- Lu Yuan
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences) Jinan 250353 China
| | - Xiaopei Hu
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences) Jinan 250353 China
| | - Yuming Ma
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences) Jinan 250353 China
| | - Yihui Liu
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences) Jinan 250353 China
| |
Collapse
|
6
|
Wang Q, Wei J, Zhou Y, Lin M, Ren R, Wang S, Cui S, Li Z. Prior Knowledge Facilitates Low Homologous Protein Secondary Structure Prediction with DSM Distillation. Bioinformatics 2022; 38:3574-3581. [PMID: 35652719 DOI: 10.1093/bioinformatics/btac351] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Revised: 04/15/2022] [Accepted: 05/30/2022] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Protein secondary structure prediction (PSSP) is one of the fundamental and challenging problems in the field of computational biology. Accurate PSSP relies on sufficient homologous protein sequences to build the multiple sequence alignment (MSA). Unfortunately, many proteins lack homologous sequences, which results in the low quality of MSA and poor performance. In this paper, we propose the novel DSM-Distil to tackle this issue, which takes advantage of the pretrained BERT and exploits the knowledge distillation on the newly designed dynamic scoring matrix (DSM) features. Specifically, we propose the dynamic scoring matrix (DSM) to replace the widely used profile and PSSM features. DSM could automatically dig for the suitable feature for each residue, based on the original profile. Namely, DSM-Distil not only could adapt to the low homologous proteins but also is compatible with high homologous ones. Thanks to the dynamic property, DSM could adapt to the input data much better and achieve higher performance. Moreover, to compensate for low-quality MSA, we propose to generate the pseudo-DSM from a pretrained BERT model and aggregate it with the original DSM by adaptive residue-wise fusion, which helps to build richer and more complete input features. In addition, we propose to supervise the learning of low-quality DSM features by using high-quality ones. To achieve this, a novel teacher-student model is designed to distill the knowledge from proteins with high homologous sequences to that of low ones. Combining all the proposed methods, our model achieves the new state-of-the-art performance for low homologous proteins. RESULTS Compared with the previous state-of-the-art method "Bagging", DSM-Distil achieves an improvement about 5% and 7.3% improvement for proteins with MSA count ≤ 30 and extremely low homologous cases respectively. We also compare DSM-Distil with Alphafold2 which is a state-of-the-art framework for protein structure prediction. DSM-Distil outperforms Alphafold2 by 4.1% on extremely low-quality MSA on 8-state secondary structure prediction. Moreover, we release a large-scale up-to-date test dataset BC40 for low-quality MSA structure prediction evaluation. AVAILABILITY AND IMPLEMENTATION BC40 dataset: https://drive.google.com/drive/folders/15vwRoOjAkhhwfjDk6-YoKGf4JzZXIMCHardCase dataset: https://drive.google.com/drive/folders/1BvduOr2b7cObUHy6GuEWk-aUkKJgzTUvCode: https://github.com/qinwang-ai/DSM-Distil.
Collapse
Affiliation(s)
- Qin Wang
- The Chinese University of Hong Kong(Shenzhen), Shenzhen 51800, China.,The Future Network of Intelligence Institute, Shenzhen 51800, China.,Shenzhen Research Institute of Big Data, Shenzhen 51800, China
| | - Jun Wei
- The Chinese University of Hong Kong(Shenzhen), Shenzhen 51800, China.,The Future Network of Intelligence Institute, Shenzhen 51800, China.,Shenzhen Research Institute of Big Data, Shenzhen 51800, China
| | - Yuzhe Zhou
- The Chinese University of Hong Kong(Shenzhen), Shenzhen 51800, China.,The Future Network of Intelligence Institute, Shenzhen 51800, China.,Shenzhen Research Institute of Big Data, Shenzhen 51800, China
| | | | - Ruobing Ren
- Shanghai Key Laboratory of Metabolic Remodeling and Health, Institute of Metabolism and Integrative Biology, Fudan University, Shanghai 200000, China
| | | | - Shuguang Cui
- The Chinese University of Hong Kong(Shenzhen), Shenzhen 51800, China.,The Future Network of Intelligence Institute, Shenzhen 51800, China.,Shenzhen Research Institute of Big Data, Shenzhen 51800, China
| | - Zhen Li
- The Chinese University of Hong Kong(Shenzhen), Shenzhen 51800, China.,The Future Network of Intelligence Institute, Shenzhen 51800, China.,Shenzhen Research Institute of Big Data, Shenzhen 51800, China
| |
Collapse
|
7
|
Protein secondary structure prediction using a lightweight convolutional network and label distribution aware margin loss. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2021.107771] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
|
8
|
Xu Y, Cheng J. Secondary structure prediction of protein based on multi scale convolutional attention neural networks. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2021; 18:3404-3422. [PMID: 34198392 DOI: 10.3934/mbe.2021170] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
To fully extract the local and long-range information of amino acid sequences and enhance the effective information, this research proposes a secondary structure prediction model of protein based on a multi-scale convolutional attentional neural network. The model uses a multi-channel multi-scale parallel architecture to extract amino acid structure features of different granularity according to the window size. The reconstructed feature maps are obtained via multiple convolutional attention blocks. Then, the reconstructed feature map is fused with the input feature map to obtain the enhanced feature map. Finally, the enhanced feature map is fed to the Softmax classifier for prediction. While the traditional cross-entropy loss cannot effectively solve the problem of non-equilibrium training samples, a modified correlated cross-entropy loss function may alleviate this problem. After numerous comparison and ablation experiments, it is verified that the improved model can indeed effectively extract amino acid sequence feature information, alleviate overfitting, and thus improve the overall prediction accuracy.
Collapse
Affiliation(s)
- Ying Xu
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, China
| | - Jinyong Cheng
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, China
| |
Collapse
|
9
|
Uddin MR, Mahbub S, Rahman MS, Bayzid MS. SAINT: self-attention augmented inception-inside-inception network improves protein secondary structure prediction. Bioinformatics 2021; 36:4599-4608. [PMID: 32437517 DOI: 10.1093/bioinformatics/btaa531] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2019] [Revised: 05/10/2020] [Accepted: 05/16/2020] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Protein structures provide basic insight into how they can interact with other proteins, their functions and biological roles in an organism. Experimental methods (e.g. X-ray crystallography and nuclear magnetic resonance spectroscopy) for predicting the secondary structure (SS) of proteins are very expensive and time consuming. Therefore, developing efficient computational approaches for predicting the SS of protein is of utmost importance. Advances in developing highly accurate SS prediction methods have mostly been focused on 3-class (Q3) structure prediction. However, 8-class (Q8) resolution of SS contains more useful information and is much more challenging than the Q3 prediction. RESULTS We present SAINT, a highly accurate method for Q8 structure prediction, which incorporates self-attention mechanism (a concept from natural language processing) with the Deep Inception-Inside-Inception network in order to effectively capture both the short- and long-range interactions among the amino acid residues. SAINT offers a more interpretable framework than the typical black-box deep neural network methods. Through an extensive evaluation study, we report the performance of SAINT in comparison with the existing best methods on a collection of benchmark datasets, namely, TEST2016, TEST2018, CASP12 and CASP13. Our results suggest that self-attention mechanism improves the prediction accuracy and outperforms the existing best alternate methods. SAINT is the first of its kind and offers the best known Q8 accuracy. Thus, we believe SAINT represents a major step toward the accurate and reliable prediction of SSs of proteins. AVAILABILITY AND IMPLEMENTATION SAINT is freely available as an open-source project at https://github.com/SAINTProtein/SAINT.
Collapse
Affiliation(s)
- Mostofa Rafid Uddin
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka 1205, Bangladesh.,Department of Computer Science and Engineering, East West University, Dhaka 1212, Bangladesh
| | - Sazan Mahbub
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka 1205, Bangladesh
| | - M Saifur Rahman
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka 1205, Bangladesh
| | - Md Shamsuzzoha Bayzid
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka 1205, Bangladesh
| |
Collapse
|
10
|
Kashani-Amin E, Tabatabaei-Malazy O, Sakhteman A, Larijani B, Ebrahim-Habibi A. A Systematic Review on Popularity, Application and Characteristics of Protein Secondary Structure Prediction Tools. Curr Drug Discov Technol 2020; 16:159-172. [PMID: 29493456 DOI: 10.2174/1570163815666180227162157] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2017] [Revised: 02/15/2018] [Accepted: 02/22/2018] [Indexed: 01/22/2023]
Abstract
BACKGROUND Prediction of proteins' secondary structure is one of the major steps in the generation of homology models. These models provide structural information which is used to design suitable ligands for potential medicinal targets. However, selecting a proper tool between multiple Secondary Structure Prediction (SSP) options is challenging. The current study is an insight into currently favored methods and tools, within various contexts. OBJECTIVE A systematic review was performed for a comprehensive access to recent (2013-2016) studies which used or recommended protein SSP tools. METHODS Three databases, Web of Science, PubMed and Scopus were systematically searched and 99 out of the 209 studies were finally found eligible to extract data. RESULTS Four categories of applications for 59 retrieved SSP tools were: (I) prediction of structural features of a given sequence, (II) evaluation of a method, (III) providing input for a new SSP method and (IV) integrating an SSP tool as a component for a program. PSIPRED was found to be the most popular tool in all four categories. JPred and tools utilizing PHD (Profile network from HeiDelberg) method occupied second and third places of popularity in categories I and II. JPred was only found in the two first categories, while PHD was present in three fields. CONCLUSION This study provides a comprehensive insight into the recent usage of SSP tools which could be helpful for selecting a proper tool.
Collapse
Affiliation(s)
- Elaheh Kashani-Amin
- Biosensor Research Center, Endocrinology and Metabolism Molecular-Cellular Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Ozra Tabatabaei-Malazy
- Non-Communicable Diseases Research Center, Endocrinology and Metabolism Population Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran.,Endocrinology and Metabolism Research Center, Endocrinology and Metabolism Clinical Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Amirhossein Sakhteman
- Department of Medicinal Chemistry, School of Pharmacy, Shiraz University of Medical Sciences, Shiraz, Iran.,Medicinal Chemistry and Natural Products Research Center, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Bagher Larijani
- Endocrinology and Metabolism Research Center, Endocrinology and Metabolism Clinical Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Azadeh Ebrahim-Habibi
- Biosensor Research Center, Endocrinology and Metabolism Molecular-Cellular Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran
| |
Collapse
|
11
|
Smolarczyk T, Roterman-Konieczna I, Stapor K. Protein Secondary Structure Prediction: A Review of Progress and Directions. Curr Bioinform 2020. [DOI: 10.2174/1574893614666191017104639] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Over the last few decades, a search for the theory of protein folding has
grown into a full-fledged research field at the intersection of biology, chemistry and informatics.
Despite enormous effort, there are still open questions and challenges, like understanding the rules
by which amino acid sequence determines protein secondary structure.
Objective:
In this review, we depict the progress of the prediction methods over the years and
identify sources of improvement.
Methods:
The protein secondary structure prediction problem is described followed by the discussion
on theoretical limitations, description of the commonly used data sets, features and a review
of three generations of methods with the focus on the most recent advances. Additionally, methods
with available online servers are assessed on the independent data set.
Results:
The state-of-the-art methods are currently reaching almost 88% for 3-class prediction and
76.5% for an 8-class prediction.
Conclusion:
This review summarizes recent advances and outlines further research directions.
Collapse
Affiliation(s)
- Tomasz Smolarczyk
- Institute of Informatics, Silesian University of Technology, Gliwice, Poland
| | - Irena Roterman-Konieczna
- Department of Bioinformatics and Telemedicine, Jagiellonian University Medical College, Krakow, Poland
| | - Katarzyna Stapor
- Institute of Informatics, Silesian University of Technology, Gliwice, Poland
| |
Collapse
|
12
|
Ludwiczak J, Winski A, da Silva Neto AM, Szczepaniak K, Alva V, Dunin-Horkawicz S. PiPred - a deep-learning method for prediction of π-helices in protein sequences. Sci Rep 2019; 9:6888. [PMID: 31053765 PMCID: PMC6499831 DOI: 10.1038/s41598-019-43189-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2018] [Accepted: 04/16/2019] [Indexed: 11/17/2022] Open
Abstract
Canonical π-helices are short, relatively unstable secondary structure elements found in proteins. They comprise seven or more residues and are present in 15% of all known protein structures, often in functionally important regions such as ligand- and ion-binding sites. Given their similarity to α-helices, the prediction of π-helices is a challenging task and none of the currently available secondary structure prediction methods tackle it. Here, we present PiPred, a neural network-based tool for predicting π-helices in protein sequences. By performing a rigorous benchmark we show that PiPred can detect π-helices with a per-residue precision of 48% and sensitivity of 46%. Interestingly, some of the α-helices mispredicted by PiPred as π-helices exhibit a geometry characteristic of π-helices. Also, despite being trained only with canonical π-helices, PiPred can identify 6-residue-long α/π-bulges. These observations suggest an even higher effective precision of the method and demonstrate that π-helices, α/π-bulges, and other helical deformations may impose similar constraints on sequences. PiPred is freely accessible at: https://toolkit.tuebingen.mpg.de/#/tools/quick2d. A standalone version is available for download at: https://github.com/labstructbioinf/PiPred, where we also provide the CB6133, CB513, CASP10, and CASP11 datasets, commonly used for training and validation of secondary structure prediction methods, with correctly annotated π-helices.
Collapse
Affiliation(s)
- Jan Ludwiczak
- Laboratory of Structural Bioinformatics, Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097, Warsaw, Poland.,Laboratory of Bioinformatics, Nencki Institute of Experimental Biology, Pasteura 3, 02-093, Warsaw, Poland
| | - Aleksander Winski
- Laboratory of Structural Bioinformatics, Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097, Warsaw, Poland
| | - Antonio Marinho da Silva Neto
- Laboratory of Structural Bioinformatics, Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097, Warsaw, Poland
| | - Krzysztof Szczepaniak
- Laboratory of Structural Bioinformatics, Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097, Warsaw, Poland
| | - Vikram Alva
- Department of Protein Evolution, Max-Planck-Institute for Developmental Biology, Max-Planck-Ring 5, 72076, Tübingen, Germany
| | - Stanislaw Dunin-Horkawicz
- Laboratory of Structural Bioinformatics, Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097, Warsaw, Poland.
| |
Collapse
|
13
|
Zhou J, Wang H, Zhao Z, Xu R, Lu Q. CNNH_PSS: protein 8-class secondary structure prediction by convolutional neural network with highway. BMC Bioinformatics 2018; 19:60. [PMID: 29745837 PMCID: PMC5998876 DOI: 10.1186/s12859-018-2067-8] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Protein secondary structure is the three dimensional form of local segments of proteins and its prediction is an important problem in protein tertiary structure prediction. Developing computational approaches for protein secondary structure prediction is becoming increasingly urgent. RESULTS We present a novel deep learning based model, referred to as CNNH_PSS, by using multi-scale CNN with highway. In CNNH_PSS, any two neighbor convolutional layers have a highway to deliver information from current layer to the output of the next one to keep local contexts. As lower layers extract local context while higher layers extract long-range interdependencies, the highways between neighbor layers allow CNNH_PSS to have ability to extract both local contexts and long-range interdependencies. We evaluate CNNH_PSS on two commonly used datasets: CB6133 and CB513. CNNH_PSS outperforms the multi-scale CNN without highway by at least 0.010 Q8 accuracy and also performs better than CNF, DeepCNF and SSpro8, which cannot extract long-range interdependencies, by at least 0.020 Q8 accuracy, demonstrating that both local contexts and long-range interdependencies are indeed useful for prediction. Furthermore, CNNH_PSS also performs better than GSM and DCRNN which need extra complex model to extract long-range interdependencies. It demonstrates that CNNH_PSS not only cost less computer resource, but also achieves better predicting performance. CONCLUSION CNNH_PSS have ability to extracts both local contexts and long-range interdependencies by combing multi-scale CNN and highway network. The evaluations on common datasets and comparisons with state-of-the-art methods indicate that CNNH_PSS is an useful and efficient tool for protein secondary structure prediction.
Collapse
Affiliation(s)
- Jiyun Zhou
- School Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, HIT Campus Shenzhen University Town, Xili, Shenzhen, Guangdong 518055 China
- Department of Computing, the Hong Kong Polytechnic University, Hung Hom, Hong Kong
| | - Hongpeng Wang
- School Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, HIT Campus Shenzhen University Town, Xili, Shenzhen, Guangdong 518055 China
| | - Zhishan Zhao
- School Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, HIT Campus Shenzhen University Town, Xili, Shenzhen, Guangdong 518055 China
| | - Ruifeng Xu
- School Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, HIT Campus Shenzhen University Town, Xili, Shenzhen, Guangdong 518055 China
| | - Qin Lu
- Department of Computing, the Hong Kong Polytechnic University, Hung Hom, Hong Kong
| |
Collapse
|
14
|
Yang Y, Gao J, Wang J, Heffernan R, Hanson J, Paliwal K, Zhou Y. Sixty-five years of the long march in protein secondary structure prediction: the final stretch? Brief Bioinform 2018; 19:482-494. [PMID: 28040746 PMCID: PMC5952956 DOI: 10.1093/bib/bbw129] [Citation(s) in RCA: 84] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2016] [Revised: 11/15/2016] [Indexed: 11/13/2022] Open
Abstract
Protein secondary structure prediction began in 1951 when Pauling and Corey predicted helical and sheet conformations for protein polypeptide backbone even before the first protein structure was determined. Sixty-five years later, powerful new methods breathe new life into this field. The highest three-state accuracy without relying on structure templates is now at 82-84%, a number unthinkable just a few years ago. These improvements came from increasingly larger databases of protein sequences and structures for training, the use of template secondary structure information and more powerful deep learning techniques. As we are approaching to the theoretical limit of three-state prediction (88-90%), alternative to secondary structure prediction (prediction of backbone torsion angles and Cα-atom-based angles and torsion angles) not only has more room for further improvement but also allows direct prediction of three-dimensional fragment structures with constantly improved accuracy. About 20% of all 40-residue fragments in a database of 1199 non-redundant proteins have <6 Å root-mean-squared distance from the native conformations by SPIDER2. More powerful deep learning methods with improved capability of capturing long-range interactions begin to emerge as the next generation of techniques for secondary structure prediction. The time has come to finish off the final stretch of the long march towards protein secondary structure prediction.
Collapse
Affiliation(s)
- Yuedong Yang
- Insitute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, QLD, Australia
| | - Jianzhao Gao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, China
| | - Jihua Wang
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, China
| | - Rhys Heffernan
- Signal Processing Laboratory, Griffith University, Brisbane, Australia
| | - Jack Hanson
- Signal Processing Laboratory, Griffith University, Brisbane, Australia
| | - Kuldip Paliwal
- Signal Processing Laboratory, Griffith University, Brisbane, Australia
| | - Yaoqi Zhou
- Insitute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, QLD, Australia
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, China
| |
Collapse
|
15
|
Xie S, Li Z, Hu H. Protein secondary structure prediction based on the fuzzy support vector machine with the hyperplane optimization. Gene 2017; 642:74-83. [PMID: 29104167 DOI: 10.1016/j.gene.2017.11.005] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2017] [Revised: 10/29/2017] [Accepted: 11/02/2017] [Indexed: 11/30/2022]
Abstract
The prediction of the protein secondary structure is a crucial point in bioinformatics and related fields. In the last years, machine learning methods have become a valuable tool, achieving satisfactory results. However, the prediction accuracy needs to be further ameliorated. This paper proposes a new method based on an improved fuzzy support vector machine (FSVM) for the prediction of the secondary structure of proteins. Unlike traditional methods to set the membership function, it firstly constructs an approximate optimal separating hyperplane by iterating the class centers in the feature space. Then sample points close to this hyperplane are assigned with large membership values, while outliers with small membership values according to the K-nearest neighbor. And some sample points with low membership values are removed, reducing the training time and improving the prediction accuracy. To optimize the prediction results, our method also exploits information on sequence-based structural similarity. We used three databases (e.g. RS126, CB513 and data1199) to test this method, showing the achievement of 94.2%, 93.1%, 96.7% Q3 accuracy and 91.7%, 89.7%, 94.1% SOV values for the three datasets, respectively. Overall, our method results are comparable to or often better than commonly used methods (Magnan & Baldi, 2014; Sheng et al., 2016) for secondary structure prediction.
Collapse
Affiliation(s)
- Shangxin Xie
- School of Science, Zhejiang Sci-Tech University, Hangzhou, Zhejiang, 310018, China
| | - Zhong Li
- School of Science, Zhejiang Sci-Tech University, Hangzhou, Zhejiang, 310018, China.
| | - Hailong Hu
- School of Science, Zhejiang Sci-Tech University, Hangzhou, Zhejiang, 310018, China; School of Science, Zhejiang A&F University, Lin'an, Zhejiang 311300, China
| |
Collapse
|
16
|
In silico structural characterization of protein targets for drug development against Trypanosoma cruzi. J Mol Model 2016; 22:244. [DOI: 10.1007/s00894-016-3115-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2016] [Accepted: 09/02/2016] [Indexed: 10/21/2022]
|
17
|
Yaseen A, Nijim M, Williams B, Qian L, Li M, Wang J, Li Y. FLEXc: protein flexibility prediction using context-based statistics, predicted structural features, and sequence information. BMC Bioinformatics 2016; 17 Suppl 8:281. [PMID: 27587065 PMCID: PMC5009531 DOI: 10.1186/s12859-016-1117-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Background The fluctuation of atoms around their average positions in protein structures provides important information regarding protein dynamics. This flexibility of protein structures is associated with various biological processes. Predicting flexibility of residues from protein sequences is significant for analyzing the dynamic properties of proteins which will be helpful in predicting their functions. Results In this paper, an approach of improving the accuracy of protein flexibility prediction is introduced. A neural network method for predicting flexibility in 3 states is implemented. The method incorporates sequence and evolutionary information, context-based scores, predicted secondary structures and solvent accessibility, and amino acid properties. Context-based statistical scores are derived, using the mean-field potentials approach, for describing the different preferences of protein residues in flexibility states taking into consideration their amino acid context. The 7-fold cross validated accuracy reached 61 % when context-based scores and predicted structural states are incorporated in the training process of the flexibility predictor. Conclusions Incorporating context-based statistical scores with predicted structural states are important features to improve the performance of predicting protein flexibility, as shown by our computational results. Our prediction method is implemented as web service called “FLEXc” and available online at: http://hpcr.cs.odu.edu/flexc.
Collapse
Affiliation(s)
- Ashraf Yaseen
- Department of Electrical Engineering & Computer Science, Texas A&M University-Kingsville, Kingsville, TX, 78363, USA.
| | - Mais Nijim
- Department of Electrical Engineering & Computer Science, Texas A&M University-Kingsville, Kingsville, TX, 78363, USA
| | - Brandon Williams
- Department of Mathematics & Computer Science, Fisk University, Nashville, TN, 37208, USA
| | - Lei Qian
- Department of Mathematics & Computer Science, Fisk University, Nashville, TN, 37208, USA
| | - Min Li
- School of Information Science and Engineering, Central South University, Changsha, 410083, China
| | - Jianxin Wang
- School of Information Science and Engineering, Central South University, Changsha, 410083, China
| | - Yaohang Li
- Department of Computer Science, Old Dominion University, Norfolk, VA, 23529, USA
| |
Collapse
|
18
|
The influence of flanking secondary structures on amino Acid content and typical lengths of 3/10 helices. INTERNATIONAL JOURNAL OF PROTEOMICS 2014; 2014:360230. [PMID: 25371821 PMCID: PMC4211214 DOI: 10.1155/2014/360230] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/15/2014] [Revised: 09/19/2014] [Accepted: 09/27/2014] [Indexed: 11/25/2022]
Abstract
We used 3D structures of a highly redundant set of bacterial proteins encoded by genes of high, average, and low GC-content. Four types of connecting bridges—regions situated between any of two major elements of secondary structure (alpha helices and beta strands)—containing a pure random coil were compared with connecting bridges containing 3/10 helices. We included discovered trends in the original “VVTAK Connecting Bridges” algorithm, which is able to predict more probable conformation for a given connecting bridge. The highest number of significant differences in amino acid usage was found between 3/10 helices containing bridges connecting two beta strands (they have increased Phe, Tyr, Met, Ile, Leu, Val, and His usages but decreased usages of Asp, Asn, Gly, and Pro) and those without 3/10 helices. The typical (most common) length of 3/10 helices situated between two beta strands and between beta strand and alpha helix is equal to 5 amino acid residues. The preferred length of 3/10 helices situated between alpha helix and beta strand is equal to 3 residues. For 3/10 helices situated between two alpha helices, both lengths (3 and 5 amino acid residues) are typical.
Collapse
|