1
|
Ahmed E, Jain R, Schlatzer D, Tavares Pereira Lopes FB, Kiselar J, Lodowski DT, Chance MR, Farquhar ER. Quantitative readout of methionine residue solvent accessibility in E. coli cells using radiolytic hydroxyl radical labeling and mass spectrometry. Biochem Biophys Res Commun 2025; 762:151745. [PMID: 40199130 DOI: 10.1016/j.bbrc.2025.151745] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2024] [Revised: 03/18/2025] [Accepted: 04/01/2025] [Indexed: 04/10/2025]
Abstract
Reactive oxygen species play a crucial role in cellular processes, but their effects on protein structure and function in vivo remain challenging to study. Here, we present an approach using synchrotron-based X-ray footprinting methods to probe protein structure, via quantitative LC-coupled mass spectrometry of methionine oxidation (MSOx) in live E. coli. A label-free proteomic analysis identified 2104 proteins from E. coli, with 465 proteins exhibiting MSOx modifications distributed across multiple cellular compartments. Changes in MSOx modification with increasing X-ray dose revealed a correlation between rates of modification and solvent-accessible surface area in vivo for selected proteins responsive to exposure, providing a direct probe of protein structure and its conformational plasticity in the cell. The approach developed here offers a unique in-cell quantitative readout of methionine oxidation and solvent accessibility through radiolytic hydroxyl radical labeling. With this method, the landscape of methionine oxidation in E. coli can be mapped, providing insights into protein behavior under oxidative stress. It represents a first step in developing radiolysis and E. coli as platforms for in vivo protein structure assessment. The potential applications in drug discovery, protein engineering, and systems biology of protein conformations are considerable.
Collapse
Affiliation(s)
- Ezaz Ahmed
- Center for Synchrotron Biosciences, Case Western Reserve University, School of Medicine, 10900 Euclid Avenue, Cleveland, OH, 44106, USA; Department of Nutrition, Case Western Reserve University, School of Medicine, 10900 Euclid Ave., Cleveland, OH, 44106, USA
| | - Rohit Jain
- Center for Synchrotron Biosciences, Case Western Reserve University, School of Medicine, 10900 Euclid Avenue, Cleveland, OH, 44106, USA; Department of Nutrition, Case Western Reserve University, School of Medicine, 10900 Euclid Ave., Cleveland, OH, 44106, USA
| | - Daniela Schlatzer
- Center for Proteomics and Bioinformatics, Case Western Reserve University, School of Medicine, 10900 Euclid Ave., Cleveland, OH, 44106, USA
| | - Filipa Blasco Tavares Pereira Lopes
- Department of Nutrition, Case Western Reserve University, School of Medicine, 10900 Euclid Ave., Cleveland, OH, 44106, USA; Center for Proteomics and Bioinformatics, Case Western Reserve University, School of Medicine, 10900 Euclid Ave., Cleveland, OH, 44106, USA
| | - Janna Kiselar
- Department of Nutrition, Case Western Reserve University, School of Medicine, 10900 Euclid Ave., Cleveland, OH, 44106, USA; Center for Proteomics and Bioinformatics, Case Western Reserve University, School of Medicine, 10900 Euclid Ave., Cleveland, OH, 44106, USA
| | - David T Lodowski
- Department of Nutrition, Case Western Reserve University, School of Medicine, 10900 Euclid Ave., Cleveland, OH, 44106, USA; Center for Proteomics and Bioinformatics, Case Western Reserve University, School of Medicine, 10900 Euclid Ave., Cleveland, OH, 44106, USA
| | - Mark R Chance
- Center for Synchrotron Biosciences, Case Western Reserve University, School of Medicine, 10900 Euclid Avenue, Cleveland, OH, 44106, USA; Department of Nutrition, Case Western Reserve University, School of Medicine, 10900 Euclid Ave., Cleveland, OH, 44106, USA; Center for Proteomics and Bioinformatics, Case Western Reserve University, School of Medicine, 10900 Euclid Ave., Cleveland, OH, 44106, USA.
| | - Erik R Farquhar
- Center for Synchrotron Biosciences, Case Western Reserve University, School of Medicine, 10900 Euclid Avenue, Cleveland, OH, 44106, USA; Department of Nutrition, Case Western Reserve University, School of Medicine, 10900 Euclid Ave., Cleveland, OH, 44106, USA.
| |
Collapse
|
2
|
Zhang J, Qian J, Zou Q, Zhou F, Kurgan L. Recent Advances in Computational Prediction of Secondary and Supersecondary Structures from Protein Sequences. Methods Mol Biol 2025; 2870:1-19. [PMID: 39543027 DOI: 10.1007/978-1-0716-4213-9_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2024]
Abstract
The secondary structures (SSs) and supersecondary structures (SSSs) underlie the three-dimensional structure of proteins. Prediction of the SSs and SSSs from protein sequences enjoys high levels of use and finds numerous applications in the development of a broad range of other bioinformatics tools. Numerous sequence-based predictors of SS and SSS were developed and published in recent years. We survey and analyze 45 SS predictors that were released since 2018, focusing on their inputs, predictive models, scope of their prediction, and availability. We also review 32 sequence-based SSS predictors, which primarily focus on predicting coiled coils and beta-hairpins and which include five methods that were published since 2018. Substantial majority of these predictive tools rely on machine learning models, including a variety of deep neural network architectures. They also frequently use evolutionary sequence profiles. We discuss details of several modern SS and SSS predictors that are currently available to the users and which were published in higher impact venues.
Collapse
Affiliation(s)
- Jian Zhang
- School of Computer and Information Technology, Xinyang Normal University, Xinyang, China.
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China.
| | - Jingjing Qian
- School of Computer and Information Technology, Xinyang Normal University, Xinyang, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Feng Zhou
- School of Computer and Information Technology, Xinyang Normal University, Xinyang, China
| | - Lukasz Kurgan
- Department of Computer Science, College of Engineering, Virginia Commonwealth University, Virginia, VA, USA.
| |
Collapse
|
3
|
Oldfield CJ, Chen K, Kurgan L. Computational Prediction of Secondary and Supersecondary Structures from Protein Sequences. Methods Mol Biol 2019; 1958:73-100. [PMID: 30945214 DOI: 10.1007/978-1-4939-9161-7_4] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Many new methods for the sequence-based prediction of the secondary and supersecondary structures have been developed over the last several years. These and older sequence-based predictors are widely applied for the characterization and prediction of protein structure and function. These efforts have produced countless accurate predictors, many of which rely on state-of-the-art machine learning models and evolutionary information generated from multiple sequence alignments. We describe and motivate both types of predictions. We introduce concepts related to the annotation and computational prediction of the three-state and eight-state secondary structure as well as several types of supersecondary structures, such as β hairpins, coiled coils, and α-turn-α motifs. We review 34 predictors focusing on recent tools and provide detailed information for a selected set of 14 secondary structure and 3 supersecondary structure predictors. We conclude with several practical notes for the end users of these predictive methods.
Collapse
Affiliation(s)
- Christopher J Oldfield
- Department of Computer Science, College of Engineering, Virginia Commonwealth University, Richmond, VA, USA
| | - Ke Chen
- School of Computer Science and Software Engineering, Tianjin Polytechnic University, Tianjin, People's Republic of China
| | - Lukasz Kurgan
- Department of Computer Science, College of Engineering, Virginia Commonwealth University, Richmond, VA, USA.
| |
Collapse
|
4
|
Tsubaki M, Shimbo M, Matsumoto Y. Protein Fold Recognition with Representation Learning and Long Short-Term Memory. ACTA ACUST UNITED AC 2017. [DOI: 10.2197/ipsjtbio.10.2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Affiliation(s)
- Masashi Tsubaki
- Graduate School of Information Science, Nara Institute of Science and Technology
| | - Masashi Shimbo
- Graduate School of Information Science, Nara Institute of Science and Technology
| | - Yuji Matsumoto
- Graduate School of Information Science, Nara Institute of Science and Technology
| |
Collapse
|
5
|
Improving protein fold recognition and structural class prediction accuracies using physicochemical properties of amino acids. J Theor Biol 2016; 402:117-28. [PMID: 27164998 DOI: 10.1016/j.jtbi.2016.05.002] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2016] [Revised: 04/20/2016] [Accepted: 05/02/2016] [Indexed: 11/24/2022]
Abstract
Predicting the three-dimensional (3-D) structure of a protein is an important task in the field of bioinformatics and biological sciences. However, directly predicting the 3-D structure from the primary structure is hard to achieve. Therefore, predicting the fold or structural class of a protein sequence is generally used as an intermediate step in determining the protein's 3-D structure. For protein fold recognition (PFR) and structural class prediction (SCP), two steps are required - feature extraction step and classification step. Feature extraction techniques generally utilize syntactical-based information, evolutionary-based information and physicochemical-based information to extract features. In this study, we explore the importance of utilizing the physicochemical properties of amino acids for improving PFR and SCP accuracies. For this, we propose a Forward Consecutive Search (FCS) scheme which aims to strategically select physicochemical attributes that will supplement the existing feature extraction techniques for PFR and SCP. An exhaustive search is conducted on all the existing 544 physicochemical attributes using the proposed FCS scheme and a subset of physicochemical attributes is identified. Features extracted from these selected attributes are then combined with existing syntactical-based and evolutionary-based features, to show an improvement in the recognition and prediction performance on benchmark datasets.
Collapse
|
6
|
A Multifeatures Fusion and Discrete Firefly Optimization Method for Prediction of Protein Tyrosine Sulfation Residues. BIOMED RESEARCH INTERNATIONAL 2016; 2016:8151509. [PMID: 27034949 PMCID: PMC4806266 DOI: 10.1155/2016/8151509] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/12/2015] [Revised: 01/26/2016] [Accepted: 02/14/2016] [Indexed: 12/21/2022]
Abstract
Tyrosine sulfation is one of the ubiquitous protein posttranslational modifications, where some sulfate groups are added to the tyrosine residues. It plays significant roles in various physiological processes in eukaryotic cells. To explore the molecular mechanism of tyrosine sulfation, one of the prerequisites is to correctly identify possible protein tyrosine sulfation residues. In this paper, a novel method was presented to predict protein tyrosine sulfation residues from primary sequences. By means of informative feature construction and elaborate feature selection and parameter optimization scheme, the proposed predictor achieved promising results and outperformed many other state-of-the-art predictors. Using the optimal features subset, the proposed method achieved mean MCC of 94.41% on the benchmark dataset, and a MCC of 90.09% on the independent dataset. The experimental performance indicated that our new proposed method could be effective in identifying the important protein posttranslational modifications and the feature selection scheme would be powerful in protein functional residues prediction research fields.
Collapse
|
7
|
Sharma R, Dehzangi A, Lyons J, Paliwal K, Tsunoda T, Sharma A. Predict Gram-Positive and Gram-Negative Subcellular Localization via Incorporating Evolutionary Information and Physicochemical Features Into Chou's General PseAAC. IEEE Trans Nanobioscience 2015; 14:915-26. [DOI: 10.1109/tnb.2015.2500186] [Citation(s) in RCA: 71] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
8
|
Wei L, Liao M, Gao X, Zou Q. Enhanced Protein Fold Prediction Method Through a Novel Feature Extraction Technique. IEEE Trans Nanobioscience 2015; 14:649-59. [DOI: 10.1109/tnb.2015.2450233] [Citation(s) in RCA: 81] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
9
|
Saini H, Raicar G, Sharma A, Lal S, Dehzangi A, Lyons J, Paliwal KK, Imoto S, Miyano S. Probabilistic expression of spatially varied amino acid dimers into general form of Chou׳s pseudo amino acid composition for protein fold recognition. J Theor Biol 2015; 380:291-8. [PMID: 26079221 DOI: 10.1016/j.jtbi.2015.05.030] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2015] [Revised: 04/28/2015] [Accepted: 05/21/2015] [Indexed: 11/15/2022]
Abstract
BACKGROUND Identification of the tertiary structure (3D structure) of a protein is a fundamental problem in biology which helps in identifying its functions. Predicting a protein׳s fold is considered to be an intermediate step for identifying the tertiary structure of a protein. Computational methods have been applied to determine a protein׳s fold by assembling information from its structural, physicochemical and/or evolutionary properties. METHODS In this study, we propose a scheme in which a feature extraction technique that extracts probabilistic expressions of amino acid dimers, which have varying degree of spatial separation in the primary sequences of proteins, from the Position Specific Scoring Matrix (PSSM). SVM classifier is used to create a model from extracted features for fold recognition. RESULTS The performance of the proposed scheme is evaluated against three benchmarked datasets, namely the Ding and Dubchak, Extended Ding and Dubchak, and Taguchi and Gromiha datasets. CONCLUSIONS The proposed scheme performed well in the experiments conducted, providing improvements over previously published results in literature.
Collapse
Affiliation(s)
| | | | - Alok Sharma
- University of the South Pacific, Fiji; Griffith University, Brisbane, Australia.
| | - Sunil Lal
- University of the South Pacific, Fiji.
| | | | | | | | - Seiya Imoto
- Human Genome Center, University of Tokyo, Japan.
| | | |
Collapse
|
10
|
Paliwal KK, Sharma A, Lyons J, Dehzangi A. Improving protein fold recognition using the amalgamation of evolutionary-based and structural based information. BMC Bioinformatics 2014; 15 Suppl 16:S12. [PMID: 25521502 PMCID: PMC4290640 DOI: 10.1186/1471-2105-15-s16-s12] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Deciphering three dimensional structure of a protein sequence is a challenging task in biological science. Protein fold recognition and protein secondary structure prediction are transitional steps in identifying the three dimensional structure of a protein. For protein fold recognition, evolutionary-based information of amino acid sequences from the position specific scoring matrix (PSSM) has been recently applied with improved results. On the other hand, the SPINE-X predictor has been developed and applied for protein secondary structure prediction. Several reported methods for protein fold recognition have only limited accuracy. In this paper, we have developed a strategy of combining evolutionary-based information (from PSSM) and predicted secondary structure using SPINE-X to improve protein fold recognition. The strategy is based on finding the probabilities of amino acid pairs (AAP). The proposed method has been tested on several protein benchmark datasets and an improvement of 8.9% recognition accuracy has been achieved. We have achieved, for the first time over 90% and 75% prediction accuracies for sequence similarity values below 40% and 25%, respectively. We also obtain 90.6% and 77.0% prediction accuracies, respectively, for the Extended Ding and Dubchak and Taguchi and Gromiha benchmark protein fold recognition datasets widely used for in the literature.
Collapse
|
11
|
Paliwal KK, Sharma A, Lyons J, Dehzangi A. A tri-gram based feature extraction technique using linear probabilities of position specific scoring matrix for protein fold recognition. IEEE Trans Nanobioscience 2014; 13:44-50. [PMID: 24594513 DOI: 10.1109/tnb.2013.2296050] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
In biological sciences, the deciphering of a three dimensional structure of a protein sequence is considered to be an important and challenging task. The identification of protein folds from primary protein sequences is an intermediate step in discovering the three dimensional structure of a protein. This can be done by utilizing feature extraction technique to accurately extract all the relevant information followed by employing a suitable classifier to label an unknown protein. In the past, several feature extraction techniques have been developed but with limited recognition accuracy only. In this study, we have developed a feature extraction technique based on tri-grams computed directly from Position Specific Scoring Matrices. The effectiveness of the feature extraction technique has been shown on two benchmark datasets. The proposed technique exhibits up to 4.4% improvement in protein fold recognition accuracy compared to the state-of-the-art feature extraction techniques.
Collapse
|
12
|
Lyons J, Biswas N, Sharma A, Dehzangi A, Paliwal KK. Protein fold recognition by alignment of amino acid residues using kernelized dynamic time warping. J Theor Biol 2014; 354:137-45. [DOI: 10.1016/j.jtbi.2014.03.033] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2013] [Revised: 03/05/2014] [Accepted: 03/21/2014] [Indexed: 01/21/2023]
|
13
|
Saini H, Raicar G, Sharma A, Lal S, Dehzangi A, Ananthanarayanan R, Lyons J, Biswas N, Paliwal KK. Protein Structural Class Prediction viak-Separated Bigrams Using Position Specific Scoring Matrix. JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS 2014. [DOI: 10.20965/jaciii.2014.p0474] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Protein structural class prediction (SCP) is as important task in identifying protein tertiary structure and protein functions. In this study, we propose a feature extraction technique to predict secondary structures. The technique utilizes bigram (of adjacent andk-separated amino acids) information derived from Position Specific Scoring Matrix (PSSM). The technique has shown promising results when evaluated on benchmarked Ding and Dubchak dataset.
Collapse
|
14
|
Quad-PRE: a hybrid method to predict protein quaternary structure attributes. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2014; 2014:715494. [PMID: 24963340 PMCID: PMC4052169 DOI: 10.1155/2014/715494] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/27/2014] [Revised: 04/24/2014] [Accepted: 04/27/2014] [Indexed: 11/17/2022]
Abstract
The protein quaternary structure is very important to the biological process. Predicting their attributes is an essential task in computational biology for the advancement of the proteomics. However, the existing methods did not consider sufficient properties of amino acid. To end this, we proposed a hybrid method Quad-PRE to predict protein quaternary structure attributes using the properties of amino acid, predicted secondary structure, predicted relative solvent accessibility, and position-specific scoring matrix profiles and motifs. Empirical evaluation on independent dataset shows that Quad-PRE achieved higher overall accuracy 81.7%, especially higher accuracy 92.8%, 93.3%, and 90.6% on discrimination for trimer, hexamer, and octamer, respectively. Our model also reveals that six features sets are all important to the prediction, and a hybrid method is an optimal strategy by now. The results indicate that the proposed method can classify protein quaternary structure attributes effectively.
Collapse
|
15
|
Sequence based prediction of DNA-binding proteins based on hybrid feature selection using random forest and Gaussian naïve Bayes. PLoS One 2014; 9:e86703. [PMID: 24475169 PMCID: PMC3901691 DOI: 10.1371/journal.pone.0086703] [Citation(s) in RCA: 115] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2013] [Accepted: 12/10/2013] [Indexed: 11/22/2022] Open
Abstract
Developing an efficient method for determination of the DNA-binding proteins, due to their vital roles in gene regulation, is becoming highly desired since it would be invaluable to advance our understanding of protein functions. In this study, we proposed a new method for the prediction of the DNA-binding proteins, by performing the feature rank using random forest and the wrapper-based feature selection using forward best-first search strategy. The features comprise information from primary sequence, predicted secondary structure, predicted relative solvent accessibility, and position specific scoring matrix. The proposed method, called DBPPred, used Gaussian naïve Bayes as the underlying classifier since it outperformed five other classifiers, including decision tree, logistic regression, k-nearest neighbor, support vector machine with polynomial kernel, and support vector machine with radial basis function. As a result, the proposed DBPPred yields the highest average accuracy of 0.791 and average MCC of 0.583 according to the five-fold cross validation with ten runs on the training benchmark dataset PDB594. Subsequently, blind tests on the independent dataset PDB186 by the proposed model trained on the entire PDB594 dataset and by other five existing methods (including iDNA-Prot, DNA-Prot, DNAbinder, DNABIND and DBD-Threader) were performed, resulting in that the proposed DBPPred yielded the highest accuracy of 0.769, MCC of 0.538, and AUC of 0.790. The independent tests performed by the proposed DBPPred on completely a large non-DNA binding protein dataset and two RNA binding protein datasets also showed improved or comparable quality when compared with the relevant prediction methods. Moreover, we observed that majority of the selected features by the proposed method are statistically significantly different between the mean feature values of the DNA-binding and the non DNA-binding proteins. All of the experimental results indicate that the proposed DBPPred can be an alternative perspective predictor for large-scale determination of DNA-binding proteins.
Collapse
|
16
|
Sharma A, Paliwal KK, Dehzangi A, Lyons J, Imoto S, Miyano S. A strategy to select suitable physicochemical attributes of amino acids for protein fold recognition. BMC Bioinformatics 2013; 14:233. [PMID: 23879571 PMCID: PMC3724710 DOI: 10.1186/1471-2105-14-233] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2012] [Accepted: 06/20/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Assigning a protein into one of its folds is a transitional step for discovering three dimensional protein structure, which is a challenging task in bimolecular (biological) science. The present research focuses on: 1) the development of classifiers, and 2) the development of feature extraction techniques based on syntactic and/or physicochemical properties. RESULTS Apart from the above two main categories of research, we have shown that the selection of physicochemical attributes of the amino acids is an important step in protein fold recognition and has not been explored adequately. We have presented a multi-dimensional successive feature selection (MD-SFS) approach to systematically select attributes. The proposed method is applied on protein sequence data and an improvement of around 24% in fold recognition has been noted when selecting attributes appropriately. CONCLUSION The MD-SFS has been applied successfully in selecting physicochemical attributes of the amino acids. The selected attributes show improved protein fold recognition performance.
Collapse
Affiliation(s)
- Alok Sharma
- Laboratory of DNA Information Analysis, University of Tokyo, Minato-ku, Tokyo, Japan.
| | | | | | | | | | | |
Collapse
|
17
|
Yan J, Marcus M, Kurgan L. Comprehensively designed consensus of standalone secondary structure predictors improves Q3 by over 3%. J Biomol Struct Dyn 2013; 32:36-51. [PMID: 23298369 DOI: 10.1080/07391102.2012.746945] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Protein fold is defined by a spatial arrangement of three types of secondary structures (SSs) including helices, sheets, and coils/loops. Current methods that predict SS from sequences rely on complex machine learning-derived models and provide the three-state accuracy (Q3) at about 82%. Further improvements in predictive quality could be obtained with a consensus-based approach, which so far received limited attention. We perform first-of-its-kind comprehensive design of a SS consensus predictor (SScon), in which we consider 12 modern standalone SS predictors and utilize Support Vector Machine (SVM) to combine their predictions. Using a large benchmark data-set with 10 random training-test splits, we show that a simple, voting-based consensus of carefully selected base methods improves Q3 by 1.9% when compared to the best single predictor. Use of SVM provides additional 1.4% improvement with the overall Q3 at 85.6% and segment overlap (SOV3) at 83.7%, when compared to 82.3 and 80.9%, respectively, obtained by the best individual methods. We also show strong improvements when the consensus is based on ab-initio methods, with Q3 = 82.3% and SOV3 = 80.7% that match the results from the best template-based approaches. Our consensus reduces the number of significant errors where helix is confused with a strand, provides particularly good results for short helices and strands, and gives the most accurate estimates of the content of individual SSs in the chain. Case studies are used to visualize the improvements offered by the consensus at the residue level. A web-server and a standalone implementation of SScon are available at http://biomine.ece.ualberta.ca/SSCon/ .
Collapse
Affiliation(s)
- Jing Yan
- a Department of Electrical and Computer Engineering , University of Alberta , Edmonton , Canada
| | | | | |
Collapse
|
18
|
Chen K, Kurgan L. Computational prediction of secondary and supersecondary structures. Methods Mol Biol 2012; 932:63-86. [PMID: 22987347 DOI: 10.1007/978-1-62703-065-6_5] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
The sequence-based prediction of the secondary and supersecondary structures enjoys strong interest and finds applications in numerous areas related to the characterization and prediction of protein structure and function. Substantial efforts in these areas over the last three decades resulted in the development of accurate predictors, which take advantage of modern machine learning models and availability of evolutionary information extracted from multiple sequence alignment. In this chapter, we first introduce and motivate both prediction areas and introduce basic concepts related to the annotation and prediction of the secondary and supersecondary structures, focusing on the β hairpin, coiled coil, and α-turn-α motifs. Next, we overview state-of-the-art prediction methods, and we provide details for 12 modern secondary structure predictors and 4 representative supersecondary structure predictors. Finally, we provide several practical notes for the users of these prediction tools.
Collapse
Affiliation(s)
- Ke Chen
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB, Canada
| | | |
Collapse
|