1
|
Mapes NJ, Rodriguez C, Chowriappa P, Dua S. Local Similarity Matrix for Cysteine Disulfide Connectivity Prediction from Protein Sequences. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1276-1289. [PMID: 30640622 DOI: 10.1109/tcbb.2019.2892441] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Accurately predicting three dimensional protein structures from sequences would present us with targets for drugs via molecular dynamics that would treat cancer, viral infections, and neurological diseases. These treatments would have a far reaching impact to our economy, quality of life, and society. The goal of this research was to build a data mining framework to predict cysteine connectivity in proteins from the sequence and oxidation state of cysteines. Accurately predicting the cysteine bonding configuration improves the TM-Score, a quantitative measurement of protein structure prediction accuracy. We provided state of the art Qp and Qc on the PDBCYS and IVD-54 Datasets. Furthermore, we have produced a Local Similarity Matrix that compares favorably to the default PSSMs generated from PSI-Blast in a statistically significant way. Our Qp for SP39, PDBCYS, and IVD-54 were 90.6, 80.6, and 68.5, respectively.
Collapse
|
2
|
Mapes NJ, Rodriguez C, Chowriappa P, Dua S. Residue Adjacency Matrix Based Feature Engineering for Predicting Cysteine Reactivity in Proteins. Comput Struct Biotechnol J 2018; 17:90-100. [PMID: 30671196 PMCID: PMC6327741 DOI: 10.1016/j.csbj.2018.12.005] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2018] [Revised: 12/13/2018] [Accepted: 12/20/2018] [Indexed: 01/31/2023] Open
Abstract
Free radicals that form from reactive species of nitrogen and oxygen can react dangerously with cellular components and are involved with the pathogenesis of diabetes, cancer, Parkinson's, and heart disease. Cysteine amino acids, due to their reactive nature, are prone to oxidation by these free radicals. Determining which cysteines oxidize within proteins is crucial to our understanding of these chronic diseases. Wet lab techniques, like differential alkylation, to determine which cysteines oxidize are often expensive and time-consuming. We utilize machine learning as a fast and inexpensive approach to identifying cysteines with oxidative capabilities. We created the original features RAMmod and RAMseq for use in classification. We also incorporated well-known features such as PROPKA, SASA, PSS, and PSSM. Our algorithm requires only the protein sequence to operate; however, we do use template matching by MODELLER to acquire 3D coordinates for additional feature extraction. There was a mean improvement of RAM over N6C by 22.04% MCC. It was statistically significant with a p-value of 0.015. RAM provided a significant increase over PSSM with a p-value of 0.040 and an average 70.09% improvement MCC.
Collapse
Affiliation(s)
| | | | - Pradeep Chowriappa
- Program of Computer Science, College of Engineering and Science, Louisiana Tech University, 305 Wisteria St., Ruston, LA 71272, United States
| | | |
Collapse
|
3
|
Fedoseev SV, Belikov MY, Ershov OV, Tafeenko VA. Reductive alkylation of disulfides. Synthesis of 2-(alkylsulfanyl)-1H-pyrrole-3-carbonitriles. RUSSIAN JOURNAL OF ORGANIC CHEMISTRY 2017. [DOI: 10.1134/s1070428016120125] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
4
|
Yang J, He BJ, Jang R, Zhang Y, Shen HB. Accurate disulfide-bonding network predictions improve ab initio structure prediction of cysteine-rich proteins. Bioinformatics 2015; 31:3773-81. [PMID: 26254435 DOI: 10.1093/bioinformatics/btv459] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2015] [Accepted: 08/02/2015] [Indexed: 01/19/2023] Open
Abstract
MOTIVATION Cysteine-rich proteins cover many important families in nature but there are currently no methods specifically designed for modeling the structure of these proteins. The accuracy of disulfide connectivity pattern prediction, particularly for the proteins of higher-order connections, e.g., >3 bonds, is too low to effectively assist structure assembly simulations. RESULTS We propose a new hierarchical order reduction protocol called Cyscon for disulfide-bonding prediction. The most confident disulfide bonds are first identified and bonding prediction is then focused on the remaining cysteine residues based on SVR training. Compared with purely machine learning-based approaches, Cyscon improved the average accuracy of connectivity pattern prediction by 21.9%. For proteins with more than 5 disulfide bonds, Cyscon improved the accuracy by 585% on the benchmark set of PDBCYS. When applied to 158 non-redundant cysteine-rich proteins, Cyscon predictions helped increase (or decrease) the TM-score (or RMSD) of the ab initio QUARK modeling by 12.1% (or 14.4%). This result demonstrates a new avenue to improve the ab initio structure modeling for cysteine-rich proteins. AVAILABILITY AND IMPLEMENTATION http://www.csbio.sjtu.edu.cn/bioinf/Cyscon/ CONTACT zhng@umich.edu or hbshen@sjtu.edu.cn. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jing Yang
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Bao-Ji He
- State Key Laboratory of Theoretical Physics, Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100190, China, Department of Computational Medicine and Bioinformatics and
| | - Richard Jang
- Department of Computational Medicine and Bioinformatics and
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics and Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China, Department of Computational Medicine and Bioinformatics and
| |
Collapse
|
5
|
Raimondi D, Orlando G, Vranken WF. An Evolutionary View on Disulfide Bond Connectivities Prediction Using Phylogenetic Trees and a Simple Cysteine Mutation Model. PLoS One 2015; 10:e0131792. [PMID: 26161671 PMCID: PMC4498770 DOI: 10.1371/journal.pone.0131792] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2015] [Accepted: 06/07/2015] [Indexed: 01/09/2023] Open
Abstract
Disulfide bonds are crucial for many structural and functional aspects of proteins. They have a stabilizing role during folding, can regulate enzymatic activity and can trigger allosteric changes in the protein structure. Moreover, knowledge of the topology of the disulfide connectivity can be relevant in genomic annotation tasks and can provide long range constraints for ab-initio protein structure predictors. In this paper we describe PhyloCys, a novel unsupervised predictor of disulfide bond connectivity from known cysteine oxidation states. For each query protein, PhyloCys retrieves and aligns homologs with HHblits and builds a phylogenetic tree using ClustalW. A simplified model of cysteine co-evolution is then applied to the tree in order to hypothesize the presence of oxidized cysteines in the inner nodes of the tree, which represent ancestral protein sequences. The tree is then traversed from the leaves to the root and the putative disulfide connectivity is inferred by observing repeated patterns of tandem mutations between a sequence and its ancestors. A final correction is applied using the Edmonds-Gabow maximum weight perfect matching algorithm. The evolutionary approach applied in PhyloCys results in disulfide bond predictions equivalent to Sephiroth, another approach that takes whole sequence information into account, and is 26-29% better than state of the art methods based on cysteine covariance patterns in multiple sequence alignments, while requiring one order of magnitude fewer homologous sequences (10(3) instead of 10(4)), thus extending its range of applicability. The software described in this article and the datasets used are available at http://ibsquare.be/phylocys.
Collapse
Affiliation(s)
- Daniele Raimondi
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels, Belgium
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
- Department of Structural Biology, VIB, Brussels, Belgium
- Machine Learning Group, ULB, Brussels, Belgium
| | - Gabriele Orlando
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels, Belgium
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
- Department of Structural Biology, VIB, Brussels, Belgium
- Machine Learning Group, ULB, Brussels, Belgium
| | - Wim F. Vranken
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels, Belgium
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
- Department of Structural Biology, VIB, Brussels, Belgium
| |
Collapse
|
6
|
Murad W, Singh R. The MS2DB ${++}$ Webserver: Disulfide Bond Determination Through Evidence Combination. IEEE Trans Nanobioscience 2013; 12:340-2. [DOI: 10.1109/tnb.2013.2289391] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
7
|
Lin HH, Hsu JC, Hsu YN, Pan RH, Chen YF, Tseng LY. Disulfide connectivity prediction based on structural information without a prior knowledge of the bonding state of cysteines. Comput Biol Med 2013; 43:1941-8. [PMID: 24209939 DOI: 10.1016/j.compbiomed.2013.09.008] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2013] [Revised: 09/07/2013] [Accepted: 09/10/2013] [Indexed: 10/26/2022]
Abstract
Previous studies predicted the disulfide bonding patterns of cysteines using a prior knowledge of their bonding states. In this study, we propose a method that is based on the ensemble support vector machine (SVM), with the structural features of cysteines extracted without any prior knowledge of their bonding states. This method is useful for improving the predictive performance of disulfide bonding patterns. For comparison, the proposed method was tested with the same dataset SPX that was adopted in previous studies. The experimental results demonstrate that bridge classification and disulfide connectivity predictions achieve 96.5% and 89.2% accuracy, respectively, using the ensemble SVM model, which outperforms the traditional method (51.5% and 51.0%, respectively) and the model that is based on a single-kernel SVM classifier (94.6% and 84.4%, respectively). For protein chain and residue classifications, the sensitivity, specificity, and accuracy of ensemble and single-kernel SVM approaches are better than those of the traditional methods. The predictive performances of the ensemble SVM and single-kernel models are identical, indicating that the ensemble model can converge to the single-kernel model for some applications.
Collapse
Affiliation(s)
- Hsuan-Hung Lin
- Department of Management Information System, Central Taiwan University of Science and Technology, Taichung 40601, Taiwan.
| | | | | | | | | | | |
Collapse
|
8
|
Becker J, Maes F, Wehenkel L. On the relevance of sophisticated structural annotations for disulfide connectivity pattern prediction. PLoS One 2013; 8:e56621. [PMID: 23533562 PMCID: PMC3574028 DOI: 10.1371/journal.pone.0056621] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2012] [Accepted: 01/14/2013] [Indexed: 12/02/2022] Open
Abstract
Disulfide bridges strongly constrain the native structure of many proteins and predicting their formation is therefore a key sub-problem of protein structure and function inference. Most recently proposed approaches for this prediction problem adopt the following pipeline: first they enrich the primary sequence with structural annotations, second they apply a binary classifier to each candidate pair of cysteines to predict disulfide bonding probabilities and finally, they use a maximum weight graph matching algorithm to derive the predicted disulfide connectivity pattern of a protein. In this paper, we adopt this three step pipeline and propose an extensive study of the relevance of various structural annotations and feature encodings. In particular, we consider five kinds of structural annotations, among which three are novel in the context of disulfide bridge prediction. So as to be usable by machine learning algorithms, these annotations must be encoded into features. For this purpose, we propose four different feature encodings based on local windows and on different kinds of histograms. The combination of structural annotations with these possible encodings leads to a large number of possible feature functions. In order to identify a minimal subset of relevant feature functions among those, we propose an efficient and interpretable feature function selection scheme, designed so as to avoid any form of overfitting. We apply this scheme on top of three supervised learning algorithms: k-nearest neighbors, support vector machines and extremely randomized trees. Our results indicate that the use of only the PSSM (position-specific scoring matrix) together with the CSP (cysteine separation profile) are sufficient to construct a high performance disulfide pattern predictor and that extremely randomized trees reach a disulfide pattern prediction accuracy of on the benchmark dataset SPX, which corresponds to improvement over the state of the art. A web-application is available at http://m24.giga.ulg.ac.be:81/x3CysBridges.
Collapse
Affiliation(s)
- Julien Becker
- Bioinformatics and Modeling, GIGA-Research, University of Liege, Liege, Belgium
| | - Francis Maes
- Department of Electrical Engineering and Computer Science, Montefiore Institute, University of Liege, Liege, Belgium
- DTAI, Departement Computerwetenschappen, University of Leuven, Leuven, Belgium
| | - Louis Wehenkel
- Department of Electrical Engineering and Computer Science, Montefiore Institute, University of Liege, Liege, Belgium
- * E-mail:
| |
Collapse
|
9
|
Singh R, Murad W. Protein disulfide topology determination through the fusion of mass spectrometric analysis and sequence-based prediction using Dempster-Shafer theory. BMC Bioinformatics 2013; 14 Suppl 2:S20. [PMID: 23368815 PMCID: PMC3549834 DOI: 10.1186/1471-2105-14-s2-s20] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Disulfide bonds constitute one of the most important cross-linkages in proteins and significantly influence protein structure and function. At the state-of-the-art, various methodological frameworks have been proposed for identification of disulfide bonds. These include among others, mass spectrometry-based methods, sequence-based predictive approaches, as well as techniques like crystallography and NMR. Each of these frameworks has its advantages and disadvantages in terms of pre-requisites for applicability, throughput, and accuracy. Furthermore, the results from different methods may concur or conflict in parts. Results In this paper, we propose a novel and theoretically rigorous framework for disulfide bond determination based on information fusion from different methods using an extended formulation of Dempster-Shafer theory. A key advantage of our approach is that it can automatically deal with concurring as well as conflicting evidence in a data-driven manner. Using the proposed framework, we have developed a method for disulfide bond determination that combines results from sequence-based prediction and mass spectrometric inference. This method leads to more accurate disulfide bond determination than any of the constituent methods taken individually. Furthermore, experiments indicate that the method improves the accuracy of bond identification as compared to leading extant methods at the state-of-the-art. Finally, the proposed framework is extensible in that results from any number of approaches can be incorporated. Results obtained using this framework can especially be useful in cases where the complexity of the bonding patterns coupled with specificities of the fragmentation pattern or limitations of computational models impair any single method to perform consistently across a diverse set of molecules.
Collapse
Affiliation(s)
- Rahul Singh
- Department of Computer Science, San Francisco State University, San Francisco, CA 94132, USA.
| | | |
Collapse
|
10
|
Lin HH, Tseng LY. Prediction of disulfide bonding pattern based on a support vector machine and multiple trajectory search. Inf Sci (N Y) 2012. [DOI: 10.1016/j.ins.2012.02.035] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
11
|
Zhu L, Yang J, Song JN, Chou KC, Shen HB. Improving the accuracy of predicting disulfide connectivity by feature selection. J Comput Chem 2010; 31:1478-85. [PMID: 20127740 DOI: 10.1002/jcc.21433] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
Disulfide bonds are primary covalent cross-links formed between two cysteine residues in the same or different protein polypeptide chains, which play important roles in the folding and stability of proteins. However, computational prediction of disulfide connectivity directly from protein primary sequences is challenging due to the nonlocal nature of disulfide bonds in the context of sequences, and the number of possible disulfide patterns grows exponentially when the number of cysteine residues increases. In the previous studies, disulfide connectivity prediction was usually performed in high-dimensional feature space, which can cause a variety of problems in statistical learning, such as the dimension disaster, overfitting, and feature redundancy. In this study, we propose an efficient feature selection technique for analyzing the importance of each feature component. On the basis of this approach, we selected the most important features for predicting the connectivity pattern of intra-chain disulfide bonds. Our results have shown that the high-dimensional features contain redundant information, and the prediction performance can be further improved when these high-dimensional features are reduced to a lower but more compact dimensional space. Our results also indicate that the global protein features contribute little to the formation and prediction of disulfide bonds, while the local sequential and structural information play important roles. All these findings provide important insights for structural studies of disulfide-rich proteins.
Collapse
Affiliation(s)
- Lin Zhu
- Department of Bioinformatics, Institute of Image Processing & Pattern Recognition, Shanghai Jiaotong University, 800 Dongchuan Road, Shanghai 200240, China
| | | | | | | | | |
Collapse
|
12
|
Elumalai P, Wu JW, Liu HL. Current advances in disulfide connectivity predictions. J Taiwan Inst Chem Eng 2010. [DOI: 10.1016/j.jtice.2010.05.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
13
|
Mello LV, O'Meara H, Rigden DJ, Paterson S. Identification of novel aspartic proteases from Strongyloides ratti and characterisation of their evolutionary relationships, stage-specific expression and molecular structure. BMC Genomics 2009; 10:611. [PMID: 20015380 PMCID: PMC2805697 DOI: 10.1186/1471-2164-10-611] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2009] [Accepted: 12/16/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Aspartic proteases are known to play an important role in the biology of nematode parasitism. This role is best characterised in blood-feeding nematodes, where they digest haemoglobin, but they are also likely to play important roles in the biology of nematode parasites that do not feed on blood. In the present work, we investigate the evolution and expression of aspartic proteases in Strongyloides ratti, which permits a unique comparison between parasitic and free-living adult forms within its life-cycle. RESULTS We identified eight transcribed aspartic protease sequences and a further two genomic sequences and compared these to homologues in Caenorhabditis elegans and other nematode species. Phylogenetic analysis demonstrated a complex pattern of gene evolution, such that some S. ratti sequences had a one-to-one correspondence with orthologues of C. elegans but that lineage-specific expansions have occurred for other aspartic proteases in these two nematodes. These gene duplication events may have contributed to the adaptation of the two species to their different lifestyles. Among the set of S. ratti aspartic proteases were two closely-related isoforms that showed differential expression during different life stages: ASP-2A is highly expressed in parasitic females while ASP-2B is predominantly found in free-living adults. Molecular modelling of the ASP-2 isoforms reveals that their substrate specificities are likely to be very similar, but that ASP-2B is more electrostatically negative over its entire molecular surface than ASP-2A. This characteristic may be related to different pH values of the environments in which these two isoforms operate. CONCLUSIONS We have demonstrated that S. ratti provides a powerful model to explore the genetic adaptations associated with parasitic versus free-living life-styles. We have discovered gene duplication of aspartic protease genes in Strongyloides and identified a pair of paralogues differentially expressed in either the parasitic or the free-living phase of the nematode life-cycle, consistent with an adaptive role for aspartic proteases in the evolution of nematode parasitism.
Collapse
Affiliation(s)
- Luciane V Mello
- School of Biological Sciences, University of Liverpool, Crown Street, Liverpool L69 7ZB, UK
| | - Helen O'Meara
- School of Biological Sciences, University of Liverpool, Crown Street, Liverpool L69 7ZB, UK
- Department of Pharmacology and Therapeutics, University of Liverpool, Ashton Street, Liverpool, L69 3GE, UK
| | - Daniel J Rigden
- School of Biological Sciences, University of Liverpool, Crown Street, Liverpool L69 7ZB, UK
| | - Steve Paterson
- School of Biological Sciences, University of Liverpool, Crown Street, Liverpool L69 7ZB, UK
| |
Collapse
|
14
|
Thangudu RR, Manoharan M, Srinivasan N, Cadet F, Sowdhamini R, Offmann B. Analysis on conservation of disulphide bonds and their structural features in homologous protein domain families. BMC STRUCTURAL BIOLOGY 2008; 8:55. [PMID: 19111067 PMCID: PMC2628669 DOI: 10.1186/1472-6807-8-55] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/04/2008] [Accepted: 12/26/2008] [Indexed: 11/22/2022]
Abstract
Background Disulphide bridges are well known to play key roles in stability, folding and functions of proteins. Introduction or deletion of disulphides by site-directed mutagenesis have produced varying effects on stability and folding depending upon the protein and location of disulphide in the 3-D structure. Given the lack of complete understanding it is worthwhile to learn from an analysis of extent of conservation of disulphides in homologous proteins. We have also addressed the question of what structural interactions replaces a disulphide in a homologue in another homologue. Results Using a dataset involving 34,752 pairwise comparisons of homologous protein domains corresponding to 300 protein domain families of known 3-D structures, we provide a comprehensive analysis of extent of conservation of disulphide bridges and their structural features. We report that only 54% of all the disulphide bonds compared between the homologous pairs are conserved, even if, a small fraction of the non-conserved disulphides do include cytoplasmic proteins. Also, only about one fourth of the distinct disulphides are conserved in all the members in protein families. We note that while conservation of disulphide is common in many families, disulphide bond mutations are quite prevalent. Interestingly, we note that there is no clear relationship between sequence identity between two homologous proteins and disulphide bond conservation. Our analysis on structural features at the sites where cysteines forming disulphide in one homologue are replaced by non-Cys residues show that the elimination of a disulphide in a homologue need not always result in stabilizing interactions between equivalent residues. Conclusion We observe that in the homologous proteins, disulphide bonds are conserved only to a modest extent. Very interestingly, we note that extent of conservation of disulphide in homologous proteins is unrelated to the overall sequence identity between homologues. The non-conserved disulphides are often associated with variable structural features that were recruited to be associated with differentiation or specialisation of protein function.
Collapse
Affiliation(s)
- Ratna R Thangudu
- Laboratoire de Biochimie et Génétique Moléculaire, Université de La Réunion, BP 7151, 15 avenue René Cassin, 97715 Saint Denis Messag Cedex 09, La Réunion, France.
| | | | | | | | | | | |
Collapse
|
15
|
Rubinstein R, Fiser A. Predicting disulfide bond connectivity in proteins by correlated mutations analysis. Bioinformatics 2008; 24:498-504. [PMID: 18203772 DOI: 10.1093/bioinformatics/btm637] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
MOTIVATION Prediction of disulfide bond connectivity facilitates structural and functional annotation of proteins. Previous studies suggest that cysteines of a disulfide bond mutate in a correlated manner. RESULTS We developed a method that analyzes correlated mutation patterns in multiple sequence alignments in order to predict disulfide bond connectivity. Proteins with known experimental structures and varying numbers of disulfide bonds, and that spanned various evolutionary distances, were aligned. We observed frequent variation of disulfide bond connectivity within members of the same protein families, and it was also observed that in 99% of the cases, cysteine pairs forming non-conserved disulfide bonds mutated in concert. Our data support the notion that substitution of a cysteine in a disulfide bond prompts the substitution of its cysteine partner and that oxidized cysteines appear in pairs. The method we developed predicts disulfide bond connectivity patterns with accuracies of 73, 69 and 61% for proteins with two, three and four disulfide bonds, respectively.
Collapse
Affiliation(s)
- Rotem Rubinstein
- Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, USA.
| | | |
Collapse
|
16
|
Vincent M, Passerini A, Labbé M, Frasconi P. A simplified approach to disulfide connectivity prediction from protein sequences. BMC Bioinformatics 2008; 9:20. [PMID: 18194539 PMCID: PMC2375136 DOI: 10.1186/1471-2105-9-20] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2007] [Accepted: 01/14/2008] [Indexed: 11/17/2022] Open
Abstract
Background Prediction of disulfide bridges from protein sequences is useful for characterizing structural and functional properties of proteins. Several methods based on different machine learning algorithms have been applied to solve this problem and public domain prediction services exist. These methods are however still potentially subject to significant improvements both in terms of prediction accuracy and overall architectural complexity. Results We introduce new methods for predicting disulfide bridges from protein sequences. The methods take advantage of two new decomposition kernels for measuring the similarity between protein sequences according to the amino acid environments around cysteines. Disulfide connectivity is predicted in two passes. First, a binary classifier is trained to predict whether a given protein chain has at least one intra-chain disulfide bridge. Second, a multiclass classifier (plemented by 1-nearest neighbor) is trained to predict connectivity patterns. The two passes can be easily cascaded to obtain connectivity prediction from sequence alone. We report an extensive experimental comparison on several data sets that have been previously employed in the literature to assess the accuracy of cysteine bonding state and disulfide connectivity predictors. Conclusion We reach state-of-the-art results on bonding state prediction with a simple method that classifies chains rather than individual residues. The prediction accuracy reached by our connectivity prediction method compares favorably with respect to all but the most complex other approaches. On the other hand, our method does not need any model selection or hyperparameter tuning, a property that makes it less prone to overfitting and prediction accuracy overestimation.
Collapse
Affiliation(s)
- Marc Vincent
- Machine Learning and Neural Networks Group, Dipartimento di Sistemi e Informatica, Università degli Studi di Firenze, Via di Santa Marta 3, 50139 Firenze, Italy.
| | | | | | | |
Collapse
|
17
|
Song J, Yuan Z, Tan H, Huber T, Burrage K. Predicting disulfide connectivity from protein sequence using multiple sequence feature vectors and secondary structure. ACTA ACUST UNITED AC 2007; 23:3147-54. [PMID: 17942444 DOI: 10.1093/bioinformatics/btm505] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Disulfide bonds are primary covalent crosslinks between two cysteine residues in proteins that play critical roles in stabilizing the protein structures and are commonly found in extracy-toplasmatic or secreted proteins. In protein folding prediction, the localization of disulfide bonds can greatly reduce the search in conformational space. Therefore, there is a great need to develop computational methods capable of accurately predicting disulfide connectivity patterns in proteins that could have potentially important applications. RESULTS We have developed a novel method to predict disulfide connectivity patterns from protein primary sequence, using a support vector regression (SVR) approach based on multiple sequence feature vectors and predicted secondary structure by the PSIPRED program. The results indicate that our method could achieve a prediction accuracy of 74.4% and 77.9%, respectively, when averaged on proteins with two to five disulfide bridges using 4-fold cross-validation, measured on the protein and cysteine pair on a well-defined non-homologous dataset. We assessed the effects of different sequence encoding schemes on the prediction performance of disulfide connectivity. It has been shown that the sequence encoding scheme based on multiple sequence feature vectors coupled with predicted secondary structure can significantly improve the prediction accuracy, thus enabling our method to outperform most of other currently available predictors. Our work provides a complementary approach to the current algorithms that should be useful in computationally assigning disulfide connectivity patterns and helps in the annotation of protein sequences generated by large-scale whole-genome projects. AVAILABILITY The prediction web server and Supplementary Material are accessible at http://foo.maths.uq.edu.au/~huber/disulfide
Collapse
Affiliation(s)
- Jiangning Song
- Advanced Computational Modelling Centre, The University of Queensland, Brisbane, QLD 4072, Australia
| | | | | | | | | |
Collapse
|
18
|
Thangudu RR, Sharma P, Srinivasan N, Offmann B. Analycys: A database for conservation and conformation of disulphide bonds in homologous protein domains. Proteins 2007; 67:255-61. [PMID: 17285632 DOI: 10.1002/prot.21318] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Disulphide bonds in proteins are known to play diverse roles ranging from folding to structure to function. Thorough knowledge of the conservation status and structural state of the disulphide bonds will help in understanding of the differences in homologous proteins. Here we present a database for the analysis of conservation and conformation of disulphide bonds in SCOP structural families. This database has a wide range of applications including mapping of disulphide bond mutation patterns, identification of disulphide bonds important for folding and stabilization, modeling of protein tertiary structures and in protein engineering. The database can be accessed at: http://bioinformatics.univ-reunion.fr/analycys/.
Collapse
Affiliation(s)
- Ratna R Thangudu
- Laboratoire de Biochimie et Génétique Moléculaire, Université de La Réunion, BP 7151, 15 Avenue René Cassin, 97715 Saint Denis Messag Cedex 09, La Réunion, France
| | | | | | | |
Collapse
|
19
|
Liu HL, Chen SC. Prediction of disulfide connectivity in proteins with support vector machine. ACTA ACUST UNITED AC 2007. [DOI: 10.1016/j.jcice.2006.09.002] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
20
|
Ceroni A, Passerini A, Vullo A, Frasconi P. DISULFIND: a disulfide bonding state and cysteine connectivity prediction server. Nucleic Acids Res 2006; 34:W177-81. [PMID: 16844986 PMCID: PMC1538823 DOI: 10.1093/nar/gkl266] [Citation(s) in RCA: 251] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
DISULFIND is a server for predicting the disulfide bonding state of cysteines and their disulfide connectivity starting from sequence alone. Optionally, disulfide connectivity can be predicted from sequence and a bonding state assignment given as input. The output is a simple visualization of the assigned bonding state (with confidence degrees) and the most likely connectivity patterns. The server is available at .
Collapse
Affiliation(s)
- Alessio Ceroni
- Machine Learning and Neural Networks Group, Università degli Studi di Firenze, Dipartimento di Sistemi e InformaticaVia di Santa Marta 3, 50139 Firenze, Italy
| | - Andrea Passerini
- Machine Learning and Neural Networks Group, Università degli Studi di Firenze, Dipartimento di Sistemi e InformaticaVia di Santa Marta 3, 50139 Firenze, Italy
| | - Alessandro Vullo
- School of Computer Science and Informatics, University College DublinBelfield, Dublin 4, Ireland
| | - Paolo Frasconi
- Machine Learning and Neural Networks Group, Università degli Studi di Firenze, Dipartimento di Sistemi e InformaticaVia di Santa Marta 3, 50139 Firenze, Italy
- To whom correspondence should be addressed. Tel: +39 0554796362; Fax: +39 0554796363;
| |
Collapse
|
21
|
Chen BJ, Tsai CH, Chan CH, Kao CY. Disulfide connectivity prediction with 70% accuracy using two-level models. Proteins 2006; 64:246-52. [PMID: 16615141 DOI: 10.1002/prot.20972] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Disulfide bridges stabilize protein structures covalently and play an important role in protein folding. Predicting disulfide connectivity precisely helps towards the solution of protein structure prediction. Previous methods for disulfide connectivity prediction either infer the bonding potential of cysteine pairs or rank alternative disulfide bonding patterns. As a result, these methods encode data according to cysteine pairs (pair-wise) or disulfide bonding patterns (pattern-wise). However, using either encoding scheme alone cannot fully utilize the local and global information of proteins, so the accuracies of previous methods are limited. In this work, we propose a novel two-level framework to predict disulfide connectivity. With this framework, both the pair-wise and pattern-wise encoding schemes are considered. Our models were validated on the datasets derived from SWISS-PROT 39 and 43, and the results demonstrate that our models can combine both local and global information. Compared to previous methods, significant improvements were obtained by our models. Our work may also provide insights to further improvements of disulfide connectivity prediction and increase its applicability in protein structure analysis and prediction.
Collapse
Affiliation(s)
- Bo-Juen Chen
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Republic of China
| | | | | | | |
Collapse
|
22
|
Tsai CH, Chen BJ, Chan CH, Liu HL, Kao CY. Improving disulfide connectivity prediction with sequential distance between oxidized cysteines. Bioinformatics 2005; 21:4416-9. [PMID: 16223789 DOI: 10.1093/bioinformatics/bti715] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
SUMMARY Predicting disulfide connectivity precisely helps towards the solution of protein structure prediction. In this study, a descriptor derived from the sequential distance between oxidized cysteines (denoted as DOC) is proposed. An approach using support vector machine (SVM) method based on weighted graph matching was further developed to predict the disulfide connectivity pattern in proteins. When DOC was applied, prediction accuracy of 63% for our SVM models could be achieved, which is significantly higher than those obtained from previous approaches. The results show that using the non-local descriptor DOC coupled with local sequence profiles significantly improves the prediction accuracy. These improvements demonstrate that DOC, with a proper scaling scheme, is an effective feature for the prediction of disulfide connectivity. The method developed in this work is available at the web server PreCys (prediction of cys-cys linkages of proteins).
Collapse
Affiliation(s)
- Chi-Hung Tsai
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan 106
| | | | | | | | | |
Collapse
|
23
|
Abstract
Correctly predicting the disulfide bond topology in a protein is of crucial importance for the understanding of protein function and can be of great help for tertiary prediction methods. The web server http://clavius.bc.edu/~clotelab/DiANNA/ outputs the disulfide connectivity prediction given input of a protein sequence. The following procedure is performed. First, PSIPRED is run to predict the protein's secondary structure, then PSIBLAST is run against the non-redundant SwissProt to obtain a multiple alignment of the input sequence. The predicted secondary structure and the profile arising from this alignment are used in the training phase of our neural network. Next, cysteine oxidation state is predicted, then each pair of cysteines in the protein sequence is assigned a likelihood of forming a disulfide bond--this is performed by means of a novel architecture (diresidue neural network). Finally, Rothberg's implementation of Gabow's maximum weighted matching algorithm is applied to diresidue neural network scores in order to produce the final connectivity prediction. Our novel neural network-based approach achieves results that are comparable and in some cases better than the current state-of-the-art methods.
Collapse
Affiliation(s)
- F. Ferrè
- Department of Biology, Boston CollegeChestnut Hill, MA 02467, USA
| | - P. Clote
- Department of Biology, Boston CollegeChestnut Hill, MA 02467, USA
- Department of Computer Science (courtesy appointment), Boston CollegeChestnut Hill, MA 02467, USA
- To whom correspondence should be addressed. Tel: +1 617 552 1332; Fax: +1 617 552 2011;
| |
Collapse
|