201
|
Jain T, Boland T, Lilov A, Burnina I, Brown M, Xu Y, Vásquez M. Prediction of delayed retention of antibodies in hydrophobic interaction chromatography from sequence using machine learning. Bioinformatics 2017; 33:3758-3766. [DOI: 10.1093/bioinformatics/btx519] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2017] [Accepted: 08/11/2017] [Indexed: 12/16/2022] Open
Affiliation(s)
- Tushar Jain
- Computational Biology, Adimab, Palo Alto, CA, USA
| | - Todd Boland
- Computational Biology, Adimab, Palo Alto, CA, USA
| | | | | | | | - Yingda Xu
- Protein Analytics, Adimab, Lebanon, NH, USA
| | | |
Collapse
|
202
|
Nielsen H. Predicting Subcellular Localization of Proteins by Bioinformatic Algorithms. Curr Top Microbiol Immunol 2017; 404:129-158. [PMID: 26728066 DOI: 10.1007/82_2015_5006] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
When predicting the subcellular localization of proteins from their amino acid sequences, there are basically three approaches: signal-based, global property-based, and homology-based. Each of these has its advantages and drawbacks, and it is important when comparing methods to know which approach was used. Various statistical and machine learning algorithms are used with all three approaches, and various measures and standards are employed when reporting the performances of the developed methods. This chapter presents a number of available methods for prediction of sorting signals and subcellular localization, but rather than providing a checklist of which predictors to use, it aims to function as a guide for critical assessment of prediction methods.
Collapse
Affiliation(s)
- Henrik Nielsen
- Department of Systems Biology, Center for Biological Sequence Analysis, Technical University of Denmark, Kemitorvet building 208, 2800, Lyngby, Denmark.
| |
Collapse
|
203
|
|
204
|
Feyertag F, Berninsone PM, Alvarez-Ponce D. Secreted Proteins Defy the Expression Level-Evolutionary Rate Anticorrelation. Mol Biol Evol 2017; 34:692-706. [PMID: 28007979 DOI: 10.1093/molbev/msw268] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
The rates of evolution of the proteins of any organism vary across orders of magnitude. A primary factor influencing rates of protein evolution is expression. A strong negative correlation between expression levels and evolutionary rates (the so-called E-R anticorrelation) has been observed in virtually all studied organisms. This effect is currently attributed to the abundance-dependent fitness costs of misfolding and unspecific protein-protein interactions, among other factors. Secreted proteins are folded in the endoplasmic reticulum, a compartment where chaperones, folding catalysts, and stringent quality control mechanisms promote their correct folding and may reduce the fitness costs of misfolding. In addition, confinement of secreted proteins to the extracellular space may reduce misinteractions and their deleterious effects. We hypothesize that each of these factors (the secretory pathway quality control and extracellular location) may reduce the strength of the E-R anticorrelation. Indeed, here we show that among human proteins that are secreted to the extracellular space, rates of evolution do not correlate with protein abundances. This trend is robust to controlling for several potentially confounding factors and is also observed when analyzing protein abundance data for 6 human tissues. In addition, analysis of mRNA abundance data for 32 human tissues shows that the E-R correlation is always less negative, and sometimes nonsignificant, in secreted proteins. Similar observations were made in Caenorhabditis elegans and in Escherichia coli, and to a lesser extent in Drosophila melanogaster, Saccharomyces cerevisiae and Arabidopsis thaliana. Our observations contribute to understand the causes of the E-R anticorrelation.
Collapse
Affiliation(s)
- Felix Feyertag
- Department of Biology, University of Nevada, Reno, Reno, NV
| | | | | |
Collapse
|
205
|
Li Z, Wang J, Zhang S, Zhang Q, Wu W. A new hybrid coding for protein secondary structure prediction based on primary structure similarity. Gene 2017; 618:8-13. [DOI: 10.1016/j.gene.2017.03.011] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2016] [Revised: 02/02/2017] [Accepted: 03/15/2017] [Indexed: 11/16/2022]
|
206
|
Jing X, Dong Q. MQAPRank: improved global protein model quality assessment by learning-to-rank. BMC Bioinformatics 2017; 18:275. [PMID: 28545390 PMCID: PMC5445322 DOI: 10.1186/s12859-017-1691-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2017] [Accepted: 05/16/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Protein structure prediction has achieved a lot of progress during the last few decades and a greater number of models for a certain sequence can be predicted. Consequently, assessing the qualities of predicted protein models in perspective is one of the key components of successful protein structure prediction. Over the past years, a number of methods have been developed to address this issue, which could be roughly divided into three categories: single methods, quasi-single methods and clustering (or consensus) methods. Although these methods achieve much success at different levels, accurate protein model quality assessment is still an open problem. RESULTS Here, we present the MQAPRank, a global protein model quality assessment program based on learning-to-rank. The MQAPRank first sorts the decoy models by using single method based on learning-to-rank algorithm to indicate their relative qualities for the target protein. And then it takes the first five models as references to predict the qualities of other models by using average GDT_TS scores between reference models and other models. Benchmarked on CASP11 and 3DRobot datasets, the MQAPRank achieved better performances than other leading protein model quality assessment methods. Recently, the MQAPRank participated in the CASP12 under the group name FDUBio and achieved the state-of-the-art performances. CONCLUSIONS The MQAPRank provides a convenient and powerful tool for protein model quality assessment with the state-of-the-art performances, it is useful for protein structure prediction and model quality assessment usages.
Collapse
Affiliation(s)
- Xiaoyang Jing
- School of Computer Science, Fudan University, Shanghai, 200433 People’s Republic of China
| | - Qiwen Dong
- School of Data Science and Engineering, East China Normal University, Shanghai, 200062 People’s Republic of China
| |
Collapse
|
207
|
Xiong D, Zeng J, Gong H. A deep learning framework for improving long-range residue–residue contact prediction using a hierarchical strategy. Bioinformatics 2017; 33:2675-2683. [DOI: 10.1093/bioinformatics/btx296] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2016] [Accepted: 05/02/2017] [Indexed: 12/31/2022] Open
Affiliation(s)
- Dapeng Xiong
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
- Beijing Innovation Center of Structural Biology, Tsinghua University, Beijing, China
| | - Jianyang Zeng
- Beijing Innovation Center of Structural Biology, Tsinghua University, Beijing, China
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Haipeng Gong
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
- Beijing Innovation Center of Structural Biology, Tsinghua University, Beijing, China
| |
Collapse
|
208
|
Richard Strimbeck G. Hiding in plain sight: the F segment and other conserved features of seed plant SK n dehydrins. PLANTA 2017; 245:1061-1066. [PMID: 28321577 PMCID: PMC5393156 DOI: 10.1007/s00425-017-2679-7] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/14/2017] [Accepted: 03/15/2017] [Indexed: 05/13/2023]
Abstract
MAIN CONCLUSION An 11-residue amino acid sequence, DRGLFDFLGKK, is highly conserved in a subset of dehydrins found across the full spectrum of seed plants and here given the name F-segment. An 11-residue amino acid sequence, DRGLFDFLGKK, is highly conserved in identity and polarity in 130 non-redundant dehydrin sequences representing conifers and all major angiosperm groups. This newly described motif is here given the name F segment based on the pair of hydrophobic F residues at the core of the sequence. The majority of dehydrins previously classified as SKn dehydrins contain one F segment N terminal to the S and K segments and can accordingly be reclassified as FSKn dehydrins. A cysteine-containing variant, GCGMFDFLKK, occurs in a few rosid and asterid taxa. The S segment in this and other dehydrin types also includes previously overlooked conserved features, including a KLHR prefix and charged or G residues within and following the characteristic string of S residues. Secondary structure prediction models indicate that the F segment and S segment prefix may form amphipathic helices that could be involved in membrane or protein binding.
Collapse
Affiliation(s)
- G Richard Strimbeck
- Department of Biology, Norwegian University of Science and Technology, 7491, Trondheim, Norway.
| |
Collapse
|
209
|
Heffernan R, Yang Y, Paliwal K, Zhou Y. Capturing non-local interactions by long short-term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility. Bioinformatics 2017; 33:2842-2849. [DOI: 10.1093/bioinformatics/btx218] [Citation(s) in RCA: 234] [Impact Index Per Article: 29.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2016] [Accepted: 04/15/2017] [Indexed: 11/14/2022] Open
Affiliation(s)
- Rhys Heffernan
- Signal Processing Laboratory, Griffith University, Brisbane, QLD, Australia
| | - Yuedong Yang
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Southport, QLD, Australia
| | - Kuldip Paliwal
- Signal Processing Laboratory, Griffith University, Brisbane, QLD, Australia
| | - Yaoqi Zhou
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Southport, QLD, Australia
| |
Collapse
|
210
|
Peterson LX, Roy A, Christoffer C, Terashi G, Kihara D. Modeling disordered protein interactions from biophysical principles. PLoS Comput Biol 2017; 13:e1005485. [PMID: 28394890 PMCID: PMC5402988 DOI: 10.1371/journal.pcbi.1005485] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2016] [Revised: 04/24/2017] [Accepted: 03/29/2017] [Indexed: 12/12/2022] Open
Abstract
Disordered protein-protein interactions (PPIs), those involving a folded protein and an intrinsically disordered protein (IDP), are prevalent in the cell, including important signaling and regulatory pathways. IDPs do not adopt a single dominant structure in isolation but often become ordered upon binding. To aid understanding of the molecular mechanisms of disordered PPIs, it is crucial to obtain the tertiary structure of the PPIs. However, experimental methods have difficulty in solving disordered PPIs and existing protein-protein and protein-peptide docking methods are not able to model them. Here we present a novel computational method, IDP-LZerD, which models the conformation of a disordered PPI by considering the biophysical binding mechanism of an IDP to a structured protein, whereby a local segment of the IDP initiates the interaction and subsequently the remaining IDP regions explore and coalesce around the initial binding site. On a dataset of 22 disordered PPIs with IDPs up to 69 amino acids, successful predictions were made for 21 bound and 18 unbound receptors. The successful modeling provides additional support for biophysical principles. Moreover, the new technique significantly expands the capability of protein structure modeling and provides crucial insights into the molecular mechanisms of disordered PPIs.
Collapse
Affiliation(s)
- Lenna X. Peterson
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, United States of America
| | - Amitava Roy
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, United States of America
- Department of Medicinal Chemistry and Molecular Pharmacology, Purdue University, West Lafayette, Indiana, United States of America
- Bioinformatics and Computational Biosciences Branch, Rocky Mountain Laboratories, NIAID, National Institutes of Health, Hamilton, Montana, United States of America
| | - Charles Christoffer
- Department of Computer Science, Purdue University, West Lafayette, Indiana, United States of America
| | - Genki Terashi
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, United States of America
- School of Pharmacy, Kitasato University, Tokyo, Japan
| | - Daisuke Kihara
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, United States of America
- Department of Computer Science, Purdue University, West Lafayette, Indiana, United States of America
| |
Collapse
|
211
|
Gong H, Zhang H, Zhu J, Wang C, Sun S, Zheng WM, Bu D. Improving prediction of burial state of residues by exploiting correlation among residues. BMC Bioinformatics 2017; 18:70. [PMID: 28361691 PMCID: PMC5374591 DOI: 10.1186/s12859-017-1475-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
Abstract
Background Residues in a protein might be buried inside or exposed to the solvent surrounding the protein. The buried residues usually form hydrophobic cores to maintain the structural integrity of proteins while the exposed residues are tightly related to protein functions. Thus, the accurate prediction of solvent accessibility of residues will greatly facilitate our understanding of both structure and functionalities of proteins. Most of the state-of-the-art prediction approaches consider the burial state of each residue independently, thus neglecting the correlations among residues. Results In this study, we present a high-order conditional random field model that considers burial states of all residues in a protein simultaneously. Our approach exploits not only the correlation among adjacent residues but also the correlation among long-range residues. Experimental results showed that by exploiting the correlation among residues, our approach outperformed the state-of-the-art approaches in prediction accuracy. In-depth case studies also showed that by using the high-order statistical model, the errors committed by the bidirectional recurrent neural network and chain conditional random field models were successfully corrected. Conclusions Our methods enable the accurate prediction of residue burial states, which should greatly facilitate protein structure prediction and evaluation.
Collapse
Affiliation(s)
- Hai'e Gong
- Key Lab of Intelligent Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China.,School of Computer Science, University of Chinese Academy of Sciences, Beijing, China
| | - Haicang Zhang
- Key Lab of Intelligent Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China.,School of Computer Science, University of Chinese Academy of Sciences, Beijing, China
| | - Jianwei Zhu
- Key Lab of Intelligent Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China.,School of Computer Science, University of Chinese Academy of Sciences, Beijing, China
| | - Chao Wang
- Key Lab of Intelligent Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China.,School of Computer Science, University of Chinese Academy of Sciences, Beijing, China
| | - Shiwei Sun
- Key Lab of Intelligent Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China
| | - Wei-Mou Zheng
- Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing, 100190, China.
| | - Dongbo Bu
- Key Lab of Intelligent Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China.
| |
Collapse
|
212
|
|
213
|
Siddaramaiah M, Satyamoorthy K, Rao BSS, Roy S, Chandra S, Mahato KK. Identification of protein secondary structures by laser induced autofluorescence: A study of urea and GnHCl induced protein denaturation. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2017; 174:44-53. [PMID: 27875744 DOI: 10.1016/j.saa.2016.11.017] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/01/2016] [Revised: 11/11/2016] [Accepted: 11/12/2016] [Indexed: 06/06/2023]
Abstract
In the present study an attempt has been made to interrogate the bulk secondary structures of some selected proteins (BSA, HSA, lysozyme, trypsin and ribonuclease A) under urea and GnHCl denaturation using laser induced autofluorescence. The proteins were treated with different concentrations of urea (3M, 6M, 9M) and GnHCl (2M, 4M, 6M) and the corresponding steady state autofluorescence spectra were recorded at 281nm pulsed laser excitations. The recorded fluorescence spectra of proteins were then interpreted based on the existing PDB structures of the proteins and the Trp solvent accessibility (calculated using "Scratch protein predictor" at 30% threshold). Further, the influence of rigidity and conformation of the indole ring (caused by protein secondary structures) on the intrinsic fluorescence properties of proteins were also evaluated using fluorescence of ANS-HSA complexes, CD spectroscopy as well as with trypsin digestion experiments. The outcomes obtained clearly demonstrated GnHCl preferably disrupt helix as compared to the beta β-sheets whereas, urea found was more effective in disrupting β-sheets as compared to the helices. The other way round the proteins which have shown detectable change in the intrinsic fluorescence at lower concentrations of GnHCl were rich in helices whereas, the proteins which showed detectable change in the intrinsic fluorescence at lower concentrations of urea were rich in β-sheets. Since high salt concentrations like GnHCl and urea interfere in the secondary structure analysis by circular dichroism Spectrometry, the present method of analyzing secondary structures using laser induced autofluorescence will be highly advantageous over existing tools for the same.
Collapse
Affiliation(s)
- Manjunath Siddaramaiah
- Department of Biophysics, School of Life Sciences, Manipal University, Manipal, Karnataka 576104, India
| | | | - Bola Sadashiva Satish Rao
- Department of Radiation Biology and Toxicology, School of Life Sciences, Manipal University, Manipal, Karnataka 576104, India
| | - Suparna Roy
- School of Life Sciences, Manipal University, Manipal, Karnataka 576104, India
| | - Subhash Chandra
- Department of Biophysics, School of Life Sciences, Manipal University, Manipal, Karnataka 576104, India
| | - Krishna Kishore Mahato
- Department of Biophysics, School of Life Sciences, Manipal University, Manipal, Karnataka 576104, India.
| |
Collapse
|
214
|
Hajighahramani N, Nezafat N, Eslami M, Negahdaripour M, Rahmatabadi SS, Ghasemi Y. Immunoinformatics analysis and in silico designing of a novel multi-epitope peptide vaccine against Staphylococcus aureus. INFECTION GENETICS AND EVOLUTION 2017; 48:83-94. [DOI: 10.1016/j.meegid.2016.12.010] [Citation(s) in RCA: 115] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/19/2016] [Revised: 11/29/2016] [Accepted: 12/09/2016] [Indexed: 12/19/2022]
|
215
|
|
216
|
Wu W, Wang Z, Cong P, Li T. Accurate prediction of protein relative solvent accessibility using a balanced model. BioData Min 2017; 10:1. [PMID: 28127402 PMCID: PMC5259893 DOI: 10.1186/s13040-016-0121-5] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2016] [Accepted: 12/27/2016] [Indexed: 01/19/2023] Open
Abstract
BACKGROUND Protein relative solvent accessibility provides insight into understanding protein structure and function. Prediction of protein relative solvent accessibility is often the first stage of predicting other protein properties. Recent predictors of relative solvent accessibility discriminate against exposed regions as compared with buried regions, resulting in higher prediction accuracy associated with buried regions relative to exposed regions. METHODS Here, we propose a more accurate and balanced predictor of protein relative solvent accessibility. First, we collected known proteins in three subsets according to sequence length and constructed a balanced dataset after reducing redundancy within each subset. Next, we measured the performance associated with different variables and variable combinations to determine the best variable combination. Finally, a predictor called BMRSA was constructed for modelling and prediction, which used the balanced set as the training set, the position- specific scoring matrix, predicted secondary structure, buried-exposed profile, and length of a query sequence as variables, and the conditional random field as the machine-learning method. RESULTS BMRSA performance on test sets confirmed that our approach improved prediction accuracy relative to state-of-the-art approaches and was balanced in its comparison of buried and exposed regions. Our method is valuable when higher levels of accuracy in predicting exposed-residue states are required. The BMRSA is available at: http://cheminfo.tongji.edu.cn:8080/BMRSA/.
Collapse
Affiliation(s)
- Wei Wu
- Department of Chemistry, Tongji University, Shanghai, China
| | - Zhiheng Wang
- Department of Chemistry, Tongji University, Shanghai, China
| | - Peisheng Cong
- Department of Chemistry, Tongji University, Shanghai, China
| | - Tonghua Li
- Department of Chemistry, Tongji University, Shanghai, China
| |
Collapse
|
217
|
Site-specific mapping of the human SUMO proteome reveals co-modification with phosphorylation. Nat Struct Mol Biol 2017; 24:325-336. [DOI: 10.1038/nsmb.3366] [Citation(s) in RCA: 262] [Impact Index Per Article: 32.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2016] [Accepted: 12/16/2016] [Indexed: 12/18/2022]
|
218
|
An Y, Wang J, Li C, Revote J, Zhang Y, Naderer T, Hayashida M, Akutsu T, Webb GI, Lithgow T, Song J. SecretEPDB: a comprehensive web-based resource for secreted effector proteins of the bacterial types III, IV and VI secretion systems. Sci Rep 2017; 7:41031. [PMID: 28112271 PMCID: PMC5253721 DOI: 10.1038/srep41031] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2016] [Accepted: 12/14/2016] [Indexed: 12/28/2022] Open
Abstract
Bacteria translocate effector molecules to host cells through highly evolved secretion systems. By definition, the function of these effector proteins is to manipulate host cell biology and the sequence, structural and functional annotations of these effector proteins will provide a better understanding of how bacterial secretion systems promote bacterial survival and virulence. Here we developed a knowledgebase, termed SecretEPDB (Bacterial Secreted Effector Protein DataBase), for effector proteins of type III secretion system (T3SS), type IV secretion system (T4SS) and type VI secretion system (T6SS). SecretEPDB provides enriched annotations of the aforementioned three classes of effector proteins by manually extracting and integrating structural and functional information from currently available databases and the literature. The database is conservative and strictly curated to ensure that every effector protein entry is supported by experimental evidence that demonstrates it is secreted by a T3SS, T4SS or T6SS. The annotations of effector proteins documented in SecretEPDB are provided in terms of protein characteristics, protein function, protein secondary structure, Pfam domains, metabolic pathway and evolutionary details. It is our hope that this integrated knowledgebase will serve as a useful resource for biological investigation and the generation of new hypotheses for research efforts aimed at bacterial secretion systems.
Collapse
Affiliation(s)
- Yi An
- College of Information Engineering, Northwest A&F University, Yangling 712100, China.,Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - Jiawei Wang
- School of Electronic and Computer Engineering, Peking University, Beijing 100871, China
| | - Chen Li
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC 3800, Australia
| | - Jerico Revote
- Monash Bioinformatics Platform, Monash University, Melbourne, VIC 3800, Australia
| | - Yang Zhang
- College of Information Engineering, Northwest A&F University, Yangling 712100, China
| | - Thomas Naderer
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Morihiro Hayashida
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan
| | - Geoffrey I Webb
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - Trevor Lithgow
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC 3800, Australia
| | - Jiangning Song
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia.,Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| |
Collapse
|
219
|
Zhang X, Liu S. RBPPred: predicting RNA-binding proteins from sequence using SVM. Bioinformatics 2016; 33:854-862. [DOI: 10.1093/bioinformatics/btw730] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2016] [Accepted: 11/16/2016] [Indexed: 11/13/2022] Open
|
220
|
Meng F, Kurgan L. Computational Prediction of Protein Secondary Structure from Sequence. ACTA ACUST UNITED AC 2016; 86:2.3.1-2.3.10. [DOI: 10.1002/cpps.19] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Affiliation(s)
- Fanchi Meng
- Department of Electrical and Computer Engineering, University of Alberta Edmonton Canada
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University Richmond Virginia
| |
Collapse
|
221
|
Kavianpour H, Vasighi M. Structural classification of proteins using texture descriptors extracted from the cellular automata image. Amino Acids 2016; 49:261-271. [DOI: 10.1007/s00726-016-2354-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2016] [Accepted: 10/18/2016] [Indexed: 12/12/2022]
|
222
|
Brandes N, Ofer D, Linial M. ASAP: a machine learning framework for local protein properties. Database (Oxford) 2016; 2016:baw133. [PMID: 27694209 PMCID: PMC5045867 DOI: 10.1093/database/baw133] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2016] [Revised: 08/08/2016] [Accepted: 08/28/2016] [Indexed: 11/14/2022]
Abstract
Determining residue-level protein properties, such as sites of post-translational modifications (PTMs), is vital to understanding protein function. Experimental methods are costly and time-consuming, while traditional rule-based computational methods fail to annotate sites lacking substantial similarity. Machine Learning (ML) methods are becoming fundamental in annotating unknown proteins and their heterogeneous properties. We present ASAP (Amino-acid Sequence Annotation Prediction), a universal ML framework for predicting residue-level properties. ASAP extracts numerous features from raw sequences, and supports easy integration of external features such as secondary structure, solvent accessibility, intrinsically disorder or PSSM profiles. Features are then used to train ML classifiers. ASAP can create new classifiers within minutes for a variety of tasks, including PTM prediction (e.g. cleavage sites by convertase, phosphoserine modification). We present a detailed case study for ASAP: CleavePred, an ASAP-based model to predict protein precursor cleavage sites, with state-of-the-art results. Protein cleavage is a PTM shared by a wide variety of proteins sharing minimal sequence similarity. Current rule-based methods suffer from high false positive rates, making them suboptimal. The high performance of CleavePred makes it suitable for analyzing new proteomes at a genomic scale. The tool is attractive to protein design, mass spectrometry search engines and the discovery of new bioactive peptides from precursors. ASAP functions as a baseline approach for residue-level protein sequence prediction. CleavePred is freely accessible as a web-based application. Both ASAP and CleavePred are open-source with a flexible Python API.Database URL: ASAP's and CleavePred source code, webtool and tutorials are available at: https://github.com/ddofer/asap; http://protonet.cs.huji.ac.il/cleavepred.
Collapse
Affiliation(s)
- Nadav Brandes
- Department of Biological Chemistry, The Alexander Silberman Institute of Life Sciences, The Hebrew University, Jerusalem 91904, Israel
| | - Dan Ofer
- Department of Biological Chemistry, The Alexander Silberman Institute of Life Sciences, The Hebrew University, Jerusalem 91904, Israel
| | - Michal Linial
- Department of Biological Chemistry, The Alexander Silberman Institute of Life Sciences, The Hebrew University, Jerusalem 91904, Israel
| |
Collapse
|
223
|
AUC-Maximized Deep Convolutional Neural Fields for Protein Sequence Labeling. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES : EUROPEAN CONFERENCE, ECML PKDD ... : PROCEEDINGS. ECML PKDD (CONFERENCE) 2016; 9852:1-16. [PMID: 28884168 PMCID: PMC5584645 DOI: 10.1007/978-3-319-46227-1_1] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Deep Convolutional Neural Networks (DCNN) has shown excellent performance in a variety of machine learning tasks. This paper presents Deep Convolutional Neural Fields (DeepCNF), an integration of DCNN with Conditional Random Field (CRF), for sequence labeling with an imbalanced label distribution. The widely-used training methods, such as maximum-likelihood and maximum labelwise accuracy, do not work well on imbalanced data. To handle this, we present a new training algorithm called maximum-AUC for DeepCNF. That is, we train DeepCNF by directly maximizing the empirical Area Under the ROC Curve (AUC), which is an unbiased measurement for imbalanced data. To fulfill this, we formulate AUC in a pairwise ranking framework, approximate it by a polynomial function and then apply a gradient-based procedure to optimize it. Our experimental results confirm that maximum-AUC greatly outperforms the other two training methods on 8-state secondary structure prediction and disorder prediction since their label distributions are highly imbalanced and also has similar performance as the other two training methods on solvent accessibility prediction, which has three equally-distributed labels. Furthermore, our experimental results show that our AUC-trained DeepCNF models greatly outperform existing popular predictors of these three tasks. The data and software related to this paper are available at https://github.com/realbigws/DeepCNF_AUC.
Collapse
|
224
|
Sun MA, Zhang Q, Wang Y, Ge W, Guo D. Prediction of redox-sensitive cysteines using sequential distance and other sequence-based features. BMC Bioinformatics 2016; 17:316. [PMID: 27553667 PMCID: PMC4995733 DOI: 10.1186/s12859-016-1185-4] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2015] [Accepted: 08/12/2016] [Indexed: 11/10/2022] Open
Abstract
Background Reactive oxygen species can modify the structure and function of proteins and may also act as important signaling molecules in various cellular processes. Cysteine thiol groups of proteins are particularly susceptible to oxidation. Meanwhile, their reversible oxidation is of critical roles for redox regulation and signaling. Recently, several computational tools have been developed for predicting redox-sensitive cysteines; however, those methods either only focus on catalytic redox-sensitive cysteines in thiol oxidoreductases, or heavily depend on protein structural data, thus cannot be widely used. Results In this study, we analyzed various sequence-based features potentially related to cysteine redox-sensitivity, and identified three types of features for efficient computational prediction of redox-sensitive cysteines. These features are: sequential distance to the nearby cysteines, PSSM profile and predicted secondary structure of flanking residues. After further feature selection using SVM-RFE, we developed Redox-Sensitive Cysteine Predictor (RSCP), a SVM based classifier for redox-sensitive cysteine prediction using primary sequence only. Using 10-fold cross-validation on RSC758 dataset, the accuracy, sensitivity, specificity, MCC and AUC were estimated as 0.679, 0.602, 0.756, 0.362 and 0.727, respectively. When evaluated using 10-fold cross-validation with BALOSCTdb dataset which has structure information, the model achieved performance comparable to current structure-based method. Further validation using an independent dataset indicates it is robust and of relatively better accuracy for predicting redox-sensitive cysteines from non-enzyme proteins. Conclusions In this study, we developed a sequence-based classifier for predicting redox-sensitive cysteines. The major advantage of this method is that it does not rely on protein structure data, which ensures more extensive application compared to other current implementations. Accurate prediction of redox-sensitive cysteines not only enhances our understanding about the redox sensitivity of cysteine, it may also complement the proteomics approach and facilitate further experimental investigation of important redox-sensitive cysteines. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1185-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ming-An Sun
- State Key Laboratory of Agrobiotechnology and School of Life Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong, People's Republic of China
| | - Qing Zhang
- State Key Laboratory of Agrobiotechnology and School of Life Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong, People's Republic of China
| | - Yejun Wang
- Department of Cell Biology and Genetics, School of Basic Medical Sciences, Shenzhen University Health Science Center, Nanhai Ave 3688, Shenzhen, 518060, People's Republic of China
| | - Wei Ge
- Centre of Reproduction, Development and Aging, Faculty of Health Sciences, University of Macau, Taipa, Macau, People's Republic of China
| | - Dianjing Guo
- State Key Laboratory of Agrobiotechnology and School of Life Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong, People's Republic of China.
| |
Collapse
|
225
|
Jing X, Wang K, Lu R, Dong Q. Sorting protein decoys by machine-learning-to-rank. Sci Rep 2016; 6:31571. [PMID: 27530967 PMCID: PMC4987638 DOI: 10.1038/srep31571] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2016] [Accepted: 07/26/2016] [Indexed: 11/18/2022] Open
Abstract
Much progress has been made in Protein structure prediction during the last few decades. As the predicted models can span a broad range of accuracy spectrum, the accuracy of quality estimation becomes one of the key elements of successful protein structure prediction. Over the past years, a number of methods have been developed to address this issue, and these methods could be roughly divided into three categories: the single-model methods, clustering-based methods and quasi single-model methods. In this study, we develop a single-model method MQAPRank based on the learning-to-rank algorithm firstly, and then implement a quasi single-model method Quasi-MQAPRank. The proposed methods are benchmarked on the 3DRobot and CASP11 dataset. The five-fold cross-validation on the 3DRobot dataset shows the proposed single model method outperforms other methods whose outputs are taken as features of the proposed method, and the quasi single-model method can further enhance the performance. On the CASP11 dataset, the proposed methods also perform well compared with other leading methods in corresponding categories. In particular, the Quasi-MQAPRank method achieves a considerable performance on the CASP11 Best150 dataset.
Collapse
Affiliation(s)
- Xiaoyang Jing
- School of Computer Science, Fudan University, Shanghai 200433, People’s Republic of China
| | - Kai Wang
- College of Animal Science and Technology, Jilin Agricultural University, Changchun 130118, People’s Republic of China
| | - Ruqian Lu
- School of Computer Science, Fudan University, Shanghai 200433, People’s Republic of China
| | - Qiwen Dong
- Institute for Data Science and Engineering, East China Normal University, Shanghai 200062, People’s Republic of China
| |
Collapse
|
226
|
Park M, Kim S, Fetterer RH, Dalloul RA. Functional characterization of the turkey macrophage migration inhibitory factor. DEVELOPMENTAL AND COMPARATIVE IMMUNOLOGY 2016; 61:198-207. [PMID: 27062968 DOI: 10.1016/j.dci.2016.04.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/13/2016] [Revised: 04/05/2016] [Accepted: 04/05/2016] [Indexed: 06/05/2023]
Abstract
Macrophage migration inhibitory factor (MIF) is a soluble protein that inhibits the random migration of macrophages and plays a pivotal immunoregulatory function in innate and adaptive immunity. The aim of this study was to clone the turkey MIF (TkMIF) gene, express the active protein, and characterize its basic function. The full-length TkMIF gene was amplified from total RNA extracted from turkey spleen, followed by cloning into a prokaryotic (pET11a) expression vector. Sequence analysis revealed that TkMIF consists of 115 amino acids with 12.5 kDa molecular weight. Multiple sequence alignment revealed 100%, 65%, 95% and 92% identity with chicken, duck, eagle and zebra finch MIFs, respectively. Recombinant TkMIF (rTkMIF) was expressed in Escherichia coli and purified through HPLC and endotoxin removal. SDS-PAGE analysis revealed an approximately 13.5 kDa of rTkMIF monomer containing T7 tag in soluble form. Western blot analysis showed that anti-chicken MIF (ChMIF) polyclonal antisera detected a monomer form of TkMIF at approximately 13.5 kDa size. Further functional analysis revealed that rTkMIF inhibits migration of both mononuclear cells and splenocytes in a dose-dependent manner, but was abolished by the addition of anti-ChMIF polyclonal antisera. qRT-PCR analysis revealed elevated transcripts of pro-inflammatory cytokines by rTkMIF in LPS-stimulated monocytes. rTkMIF also led to increased levels of IFN-γ and IL-17F transcripts in Con A-activated splenocytes, while IL-10 and IL-13 transcripts were decreased. Overall, the sequences of both the turkey and chicken MIF have high similarity and comparable biological functions with respect to migration inhibitory activities of macrophages and enhancement of pro-inflammatory cytokine expression, suggesting that turkey and chicken MIFs would be biologically cross-reactive.
Collapse
Affiliation(s)
- Myeongseon Park
- Avian Immunobiology Laboratory, Department of Animal and Poultry Sciences, Virginia Tech, Blacksburg, VA 24061, USA
| | - Sungwon Kim
- Avian Immunobiology Laboratory, Department of Animal and Poultry Sciences, Virginia Tech, Blacksburg, VA 24061, USA; The Roslin Institute and R(D)SVS, University of Edinburgh, Easter Bush, Midlothian EH25 9RG, UK
| | - Raymond H Fetterer
- Animal Parasitic Diseases Laboratory, Agricultural Research Service, USDA, Beltsville, MD 20705, USA
| | - Rami A Dalloul
- Avian Immunobiology Laboratory, Department of Animal and Poultry Sciences, Virginia Tech, Blacksburg, VA 24061, USA.
| |
Collapse
|
227
|
Rahman KS, Chowdhury EU, Sachse K, Kaltenboeck B. Inadequate Reference Datasets Biased toward Short Non-epitopes Confound B-cell Epitope Prediction. J Biol Chem 2016; 291:14585-99. [PMID: 27189949 PMCID: PMC4938180 DOI: 10.1074/jbc.m116.729020] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2016] [Revised: 05/03/2016] [Indexed: 11/06/2022] Open
Abstract
X-ray crystallography has shown that an antibody paratope typically binds 15-22 amino acids (aa) of an epitope, of which 2-5 randomly distributed amino acids contribute most of the binding energy. In contrast, researchers typically choose for B-cell epitope mapping short peptide antigens in antibody binding assays. Furthermore, short 6-11-aa epitopes, and in particular non-epitopes, are over-represented in published B-cell epitope datasets that are commonly used for development of B-cell epitope prediction approaches from protein antigen sequences. We hypothesized that such suboptimal length peptides result in weak antibody binding and cause false-negative results. We tested the influence of peptide antigen length on antibody binding by analyzing data on more than 900 peptides used for B-cell epitope mapping of immunodominant proteins of Chlamydia spp. We demonstrate that short 7-12-aa peptides of B-cell epitopes bind antibodies poorly; thus, epitope mapping with short peptide antigens falsely classifies many B-cell epitopes as non-epitopes. We also show in published datasets of confirmed epitopes and non-epitopes a direct correlation between length of peptide antigens and antibody binding. Elimination of short, ≤11-aa epitope/non-epitope sequences improved datasets for evaluation of in silico B-cell epitope prediction. Achieving up to 86% accuracy, protein disorder tendency is the best indicator of B-cell epitope regions for chlamydial and published datasets. For B-cell epitope prediction, the most effective approach is plotting disorder of protein sequences with the IUPred-L scale, followed by antibody reactivity testing of 16-30-aa peptides from peak regions. This strategy overcomes the well known inaccuracy of in silico B-cell epitope prediction from primary protein sequences.
Collapse
Affiliation(s)
- Kh Shamsur Rahman
- From the Department of Pathobiology, Auburn University, Auburn, Alabama 36849 and
| | | | - Konrad Sachse
- the Federal Institute for Animal Health, D-07743 Jena, Germany
| | - Bernhard Kaltenboeck
- From the Department of Pathobiology, Auburn University, Auburn, Alabama 36849 and
| |
Collapse
|
228
|
König E, Rainer J, Domingues FS. Computational assessment of feature combinations for pathogenic variant prediction. Mol Genet Genomic Med 2016; 4:431-46. [PMID: 27468419 PMCID: PMC4947862 DOI: 10.1002/mgg3.214] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2015] [Revised: 02/15/2016] [Accepted: 02/17/2016] [Indexed: 11/11/2022] Open
Abstract
BACKGROUND Although several methods have been proposed for predicting the effects of genetic variants and their role in disease, it is still a challenge to identify and prioritize pathogenic variants within sequencing studies. METHODS Here, we compare different variant and gene-specific features as well as existing methods and investigate their best combination to explore potential performance gains. RESULTS We found that combining the number of "biological process" Gene Ontology annotations of a gene with the methods PON-P2, and PROVEAN significantly improves prediction of pathogenic variants, outperforming all individual methods. A comprehensive analysis of the Gene Ontology feature suggests that it is not a variant-dependent annotation bias but reflects the multifunctional nature of disease genes. Furthermore, we identified a set of difficult variants where different prediction methods fail. CONCLUSION Existing pathogenicity prediction methods can be further improved.
Collapse
Affiliation(s)
- Eva König
- Center for BiomedicineEuropean Academy of Bozen/Bolzano (EURAC)Viale Druso 139100BolzanoItaly
- Affiliated Institute of the University of LübeckLübeckGermany
| | - Johannes Rainer
- Center for BiomedicineEuropean Academy of Bozen/Bolzano (EURAC)Viale Druso 139100BolzanoItaly
- Affiliated Institute of the University of LübeckLübeckGermany
| | - Francisco S. Domingues
- Center for BiomedicineEuropean Academy of Bozen/Bolzano (EURAC)Viale Druso 139100BolzanoItaly
- Affiliated Institute of the University of LübeckLübeckGermany
| |
Collapse
|
229
|
Bhattacharya D, Cao R, Cheng J. UniCon3D: de novo protein structure prediction using united-residue conformational search via stepwise, probabilistic sampling. Bioinformatics 2016; 32:2791-9. [PMID: 27259540 PMCID: PMC5018369 DOI: 10.1093/bioinformatics/btw316] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2016] [Accepted: 05/15/2016] [Indexed: 12/20/2022] Open
Abstract
MOTIVATION Recent experimental studies have suggested that proteins fold via stepwise assembly of structural units named 'foldons' through the process of sequential stabilization. Alongside, latest developments on computational side based on probabilistic modeling have shown promising direction to perform de novo protein conformational sampling from continuous space. However, existing computational approaches for de novo protein structure prediction often randomly sample protein conformational space as opposed to experimentally suggested stepwise sampling. RESULTS Here, we develop a novel generative, probabilistic model that simultaneously captures local structural preferences of backbone and side chain conformational space of polypeptide chains in a united-residue representation and performs experimentally motivated conditional conformational sampling via stepwise synthesis and assembly of foldon units that minimizes a composite physics and knowledge-based energy function for de novo protein structure prediction. The proposed method, UniCon3D, has been found to (i) sample lower energy conformations with higher accuracy than traditional random sampling in a small benchmark of 6 proteins; (ii) perform comparably with the top five automated methods on 30 difficult target domains from the 11th Critical Assessment of Protein Structure Prediction (CASP) experiment and on 15 difficult target domains from the 10th CASP experiment; and (iii) outperform two state-of-the-art approaches and a baseline counterpart of UniCon3D that performs traditional random sampling for protein modeling aided by predicted residue-residue contacts on 45 targets from the 10th edition of CASP. AVAILABILITY AND IMPLEMENTATION Source code, executable versions, manuals and example data of UniCon3D for Linux and OSX are freely available to non-commercial users at http://sysbio.rnet.missouri.edu/UniCon3D/ CONTACT: chengji@missouri.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Jianlin Cheng
- Department of Computer Science Informatics Institute, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
230
|
Galiez C, Magnan CN, Coste F, Baldi P. VIRALpro: a tool to identify viral capsid and tail sequences. Bioinformatics 2016; 32:1405-7. [PMID: 26733451 PMCID: PMC5860506 DOI: 10.1093/bioinformatics/btv727] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2015] [Revised: 11/24/2015] [Accepted: 12/07/2015] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION Not only sequence data continue to outpace annotation information, but also the problem is further exacerbated when organisms are underrepresented in the annotation databases. This is the case with non-human-pathogenic viruses which occur frequently in metagenomic projects. Thus, there is a need for tools capable of detecting and classifying viral sequences. RESULTS We describe VIRALpro a new effective tool for identifying capsid and tail protein sequences, which are the cornerstones toward viral sequence annotation and viral genome classification. AVAILABILITY AND IMPLEMENTATION The data, software and corresponding web server are available from http://scratch.proteomics.ics.uci.edu as part of the SCRATCH suite. CONTACT clovis.galiez@inria.fr or pfbaldi@uci.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Clovis Galiez
- INRIA, Campus De Beaulieu, Rennes Cedex, 35042, France
| | - Christophe N Magnan
- Department of Computer Science and Institute for Genomics and Bioinformatics, University of California, Irvine, Irvine, CA 92697, USA
| | | | - Pierre Baldi
- Department of Computer Science and Institute for Genomics and Bioinformatics, University of California, Irvine, Irvine, CA 92697, USA
| |
Collapse
|
231
|
Wang S, Li W, Liu S, Xu J. RaptorX-Property: a web server for protein structure property prediction. Nucleic Acids Res 2016; 44:W430-5. [PMID: 27112573 PMCID: PMC4987890 DOI: 10.1093/nar/gkw306] [Citation(s) in RCA: 367] [Impact Index Per Article: 40.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2016] [Accepted: 04/12/2016] [Indexed: 11/14/2022] Open
Abstract
RaptorX Property (http://raptorx2.uchicago.edu/StructurePropertyPred/predict/) is a web server predicting structure property of a protein sequence without using any templates. It outperforms other servers, especially for proteins without close homologs in PDB or with very sparse sequence profile (i.e. carries little evolutionary information). This server employs a powerful in-house deep learning model DeepCNF (Deep Convolutional Neural Fields) to predict secondary structure (SS), solvent accessibility (ACC) and disorder regions (DISO). DeepCNF not only models complex sequence–structure relationship by a deep hierarchical architecture, but also interdependency between adjacent property labels. Our experimental results show that, tested on CASP10, CASP11 and the other benchmarks, this server can obtain ∼84% Q3 accuracy for 3-state SS, ∼72% Q8 accuracy for 8-state SS, ∼66% Q3 accuracy for 3-state solvent accessibility, and ∼0.89 area under the ROC curve (AUC) for disorder prediction.
Collapse
Affiliation(s)
- Sheng Wang
- Toyota Technological Institute at Chicago, Chicago, IL, USA Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Wei Li
- School of Biological and Chemical Engineering, Zhejiang University of Science and Technology, Zhejiang, China
| | - Shiwang Liu
- School of Biological and Chemical Engineering, Zhejiang University of Science and Technology, Zhejiang, China
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, IL, USA
| |
Collapse
|
232
|
Biological effect of LOXL1 coding variants associated with pseudoexfoliation syndrome. Exp Eye Res 2016; 146:212-223. [PMID: 26997634 DOI: 10.1016/j.exer.2016.03.013] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2015] [Revised: 03/01/2016] [Accepted: 03/13/2016] [Indexed: 01/08/2023]
Abstract
Pseudoexfoliation (PEX) syndrome is a systemic disease involving the extracellular matrix. It increases the risk of glaucoma, an irreversible cause of blindness, and susceptibility to heart disease, stroke and hearing loss. Single nucleotide polymorphisms (SNPs) in the LOXL1 (Lysyl oxidase-like 1) gene are the major known genetic risk factor for PEX syndrome. Two coding SNPs, rs1048861 (G > T; Arg141Leu) and rs3825942 (G > A; Gly153Asp), in the LOXL1 gene are strongly associated with the disease risk in multiple populations worldwide. In the present study, we investigated functional effects of these SNPs on the LOXL1 protein. We show through molecular modelling that positions 141 and 153 are likely surface residues and hence possible recognition sites for protein-protein interactions; the Arg141Leu and Gly153Asp substitutions cause charge changes that would lead to local differences in protein electrostatic potential and in turn the potential to modify protein-protein interactions. In RFL-6 rat fetal lung fibroblast cells ectopically expressing the LOXL1 protein variants related to PEX (Arg141_Gly153, Arg141_Asp153 or Leu141_Gly153), immunoprecipitation of the secreted variants showed differences in their processing by endogenous proteins, possibly Bone morphogenetic protein-1 (BMP-1) that cleaves and leads to enzymatic activation of LOXL1. Immunofluorescence labelling of the ectopically expressed protein variants in RFL-6 cells showed no significant difference in their extracellular accumulation tendency. In conclusion, this is the first report of a biological effect of the coding SNPs in the LOXL1 gene associated with PEX syndrome, on the LOXL1 protein. The findings indicate that the disease associated coding variants themselves may be involved in the manifestation of PEX syndrome.
Collapse
|
233
|
Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields. Sci Rep 2016; 6:18962. [PMID: 26752681 PMCID: PMC4707437 DOI: 10.1038/srep18962] [Citation(s) in RCA: 273] [Impact Index Per Article: 30.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2015] [Accepted: 11/26/2015] [Indexed: 12/29/2022] Open
Abstract
Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility.
Collapse
|
234
|
Structure Analysis Uncovers a Highly Diverse but Structurally Conserved Effector Family in Phytopathogenic Fungi. PLoS Pathog 2015; 11:e1005228. [PMID: 26506000 PMCID: PMC4624222 DOI: 10.1371/journal.ppat.1005228] [Citation(s) in RCA: 155] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2015] [Accepted: 09/24/2015] [Indexed: 01/13/2023] Open
Abstract
Phytopathogenic ascomycete fungi possess huge effector repertoires that are dominated by hundreds of sequence-unrelated small secreted proteins. The molecular function of these effectors and the evolutionary mechanisms that generate this tremendous number of singleton genes are largely unknown. To get a deeper understanding of fungal effectors, we determined by NMR spectroscopy the 3-dimensional structures of the Magnaporthe oryzae effectors AVR1-CO39 and AVR-Pia. Despite a lack of sequence similarity, both proteins have very similar 6 β-sandwich structures that are stabilized in both cases by a disulfide bridge between 2 conserved cysteins located in similar positions of the proteins. Structural similarity searches revealed that AvrPiz-t, another effector from M. oryzae, and ToxB, an effector of the wheat tan spot pathogen Pyrenophora tritici-repentis have the same structures suggesting the existence of a family of sequence-unrelated but structurally conserved fungal effectors that we named MAX-effectors (MagnaportheAvrs and ToxB like). Structure-informed pattern searches strengthened this hypothesis by identifying MAX-effector candidates in a broad range of ascomycete phytopathogens. Strong expansion of the MAX-effector family was detected in M. oryzae and M. grisea where they seem to be particularly important since they account for 5–10% of the effector repertoire and 50% of the cloned avirulence effectors. Expression analysis indicated that the majority of M. oryzae MAX-effectors are expressed specifically during early infection suggesting important functions during biotrophic host colonization. We hypothesize that the scenario observed for MAX-effectors can serve as a paradigm for ascomycete effector diversity and that the enormous number of sequence-unrelated ascomycete effectors may in fact belong to a restricted set of structurally conserved effector families. Fungal plant pathogens are of outstanding economic and ecological importance and cause destructive diseases on many cultivated and wild plants. Effector proteins that are secreted during infection to manipulate the host and to promote disease are a key element in fungal virulence. Phytopathogenic fungi possess huge effector repertoires that are dominated by hundreds of sequence-unrelated small secreted proteins. The molecular functions of this most important class of fungal effectors and the evolutionary mechanisms that generate this tremendous numbers of apparently unrelated proteins are largely unknown. By investigating the 3-dimensional structures of effectors from the rice blast fungus M. oryzae, we discovered an effector family comprising structurally conserved but sequence-unrelated effectors from M. oryzae and the phylogenetically distant wheat pathogen Pyrenophora tritici-repentis that we named MAX-effectors (M. oryzaeAvrs and ToxB). Structure-informed searches of whole genome sequence databases suggest that MAX-effectors are present at low frequencies and with a patchy phylogenetic distribution in many ascomycete phytopathogens. They underwent strong lineage-specific expansion in fungi of the Pyriculariae family that contains M. oryzae where they seem particularly important during biotrophic plant colonization and account for 50% of the cloned Avr effectors and 5–10% of the effector repertoire. Based on our results on the MAX-effectors and the widely accepted concept that fungal effectors evolve according to a birth-and-death model we propose the hypothesis that the majority of the immense numbers of different ascomycete effectors could in fact belong to a limited set of structurally defined families whose members are phylogenetically related.
Collapse
|
235
|
Xiao F, Shen HB. Prediction Enhancement of Residue Real-Value Relative Accessible Surface Area in Transmembrane Helical Proteins by Solving the Output Preference Problem of Machine Learning-Based Predictors. J Chem Inf Model 2015; 55:2464-74. [DOI: 10.1021/acs.jcim.5b00246] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Affiliation(s)
- Feng Xiao
- Institute
of Image Processing
and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory
of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Hong-Bin Shen
- Institute
of Image Processing
and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory
of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| |
Collapse
|
236
|
Pieszko M, Weir W, Goodhead I, Kinnaird J, Shiels B. ApiAP2 Factors as Candidate Regulators of Stochastic Commitment to Merozoite Production in Theileria annulata. PLoS Negl Trop Dis 2015; 9:e0003933. [PMID: 26273826 PMCID: PMC4537280 DOI: 10.1371/journal.pntd.0003933] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2015] [Accepted: 06/25/2015] [Indexed: 02/05/2023] Open
Abstract
Background Differentiation of one life-cycle stage to the next is critical for survival and transmission of apicomplexan parasites. A number of studies have shown that stage differentiation is a stochastic process and is associated with a point that commits the cell to a change over in the pattern of gene expression. Studies on differentiation to merozoite production (merogony) in T. annulata postulated that commitment involves a concentration threshold of DNA binding proteins and an auto-regulatory loop. Principal Findings In this study ApiAP2 DNA binding proteins that show changes in expression level during merogony of T. annulata have been identified. DNA motifs bound by orthologous domains in Plasmodium were found to be enriched in upstream regions of stage-regulated T. annulata genes and validated as targets for the T. annulata AP2 domains by electrophoretic mobility shift assay (EMSA). Two findings were of particular note: the gene in T. annulata encoding the orthologue of the ApiAP2 domain in the AP2-G factor that commits Plasmodium to gametocyte production, has an expression profile indicating involvement in transmission of T. annulata to the tick vector; genes encoding related domains that bind, or are predicted to bind, sequence motifs of the type 5'-(A)CACAC(A) are implicated in differential regulation of gene expression, with one gene (TA11145) likely to be preferentially up-regulated via auto-regulation as the cell progresses to merogony. Conclusions We postulate that the Theileria factor possessing the AP2 domain orthologous to that of Plasmodium AP2-G may regulate gametocytogenesis in a similar manner to AP2-G. In addition, paralogous ApiAP2 factors that recognise 5'-(A)CACAC(A) type motifs could operate in a competitive manner to promote reversible progression towards the point that commits the cell to undergo merogony. Factors possessing AP2 domains that bind (or are predicted to bind) this motif are present in the vector-borne genera Theileria, Babesia and Plasmodium, and other Apicomplexa; leading to the proposal that the mechanisms that control stage differentiation will show a degree of conservation. The ability of vector-borne Apicomplexan parasites (Babesia, Plasmodium and Theileria) to change from one life-cycle stage to the next is critical for establishment of infection and transmission to new hosts. Stage differentiation steps of both Plasmodium and Theileria are known to involve stochastic transition through an intermediate form to a point that commits the cell to generate the next stage in the life-cycle. In this study we have identified genes encoding ApiAP2 DNA binding proteins in Theileria annulata that are differentially expressed during differentiation from the macroschizont stage, through merozoite production (merogony) to the piroplasm stage. The results provide evidence that the ApiAp2 factor in Theileria that possesses the orthologue of the Plasmodium AP2-G domain may also operate to regulate gametocytogenesis, and that progression to merogony is promoted by the ability of a merozoite DNA binding protein to preferentially up-regulate its own production. In addition, identification of multiple ApiAP2 DNA binding domains that bind related motifs within and across vector-borne Apicomplexan genera lead to the proposal that the mechanisms that promote the transition from asexual to sexual replication will show a degree of conservation.
Collapse
Affiliation(s)
- Marta Pieszko
- Institute of Biodiversity Animal Health and Comparative Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, Bearsden Road, Glasgow, United Kingdom
| | - William Weir
- Institute of Biodiversity Animal Health and Comparative Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, Bearsden Road, Glasgow, United Kingdom
| | - Ian Goodhead
- Institute of Integrative Biology, University of Liverpool, Crown Street, Liverpool, United Kingdom
| | - Jane Kinnaird
- Institute of Biodiversity Animal Health and Comparative Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, Bearsden Road, Glasgow, United Kingdom
| | - Brian Shiels
- Institute of Biodiversity Animal Health and Comparative Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, Bearsden Road, Glasgow, United Kingdom
- * E-mail:
| |
Collapse
|
237
|
AcconPred: Predicting Solvent Accessibility and Contact Number Simultaneously by a Multitask Learning Framework under the Conditional Neural Fields Model. BIOMED RESEARCH INTERNATIONAL 2015; 2015:678764. [PMID: 26339631 PMCID: PMC4538422 DOI: 10.1155/2015/678764] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/27/2014] [Accepted: 03/11/2015] [Indexed: 12/14/2022]
Abstract
Motivation. The solvent accessibility of protein residues is one of the driving forces of protein folding, while the contact number of protein residues limits the possibilities of protein conformations. The de novo prediction of these properties from protein sequence is important for the study of protein structure and function. Although these two properties are certainly related with each other, it is challenging to exploit this dependency for the prediction. Method. We present a method AcconPred for predicting solvent accessibility and contact number simultaneously, which is based on a shared weight multitask learning framework under the CNF (conditional neural fields) model. The multitask learning framework on a collection of related tasks provides more accurate prediction than the framework trained only on a single task. The CNF method not only models the complex relationship between the input features and the predicted labels, but also exploits the interdependency among adjacent labels. Results. Trained on 5729 monomeric soluble globular protein datasets, AcconPred could reach 0.68 three-state accuracy for solvent accessibility and 0.75 correlation for contact number. Tested on the 105 CASP11 domain datasets for solvent accessibility, AcconPred could reach 0.64 accuracy, which outperforms existing methods.
Collapse
|
238
|
Order and disorder in intermediate filament proteins. FEBS Lett 2015; 589:2464-76. [PMID: 26231765 DOI: 10.1016/j.febslet.2015.07.024] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2015] [Revised: 07/21/2015] [Accepted: 07/22/2015] [Indexed: 11/20/2022]
Abstract
Intermediate filaments (IFs), important components of the cytoskeleton, provide a versatile, tunable network of self-assembled proteins. IF proteins contain three distinct domains: an α-helical structured rod domain, flanked by intrinsically disordered head and tail domains. Recent studies demonstrated the functional importance of the disordered domains, which differ in length and amino-acid sequence among the 70 different human IF genes. Here, we investigate the biophysical properties of the disordered domains, and review recent findings on the interactions between them. Our analysis highlights key components governing IF functional roles in the cytoskeleton, where the intrinsically disordered domains dictate protein-protein interactions, supramolecular assembly, and macro-scale order.
Collapse
|
239
|
Jing R, Sun J, Wang Y, Li M. Domain position prediction based on sequence information by using fuzzy mean operator. Proteins 2015; 83:1462-9. [PMID: 26009844 DOI: 10.1002/prot.24833] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2015] [Revised: 04/23/2015] [Accepted: 05/17/2015] [Indexed: 11/09/2022]
Abstract
The prediction of protein domain region is an advantageous process on the study of protein structure and function. In this study, we proposed a new method, which is composed of fuzzy mean operator and region division, to predict the particular positions of domains in a target protein based on its sequence. The whole sequence is aligned and scored by using fuzzy mean operator, and the final determination of domain region position is realized by region division. A published benchmark is used for the comparison with previous researches. In addition, we generate two extra datasets to examine the stability of this method. Finally, the prediction accuracy of independent test dataset achieved by our method was up to 84.13%. We wish that this method could be useful for related researches.
Collapse
Affiliation(s)
- Runyu Jing
- Chemical Information Center (CIC), College of Chemistry, Sichuan University, Chengdu, 610064, China
| | - Jing Sun
- Chemical Information Center (CIC), College of Chemistry, Sichuan University, Chengdu, 610064, China
| | - Yuelong Wang
- Chemical Information Center (CIC), College of Chemistry, Sichuan University, Chengdu, 610064, China
| | - Menglong Li
- Chemical Information Center (CIC), College of Chemistry, Sichuan University, Chengdu, 610064, China
| |
Collapse
|
240
|
Sønderby SK, Sønderby CK, Nielsen H, Winther O. Convolutional LSTM Networks for Subcellular Localization of Proteins. ALGORITHMS FOR COMPUTATIONAL BIOLOGY 2015. [DOI: 10.1007/978-3-319-21233-3_6] [Citation(s) in RCA: 60] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
241
|
Hayes M, Rougé P, Barre A, Herouet-Guicheney C, Roggen EL. In silico tools for exploring potential human allergy to proteins. ACTA ACUST UNITED AC 2015. [DOI: 10.1016/j.ddmod.2016.06.001] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|