Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Shamim MTA, Anwaruddin M, Nagarajaram HA. Support Vector Machine-based classification of protein folds using the structural properties of amino acid residues and amino acid residue pairs. ACTA ACUST UNITED AC 2007;23:3320-7. [PMID: 17989092 DOI: 10.1093/bioinformatics/btm527] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

For:	Shamim MTA, Anwaruddin M, Nagarajaram HA. Support Vector Machine-based classification of protein folds using the structural properties of amino acid residues and amino acid residue pairs. ACTA ACUST UNITED AC 2007;23:3320-7. [PMID: 17989092 DOI: 10.1093/bioinformatics/btm527] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Number

Cited by Other Article(s)

Ali S, Chourasia P, Patterson M. From PDB files to protein features: a comparative analysis of PDB bind and STCRDAB datasets. Med Biol Eng Comput 2024:10.1007/s11517-024-03074-3. [PMID: 38622438 DOI: 10.1007/s11517-024-03074-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Accepted: 03/13/2024] [Indexed: 04/17/2024]

Ali S, Chourasia P, Patterson M. When Protein Structure Embedding Meets Large Language Models. Genes (Basel) 2023;15:25. [PMID: 38254915 PMCID: PMC10815811 DOI: 10.3390/genes15010025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 12/16/2023] [Accepted: 12/21/2023] [Indexed: 01/24/2024] Open

Selvaraj MK, Thakur A, Kumar M, Pinnaka AK, Suri CR, Siddhardha B, Elumalai SP. Ion-pumping microbial rhodopsin protein classification by machine learning approach. BMC Bioinformatics 2023;24:29. [PMID: 36707759 PMCID: PMC9881276 DOI: 10.1186/s12859-023-05138-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Accepted: 01/04/2023] [Indexed: 01/28/2023] Open

Abstract

BACKGROUND

Rhodopsin is a seven-transmembrane protein covalently linked with retinal chromophore that absorbs photons for energy conversion and intracellular signaling in eukaryotes, bacteria, and archaea. Haloarchaeal rhodopsins are Type-I microbial rhodopsin that elicits various light-driven functions like proton pumping, chloride pumping and Phototaxis behaviour. The industrial application of Ion-pumping Haloarchaeal rhodopsins is limited by the lack of full-length rhodopsin sequence-based classifications, which play an important role in Ion-pumping activity. The well-studied Haloarchaeal rhodopsin is a proton-pumping bacteriorhodopsin that shows promising applications in optogenetics, biosensitized solar cells, security ink, data storage, artificial retinal implant and biohydrogen generation. As a result, a low-cost computational approach is required to identify Ion-pumping Haloarchaeal rhodopsin sequences and its subtype.

RESULTS

This study uses a support vector machine (SVM) technique to identify these ion-pumping Haloarchaeal rhodopsin proteins. The haloarchaeal ion pumping rhodopsins viz., bacteriorhodopsin, halorhodopsin, xanthorhodopsin, sensoryrhodopsin and marine prokaryotic Ion-pumping rhodopsins like actinorhodopsin, proteorhodopsin have been utilized to develop the methods that accurately identified the ion pumping haloarchaeal and other type I microbial rhodopsins. We achieved overall maximum accuracy of 97.78%, 97.84% and 97.60%, respectively, for amino acid composition, dipeptide composition and hybrid approach on tenfold cross validation using SVM. Predictive models for each class of rhodopsin performed equally well on an independent data set. In addition to this, similar results were achieved using another machine learning technique namely random forest. Simultaneously predictive models performed equally well during five-fold cross validation. Apart from this study, we also tested the own, blank, BLAST dataset and annotated whole-genome rhodopsin sequences of PWS haloarchaeal isolates in the developed methods. The developed web server ( https://bioinfo.imtech.res.in/servers/rhodopred ) can identify the Ion Pumping Haloarchaeal rhodopsin proteins and their subtypes. We expect this web tool would be useful for rhodopsin researchers.

CONCLUSION

The overall performance of the developed method results show that it accurately identifies the Ionpumping Haloarchaeal rhodopsin and their subtypes using known and unknown microbial rhodopsin sequences. We expect that this study would be useful for optogenetics, molecular biologists and rhodopsin researchers.

Collapse

Lahorkar A, Bhosale H, Sane A, Ramakrishnan V, Jayaraman VK. Identification of Phase Separating Proteins With Distributed Reduced Alphabet Representations of Sequences. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023;20:410-420. [PMID: 35139023 DOI: 10.1109/tcbb.2022.3149310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]

Villalobos-Alva J, Ochoa-Toledo L, Villalobos-Alva MJ, Aliseda A, Pérez-Escamirosa F, Altamirano-Bustamante NF, Ochoa-Fernández F, Zamora-Solís R, Villalobos-Alva S, Revilla-Monsalve C, Kemper-Valverde N, Altamirano-Bustamante MM. Protein Science Meets Artificial Intelligence: A Systematic Review and a Biochemical Meta-Analysis of an Inter-Field. Front Bioeng Biotechnol 2022;10:788300. [PMID: 35875501 PMCID: PMC9301016 DOI: 10.3389/fbioe.2022.788300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2021] [Accepted: 05/25/2022] [Indexed: 11/23/2022] Open

Abstract

Proteins are some of the most fascinating and challenging molecules in the universe, and they pose a big challenge for artificial intelligence. The implementation of machine learning/AI in protein science gives rise to a world of knowledge adventures in the workhorse of the cell and proteome homeostasis, which are essential for making life possible. This opens up epistemic horizons thanks to a coupling of human tacit–explicit knowledge with machine learning power, the benefits of which are already tangible, such as important advances in protein structure prediction. Moreover, the driving force behind the protein processes of self-organization, adjustment, and fitness requires a space corresponding to gigabytes of life data in its order of magnitude. There are many tasks such as novel protein design, protein folding pathways, and synthetic metabolic routes, as well as protein-aggregation mechanisms, pathogenesis of protein misfolding and disease, and proteostasis networks that are currently unexplored or unrevealed. In this systematic review and biochemical meta-analysis, we aim to contribute to bridging the gap between what we call binomial artificial intelligence (AI) and protein science (PS), a growing research enterprise with exciting and promising biotechnological and biomedical applications. We undertake our task by exploring “the state of the art” in AI and machine learning (ML) applications to protein science in the scientific literature to address some critical research questions in this domain, including What kind of tasks are already explored by ML approaches to protein sciences? What are the most common ML algorithms and databases used? What is the situational diagnostic of the AI–PS inter-field? What do ML processing steps have in common? We also formulate novel questions such as Is it possible to discover what the rules of protein evolution are with the binomial AI–PS? How do protein folding pathways evolve? What are the rules that dictate the folds? What are the minimal nuclear protein structures? How do protein aggregates form and why do they exhibit different toxicities? What are the structural properties of amyloid proteins? How can we design an effective proteostasis network to deal with misfolded proteins? We are a cross-functional group of scientists from several academic disciplines, and we have conducted the systematic review using a variant of the PICO and PRISMA approaches. The search was carried out in four databases (PubMed, Bireme, OVID, and EBSCO Web of Science), resulting in 144 research articles. After three rounds of quality screening, 93 articles were finally selected for further analysis. A summary of our findings is as follows: regarding AI applications, there are mainly four types: 1) genomics, 2) protein structure and function, 3) protein design and evolution, and 4) drug design. In terms of the ML algorithms and databases used, supervised learning was the most common approach (85%). As for the databases used for the ML models, PDB and UniprotKB/Swissprot were the most common ones (21 and 8%, respectively). Moreover, we identified that approximately 63% of the articles organized their results into three steps, which we labeled pre-process, process, and post-process. A few studies combined data from several databases or created their own databases after the pre-process. Our main finding is that, as of today, there are no research road maps serving as guides to address gaps in our knowledge of the AI–PS binomial. All research efforts to collect, integrate multidimensional data features, and then analyze and validate them are, so far, uncoordinated and scattered throughout the scientific literature without a clear epistemic goal or connection between the studies. Therefore, our main contribution to the scientific literature is to offer a road map to help solve problems in drug design, protein structures, design, and function prediction while also presenting the “state of the art” on research in the AI–PS binomial until February 2021. Thus, we pave the way toward future advances in the synthetic redesign of novel proteins and protein networks and artificial metabolic pathways, learning lessons from nature for the welfare of humankind. Many of the novel proteins and metabolic pathways are currently non-existent in nature, nor are they used in the chemical industry or biomedical field.

Collapse

Affiliation(s)

Jalil Villalobos-Alva Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
Luis Ochoa-Toledo Instituto de Ciencias Aplicadas y Tecnología (ICAT), Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
Mario Javier Villalobos-Alva Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
Atocha Aliseda Instituto de Investigaciones Filosóficas, Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
Fernando Pérez-Escamirosa Instituto de Ciencias Aplicadas y Tecnología (ICAT), Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
Nelly F. Altamirano-Bustamante Instituto Nacional de Pediatría, Mexico City, Mexico
Francine Ochoa-Fernández Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
Ricardo Zamora-Solís Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
Sebastián Villalobos-Alva Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
Cristina Revilla-Monsalve Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
Nicolás Kemper-Valverde Instituto de Ciencias Aplicadas y Tecnología (ICAT), Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
Myriam M. Altamirano-Bustamante Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico *Correspondence: Myriam M. Altamirano-Bustamante,

Collapse

PupStruct: Prediction of Pupylated Lysine Residues Using Structural Properties of Amino Acids. Genes (Basel) 2020;11:genes11121431. [PMID: 33260770 PMCID: PMC7761138 DOI: 10.3390/genes11121431] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Revised: 11/23/2020] [Accepted: 11/23/2020] [Indexed: 12/23/2022] Open

Zhang L, Kong L. A Novel Amino Acid Properties Selection Method for Protein Fold Classification. Protein Pept Lett 2020;27:287-294. [PMID: 32207399 DOI: 10.2174/0929866526666190718151753] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2019] [Revised: 04/17/2019] [Accepted: 06/10/2019] [Indexed: 12/21/2022]

Patil K, Chouhan U. Relevance of Machine Learning Techniques and Various Protein Features in Protein Fold Classification: A Review. Curr Bioinform 2019. [DOI: 10.2174/1574893614666190204154038] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]

Meng C, Jin S, Wang L, Guo F, Zou Q. AOPs-SVM: A Sequence-Based Classifier of Antioxidant Proteins Using a Support Vector Machine. Front Bioeng Biotechnol 2019;7:224. [PMID: 31620433 PMCID: PMC6759716 DOI: 10.3389/fbioe.2019.00224] [Citation(s) in RCA: 49] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2019] [Accepted: 09/03/2019] [Indexed: 01/03/2023] Open

Identifying anticancer peptides by using a generalized chaos game representation. J Math Biol 2018;78:441-463. [PMID: 30291366 DOI: 10.1007/s00285-018-1279-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2017] [Revised: 08/01/2018] [Indexed: 10/28/2022]

Rajendran S, Jothi A. Sequentially distant but structurally similar proteins exhibit fold specific patterns based on their biophysical properties. Comput Biol Chem 2018;75:143-153. [PMID: 29783123 DOI: 10.1016/j.compbiolchem.2018.05.009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2017] [Revised: 05/06/2018] [Accepted: 05/07/2018] [Indexed: 11/25/2022]

Kumar R, Kumari B, Kumar M. Prediction of endoplasmic reticulum resident proteins using fragmented amino acid composition and support vector machine. PeerJ 2017;5:e3561. [PMID: 28890846 PMCID: PMC5588793 DOI: 10.7717/peerj.3561] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2017] [Accepted: 06/20/2017] [Indexed: 12/15/2022] Open

Abstract

Background

The endoplasmic reticulum plays an important role in many cellular processes, which includes protein synthesis, folding and post-translational processing of newly synthesized proteins. It is also the site for quality control of misfolded proteins and entry point of extracellular proteins to the secretory pathway. Hence at any given point of time, endoplasmic reticulum contains two different cohorts of proteins, (i) proteins involved in endoplasmic reticulum-specific function, which reside in the lumen of the endoplasmic reticulum, called as endoplasmic reticulum resident proteins and (ii) proteins which are in process of moving to the extracellular space. Thus, endoplasmic reticulum resident proteins must somehow be distinguished from newly synthesized secretory proteins, which pass through the endoplasmic reticulum on their way out of the cell. Approximately only 50% of the proteins used in this study as training data had endoplasmic reticulum retention signal, which shows that these signals are not essentially present in all endoplasmic reticulum resident proteins. This also strongly indicates the role of additional factors in retention of endoplasmic reticulum-specific proteins inside the endoplasmic reticulum.

Methods

This is a support vector machine based method, where we had used different forms of protein features as inputs for support vector machine to develop the prediction models. During training leave-one-out approach of cross-validation was used. Maximum performance was obtained with a combination of amino acid compositions of different part of proteins.

Results

In this study, we have reported a novel support vector machine based method for predicting endoplasmic reticulum resident proteins, named as ERPred. During training we achieved a maximum accuracy of 81.42% with leave-one-out approach of cross-validation. When evaluated on independent dataset, ERPred did prediction with sensitivity of 72.31% and specificity of 83.69%. We have also annotated six different proteomes to predict the candidate endoplasmic reticulum resident proteins in them. A webserver, ERPred, was developed to make the method available to the scientific community, which can be accessed at http://proteininformatics.org/mkumar/erpred/index.html.

Discussion

We found that out of 124 proteins of the training dataset, only 66 proteins had endoplasmic reticulum retention signals, which shows that these signals are not an absolute necessity for endoplasmic reticulum resident proteins to remain inside the endoplasmic reticulum. This observation also strongly indicates the role of additional factors in retention of proteins inside the endoplasmic reticulum. Our proposed predictor, ERPred, is a signal independent tool. It is tuned for the prediction of endoplasmic reticulum resident proteins, even if the query protein does not contain specific ER-retention signal.

Collapse

Tripathi P, Pandey PN. A novel alignment-free method to classify protein folding types by combining spectral graph clustering with Chou's pseudo amino acid composition. J Theor Biol 2017;424:49-54. [DOI: 10.1016/j.jtbi.2017.04.027] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2015] [Revised: 04/24/2017] [Accepted: 04/27/2017] [Indexed: 10/19/2022]

Yan K, Xu Y, Fang X, Zheng C, Liu B. Protein fold recognition based on sparse representation based classification. Artif Intell Med 2017;79:1-8. [DOI: 10.1016/j.artmed.2017.03.006] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2016] [Revised: 03/06/2017] [Accepted: 03/07/2017] [Indexed: 12/13/2022]

Xia J, Peng Z, Qi D, Mu H, Yang J. An ensemble approach to protein fold classification by integration of template-based assignment and support vector machine classifier. Bioinformatics 2016;33:863-870. [DOI: 10.1093/bioinformatics/btw768] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2016] [Accepted: 12/02/2016] [Indexed: 11/14/2022] Open

Arana-Daniel N, Gallegos AA, López-Franco C, Alanís AY, Morales J, López-Franco A. Support Vector Machines Trained with Evolutionary Algorithms Employing Kernel Adatron for Large Scale Classification of Protein Structures. Evol Bioinform Online 2016;12:285-302. [PMID: 27980384 PMCID: PMC5140013 DOI: 10.4137/ebo.s40912] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2016] [Revised: 10/19/2016] [Accepted: 10/20/2016] [Indexed: 11/05/2022] Open

Wei L, Zou Q. Recent Progress in Machine Learning-Based Methods for Protein Fold Recognition. Int J Mol Sci 2016;17:ijms17122118. [PMID: 27999256 PMCID: PMC5187918 DOI: 10.3390/ijms17122118] [Citation(s) in RCA: 67] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2016] [Revised: 12/03/2016] [Accepted: 12/11/2016] [Indexed: 01/22/2023] Open

ProFold: Protein Fold Classification with Additional Structural Features and a Novel Ensemble Classifier. BIOMED RESEARCH INTERNATIONAL 2016;2016:6802832. [PMID: 27660761 PMCID: PMC5021882 DOI: 10.1155/2016/6802832] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/01/2016] [Revised: 07/15/2016] [Accepted: 08/07/2016] [Indexed: 11/17/2022]

Improving protein fold recognition and structural class prediction accuracies using physicochemical properties of amino acids. J Theor Biol 2016;402:117-28. [PMID: 27164998 DOI: 10.1016/j.jtbi.2016.05.002] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2016] [Revised: 04/20/2016] [Accepted: 05/02/2016] [Indexed: 11/24/2022]

Lyons J, Paliwal KK, Dehzangi A, Heffernan R, Tsunoda T, Sharma A. Protein fold recognition using HMM–HMM alignment and dynamic programming. J Theor Biol 2016;393:67-74. [DOI: 10.1016/j.jtbi.2015.12.018] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2015] [Revised: 12/17/2015] [Accepted: 12/18/2015] [Indexed: 10/22/2022]

Feng Z, Hu X, Jiang Z, Song H, Ashraf MA. The recognition of multi-class protein folds by adding average chemical shifts of secondary structure elements. Saudi J Biol Sci 2016;23:189-97. [PMID: 26980999 PMCID: PMC4778582 DOI: 10.1016/j.sjbs.2015.10.008] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2015] [Revised: 10/08/2015] [Accepted: 10/12/2015] [Indexed: 11/28/2022] Open

Sharma R, Dehzangi A, Lyons J, Paliwal K, Tsunoda T, Sharma A. Predict Gram-Positive and Gram-Negative Subcellular Localization via Incorporating Evolutionary Information and Physicochemical Features Into Chou's General PseAAC. IEEE Trans Nanobioscience 2015;14:915-26. [DOI: 10.1109/tnb.2015.2500186] [Citation(s) in RCA: 71] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Lyons J, Dehzangi A, Heffernan R, Yang Y, Zhou Y, Sharma A, Paliwal K. Advancing the Accuracy of Protein Fold Recognition by Utilizing Profiles From Hidden Markov Models. IEEE Trans Nanobioscience 2015. [DOI: 10.1109/tnb.2015.2457906] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Wei L, Liao M, Gao X, Zou Q. Enhanced Protein Fold Prediction Method Through a Novel Feature Extraction Technique. IEEE Trans Nanobioscience 2015;14:649-59. [DOI: 10.1109/tnb.2015.2450233] [Citation(s) in RCA: 81] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

Confirming robustness of fuzzy support vector machine via ξ–α bound. Neurocomputing 2015. [DOI: 10.1016/j.neucom.2015.03.046] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

Paliwal KK, Sharma A, Lyons J, Dehzangi A. Improving protein fold recognition using the amalgamation of evolutionary-based and structural based information. BMC Bioinformatics 2014;15 Suppl 16:S12. [PMID: 25521502 PMCID: PMC4290640 DOI: 10.1186/1471-2105-15-s16-s12] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open

Paliwal KK, Sharma A, Lyons J, Dehzangi A. A tri-gram based feature extraction technique using linear probabilities of position specific scoring matrix for protein fold recognition. IEEE Trans Nanobioscience 2014;13:44-50. [PMID: 24594513 DOI: 10.1109/tnb.2013.2296050] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Lyons J, Biswas N, Sharma A, Dehzangi A, Paliwal KK. Protein fold recognition by alignment of amino acid residues using kernelized dynamic time warping. J Theor Biol 2014;354:137-45. [DOI: 10.1016/j.jtbi.2014.03.033] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2013] [Revised: 03/05/2014] [Accepted: 03/21/2014] [Indexed: 01/21/2023]

Feng Z, Hu X. Recognition of 27-class protein folds by adding the interaction of segments and motif information. BIOMED RESEARCH INTERNATIONAL 2014;2014:262850. [PMID: 25136571 PMCID: PMC4127253 DOI: 10.1155/2014/262850] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/27/2013] [Accepted: 06/28/2014] [Indexed: 01/31/2023]

Das Roy R, Bhardwaj M, Bhatnagar V, Chakraborty K, Dash D. How do eubacterial organisms manage aggregation-prone proteome? F1000Res 2014;3:137. [PMID: 25339987 PMCID: PMC4193397 DOI: 10.12688/f1000research.4307.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/24/2014] [Indexed: 11/20/2022] Open

Dehzangi A, Paliwal K, Lyons J, Sharma A, Sattar A. A Segmentation-Based Method to Extract Structural and Evolutionary Features for Protein Fold Recognition. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014;11:510-519. [PMID: 26356019 DOI: 10.1109/tcbb.2013.2296317] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]

Das Roy R, Dash D. Selection of relevant features from amino acids enables development of robust classifiers. Amino Acids 2014;46:1343-51. [PMID: 24604165 DOI: 10.1007/s00726-014-1697-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2013] [Accepted: 02/14/2014] [Indexed: 12/30/2022]

Sharma A, Lyons J, Dehzangi A, Paliwal KK. A feature extraction technique using bi-gram probabilities of position specific scoring matrix for protein fold recognition. J Theor Biol 2013;320:41-6. [DOI: 10.1016/j.jtbi.2012.12.008] [Citation(s) in RCA: 118] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2012] [Revised: 12/04/2012] [Accepted: 12/05/2012] [Indexed: 11/26/2022]

Panwar B, Gupta S, Raghava GPS. Prediction of vitamin interacting residues in a vitamin binding protein using evolutionary information. BMC Bioinformatics 2013;14:44. [PMID: 23387468 PMCID: PMC3577447 DOI: 10.1186/1471-2105-14-44] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2012] [Accepted: 01/31/2013] [Indexed: 02/07/2023] Open

Abstract

BACKGROUND

The vitamins are important cofactors in various enzymatic-reactions. In past, many inhibitors have been designed against vitamin binding pockets in order to inhibit vitamin-protein interactions. Thus, it is important to identify vitamin interacting residues in a protein. It is possible to detect vitamin-binding pockets on a protein, if its tertiary structure is known. Unfortunately tertiary structures of limited proteins are available. Therefore, it is important to develop in-silico models for predicting vitamin interacting residues in protein from its primary structure.

RESULTS

In this study, first we compared protein-interacting residues of vitamins with other ligands using Two Sample Logo (TSL). It was observed that ATP, GTP, NAD, FAD and mannose preferred {G,R,K,S,H}, {G,K,T,S,D,N}, {T,G,Y}, {G,Y,W} and {Y,D,W,N,E} residues respectively, whereas vitamins preferred {Y,F,S,W,T,G,H} residues for the interaction with proteins. Furthermore, compositional information of preferred and non-preferred residues along with patterns-specificity was also observed within different vitamin-classes. Vitamins A, B and B6 preferred {F,I,W,Y,L,V}, {S,Y,G,T,H,W,N,E} and {S,T,G,H,Y,N} interacting residues respectively. It suggested that protein-binding patterns of vitamins are different from other ligands, and motivated us to develop separate predictor for vitamins and their sub-classes. The four different prediction modules, (i) vitamin interacting residues (VIRs), (ii) vitamin-A interacting residues (VAIRs), (iii) vitamin-B interacting residues (VBIRs) and (iv) pyridoxal-5-phosphate (vitamin B6) interacting residues (PLPIRs) have been developed. We applied various classifiers of SVM, BayesNet, NaiveBayes, ComplementNaiveBayes, NaiveBayesMultinomial, RandomForest and IBk etc., as machine learning techniques, using binary and Position-Specific Scoring Matrix (PSSM) features of protein sequences. Finally, we selected best performing SVM modules and obtained highest MCC of 0.53, 0.48, 0.61, 0.81 for VIRs, VAIRs, VBIRs, PLPIRs respectively, using PSSM-based evolutionary information. All the modules developed in this study have been trained and tested on non-redundant datasets and evaluated using five-fold cross-validation technique. The performances were also evaluated on the balanced and different independent datasets.

CONCLUSIONS

This study demonstrates that it is possible to predict VIRs, VAIRs, VBIRs and PLPIRs from evolutionary information of protein sequence. In order to provide service to the scientific community, we have developed web-server and standalone software VitaPred (http://crdd.osdd.net/raghava/vitapred/).

Collapse

Sui T, Yang Y, Wang X. Sequence-Based Feature Extraction for Type III Effector Prediction. ACTA ACUST UNITED AC 2013. [DOI: 10.7763/ijbbb.2013.v3.206] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]

Dehzangi A, Paliwal K, Lyons J, Sharma A, Sattar A. Enhancing Protein Fold Prediction Accuracy Using Evolutionary and Structural Features. PATTERN RECOGNITION IN BIOINFORMATICS 2013. [DOI: 10.1007/978-3-642-39159-0_18] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]

Zeng J, Kirk BD, Gou Y, Wang Q, Ma J. Genome-wide polycomb target gene prediction in Drosophila melanogaster. Nucleic Acids Res 2012;40:5848-63. [PMID: 22416065 PMCID: PMC3401425 DOI: 10.1093/nar/gks209] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open

Mohammad TAS, Nagarajaram HA. SVM-based method for protein structural class prediction using secondary structural content and structural information of amino acids. J Bioinform Comput Biol 2011;9:489-502. [PMID: 21776605 DOI: 10.1142/s0219720011005422] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2010] [Revised: 09/15/2010] [Accepted: 01/07/2011] [Indexed: 11/18/2022]

Sequence-based classification using discriminatory motif feature selection. PLoS One 2011;6:e27382. [PMID: 22102890 PMCID: PMC3213122 DOI: 10.1371/journal.pone.0027382] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2011] [Accepted: 10/16/2011] [Indexed: 11/19/2022] Open

Abstract

Most existing methods for sequence-based classification use exhaustive feature generation, employing, for example, all -mer patterns. The motivation behind such (enumerative) approaches is to minimize the potential for overlooking important features. However, there are shortcomings to this strategy. First, practical constraints limit the scope of exhaustive feature generation to patterns of length , such that potentially important, longer () predictors are not considered. Second, features so generated exhibit strong dependencies, which can complicate understanding of derived classification rules. Third, and most importantly, numerous irrelevant features are created. These concerns can compromise prediction and interpretation. While remedies have been proposed, they tend to be problem-specific and not broadly applicable. Here, we develop a generally applicable methodology, and an attendant software pipeline, that is predicated on discriminatory motif finding. In addition to the traditional training and validation partitions, our framework entails a third level of data partitioning, a discovery partition. A discriminatory motif finder is used on sequences and associated class labels in the discovery partition to yield a (small) set of features. These features are then used as inputs to a classifier in the training partition. Finally, performance assessment occurs on the validation partition. Important attributes of our approach are its modularity (any discriminatory motif finder and any classifier can be deployed) and its universality (all data, including sequences that are unaligned and/or of unequal length, can be accommodated). We illustrate our approach on two nucleosome occupancy datasets and a protein solubility dataset, previously analyzed using enumerative feature generation. Our method achieves excellent performance results, with and without optimization of classifier tuning parameters. A Python pipeline implementing the approach is available at http://www.epibiostat.ucsf.edu/biostat/sen/dmfs/.

Collapse

Zhao N, Pang B, Shyu CR, Korkin D. Feature-based classification of native and non-native protein-protein interactions: Comparing supervised and semi-supervised learning approaches. Proteomics 2011;11:4321-30. [PMID: 22002942 DOI: 10.1002/pmic.201100217] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2011] [Revised: 07/26/2011] [Accepted: 08/18/2011] [Indexed: 12/12/2022]

Muda HM, Saad P, Othman RM. Remote protein homology detection and fold recognition using two-layer support vector machine classifiers. Comput Biol Med 2011;41:687-99. [PMID: 21704312 DOI: 10.1016/j.compbiomed.2011.06.004] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2009] [Revised: 03/16/2011] [Accepted: 06/05/2011] [Indexed: 02/07/2023]

Zhao N, Pang B, Shyu CR, Korkin D. Structural similarity and classification of protein interaction interfaces. PLoS One 2011;6:e19554. [PMID: 21589874 PMCID: PMC3093400 DOI: 10.1371/journal.pone.0019554] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2011] [Accepted: 04/05/2011] [Indexed: 11/25/2022] Open

Yang JY, Chen X. Improving taxonomy-based protein fold recognition by using global and local features. Proteins 2011;79:2053-64. [DOI: 10.1002/prot.23025] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2010] [Revised: 02/05/2011] [Accepted: 03/03/2011] [Indexed: 11/05/2022]

Day R, Lennox KP, Dahl DB, Vannucci M, Tsai JW. Characterizing the regularity of tetrahedral packing motifs in protein tertiary structure. ACTA ACUST UNITED AC 2010;26:3059-66. [PMID: 21047817 DOI: 10.1093/bioinformatics/btq573] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

iFC²: an integrated web-server for improved prediction of protein structural class, fold type, and secondary structure content. Amino Acids 2010;40:963-73. [PMID: 20730460 DOI: 10.1007/s00726-010-0721-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2010] [Accepted: 08/06/2010] [Indexed: 10/19/2022]

Jain P, Hirst JD. Automatic structure classification of small proteins using random forest. BMC Bioinformatics 2010;11:364. [PMID: 20594334 PMCID: PMC2916923 DOI: 10.1186/1471-2105-11-364] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2010] [Accepted: 07/01/2010] [Indexed: 11/29/2022] Open

Yang Y, Lu BL. Protein subcellular multi-localization prediction using a min-max modular support vector machine. Int J Neural Syst 2010;20:13-28. [PMID: 20180250 DOI: 10.1142/s0129065710002206] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]

Yang Y, Zhao J, Morgan RL, Ma W, Jiang T. Computational prediction of type III secreted proteins from gram-negative bacteria. BMC Bioinformatics 2010;11 Suppl 1:S47. [PMID: 20122221 PMCID: PMC3009519 DOI: 10.1186/1471-2105-11-s1-s47] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open

Abstract

Background

Type III secretion system (T3SS) is a specialized protein delivery system in gram-negative bacteria that injects proteins (called effectors) directly into the eukaryotic host cytosol and facilitates bacterial infection. For many plant and animal pathogens, T3SS is indispensable for disease development. Recently, T3SS has also been found in rhizobia and plays a crucial role in the nodulation process. Although a great deal of efforts have been done to understand type III secretion, the precise mechanism underlying the secretion and translocation process has not been fully understood. In particular, defined secretion and translocation signals enabling the secretion have not been identified from the type III secreted effectors (T3SEs), which makes the identification of these important virulence factors notoriously challenging. The availability of a large number of sequenced genomes for plant and animal-associated bacteria demands the development of efficient and effective prediction methods for the identification of T3SEs using bioinformatics approaches.

Results

We have developed a machine learning method based on the N-terminal amino acid sequences to predict novel type III effectors in the plant pathogen Pseudomonas syringae and the microsymbiont rhizobia. The extracted features used in the learning model (or classifier) include amino acid composition, secondary structure and solvent accessibility information. The method achieved a precision of over 90% on P. syringae in a cross validation study. In combination with a promoter screen for the type III specific promoters, this classifier trained on the P. syringae data was applied to predict novel T3SEs from the genomic sequences of four rhizobial strains. This application resulted in 57 candidate type III secreted proteins, 17 of which are confirmed effectors.

Conclusion

Our experimental results demonstrate that the machine learning method based on N-terminal amino acid sequences combined with a promoter screen could prove to be a very effective computational approach for predicting novel type III effectors in gram-negative bacteria. Our method and data are available to the public upon request.

Collapse

Chen P, Liu C, Burge L, Mahmood M, Southerland W, Gloster C. Protein fold classification with genetic algorithms and feature selection. J Bioinform Comput Biol 2009;7:773-88. [PMID: 19785045 DOI: 10.1142/s0219720009004321] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2008] [Revised: 01/21/2009] [Accepted: 03/23/2009] [Indexed: 11/18/2022]

Dong Q, Zhou S, Guan J. A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation. Bioinformatics 2009;25:2655-62. [DOI: 10.1093/bioinformatics/btp500] [Citation(s) in RCA: 150] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open