201
|
Ju Z, Cao JZ, Gu H. iLM-2L: A two-level predictor for identifying protein lysine methylation sites and their methylation degrees by incorporating K-gap amino acid pairs into Chou׳s general PseAAC. J Theor Biol 2015; 385:50-7. [DOI: 10.1016/j.jtbi.2015.07.030] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2015] [Revised: 07/06/2015] [Accepted: 07/23/2015] [Indexed: 10/23/2022]
|
202
|
Ahmad S, Kabir M, Hayat M. Identification of Heat Shock Protein families and J-protein types by incorporating Dipeptide Composition into Chou's general PseAAC. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2015; 122:165-174. [PMID: 26233307 DOI: 10.1016/j.cmpb.2015.07.005] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2015] [Revised: 06/21/2015] [Accepted: 07/13/2015] [Indexed: 06/04/2023]
Abstract
Heat Shock Proteins (HSPs) are the substantial ingredients for cell growth and viability, which are found in all living organisms. HSPs manage the process of folding and unfolding of proteins, the quality of newly synthesized proteins and protecting cellular homeostatic processes from environmental stress. On the basis of functionality, HSPs are categorized into six major families namely: (i) HSP20 or sHSP (ii) HSP40 or J-proteins types (iii) HSP60 or GroEL/ES (iv) HSP70 (v) HSP90 and (vi) HSP100. Identification of HSPs family and sub-family through conventional approaches is expensive and laborious. It is therefore, highly desired to establish an automatic, robust and accurate computational method for prediction of HSPs quickly and reliably. Regard, a computational model is developed for the prediction of HSPs family. In this model, protein sequences are formulated using three discrete methods namely: Split Amino Acid Composition, Pseudo Amino Acid Composition, and Dipeptide Composition. Several learning algorithms are utilized to choice the best one for high throughput computational model. Leave one out test is applied to assess the performance of the proposed model. The empirical results showed that support vector machine achieved quite promising results using Dipeptide Composition feature space. The predicted outcomes of proposed model are 90.7% accuracy for HSPs dataset and 97.04% accuracy for J-protein types, which are higher than existing methods in the literature so far.
Collapse
Affiliation(s)
- Saeed Ahmad
- Department of Computer Science, Abdul Wali Khan University, Mardan, Pakistan
| | - Muhammad Kabir
- Department of Computer Science, Abdul Wali Khan University, Mardan, Pakistan
| | - Maqsood Hayat
- Department of Computer Science, Abdul Wali Khan University, Mardan, Pakistan.
| |
Collapse
|
203
|
Fayaz SM, Rajanikant GK. Modelling the molecular mechanism of protein-protein interactions and their inhibition: CypD-p53 case study. Mol Divers 2015; 19:931-43. [PMID: 26170095 DOI: 10.1007/s11030-015-9612-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2015] [Accepted: 07/01/2015] [Indexed: 02/06/2023]
Abstract
Cyclophilin D (CypD) is an important regulatory protein involved in mitochondrial membrane permeability transition and cell death. Further, the mitochondrial CypD-p53 axis is an important contributor to necroptosis, a form of programmed necrosis, involved in various cardiovascular and neurological disorders. The CypD ligand, Cyclosporin A (CsA), was identified as an inhibitor of this interaction. In this study, using computational methods, we have attempted to model the CypD-p53 interaction in order to delineate their mode of binding and also to disclose the molecular mechanism, by means of which CsA interferes with this interaction. It was observed that p53 binds at the CsA-binding site of CypD. The knowledge obtained from this modelling was employed to identify novel CypD inhibitors through structure-based methods. Further, the identified compounds were tested by a similar strategy, adopted during the modelling process. This strategy could be applied to study the mechanism of protein-protein interaction (PPI) inhibition and to identify novel PPI inhibitors.
Collapse
Affiliation(s)
- S M Fayaz
- School of Biotechnology, National Institute of Technology Calicut, Calicut, 673601, India
| | - G K Rajanikant
- School of Biotechnology, National Institute of Technology Calicut, Calicut, 673601, India.
| |
Collapse
|
204
|
Liu B, Fang L, Wang S, Wang X, Li H, Chou KC. Identification of microRNA precursor with the degenerate K-tuple or Kmer strategy. J Theor Biol 2015; 385:153-9. [DOI: 10.1016/j.jtbi.2015.08.025] [Citation(s) in RCA: 131] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2015] [Revised: 08/21/2015] [Accepted: 08/24/2015] [Indexed: 10/23/2022]
|
205
|
Jia J, Liu Z, Xiao X, Liu B, Chou KC. Identification of protein-protein binding sites by incorporating the physicochemical properties and stationary wavelet transforms into pseudo amino acid composition. J Biomol Struct Dyn 2015; 34:1946-61. [PMID: 26375780 DOI: 10.1080/07391102.2015.1095116] [Citation(s) in RCA: 88] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
With the explosive growth of protein sequences entering into protein data banks in the post-genomic era, it is highly demanded to develop automated methods for rapidly and effectively identifying the protein-protein binding sites (PPBSs) based on the sequence information alone. To address this problem, we proposed a predictor called iPPBS-PseAAC, in which each amino acid residue site of the proteins concerned was treated as a 15-tuple peptide segment generated by sliding a window along the protein chains with its center aligned with the target residue. The working peptide segment is further formulated by a general form of pseudo amino acid composition via the following procedures: (1) it is converted into a numerical series via the physicochemical properties of amino acids; (2) the numerical series is subsequently converted into a 20-D feature vector by means of the stationary wavelet transform technique. Formed by many individual "Random Forest" classifiers, the operation engine to run prediction is a two-layer ensemble classifier, with the 1st-layer voting out the best training data-set from many bootstrap systems and the 2nd-layer voting out the most relevant one from seven physicochemical properties. Cross-validation tests indicate that the new predictor is very promising, meaning that many important key features, which are deeply hidden in complicated protein sequences, can be extracted via the wavelets transform approach, quite consistent with the facts that many important biological functions of proteins can be elucidated with their low-frequency internal motions. The web server of iPPBS-PseAAC is accessible at http://www.jci-bioinfo.cn/iPPBS-PseAAC , by which users can easily acquire their desired results without the need to follow the complicated mathematical equations involved.
Collapse
Affiliation(s)
- Jianhua Jia
- a Computer Department , Jing-De-Zhen Ceramic Institute , Jing-De-Zhen 333403 , China
| | - Zi Liu
- a Computer Department , Jing-De-Zhen Ceramic Institute , Jing-De-Zhen 333403 , China
| | - Xuan Xiao
- a Computer Department , Jing-De-Zhen Ceramic Institute , Jing-De-Zhen 333403 , China.,c Gordon Life Science Institute , Boston , MA 02478 , USA
| | - Bingxiang Liu
- a Computer Department , Jing-De-Zhen Ceramic Institute , Jing-De-Zhen 333403 , China
| | - Kuo-Chen Chou
- b Center of Excellence in Genomic Medicine Research (CEGMR) , King Abdulaziz University , Jeddah 21589 , Saudi Arabia.,c Gordon Life Science Institute , Boston , MA 02478 , USA
| |
Collapse
|
206
|
Liu B, Fang L, Long R, Lan X, Chou KC. iEnhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition. Bioinformatics 2015; 32:362-9. [PMID: 26476782 DOI: 10.1093/bioinformatics/btv604] [Citation(s) in RCA: 274] [Impact Index Per Article: 27.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2015] [Accepted: 10/12/2015] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Enhancers are of short regulatory DNA elements. They can be bound with proteins (activators) to activate transcription of a gene, and hence play a critical role in promoting gene transcription in eukaryotes. With the avalanche of DNA sequences generated in the post-genomic age, it is a challenging task to develop computational methods for timely identifying enhancers from extremely complicated DNA sequences. Although some efforts have been made in this regard, they were limited at only identifying whether a query DNA element being of an enhancer or not. According to the distinct levels of biological activities and regulatory effects on target genes, however, enhancers should be further classified into strong and weak ones in strength. RESULTS In view of this, a two-layer predictor called ' IENHANCER-2L: ' was proposed by formulating DNA elements with the 'pseudo k-tuple nucleotide composition', into which the six DNA local parameters were incorporated. To the best of our knowledge, it is the first computational predictor ever established for identifying not only enhancers, but also their strength. Rigorous cross-validation tests have indicated that IENHANCER-2L: holds very high potential to become a useful tool for genome analysis. AVAILABILITY AND IMPLEMENTATION For the convenience of most experimental scientists, a web server for the two-layer predictor was established at http://bioinformatics.hitsz.edu.cn/iEnhancer-2L/, by which users can easily get their desired results without the need to go through the mathematical details. CONTACT bliu@gordonlifescience.org, bliu@insun.hit.edu.cn, xlan@stanford.edu, kcchou@gordonlifescience.org SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Bin Liu
- School of Computer Science and Technology, Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China, Computational Biology, Gordon Life Science Institute, Belmont, MA 02478, USA
| | | | - Ren Long
- School of Computer Science and Technology
| | - Xun Lan
- Department of Genetics, Stanford University, Stanford, CA 94305, USA and
| | - Kuo-Chen Chou
- Computational Biology, Gordon Life Science Institute, Belmont, MA 02478, USA, Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah 21589, Saudi Arabia
| |
Collapse
|
207
|
Wan S, Mak MW, Kung SY. mLASSO-Hum: A LASSO-based interpretable human-protein subcellular localization predictor. J Theor Biol 2015; 382:223-34. [DOI: 10.1016/j.jtbi.2015.06.042] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2015] [Revised: 06/25/2015] [Accepted: 06/26/2015] [Indexed: 02/03/2023]
|
208
|
Rodríguez DC, Ocampo M, Reyes C, Arévalo‐Pinzón G, Munoz M, Patarroyo MA, Patarroyo ME. Cell‐Peptide Specific Interaction Can Inhibit
Mycobacterium tuberculosis H37Rv
Infection. J Cell Biochem 2015; 117:946-58. [DOI: 10.1002/jcb.25379] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2015] [Accepted: 09/14/2015] [Indexed: 11/10/2022]
Affiliation(s)
- Deisy Carolina Rodríguez
- Fundacion Instituto de Inmunología de Colombia (FIDIC)BogotáColombia
- Universidad del RosarioBogotáColombia
| | - Marisol Ocampo
- Fundacion Instituto de Inmunología de Colombia (FIDIC)BogotáColombia
- Universidad del RosarioBogotáColombia
| | - Cesar Reyes
- Fundacion Instituto de Inmunología de Colombia (FIDIC)BogotáColombia
- Universidad del RosarioBogotáColombia
| | - Gabriela Arévalo‐Pinzón
- Fundacion Instituto de Inmunología de Colombia (FIDIC)BogotáColombia
- Universidad del RosarioBogotáColombia
| | - Marina Munoz
- Fundacion Instituto de Inmunología de Colombia (FIDIC)BogotáColombia
- Universidad del RosarioBogotáColombia
| | - Manuel Alfonso Patarroyo
- Fundacion Instituto de Inmunología de Colombia (FIDIC)BogotáColombia
- Universidad del RosarioBogotáColombia
| | - Manuel Elkin Patarroyo
- Fundacion Instituto de Inmunología de Colombia (FIDIC)BogotáColombia
- Universidad Nacional de ColombiaBogotáColombia
| |
Collapse
|
209
|
Kabir M, Iqbal M, Ahmad S, Hayat M. iTIS-PseKNC: Identification of Translation Initiation Site in human genes using pseudo k-tuple nucleotides composition. Comput Biol Med 2015; 66:252-7. [PMID: 26433457 DOI: 10.1016/j.compbiomed.2015.09.010] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2015] [Accepted: 09/14/2015] [Indexed: 10/23/2022]
Abstract
Translation is an essential genetic process for understanding the mechanism of gene expression. Due to the large number of protein sequences generated in the post-genomic era, conventional methods are unable to identify Translation Initiation Site (TIS) in human genes timely and accurately. It is thus highly desirable to develop an automatic and accurate computational model for identification of TIS. Considerable improvements have been achieved in developing computational models; however, development of accurate and reliable automated systems for TIS identification in human genes is still a challenging task. In this connection, we propose iTIS-PseKNC, a novel protocol for identification of TIS. Three protein sequence representation methods including dinucleotide composition, pseudo-dinucleotide composition and Trinucleotide composition have been used in order to extract numerical descriptors. Support Vector Machine (SVM), K-nearest neighbor and Probabilistic Neural Network are assessed for their performance using the constructed descriptors. The proposed model iTIS-PseKNC has achieved 99.40% accuracy using jackknife test. The experimental results validated the superior performance of iTIS-PseKNC over the existing methods reported in the literature. It is highly anticipated that the iTIS-PseKNC predictor will be useful for basic research studies.
Collapse
Affiliation(s)
- Muhammad Kabir
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan
| | - Muhammad Iqbal
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan
| | - Saeed Ahmad
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan
| | - Maqsood Hayat
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan.
| |
Collapse
|
210
|
Ganguly B, Tewari K, Singh R. Homology modeling, functional annotation and comparative genomics of outer membrane protein H of Pasteurella multocida. J Theor Biol 2015; 386:18-24. [PMID: 26362105 DOI: 10.1016/j.jtbi.2015.08.028] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2015] [Revised: 08/29/2015] [Accepted: 08/31/2015] [Indexed: 11/18/2022]
Abstract
Pasteurella multocida is an important pathogen of animals and humans. Outer Membrane Protein (Omp) H is a major conserved protein in the envelope of P. multocida and has been commonly targeted as a protective antigen. However, not much is known about its structure and function due to the difficulties that are typically associated with obtaining sufficient amounts of purified prokaryotic transmembrane proteins. The present work is aimed at studying the OmpH using an in silico approach and consolidate the findings in light of existing experimental evidences. Our study describes the first 3D model of the P. multocida OmpH obtained through a combination of several in silico modeling approaches. From our results, OmpH of P. multocida could be classified as a homotrimeric, 16 stranded, β-barrel porin involved in the non-specific transport of small, hydrophilic molecules, serving essential osmoregulatory function. Moreover, very small homologous sequences could be identified in the host proteome, strengthening the probability of a successful OmpH-based vaccine against the pathogen with remote chances of cross-reaction to host proteins.
Collapse
Affiliation(s)
- Bhaskar Ganguly
- Animal Biotechnology Center, Department of Veterinary Physiology and Biochemistry, College of Veterinary and Animal Sciences, G. B. Pant University of Agriculture and Technology, Pantnagar 263145, India; Department of Veterinary Microbiology, College of Veterinary and Animal Sciences, G. B. Pant University of Agriculture and Technology, Pantnagar 263145, India.
| | - Kamal Tewari
- Department of Veterinary Microbiology, College of Veterinary and Animal Sciences, G. B. Pant University of Agriculture and Technology, Pantnagar 263145, India
| | - Rashmi Singh
- Department of Veterinary Microbiology, College of Veterinary and Animal Sciences, G. B. Pant University of Agriculture and Technology, Pantnagar 263145, India
| |
Collapse
|
211
|
Zhao X, Ning Q, Chai H, Ai M, Ma Z. PGlcS: Prediction of protein O-GlcNAcylation sites with multiple features and analysis. J Theor Biol 2015; 380:524-9. [DOI: 10.1016/j.jtbi.2015.06.026] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2015] [Revised: 06/01/2015] [Accepted: 06/02/2015] [Indexed: 10/23/2022]
|
212
|
Iqbal S, Mishra A, Hoque MT. Improved prediction of accessible surface area results in efficient energy function application. J Theor Biol 2015; 380:380-91. [DOI: 10.1016/j.jtbi.2015.06.012] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2015] [Revised: 05/15/2015] [Accepted: 06/02/2015] [Indexed: 01/16/2023]
|
213
|
Kabir M, Hayat M. iRSpot-GAEnsC: identifing recombination spots via ensemble classifier and extending the concept of Chou’s PseAAC to formulate DNA samples. Mol Genet Genomics 2015; 291:285-96. [DOI: 10.1007/s00438-015-1108-5] [Citation(s) in RCA: 95] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2015] [Accepted: 08/19/2015] [Indexed: 10/23/2022]
|
214
|
Ali F, Hayat M. Classification of membrane protein types using Voting Feature Interval in combination with Chou's Pseudo Amino Acid Composition. J Theor Biol 2015; 384:78-83. [PMID: 26297889 DOI: 10.1016/j.jtbi.2015.07.034] [Citation(s) in RCA: 93] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2015] [Revised: 07/15/2015] [Accepted: 07/29/2015] [Indexed: 12/11/2022]
Abstract
Membrane protein is a major constituent of cell, performing numerous crucial functions in the cell. These functions are mostly concerned with membrane protein's types. Initially, membrane proteins types are classified through traditional methods and reasonable results were obtained using these methods. However, due to large exploration of protein sequences in databases, it is very difficult or sometimes impossible to classify through conventional methods, because it is laborious and wasting of time. Therefore, a new powerful discriminating model is indispensable for classification of membrane protein's types with high precision. In this work, a quite promising classification model is developed having effective discriminating power of membrane protein's types. In our classification model, silent features of protein sequences are extracted via Pseudo Amino Acid Composition. Five classification algorithms were utilized. Among these classification algorithms Voting Feature Interval has obtained outstanding performance in all the three datasets. The accuracy of proposed model is 93.9% on dataset S1, 89.33% on S2 and 86.9% on dataset S3, respectively, applying 10-fold cross validation test. The success rates revealed that our proposed model has obtained the utmost outcomes than other existing models in literatures so far and will be played a substantial role in the fields of drug design and pharmaceutical industry.
Collapse
Affiliation(s)
- Farman Ali
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan
| | - Maqsood Hayat
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan.
| |
Collapse
|
215
|
Zheng W, Ruan J, Hu G, Wang K, Hanlon M, Gao J. Analysis of Conformational B-Cell Epitopes in the Antibody-Antigen Complex Using the Depth Function and the Convex Hull. PLoS One 2015; 10:e0134835. [PMID: 26244562 PMCID: PMC4526569 DOI: 10.1371/journal.pone.0134835] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2015] [Accepted: 07/14/2015] [Indexed: 01/05/2023] Open
Abstract
The prediction of conformational b-cell epitopes plays an important role in immunoinformatics. Several computational methods are proposed on the basis of discrimination determined by the solvent-accessible surface between epitopes and non-epitopes, but the performance of existing methods is far from satisfying. In this paper, depth functions and the k-th surface convex hull are used to analyze epitopes and exposed non-epitopes. On each layer of the protein, we compute relative solvent accessibility and four different types of depth functions, i.e., Chakravarty depth, DPX, half-sphere exposure and half space depth, to analyze the location of epitopes on different layers of the proteins. We found that conformational b-cell epitopes are rich in charged residues Asp, Glu, Lys, Arg, His; aliphatic residues Gly, Pro; non-charged residues Asn, Gln; and aromatic residue Tyr. Conformational b-cell epitopes are rich in coils. Conservation of epitopes is not significantly lower than that of exposed non-epitopes. The average depths (obtained by four methods) for epitopes are significantly lower than that of non-epitopes on the surface using the Wilcoxon rank sum test. Epitopes are more likely to be located in the outer layer of the convex hull of a protein. On the benchmark dataset, the cumulate 10th convex hull covers 84.6% of exposed residues on the protein surface area, and nearly 95% of epitope sites. These findings may be helpful in building a predictor for epitopes.
Collapse
Affiliation(s)
- Wei Zheng
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, People’s Republic of China
| | - Jishou Ruan
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, People’s Republic of China
- State Key Laboratory of Medicinal Chemical Biology, Nankai University, Tianjin, People’s Republic of China
| | - Gang Hu
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, People’s Republic of China
| | - Kui Wang
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, People’s Republic of China
| | - Michelle Hanlon
- Department of Physical Sciences, Grant MacEwan University, Alberta, Canada
| | - Jianzhao Gao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, People’s Republic of China
- * E-mail:
| |
Collapse
|
216
|
Liu Y, Munteanu CR, Fernández Blanco E, Tan Z, Santos Del Riego A, Pazos A. Prediction of Nucleotide Binding Peptides Using Star Graph Topological Indices. Mol Inform 2015; 34:736-41. [PMID: 27491034 DOI: 10.1002/minf.201500064] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2015] [Accepted: 07/06/2015] [Indexed: 01/14/2023]
Abstract
The nucleotide binding proteins are involved in many important cellular processes, such as transmission of genetic information or energy transfer and storage. Therefore, the screening of new peptides for this biological function is an important research topic. The current study proposes a mixed methodology to obtain the first classification model that is able to predict new nucleotide binding peptides, using only the amino acid sequence. Thus, the methodology uses a Star graph molecular descriptor of the peptide sequences and the Machine Learning technique for the best classifier. The best model represents a Random Forest classifier based on two features of the embedded and non-embedded graphs. The performance of the model is excellent, considering similar models in the field, with an Area Under the Receiver Operating Characteristic Curve (AUROC) value of 0.938 and true positive rate (TPR) of 0.886 (test subset). The prediction of new nucleotide binding peptides with this model could be useful for drug target studies in drug development.
Collapse
Affiliation(s)
- Yong Liu
- Department of Information and Communication Technologies, Computer Science Faculty, University of A Coruna, Campus de Elviña s/n, 15071, A Coruña, Spain, phone/fax: +34-981167000/+34-981167160.,Faculty of Veterinary Medicine and Animal Science, Autonomous University of the State of Mexico, Toluca, 50090, México.,Key Laboratory of Subtropical Agro-ecological Engineering, Institute of Subtropical Agriculture, the Chinese Academy of Sciences, Changsha, Hunan, 410125, P. R. China
| | - Cristian R Munteanu
- Department of Information and Communication Technologies, Computer Science Faculty, University of A Coruna, Campus de Elviña s/n, 15071, A Coruña, Spain, phone/fax: +34-981167000/+34-981167160.
| | - Enrique Fernández Blanco
- Department of Information and Communication Technologies, Computer Science Faculty, University of A Coruna, Campus de Elviña s/n, 15071, A Coruña, Spain, phone/fax: +34-981167000/+34-981167160
| | - Zhiliang Tan
- Key Laboratory of Subtropical Agro-ecological Engineering, Institute of Subtropical Agriculture, the Chinese Academy of Sciences, Changsha, Hunan, 410125, P. R. China
| | - Antonino Santos Del Riego
- Department of Information and Communication Technologies, Computer Science Faculty, University of A Coruna, Campus de Elviña s/n, 15071, A Coruña, Spain, phone/fax: +34-981167000/+34-981167160
| | - Alejandro Pazos
- Department of Information and Communication Technologies, Computer Science Faculty, University of A Coruna, Campus de Elviña s/n, 15071, A Coruña, Spain, phone/fax: +34-981167000/+34-981167160
| |
Collapse
|
217
|
Chen W, Lin H, Chou KC. Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences. MOLECULAR BIOSYSTEMS 2015; 11:2620-34. [DOI: 10.1039/c5mb00155b] [Citation(s) in RCA: 262] [Impact Index Per Article: 26.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
With the avalanche of DNA/RNA sequences generated in the post-genomic age, it is urgent to develop automated methods for analyzing the relationship between the sequences and their functions.
Collapse
Affiliation(s)
- Wei Chen
- Department of Physics
- School of Sciences
- and Center for Genomics and Computational Biology
- Hebei United University
- Tangshan 063000
| | - Hao Lin
- Gordon Life Science Institute
- Boston
- USA
- Key Laboratory for Neuro-Information of Ministry of Education
- Center of Bioinformatics
| | - Kuo-Chen Chou
- Department of Physics
- School of Sciences
- and Center for Genomics and Computational Biology
- Hebei United University
- Tangshan 063000
| |
Collapse
|