1
|
Malik A, Kamli MR, Sabir JSM, Rather IA, Phan LT, Kim CB, Manavalan B. APLpred: A machine learning-based tool for accurate prediction and characterization of asparagine peptide lyases using sequence-derived optimal features. Methods 2024; 229:133-146. [PMID: 38944134 DOI: 10.1016/j.ymeth.2024.05.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2024] [Revised: 05/08/2024] [Accepted: 05/19/2024] [Indexed: 07/01/2024] Open
Abstract
Asparagine peptide lyase (APL) is among the seven groups of proteases, also known as proteolytic enzymes, which are classified according to their catalytic residue. APLs are synthesized as precursors or propeptides that undergo self-cleavage through autoproteolytic reaction. At present, APLs are grouped into 10 families belonging to six different clans of proteases. Recognizing their critical roles in many biological processes including virus maturation, and virulence, accurate identification and characterization of APLs is indispensable. Experimental identification and characterization of APLs is laborious and time-consuming. Here, we developed APLpred, a novel support vector machine (SVM) based predictor that can predict APLs from the primary sequences. APLpred was developed using Boruta-based optimal features derived from seven encodings and subsequently trained using five machine learning algorithms. After evaluating each model on an independent dataset, we selected APLpred (an SVM-based model) due to its consistent performance during cross-validation and independent evaluation. We anticipate APLpred will be an effective tool for identifying APLs. This could aid in designing inhibitors against these enzymes and exploring their functions. The APLpred web server is freely available at https://procarb.org/APLpred/.
Collapse
Affiliation(s)
- Adeel Malik
- Institute of Intelligence Informatics Technology, Sangmyung University, Seoul 03016, Republic of Korea
| | - Majid Rasool Kamli
- Department of Biological Sciences, Faculty of Science, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Jamal S M Sabir
- Department of Biological Sciences, Faculty of Science, King Abdulaziz University, Jeddah 21589, Saudi Arabia; Center of Excellence in Bionanoscience Research, King Abdulaziz University, Jeddah 21589, Saudi Arabia.
| | - Irfan A Rather
- Department of Biological Sciences, Faculty of Science, King Abdulaziz University, Jeddah 21589, Saudi Arabia; Center of Excellence in Bionanoscience Research, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Le Thi Phan
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea
| | - Chang-Bae Kim
- Department of Biotechnology, Sangmyung University, Seoul 03016, Republic of Korea.
| | - Balachandran Manavalan
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea.
| |
Collapse
|
2
|
Jia P, Zhang F, Wu C, Li M. A comprehensive review of protein-centric predictors for biomolecular interactions: from proteins to nucleic acids and beyond. Brief Bioinform 2024; 25:bbae162. [PMID: 38739759 PMCID: PMC11089422 DOI: 10.1093/bib/bbae162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Revised: 02/17/2024] [Accepted: 03/31/2024] [Indexed: 05/16/2024] Open
Abstract
Proteins interact with diverse ligands to perform a large number of biological functions, such as gene expression and signal transduction. Accurate identification of these protein-ligand interactions is crucial to the understanding of molecular mechanisms and the development of new drugs. However, traditional biological experiments are time-consuming and expensive. With the development of high-throughput technologies, an increasing amount of protein data is available. In the past decades, many computational methods have been developed to predict protein-ligand interactions. Here, we review a comprehensive set of over 160 protein-ligand interaction predictors, which cover protein-protein, protein-nucleic acid, protein-peptide and protein-other ligands (nucleotide, heme, ion) interactions. We have carried out a comprehensive analysis of the above four types of predictors from several significant perspectives, including their inputs, feature profiles, models, availability, etc. The current methods primarily rely on protein sequences, especially utilizing evolutionary information. The significant improvement in predictions is attributed to deep learning methods. Additionally, sequence-based pretrained models and structure-based approaches are emerging as new trends.
Collapse
Affiliation(s)
- Pengzhen Jia
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
| | - Fuhao Zhang
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
- College of Information Engineering, Northwest A&F University, No. 3 Taicheng Road, Yangling, Shaanxi 712100, China
| | - Chaojin Wu
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
| |
Collapse
|
3
|
Firoz A, Malik A, Ali HM, Akhter Y, Manavalan B, Kim CB. PRR-HyPred: A two-layer hybrid framework to predict pattern recognition receptors and their families by employing sequence encoded optimal features. Int J Biol Macromol 2023; 234:123622. [PMID: 36773859 DOI: 10.1016/j.ijbiomac.2023.123622] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 02/03/2023] [Accepted: 02/06/2023] [Indexed: 02/12/2023]
Abstract
Pattern recognition receptors (PRRs) recognize distinct features on the surface of pathogens or damaged cells and play key roles in the innate immune system. PRRs are divided into various families, including Toll-like receptors, retinoic acid-inducible gene-I-like receptors, nucleotide oligomerization domain-like receptors, and C-type lectin receptors. As these are implicated in host health and several diseases, their accurate identification is indispensable for their functional characterization and targeted therapeutic approaches. Here, we construct PRR-HyPred, a novel two-layer hybrid framework in which the first layer predicts whether a given sequence is PRR or non-PRR using a support vector machine, and in the second, the predicted PRR sequence is assigned to a specific family using a random forest-based classifier. Based on a 10-fold cross-validation test, PRR-HyPred achieved 83.4 % accuracy in the first layer and 95 % in the second, with Matthew's correlation coefficient values of 0.639 and 0.816, respectively. This is the first study that can simultaneously predict and classify PRRs into specific families. PRR-HyPred is available as a web portal at https://procarb.org/PRRHyPred/. We hope that it could be a valuable tool for the large-scale prediction and classification of PRRs and subsequently facilitate future studies.
Collapse
Affiliation(s)
- Ahmad Firoz
- Department of Biological Sciences, Faculty of Science, King Abdulaziz University, Jeddah, Saudi Arabia; Princess Dr. Najla Bint Saud Al- Saud Center for Excellence Research in Biotechnology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Adeel Malik
- Institute of Intelligence Informatics Technology, Sangmyung University, Seoul, 03016, Republic of Korea.
| | - Hani Mohammed Ali
- Department of Biological Sciences, Faculty of Science, King Abdulaziz University, Jeddah, Saudi Arabia; Princess Dr. Najla Bint Saud Al- Saud Center for Excellence Research in Biotechnology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Yusuf Akhter
- Department of Biotechnology, Babasaheb Bhimrao Ambedkar University, Vidya Vihar, Raebareli Road, Lucknow, Uttar Pradesh, 226025, India
| | - Balachandran Manavalan
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Gyeonggi-do, Republic of Korea.
| | - Chang-Bae Kim
- Department of Biotechnology, Sangmyung University, Seoul, 03016, Republic of Korea.
| |
Collapse
|
4
|
Yamaguchi S, Nakashima H, Moriwaki Y, Terada T, Shimizu K. Prediction of protein mononucleotide binding sites using AlphaFold2 and machine learning. Comput Biol Chem 2022; 100:107744. [DOI: 10.1016/j.compbiolchem.2022.107744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 07/12/2022] [Accepted: 07/22/2022] [Indexed: 11/26/2022]
|
5
|
Malik A, Subramaniyam S, Kim CB, Manavalan B. SortPred: The first machine learning based predictor to identify bacterial sortases and their classes using sequence-derived information. Comput Struct Biotechnol J 2021; 20:165-174. [PMID: 34976319 PMCID: PMC8703055 DOI: 10.1016/j.csbj.2021.12.014] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Revised: 12/08/2021] [Accepted: 12/09/2021] [Indexed: 12/12/2022] Open
Abstract
Sortase enzymes are cysteine transpeptidases that embellish the surface of Gram-positive bacteria with various proteins thereby allowing these microorganisms to interact with their neighboring environment. It is known that several of their substrates can cause pathological implications, so researchers have focused on the development of sortase inhibitors. Currently, six different classes of sortases (A-F) are recognized. However, with the extensive application of bacterial genome sequencing projects, the number of potential sortases in the public databases has exploded, presenting considerable challenges in annotating these sequences. It is very laborious and time-consuming to characterize these sortase classes experimentally. Therefore, this study developed the first machine-learning-based two-layer predictor called SortPred, where the first layer predicts the sortase from the given sequence and the second layer predicts their class from the predicted sortase. To develop SortPred, we constructed an original benchmarking dataset and investigated 31 feature descriptors, primarily on five feature encoding algorithms. Afterward, each of these descriptors were trained using a random forest classifier and their robustness was evaluated with an independent dataset. Finally, we selected the final model independently for both layers depending on the performance consistency between cross-validation and independent evaluation. SortPred is expected to be an effective tool for identifying bacterial sortases, which in turn may aid in designing sortase inhibitors and exploring their functions. The SortPred webserver and a standalone version are freely accessible at: https://procarb.org/sortpred.
Collapse
Affiliation(s)
- Adeel Malik
- Institute of Intelligence Informatics Technology, Sangmyung University, Seoul 03016, Republic of Korea
| | | | - Chang-Bae Kim
- Department of Biotechnology, Sangmyung University, Seoul 03016, Republic of Korea
| | | |
Collapse
|
6
|
Yang YH, Wang JS, Yuan SS, Liu ML, Su W, Lin H, Zhang ZY. A Survey for Predicting ATP Binding Residues of Proteins Using Machine Learning Methods. Curr Med Chem 2021; 29:789-806. [PMID: 34514982 DOI: 10.2174/0929867328666210910125802] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 06/29/2021] [Accepted: 07/04/2021] [Indexed: 11/22/2022]
Abstract
Protein-ligand interactions are necessary for majority protein functions. Adenosine-5'-triphosphate (ATP) is one such ligand that plays vital role as a coenzyme in providing energy for cellular activities, catalyzing biological reaction and signaling. Knowing ATP binding residues of proteins is helpful for annotation of protein function and drug design. However, due to the huge amounts of protein sequences influx into databases in the post-genome era, experimentally identifying ATP binding residues is cost-ineffective and time-consuming. To address this problem, computational methods have been developed to predict ATP binding residues. In this review, we briefly summarized the application of machine learning methods in detecting ATP binding residues of proteins. We expect this review will be helpful for further research.
Collapse
Affiliation(s)
- Yu-He Yang
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| | - Jia-Shu Wang
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| | - Shi-Shi Yuan
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| | - Meng-Lu Liu
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| | - Wei Su
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| | - Hao Lin
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| | - Zhao-Yue Zhang
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| |
Collapse
|
7
|
Ahmad S, Prathipati P, Tripathi LP, Chen YA, Arya A, Murakami Y, Mizuguchi K. Integrating sequence and gene expression information predicts genome-wide DNA-binding proteins and suggests a cooperative mechanism. Nucleic Acids Res 2019; 46:54-70. [PMID: 29186632 PMCID: PMC5758906 DOI: 10.1093/nar/gkx1166] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2016] [Accepted: 11/15/2017] [Indexed: 12/29/2022] Open
Abstract
DNA-binding proteins (DBPs) perform diverse biological functions ranging from transcription to pathogen sensing. Machine learning methods can not only identify DBPs de novo but also provide insights into their DNA-recognition dynamics. However, it remains unclear whether available methods that can accurately predict DNA-binding sites in known DBPs can also identify novel DBPs. Moreover, sequence information is blind to the cellular- and disease-specific contexts of DBP activities, whereas the under-utilized knowledge from public gene expression data offers great promise. To address these issues, we have developed novel methods for predicting DBPs by integrating sequence and gene expression-derived features and applied them to explore human, mouse and Arabidopsis proteomes. While our sequence-based models outperformed the gene expression-based ones, some proteins with weaker DBP-like sequence features were correctly predicted by gene expression-based features, suggesting that these proteins acquire a tangible DBP functionality in a conducive gene expression environment. Analysis of motif enrichment among the co-expressed genes of top 100 candidates DBPs from hitherto unannotated genes provides further avenues to explore their functional associations.
Collapse
Affiliation(s)
- Shandar Ahmad
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi 110067, India.,Laboratory of Bioinformatics, National Institutes of Biomedical Innovation, Health and Nutrition, 7-6-8 Saito-asagi, Ibaraki, Osaka 5670085, Japan
| | - Philip Prathipati
- Laboratory of Bioinformatics, National Institutes of Biomedical Innovation, Health and Nutrition, 7-6-8 Saito-asagi, Ibaraki, Osaka 5670085, Japan
| | - Lokesh P Tripathi
- Laboratory of Bioinformatics, National Institutes of Biomedical Innovation, Health and Nutrition, 7-6-8 Saito-asagi, Ibaraki, Osaka 5670085, Japan
| | - Yi-An Chen
- Laboratory of Bioinformatics, National Institutes of Biomedical Innovation, Health and Nutrition, 7-6-8 Saito-asagi, Ibaraki, Osaka 5670085, Japan
| | - Ajay Arya
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi 110067, India
| | - Yoichi Murakami
- Laboratory of Bioinformatics, National Institutes of Biomedical Innovation, Health and Nutrition, 7-6-8 Saito-asagi, Ibaraki, Osaka 5670085, Japan
| | - Kenji Mizuguchi
- Laboratory of Bioinformatics, National Institutes of Biomedical Innovation, Health and Nutrition, 7-6-8 Saito-asagi, Ibaraki, Osaka 5670085, Japan
| |
Collapse
|
8
|
Aulsebrook ML, Starck M, Grace MR, Graham B, Thordarson P, Pal R, Tuck KL. Interaction of Nucleotides with a Trinuclear Terbium(III)-Dizinc(II) Complex: Efficient Sensitization of Terbium Luminescence by Guanosine Monophosphate and Application to Real-Time Monitoring of Phosphodiesterase Activity. Inorg Chem 2018; 58:495-505. [PMID: 30561998 DOI: 10.1021/acs.inorgchem.8b02731] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
An in-depth study of the interaction of a trinuclear terbium(III)-dizinc(II) complex with an array of nucleotides differing in the type of nucleobase and number of phosphate groups, as well as cyclic versus acyclic variants, is presented. The study examined the nature of the interaction and the efficiency at which guanine was able to sensitize terbium(III) luminescence. Competitive binding and titration studies were performed to help establish the nature/mode of the interactions. These established that (1) interaction occurs by the coordination of phosphate groups to zinc(II) (in addition to uridine in the case of uridine monophosphate), (2) acyclic nucleotides bind more strongly than cyclic counterparts because of their higher negative charge, (3) guanine-containing nucleotides are able to sensitize terbium(III) luminescence with the efficiency of sensitization following the order guanosine monophosphate (GMP) > guanosine diphosphate > guanosine triphosphate because of the mode of binding, and (4) nucleoside monophosphates bind to a single zinc(II) ion, whereas di- and triphosphates appear to bind in a bridging mode between two host molecules. Furthermore, it has been shown that guanine is a sensitizer of terbium(III) luminescence. On the basis of the ability of GMP to effectively sensitize terbium(III)-based luminescence while cyclic GMP (cGMP) does not, the complex has been utilized to monitor the catalytic conversion of cGMP to GMP by a phosphodiesterase enzyme in real time using time-gated luminescence on a benchtop fluorimeter. The complex has the potential to find broad application in monitoring the activity of enzymes that process nucleotides (co)substrates, including high-throughput drug-screening programs.
Collapse
Affiliation(s)
| | - Matthieu Starck
- Department of Chemistry , Durham University , Durham DH1 3LE , U.K
| | - Michael R Grace
- School of Chemistry , Monash University , Clayton , Victoria 3800 , Australia
| | - Bim Graham
- Monash Institute of Pharmaceutical Sciences , Monash University , Parkville , Victoria 3052 , Australia
| | - Pall Thordarson
- School of Chemistry, the Australian Centre for Nanomedicine and the ARC Centre of Excellence in Convergent Bio-Nano Science and Technology , University of New South Wales , Sydney , New South Wales 2052 , Australia
| | - Robert Pal
- Department of Chemistry , Durham University , Durham DH1 3LE , U.K
| | - Kellie L Tuck
- School of Chemistry , Monash University , Clayton , Victoria 3800 , Australia
| |
Collapse
|
9
|
Kopra K, Seppälä T, Rabara D, Abreu-Blanco M, Kulmala S, Holderfield M, Härmä H. Label-Free Time-Gated Luminescent Detection Method for the Nucleotides with Varying Phosphate Content. SENSORS 2018; 18:s18113989. [PMID: 30453509 PMCID: PMC6264117 DOI: 10.3390/s18113989] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/03/2018] [Revised: 11/11/2018] [Accepted: 11/12/2018] [Indexed: 01/26/2023]
Abstract
A new label-free molecular probe for luminescent nucleotide detection in neutral aqueous solution is presented. Phosphate-containing molecules, such as nucleotides possess vital role in cell metabolism, energy economy, and various signaling processes. Thus, the monitoring of nucleotide concentration and nucleotide related enzymatic reactions is of high importance. Two component lanthanide complex formed from Tb(III) ion carrier and light harvesting antenna, readily distinguishes nucleotides containing different number of phosphates and enable direct detection of enzymatic reactions converting nucleotide triphosphate (NTP) to nucleotide di/monophosphate or the opposite. Developed sensor enables the detection of enzymatic activity with a low nanomolar sensitivity, as highlighted with K-Ras and apyrase enzymes in their hydrolysis assays performed in a high throughput screening compatible 384-well plate format.
Collapse
Affiliation(s)
- Kari Kopra
- Materials Chemistry and Chemical Analysis, University of Turku, Vatselankatu 2, 20500 Turku, Finland.
| | - Tanja Seppälä
- Materials Chemistry and Chemical Analysis, University of Turku, Vatselankatu 2, 20500 Turku, Finland.
| | - Dana Rabara
- NCI-RAS Initiative, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Leidos Biomedical Research, Frederick, MD 21702, USA.
| | - Maria Abreu-Blanco
- NCI-RAS Initiative, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Leidos Biomedical Research, Frederick, MD 21702, USA.
| | - Sakari Kulmala
- Laboratory of Analytical Chemistry, Department of Chemistry, Aalto University, P.O. Box 16100, 00076 Aalto, Finland.
| | - Matthew Holderfield
- NCI-RAS Initiative, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Leidos Biomedical Research, Frederick, MD 21702, USA.
| | - Harri Härmä
- Materials Chemistry and Chemical Analysis, University of Turku, Vatselankatu 2, 20500 Turku, Finland.
| |
Collapse
|
10
|
Santhosh R, Satheesh SN, Gurusaran M, Michael D, Sekar K, Jeyakanthan J. NIMS: a database on nucleobase compounds and their interactions in macromolecular structures. J Appl Crystallogr 2016. [DOI: 10.1107/s1600576716006208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
The intense exploration of nucleotide-binding protein structures has created a whirlwind in the field of structural biology and bioinformatics. This has led to the conception and birth of NIMS. This database is a collection of detailed data on the nucleobases, nucleosides and nucleotides, along with their analogues as well as the protein structures to which they bind. Interaction details such as the interacting residues and all associated values have been made available. As a pioneering step, the diffraction precision index for protein structures, the atomic uncertainty for each atom, and the computed errors on the interatomic distances and angles are available in the database. Apart from the above, provision has been made to visualize the three-dimensional structures of both ligands and protein–ligand structures and their interactions inJmolas well asJSmol. One of the salient features of NIMS is that it has been interfaced with a user-friendly and query-based efficient search engine. It was conceived and developed with the aim of serving a significant section of researchers working in the area of protein and nucleobase complexes. NIMS is freely available online at http://iris.physics.iisc.ernet.in/nims and it is hoped that it will prove to be an invaluable asset.
Collapse
|
11
|
Yu DJ, Hu J, Li QM, Tang ZM, Yang JY, Shen HB. Constructing query-driven dynamic machine learning model with application to protein-ligand binding sites prediction. IEEE Trans Nanobioscience 2015; 14:45-58. [PMID: 25730499 DOI: 10.1109/tnb.2015.2394328] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
We are facing an era with annotated biological data rapidly and continuously generated. How to effectively incorporate new annotated data into the learning step is crucial for enhancing the performance of a bioinformatics prediction model. Although machine-learning-based methods have been extensively used for dealing with various biological problems, existing approaches usually train static prediction models based on fixed training datasets. The static approaches are found having several disadvantages such as low scalability and impractical when training dataset is huge. In view of this, we propose a dynamic learning framework for constructing query-driven prediction models. The key difference between the proposed framework and the existing approaches is that the training set for the machine learning algorithm of the proposed framework is dynamically generated according to the query input, as opposed to training a general model regardless of queries in traditional static methods. Accordingly, a query-driven predictor based on the smaller set of data specifically selected from the entire annotated base dataset will be applied on the query. The new way for constructing the dynamic model enables us capable of updating the annotated base dataset flexibly and using the most relevant core subset as the training set makes the constructed model having better generalization ability on the query, showing "part could be better than all" phenomenon. According to the new framework, we have implemented a dynamic protein-ligand binding sites predictor called OSML (On-site model for ligand binding sites prediction). Computer experiments on 10 different ligand types of three hierarchically organized levels show that OSML outperforms most existing predictors. The results indicate that the current dynamic framework is a promising future direction for bridging the gap between the rapidly accumulated annotated biological data and the effective machine-learning-based predictors. OSML web server and datasets are freely available at: http://www.csbio.sjtu.edu.cn/bioinf/OSML/ for academic use.
Collapse
|
12
|
Forlani G, Makarova KS, Ruszkowski M, Bertazzini M, Nocek B. Evolution of plant δ(1)-pyrroline-5-carboxylate reductases from phylogenetic and structural perspectives. FRONTIERS IN PLANT SCIENCE 2015; 6:567. [PMID: 26284089 PMCID: PMC4522605 DOI: 10.3389/fpls.2015.00567] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/30/2015] [Accepted: 07/09/2015] [Indexed: 05/23/2023]
Abstract
Proline plays a crucial role in cell growth and stress responses, and its accumulation is essential for the tolerance of adverse environmental conditions in plants. Two routes are used to biosynthesize proline in plants. The main route uses glutamate as a precursor, while in the other route proline is derived from ornithine. The terminal step of both pathways, the conversion of δ(1)-pyrroline-5-carboxylate (P5C) to L-proline, is catalyzed by P5C reductase (P5CR) using NADH or NADPH as a cofactor. Since P5CRs are important housekeeping enzymes, they are conserved across all domains of life and appear to be relatively unaffected throughout evolution. However, global analysis of these enzymes unveiled significant functional diversity in the preference for cofactors (NADPH vs. NADH), variation in metal dependence and the differences in the oligomeric state. In our study we investigated evolutionary patterns through phylogenetic and structural analysis of P5CR representatives from all kingdoms of life, with emphasis on the plant species. We also attempted to correlate local sequence/structure variation among the functionally and structurally characterized members of the family.
Collapse
Affiliation(s)
- Giuseppe Forlani
- Department of Life Science and Biotechnology, University of FerraraFerrara, Italy
| | - Kira S. Makarova
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, BethesdaMD, USA
| | - Milosz Ruszkowski
- Synchrotron Radiation Research Section, Macromolecular Crystallography Laboratory, National Cancer Institute, Argonne National Laboratory, ArgonneIL, USA
| | - Michele Bertazzini
- Department of Life Science and Biotechnology, University of FerraraFerrara, Italy
| | - Boguslaw Nocek
- The Bioscience Division, Argonne National Laboratory, ArgonneIL, USA
| |
Collapse
|
13
|
Usha S, Selvaraj S. Structure-wise discrimination of adenine and guanine by proteins on the basis of their nonbonded interactions. J Biomol Struct Dyn 2014; 33:1474-92. [DOI: 10.1080/07391102.2014.958759] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
|
14
|
Hu J, He X, Yu DJ, Yang XB, Yang JY, Shen HB. A new supervised over-sampling algorithm with application to protein-nucleotide binding residue prediction. PLoS One 2014; 9:e107676. [PMID: 25229688 PMCID: PMC4168127 DOI: 10.1371/journal.pone.0107676] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2014] [Accepted: 08/09/2014] [Indexed: 12/21/2022] Open
Abstract
Protein-nucleotide interactions are ubiquitous in a wide variety of biological processes. Accurately identifying interaction residues solely from protein sequences is useful for both protein function annotation and drug design, especially in the post-genomic era, as large volumes of protein data have not been functionally annotated. Protein-nucleotide binding residue prediction is a typical imbalanced learning problem, where binding residues are extremely fewer in number than non-binding residues. Alleviating the severity of class imbalance has been demonstrated to be a promising means of improving the prediction performance of a machine-learning-based predictor for class imbalance problems. However, little attention has been paid to the negative impact of class imbalance on protein-nucleotide binding residue prediction. In this study, we propose a new supervised over-sampling algorithm that synthesizes additional minority class samples to address class imbalance. The experimental results from protein-nucleotide interaction datasets demonstrate that the proposed supervised over-sampling algorithm can relieve the severity of class imbalance and help to improve prediction performance. Based on the proposed over-sampling algorithm, a predictor, called TargetSOS, is implemented for protein-nucleotide binding residue prediction. Cross-validation tests and independent validation tests demonstrate the effectiveness of TargetSOS. The web-server and datasets used in this study are freely available at http://www.csbio.sjtu.edu.cn/bioinf/TargetSOS/.
Collapse
Affiliation(s)
- Jun Hu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, Jiangsu, China
| | - Xue He
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, Jiangsu, China
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, Jiangsu, China
- Changshu Institute, Nanjing University of Science and Technology, Changshu, Jiangsu, China
- * E-mail: (DJY); (HBS)
| | - Xi-Bei Yang
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, Jiangsu, China
- School of Computer Science and Engineering, Jiangsu University of Science and Technology, Zhenjiang, Jiangsu, China
| | - Jing-Yu Yang
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, Jiangsu, China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, China
- * E-mail: (DJY); (HBS)
| |
Collapse
|
15
|
Yu DJ, Hu J, Yan H, Yang XB, Yang JY, Shen HB. Enhancing protein-vitamin binding residues prediction by multiple heterogeneous subspace SVMs ensemble. BMC Bioinformatics 2014; 15:297. [PMID: 25189131 PMCID: PMC4261549 DOI: 10.1186/1471-2105-15-297] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2014] [Accepted: 08/18/2014] [Indexed: 11/10/2022] Open
Abstract
Background Vitamins are typical ligands that play critical roles in various metabolic processes. The accurate identification of the vitamin-binding residues solely based on a protein sequence is of significant importance for the functional annotation of proteins, especially in the post-genomic era, when large volumes of protein sequences are accumulating quickly without being functionally annotated. Results In this paper, a new predictor called TargetVita is designed and implemented for predicting protein-vitamin binding residues using protein sequences. In TargetVita, features derived from the position-specific scoring matrix (PSSM), predicted protein secondary structure, and vitamin binding propensity are combined to form the original feature space; then, several feature subspaces are selected by performing different feature selection methods. Finally, based on the selected feature subspaces, heterogeneous SVMs are trained and then ensembled for performing prediction. Conclusions The experimental results obtained with four separate vitamin-binding benchmark datasets demonstrate that the proposed TargetVita is superior to the state-of-the-art vitamin-specific predictor, and an average improvement of 10% in terms of the Matthews correlation coefficient (MCC) was achieved over independent validation tests. The TargetVita web server and the datasets used are freely available for academic use at http://csbio.njust.edu.cn/bioinf/TargetVita or http://www.csbio.sjtu.edu.cn/bioinf/TargetVita. Electronic supplementary material The online version of this article (doi:10.1186/1471-2105-15-297) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing 210094, China.
| | | | | | | | | | | |
Collapse
|
16
|
ATP synthase: the right size base model for nanomotors in nanomedicine. ScientificWorldJournal 2014; 2014:567398. [PMID: 24605056 PMCID: PMC3925597 DOI: 10.1155/2014/567398] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2013] [Accepted: 12/05/2013] [Indexed: 11/17/2022] Open
Abstract
Nanomedicine results from nanotechnology where molecular scale minute precise nanomotors can be used to treat disease conditions. Many such biological nanomotors are found and operate in living systems which could be used for therapeutic purposes. The question is how to build nanomachines that are compatible with living systems and can safely operate inside the body? Here we propose that it is of paramount importance to have a workable base model for the development of nanomotors in nanomedicine usage. The base model must placate not only the basic requirements of size, number, and speed but also must have the provisions of molecular modulations. Universal occurrence and catalytic site molecular modulation capabilities are of vital importance for being a perfect base model. In this review we will provide a detailed discussion on ATP synthase as one of the most suitable base models in the development of nanomotors. We will also describe how the capabilities of molecular modulation can improve catalytic and motor function of the enzyme to generate a catalytically improved and controllable ATP synthase which in turn will help in building a superior nanomotor. For comparison, several other biological nanomotors will be described as well as their applications for nanotechnology.
Collapse
|
17
|
Yu DJ, Hu J, Tang ZM, Shen HB, Yang J, Yang JY. Improving protein-ATP binding residues prediction by boosting SVMs with random under-sampling. Neurocomputing 2013. [DOI: 10.1016/j.neucom.2012.10.012] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
|
18
|
Selisko B, Potisopon S, Agred R, Priet S, Varlet I, Thillier Y, Sallamand C, Debart F, Vasseur JJ, Canard B. Molecular basis for nucleotide conservation at the ends of the dengue virus genome. PLoS Pathog 2012; 8:e1002912. [PMID: 23028313 PMCID: PMC3441707 DOI: 10.1371/journal.ppat.1002912] [Citation(s) in RCA: 62] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2012] [Accepted: 08/03/2012] [Indexed: 12/02/2022] Open
Abstract
The dengue virus (DV) is an important human pathogen from the Flavivirus genus, whose genome- and antigenome RNAs start with the strictly conserved sequence pppAG. The RNA-dependent RNA polymerase (RdRp), a product of the NS5 gene, initiates RNA synthesis de novo, i.e., without the use of a pre-existing primer. Very little is known about the mechanism of this de novo initiation and how conservation of the starting adenosine is achieved. The polymerase domain NS5PolDV of NS5, upon initiation on viral RNA templates, synthesizes mainly dinucleotide primers that are then elongated in a processive manner. We show here that NS5PolDV contains a specific priming site for adenosine 5′-triphosphate as the first transcribed nucleotide. Remarkably, in the absence of any RNA template the enzyme is able to selectively synthesize the dinucleotide pppAG when Mn2+ is present as catalytic ion. The T794 to A799 priming loop is essential for initiation and provides at least part of the ATP-specific priming site. The H798 loop residue is of central importance for the ATP-specific initiation step. In addition to ATP selection, NS5PolDV ensures the conservation of the 5′-adenosine by strongly discriminating against viral templates containing an erroneous 3′-end nucleotide in the presence of Mg2+. In the presence of Mn2+, NS5PolDV is remarkably able to generate and elongate the correct pppAG primer on these erroneous templates. This can be regarded as a genomic/antigenomic RNA end repair mechanism. These conservational mechanisms, mediated by the polymerase alone, may extend to other RNA virus families having RdRps initiating RNA synthesis de novo. The 5′- and 3′-ends of RNA virus genomes have evolved towards efficient replication, translation, and escape from defense mechanisms of the host cell. Little is known about how RNA viruses conserve or restore the correct ends of their genomes. The Flavivirus genus of positive-strand RNA viruses contains important human pathogens such as yellow fever virus, West Nile virus, Japanese encephalitis virus and dengue virus (DV). The Flavivirus genome ends are strictly conserved as 5′-AG…CU-3′. We demonstrate here the primary role of the DV polymerase in the conservation of the first and last genomic residue. We show that DV polymerase contains an ATP-specific priming site, which imposes a strong preference for the de novo synthesis of a dinucleotide primer starting with an ATP. Furthermore, the polymerase is able to indirectly correct erroneous sequences by producing the correct primer in the absence of template and on templates containing incorrect nucleotides at the 3′-end. The correct primer is productively elongated on either correct or incorrect templates. Our findings provide a direct demonstration of the implication of a viral RNA polymerase in the conservation and repair of genome ends. Other polymerases from other RNA virus families are likely to employ similar mechanisms.
Collapse
Affiliation(s)
- Barbara Selisko
- Aix-Marseille Université, CNRS, AFMB UMR 7257, 163, Marseille, France.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
19
|
Zhang YN, Yu DJ, Li SS, Fan YX, Huang Y, Shen HB. Predicting protein-ATP binding sites from primary sequence through fusing bi-profile sampling of multi-view features. BMC Bioinformatics 2012; 13:118. [PMID: 22651691 PMCID: PMC3424114 DOI: 10.1186/1471-2105-13-118] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2011] [Accepted: 05/31/2012] [Indexed: 12/23/2022] Open
Abstract
Background Adenosine-5′-triphosphate (ATP) is one of multifunctional nucleotides and plays an important role in cell biology as a coenzyme interacting with proteins. Revealing the binding sites between protein and ATP is significantly important to understand the functionality of the proteins and the mechanisms of protein-ATP complex. Results In this paper, we propose a novel framework for predicting the proteins’ functional residues, through which they can bind with ATP molecules. The new prediction protocol is achieved by combination of sequence evolutional information and bi-profile sampling of multi-view sequential features and the sequence derived structural features. The hypothesis for this strategy is single-view feature can only represent partial target’s knowledge and multiple sources of descriptors can be complementary. Conclusions Prediction performances evaluated by both 5-fold and leave-one-out jackknife cross-validation tests on two benchmark datasets consisting of 168 and 227 non-homologous ATP binding proteins respectively demonstrate the efficacy of the proposed protocol. Our experimental results also reveal that the residue structural characteristics of real protein-ATP binding sites are significant different from those normal ones, for example the binding residues do not show high solvent accessibility propensities, and the bindings prefer to occur at the conjoint points between different secondary structure segments. Furthermore, results also show that performance is affected by the imbalanced training datasets by testing multiple ratios between positive and negative samples in the experiments. Increasing the dataset scale is also demonstrated useful for improving the prediction performances.
Collapse
Affiliation(s)
- Ya-Nan Zhang
- Department of Automation, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | | | | | | | | | | |
Collapse
|
20
|
Chen K, Mizianty MJ, Kurgan L. Prediction and analysis of nucleotide-binding residues using sequence and sequence-derived structural descriptors. ACTA ACUST UNITED AC 2011; 28:331-41. [PMID: 22130595 DOI: 10.1093/bioinformatics/btr657] [Citation(s) in RCA: 86] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Nucleotides are multifunctional molecules that are essential for numerous biological processes. They serve as sources for chemical energy, participate in the cellular signaling and they are involved in the enzymatic reactions. The knowledge of the nucleotide-protein interactions helps with annotation of protein functions and finds applications in drug design. RESULTS We propose a novel ensemble of accurate high-throughput predictors of binding residues from the protein sequence for ATP, ADP, AMP, GTP and GDP. Empirical tests show that our NsitePred method significantly outperforms existing predictors and approaches based on sequence alignment and residue conservation scoring. The NsitePred accurately finds more binding residues and binding sites and it performs particularly well for the sites with residues that are clustered close together in the sequence. The high predictive quality stems from the usage of novel, comprehensive and custom-designed inputs that utilize information extracted from the sequence, evolutionary profiles, several sequence-predicted structural descriptors and sequence alignment. Analysis of the predictive model reveals several sequence-derived hallmarks of nucleotide-binding residues; they are usually conserved and flanked by less conserved residues, and they are associated with certain arrangements of secondary structures and amino acid pairs in the specific neighboring positions in the sequence. AVAILABILITY http://biomine.ece.ualberta.ca/nSITEpred/ CONTACT lkurgan@ece.ualberta.ca SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ke Chen
- School of Computer Science and Software Engineering, Tianjin Polytechnic University, Hedong District, Tianjin 300160, PR of China
| | | | | |
Collapse
|