1
|
Esmaili F, Pourmirzaei M, Ramazi S, Shojaeilangari S, Yavari E. A Review of Machine Learning and Algorithmic Methods for Protein Phosphorylation Site Prediction. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:1266-1285. [PMID: 37863385 PMCID: PMC11082408 DOI: 10.1016/j.gpb.2023.03.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Revised: 01/16/2023] [Accepted: 03/23/2023] [Indexed: 10/22/2023]
Abstract
Post-translational modifications (PTMs) have key roles in extending the functional diversity of proteins and, as a result, regulating diverse cellular processes in prokaryotic and eukaryotic organisms. Phosphorylation modification is a vital PTM that occurs in most proteins and plays a significant role in many biological processes. Disorders in the phosphorylation process lead to multiple diseases, including neurological disorders and cancers. The purpose of this review is to organize this body of knowledge associated with phosphorylation site (p-site) prediction to facilitate future research in this field. At first, we comprehensively review all related databases and introduce all steps regarding dataset creation, data preprocessing, and method evaluation in p-site prediction. Next, we investigate p-site prediction methods, which are divided into two computational groups: algorithmic and machine learning (ML). Additionally, it is shown that there are basically two main approaches for p-site prediction by ML: conventional and end-to-end deep learning methods, both of which are given an overview. Moreover, this review introduces the most important feature extraction techniques, which have mostly been used in p-site prediction. Finally, we create three test sets from new proteins related to the released version of the database of protein post-translational modifications (dbPTM) in 2022 based on general and human species. Evaluating online p-site prediction tools on newly added proteins introduced in the dbPTM 2022 release, distinct from those in the dbPTM 2019 release, reveals their limitations. In other words, the actual performance of these online p-site prediction tools on unseen proteins is notably lower than the results reported in their respective research papers.
Collapse
Affiliation(s)
- Farzaneh Esmaili
- Department of Information Technology, Tarbiat Modares University, Tehran 14115-111, Iran
| | - Mahdi Pourmirzaei
- Department of Information Technology, Tarbiat Modares University, Tehran 14115-111, Iran
| | - Shahin Ramazi
- Department of Biophysics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran 14115-111, Iran.
| | - Seyedehsamaneh Shojaeilangari
- Biomedical Engineering Group, Department of Electrical Engineering and Information Technology, Iranian Research Organization for Science and Technology (IROST), Tehran 33535-111, Iran
| | - Elham Yavari
- Department of Information Technology, Tarbiat Modares University, Tehran 14115-111, Iran
| |
Collapse
|
2
|
Naseer S, Hussain W, Khan YD, Rasool N. iPhosS(Deep)-PseAAC: Identification of Phosphoserine Sites in Proteins Using Deep Learning on General Pseudo Amino Acid Compositions. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1703-1714. [PMID: 33242308 DOI: 10.1109/tcbb.2020.3040747] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Among all the PTMs, the protein phosphorylation is pivotal for various pathological and physiological processes. About 30 percent of eukaryotic proteins undergo the phosphorylation modification, leading to various changes in conformation, function, stability, localization, and so forth. In eukaryotic proteins, phosphorylation occurs on serine (S), Threonine (T) and Tyrosine (Y) residues. Among these all, serine phosphorylation has its own importance as it is associated with various importance biological processes, including energy metabolism, signal transduction pathways, cell cycling, and apoptosis. Thus, its identification is important, however, the in vitro, ex vivo and in vivo identification can be laborious, time-taking and costly. There is a dire need of an efficient and accurate computational model to help researchers and biologists identifying these sites, in an easy manner. Herein, we propose a novel predictor for identification of Phosphoserine sites (PhosS) in proteins, by integrating the Chou's Pseudo Amino Acid Composition (PseAAC) with deep features. We used well-known DNNs for both the tasks of learning a feature representation of peptide sequences and performing classifications. Among different DNNs, the best score is shown by Covolutional Neural Network based model which renders CNN based prediction model the best for Phosphoserine prediction. Based on these results, it is concluded that the proposed model can help to identify PhosS sites in a very efficient and accurate manner which can help scientists understand the mechanism of this modification in proteins.
Collapse
|
3
|
Yu L, Qiu W, Lin W, Cheng X, Xiao X, Dai J. HGDTI: predicting drug-target interaction by using information aggregation based on heterogeneous graph neural network. BMC Bioinformatics 2022; 23:126. [PMID: 35413800 PMCID: PMC9004085 DOI: 10.1186/s12859-022-04655-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Accepted: 03/28/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In research on new drug discovery, the traditional wet experiment has a long period. Predicting drug-target interaction (DTI) in silico can greatly narrow the scope of search of candidate medications. Excellent algorithm model may be more effective in revealing the potential connection between drug and target in the bioinformatics network composed of drugs, proteins and other related data. RESULTS In this work, we have developed a heterogeneous graph neural network model, named as HGDTI, which includes a learning phase of network node embedding and a training phase of DTI classification. This method first obtains the molecular fingerprint information of drugs and the pseudo amino acid composition information of proteins, then extracts the initial features of nodes through Bi-LSTM, and uses the attention mechanism to aggregate heterogeneous neighbors. In several comparative experiments, the overall performance of HGDTI significantly outperforms other state-of-the-art DTI prediction models, and the negative sampling technology is employed to further optimize the prediction power of model. In addition, we have proved the robustness of HGDTI through heterogeneous network content reduction tests, and proved the rationality of HGDTI through other comparative experiments. These results indicate that HGDTI can utilize heterogeneous information to capture the embedding of drugs and targets, and provide assistance for drug development. CONCLUSIONS The HGDTI based on heterogeneous graph neural network model, can utilize heterogeneous information to capture the embedding of drugs and targets, and provide assistance for drug development. For the convenience of related researchers, a user-friendly web-server has been established at http://bioinfo.jcu.edu.cn/hgdti .
Collapse
Affiliation(s)
- Liyi Yu
- School of Information Engineering, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Wangren Qiu
- School of Information Engineering, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Weizhong Lin
- School of Information Engineering, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Xiang Cheng
- School of Information Engineering, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Xuan Xiao
- School of Information Engineering, Jingdezhen Ceramic Institute, Jingdezhen, China.
| | - Jiexia Dai
- School of Foreign Languages, Jingdezhen University, Jingdezhen, China
| |
Collapse
|
4
|
Qiu WR, Guan MY, Wang QK, Lou LL, Xiao X. Identifying Pupylation Proteins and Sites by Incorporating Multiple Methods. Front Endocrinol (Lausanne) 2022; 13:849549. [PMID: 35557849 PMCID: PMC9088680 DOI: 10.3389/fendo.2022.849549] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/06/2022] [Accepted: 03/07/2022] [Indexed: 11/20/2022] Open
Abstract
Pupylation is an important posttranslational modification in proteins and plays a key role in the cell function of microorganisms; an accurate prediction of pupylation proteins and specified sites is of great significance for the study of basic biological processes and development of related drugs since it would greatly save experimental costs and improve work efficiency. In this work, we first constructed a model for identifying pupylation proteins. To improve the pupylation protein prediction model, the KNN scoring matrix model based on functional domain GO annotation and the Word Embedding model were used to extract the features and Random Under-sampling (RUS) and Synthetic Minority Over-sampling Technique (SMOTE) were applied to balance the dataset. Finally, the balanced data sets were input into Extreme Gradient Boosting (XGBoost). The performance of 10-fold cross-validation shows that accuracy (ACC), Matthew's correlation coefficient (MCC), and area under the ROC curve (AUC) are 95.23%, 0.8100, and 0.9864, respectively. For the pupylation site prediction model, six feature extraction codes (i.e., TPC, AAI, One-hot, PseAAC, CKSAAP, and Word Embedding) served to extract protein sequence features, and the chi-square test was employed for feature selection. Rigorous 10-fold cross-validations indicated that the accuracies are very high and outperformed its existing counterparts. Finally, for the convenience of researchers, PUP-PS-Fuse has been established at https://bioinfo.jcu.edu.cn/PUP-PS-Fuse and http://121.36.221.79/PUP-PS-Fuse/as a backup.
Collapse
Affiliation(s)
| | | | | | | | - Xuan Xiao
- *Correspondence: Wang-Ren Qiu, ; Xuan Xiao,
| |
Collapse
|
5
|
Ge F, Hu J, Zhu YH, Arif M, Yu DJ. TargetMM: Accurate Missense Mutation Prediction by Utilizing Local and Global Sequence Information with Classifier Ensemble. Comb Chem High Throughput Screen 2022; 25:38-52. [PMID: 33280588 DOI: 10.2174/1386207323666201204140438] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2020] [Revised: 10/22/2020] [Accepted: 10/26/2020] [Indexed: 11/22/2022]
Abstract
AIM AND OBJECTIVE Missense mutation (MM) may lead to various human diseases by disabling proteins. Accurate prediction of MM is important and challenging for both protein function annotation and drug design. Although several computational methods yielded acceptable success rates, there is still room for further enhancing the prediction performance of MM. MATERIALS AND METHODS In the present study, we designed a new feature extracting method, which considers the impact degree of residues in the microenvironment range to the mutation site. Stringent cross-validation and independent test on benchmark datasets were performed to evaluate the efficacy of the proposed feature extracting method. Furthermore, three heterogeneous prediction models were trained and then ensembled for the final prediction. By combining the feature representation method and classifier ensemble technique, we reported a novel MM predictor called TargetMM for identifying the pathogenic mutations from the neutral ones. RESULTS Comparison outcomes based on statistical evaluation demonstrate that TargetMM outperforms the prior advanced methods on the independent test data. The source codes and benchmark datasets of TargetMM are freely available at https://github.com/sera616/TargetMM.git for academic use.
Collapse
Affiliation(s)
- Fang Ge
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094,China
| | - Jun Hu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023,China
| | - Yi-Heng Zhu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094,China
| | - Muhammad Arif
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094,China
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094,China
| |
Collapse
|
6
|
Alzahrani E, Alghamdi W, Ullah MZ, Khan YD. Identification of stress response proteins through fusion of machine learning models and statistical paradigms. Sci Rep 2021; 11:21767. [PMID: 34741132 PMCID: PMC8571424 DOI: 10.1038/s41598-021-99083-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Accepted: 09/13/2021] [Indexed: 11/08/2022] Open
Abstract
Proteins are a vital component of cells that perform physiological functions to ensure smooth operations of bodily functions. Identification of a protein's function involves a detailed understanding of the structure of proteins. Stress proteins are essential mediators of several responses to cellular stress and are categorized based on their structural characteristics. These proteins are found to be conserved across many eukaryotic and prokaryotic linkages and demonstrate varied crucial functional activities inside a cell. The in-vivo, ex vivo, and in-vitro identification of stress proteins are a time-consuming and costly task. This study is aimed at the identification of stress protein sequences with the aid of mathematical modelling and machine learning methods to supplement the aforementioned wet lab methods. The model developed using Random Forest showed remarkable results with 91.1% accuracy while models based on neural network and support vector machine showed 87.7% and 47.0% accuracy, respectively. Based on evaluation results it was concluded that random-forest based classifier surpassed all other predictors and is suitable for use in practical applications for the identification of stress proteins. Live web server is available at http://biopred.org/stressprotiens , while the webserver code available is at https://github.com/abdullah5naveed/SRP_WebServer.git.
Collapse
Affiliation(s)
- Ebraheem Alzahrani
- Department of Mathematics, Faculty of Science, King Abdulaziz University, P. O. Box 80203, Jeddah, 21589, Saudi Arabia
| | - Wajdi Alghamdi
- Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, P. O. Box 80221, Jeddah, 21589, Saudi Arabia
| | - Malik Zaka Ullah
- Department of Mathematics, Faculty of Science, King Abdulaziz University, P. O. Box 80203, Jeddah, 21589, Saudi Arabia
| | - Yaser Daanial Khan
- Department of Computer Science, University of Management and Technology, Lahore, 54770, Pakistan.
| |
Collapse
|
7
|
Alghamdi W, Alzahrani E, Ullah MZ, Khan YD. 4mC-RF: Improving the prediction of 4mC sites using composition and position relative features and statistical moment. Anal Biochem 2021; 633:114385. [PMID: 34571005 DOI: 10.1016/j.ab.2021.114385] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 09/09/2021] [Accepted: 09/13/2021] [Indexed: 01/28/2023]
Abstract
N4-methylcytosine (4 mC) is an important epigenetic modification that occurs enzymatically by the action of DNA methyltransferases. 4 mC sites exist in prokaryotes and eukaryotes while playing a vital role in regulating gene expression, DNA replication, and cell cycle. The efficient and accurate prediction of 4 mC sites has a significant role in the insight of 4 mC biological properties and functions. Therefore, a sequence-based predictor is proposed, namely 4 mC-RF, for identifying 4 mC sites through the integration of statistical moments along with position, and composition-dependent features. Relative and absolute position-based features are computed to extract optimal features. A popular machine learning classifier Random Forest was used for training the model. Validation results were obtained through rigorous processes of self-consistency, 10-fold cross-validation, Independent set testing, and Jackknife yielding 95.1%, 95.2%, 97.0%, and 94.7% accuracies, respectively. Our proposed model depicts the highest prediction accuracies as compared to existing models. Subsequently, the developed 4 mC-RF model was constructed into a web server. A significant and more accurate predictor of 4 mC Methylcytosine sites helps experimental scientists to gather faster, efficient, and cost-effective results.
Collapse
Affiliation(s)
- Wajdi Alghamdi
- Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, P. O. Box 80221, Jeddah 21589, Saudi Arabia.
| | - Ebraheem Alzahrani
- Department of Mathematics, Faculty of Science, King Abdulaziz University, P. O. Box 80203, Jeddah 21589, Saudi Arabia.
| | - Malik Zaka Ullah
- Department of Mathematics, Faculty of Science, King Abdulaziz University, P. O. Box 80203, Jeddah 21589, Saudi Arabia.
| | - Yaser Daanial Khan
- Department of Computer Science, University of Management and Technology, Lahore 54770, Pakistan.
| |
Collapse
|
8
|
Akmal MA, Hussain W, Rasool N, Khan YD, Khan SA, Chou KC. Using CHOU'S 5-Steps Rule to Predict O-Linked Serine Glycosylation Sites by Blending Position Relative Features and Statistical Moment. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2045-2056. [PMID: 31985438 DOI: 10.1109/tcbb.2020.2968441] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Glycosylation of proteins in eukaryote cells is an important and complicated post-translation modification due to its pivotal role and association with crucial physiological functions within most of the proteins. Identification of glycosylation sites in a polypeptide chain is not an easy task due to multiple impediments. Analytical identification of these sites is expensive and laborious. There is a dire need to develop a reliable computational method for precise determination of such sites which can help researchers to save time and effort. Herein, we propose a novel predictor namely iGlycoS-PseAAC by integrating the Chou's Pseudo Amino Acid Composition (PseAAC) and relative/absolute position-based features. The self-consistency results show that the accuracy revealed by the model using the benchmark dataset for prediction of O-linked glycosylation having serine sites is 98.8 percent. The overall accuracy of predictor achieved through 10-fold cross validation by combining the positive and negative results is 97.2 percent. The overall accuracy achieved through Jackknife test is 96.195 percent by aggregating of all the prediction results. Thus the proposed predictor can help in predicting the O-linked glycosylated serine sites in an efficient and accurate way. The overall results show that the accuracy of the iGlycoS-PseAAC is higher than the existing tools.
Collapse
|
9
|
Awais M, Hussain W, Khan YD, Rasool N, Khan SA, Chou KC. iPhosH-PseAAC: Identify Phosphohistidine Sites in Proteins by Blending Statistical Moments and Position Relative Features According to the Chou's 5-Step Rule and General Pseudo Amino Acid Composition. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:596-610. [PMID: 31144645 DOI: 10.1109/tcbb.2019.2919025] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Protein phosphorylation is one of the key mechanism in prokaryotes and eukaryotes and is responsible for various biological functions such as protein degradation, intracellular localization, the multitude of cellular processes, molecular association, cytoskeletal dynamics, and enzymatic inhibition/activation. Phosphohistidine (PhosH) has a key role in a number of biological processes, including central metabolism to signalling in eukaryotes and bacteria. Thus, identification of phosphohistidine sites in a protein sequence is crucial, and experimental identification can be expensive, time-taking, and laborious. To address this problem, here, we propose a novel computational model namely iPhosH-PseAAC for prediction of phosphohistidine sites in a given protein sequence using pseudo amino acid composition (PseAAC), statistical moments, and position relative features. The results of the proposed predictor are validated through self-consistency testing, 10-fold cross-validation, and jackknife testing. The self-consistency validation gave the 100 percent accuracy, whereas, for cross-validation, the accuracy achieved is 94.26 percent. Moreover, jackknife testing gave 97.07 percent accuracy for the proposed model. Thus, the proposed model iPhosH-PseAAC for prediction of iPhosH site has the great ability to predict the PhosH sites in given proteins.
Collapse
|
10
|
Khan YD, Alzahrani E, Alghamdi W, Ullah MZ. Sequence-based Identification of Allergen Proteins Developed by Integration of PseAAC and Statistical Moments via 5-Step Rule. Curr Bioinform 2021. [DOI: 10.2174/1574893615999200424085947] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Background:
Allergens are antigens that can stimulate an atopic type I human
hypersensitivity reaction by an immunoglobulin E (IgE) reaction. Some proteins are naturally
allergenic than others. The challenge for toxicologists is to identify properties that allow proteins
to cause allergic sensitization and allergic diseases. The identification of allergen proteins is a very
critical and pivotal task. The experimental identification of protein functions is a hectic, laborious
and costly task; therefore, computer scientists have proposed various methods in the field of
computational biology and bioinformatics using various data science approaches. Objectives:
Herein, we report a novel predictor for the identification of allergen proteins.
Methods:
For feature extraction, statistical moments and various position-based features have been
incorporated into Chou’s pseudo amino acid composition (PseAAC), and are used for training of a
neural network.
Results:
The predictor is validated through 10-fold cross-validation and Jackknife testing, which
gave 99.43% and 99.87% accurate results.
Conclusions:
Thus, the proposed predictor can help in predicting the Allergen proteins in an
efficient and accurate way and can provide baseline data for the discovery of new drugs and
biomarkers.
Collapse
Affiliation(s)
- Yaser Daanial Khan
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, C II Johar Town, Lahore 54770, Pakistan
| | - Ebraheem Alzahrani
- Department of Mathematics, Faculty of Science, King Abdulaziz University, P.O. Box 80203, Jeddah 21589, Saudi Arabia
| | - Wajdi Alghamdi
- Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, P.O. Box 80221, Jeddah, Saudi Arabia
| | - Malik Zaka Ullah
- Department of Mathematics, Faculty of Science, King Abdulaziz University, P.O. Box 80203, Jeddah 21589, Saudi Arabia
| |
Collapse
|
11
|
Naseer S, Hussain W, Khan YD, Rasool N. Optimization of serine phosphorylation prediction in proteins by comparing human engineered features and deep representations. Anal Biochem 2020; 615:114069. [PMID: 33340540 DOI: 10.1016/j.ab.2020.114069] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2020] [Revised: 11/15/2020] [Accepted: 12/14/2020] [Indexed: 02/01/2023]
Abstract
Deep representations can be used to replace human-engineered representations, as such features are constrained by certain limitations. For the prediction of protein post-translation modifications (PTMs) sites, research community uses different feature extraction techniques applied on Pseudo amino acid compositions (PseAAC). Serine phosphorylation is one of the most important PTM as it is the most occurring, and is important for various biological functions. Creating efficient representations from large protein sequences, to predict PTM sites, is a time and resource intensive task. In this study we propose, implement and evaluate use of Deep learning to learn effective protein data representations from PseAAC to develop data driven PTM detection systems and compare the same with two human representations.. The comparisons are performed by training an xgboost based classifier using each representation. The best scores were achieved by RNN-LSTM based deep representation and CNN based representation with an accuracy score of 81.1% and 78.3% respectively. Human engineered representations scored 77.3% and 74.9% respectively. Based on these results, it is concluded that the deep features are promising feature engineering replacement to identify PhosS sites in a very efficient and accurate manner which can help scientists understand the mechanism of this modification in proteins.
Collapse
Affiliation(s)
- Sheraz Naseer
- Department of Computer Science, University of Management and Technology, Lahore, Pakistan.
| | - Waqar Hussain
- National Center of Artificial Intelligence, Punjab University College of Information Technology, University of the Punjab, Lahore, Pakistan; Center for Professional & Applied Studies, Lahore, Pakistan
| | - Yaser Daanial Khan
- Department of Computer Science, University of Management and Technology, Lahore, Pakistan
| | - Nouman Rasool
- Center for Professional & Applied Studies, Lahore, Pakistan
| |
Collapse
|
12
|
Liu GH, Zhang BW, Qian G, Wang B, Mao B, Bichindaritz I. Bioimage-Based Prediction of Protein Subcellular Location in Human Tissue with Ensemble Features and Deep Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1966-1980. [PMID: 31107658 DOI: 10.1109/tcbb.2019.2917429] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Prediction of protein subcellular location has currently become a hot topic because it has been proven to be useful for understanding both the disease mechanisms and novel drug design. With the rapid development of automated microscopic imaging technology in recent years, classification methods of bioimage-based protein subcellular location have attracted considerable attention for images can describe the protein distribution intuitively and in detail. In the current study, a prediction method of protein subcellular location was proposed based on multi-view image features that are extracted from three different views, including the four texture features of the original image, the global and local features of the protein extracted from the protein channel images after color segmentation, and the global features of DNA extracted from the DNA channel image. Finally, the extracted features were combined together to improve the performance of subcellular localization prediction. From the performance comparison of different combination features under the same classifier, the best ensemble features could be obtained. In this work, a classifier based on Stacked Auto-encoders and the random forest was also put forward. To improve the prediction results, the deep network was combined with the traditional statistical classification methods. Stringent cross-validation and independent validation tests on the benchmark dataset demonstrated the efficacy of the proposed method.
Collapse
|
13
|
Amanat S, Ashraf A, Hussain W, Rasool N, Khan YD. Identification of Lysine Carboxylation Sites in Proteins by Integrating Statistical Moments and Position Relative Features via General PseAAC. Curr Bioinform 2020. [DOI: 10.2174/1574893614666190723114923] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Background:
Carboxylation is one of the most biologically important post-translational
modifications and occurs on lysine, arginine, and glutamine residues of a protein. Among all these
three, the covalent attachment of the carboxyl group with the lysine side chain is the most frequent
and biologically important type of carboxylation. For studying such biological functions, it is essential
to correctly determine the lysine sites sensitive to carboxylation.
Objective:
Herein, we present a computational model for the prediction of the carboxylysine site
which is based on machine learning.
Methods:
Various position and composition relative features have been incorporated into the Pse-
AAC for construction of feature vectors and a neural network is employed as a classifier. The
model is validated by jackknife, cross-validation, self-consistency, and independent testing.
Results:
The results of the self-consistency test elaborated that model has 99.76% Acc, 99.76% Sp,
99.76% Sp, and 0.99 MCC. Using the jackknife method, prediction model validation gave 97.07%
Acc, while for 10-fold cross-validation, prediction model validation gave 95.16% Acc.
Conclusion:
The results of independent dataset testing were 94.3% which illustrated that the proposed
model has better performance as compared to the existing model PreLysCar; however, the
accuracy can be improved further, in the future, due to the increasing number of carboxylysine
sites in proteins.
Collapse
Affiliation(s)
- Saba Amanat
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| | - Adeel Ashraf
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| | - Waqar Hussain
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| | - Nouman Rasool
- Department of Life Sciences, School of Science University of Management and Technology, Lahore, Pakistan
| | - Yaser D. Khan
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| |
Collapse
|
14
|
|
15
|
Zheng L, Huang S, Mu N, Zhang H, Zhang J, Chang Y, Yang L, Zuo Y. RAACBook: a web server of reduced amino acid alphabet for sequence-dependent inference by using Chou's five-step rule. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2019:5650975. [PMID: 31802128 PMCID: PMC6893003 DOI: 10.1093/database/baz131] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Revised: 10/16/2019] [Accepted: 10/17/2019] [Indexed: 12/12/2022]
Abstract
By reducing amino acid alphabet, the protein complexity can be significantly simplified, which could improve computational efficiency, decrease information redundancy and reduce chance of overfitting. Although some reduced alphabets have been proposed, different classification rules could produce distinctive results for protein sequence analysis. Thus, it is urgent to construct a systematical frame for reduced alphabets. In this work, we constructed a comprehensive web server called RAACBook for protein sequence analysis and machine learning application by integrating reduction alphabets. The web server contains three parts: (i) 74 types of reduced amino acid alphabet were manually extracted to generate 673 reduced amino acid clusters (RAACs) for dealing with unique protein problems. It is easy for users to select desired RAACs from a multilayer browser tool. (ii) An online tool was developed to analyze primary sequence of protein. The tool could produce K-tuple reduced amino acid composition by defining three correlation parameters (K-tuple, g-gap, λ-correlation). The results are visualized as sequence alignment, mergence of RAA composition, feature distribution and logo of reduced sequence. (iii) The machine learning server is provided to train the model of protein classification based on K-tuple RAAC. The optimal model could be selected according to the evaluation indexes (ROC, AUC, MCC, etc.). In conclusion, RAACBook presents a powerful and user-friendly service in protein sequence analysis and computational proteomics. RAACBook can be freely available at http://bioinfor.imu.edu.cn/raacbook. Database URL: http://bioinfor.imu.edu.cn/raacbook
Collapse
Affiliation(s)
- Lei Zheng
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Zhaojun Road No.24, Hohhot, 010070, China
| | - Shenghui Huang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Zhaojun Road No.24, Hohhot, 010070, China
| | - Nengjiang Mu
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Zhaojun Road No.24, Hohhot, 010070, China
| | - Haoyue Zhang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Zhaojun Road No.24, Hohhot, 010070, China
| | - Jiayu Zhang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Zhaojun Road No.24, Hohhot, 010070, China
| | - Yu Chang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Zhaojun Road No.24, Hohhot, 010070, China
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Baojian Road No.157, Harbin 150081, China
| | - Yongchun Zuo
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Zhaojun Road No.24, Hohhot, 010070, China
| |
Collapse
|
16
|
AHMAD WAKIL, ARAFAT EASIN, TAHERZADEH GHAZALEH, SHARMA ALOK, DIPTA SHUBHASHISROY, DEHZANGI ABDOLLAH, SHATABDA SWAKKHAR. Mal-Light: Enhancing Lysine Malonylation Sites Prediction Problem Using Evolutionary-based Features. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2020; 8:77888-77902. [PMID: 33354488 PMCID: PMC7751949 DOI: 10.1109/access.2020.2989713] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Post Translational Modification (PTM) is considered an important biological process with a tremendous impact on the function of proteins in both eukaryotes, and prokaryotes cells. During the past decades, a wide range of PTMs has been identified. Among them, malonylation is a recently identified PTM which plays a vital role in a wide range of biological interactions. Notwithstanding, this modification plays a potential role in energy metabolism in different species including Homo Sapiens. The identification of PTM sites using experimental methods is time-consuming and costly. Hence, there is a demand for introducing fast and cost-effective computational methods. In this study, we propose a new machine learning method, called Mal-Light, to address this problem. To build this model, we extract local evolutionary-based information according to the interaction of neighboring amino acids using a bi-peptide based method. We then use Light Gradient Boosting (LightGBM) as our classifier to predict malonylation sites. Our results demonstrate that Mal-Light is able to significantly improve malonylation site prediction performance compared to previous studies found in the literature. Using Mal-Light we achieve Matthew's correlation coefficient (MCC) of 0.74 and 0.60, Accuracy of 86.66% and 79.51%, Sensitivity of 78.26% and 67.27%, and Specificity of 95.05% and 91.75%, for Homo Sapiens and Mus Musculus proteins, respectively. Mal-Light is implemented as an online predictor which is publicly available at: (http://brl.uiu.ac.bd/MalLight/).
Collapse
Affiliation(s)
- WAKIL AHMAD
- Department of Computer Science and Engineering, United International University, United City, Madani Avenue, Dhaka 1212, Bangladesh
| | - EASIN ARAFAT
- Department of Computer Science and Engineering, United International University, United City, Madani Avenue, Dhaka 1212, Bangladesh
| | - GHAZALEH TAHERZADEH
- Institute for Bioscience and Biotechnology Research, University of Maryland, College Park, MD, 20742, USA
| | - ALOK SHARMA
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, QLD-4111, Australia
- Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University (TMDU), Tokyo, 113-8510, Japan
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, 230-0045, Kanagawa, Japan
- School of Engineering and Physics, Faculty of Science Technology and Environment, University of the South Pacific, Suva, Fiji
- CREST, JST, Tokyo, 102-8666, Japan
| | - SHUBHASHIS ROY DIPTA
- Department of Computer Science and Engineering, United International University, United City, Madani Avenue, Dhaka 1212, Bangladesh
| | - ABDOLLAH DEHZANGI
- Department of Computer Science, Morgan State University, Baltimore, MD, 21251, USA
| | - SWAKKHAR SHATABDA
- Department of Computer Science and Engineering, United International University, United City, Madani Avenue, Dhaka 1212, Bangladesh
| |
Collapse
|
17
|
Cui BL, Ding Y. Accurate Identification of Human Phosphorylated Proteins by Ensembling Supervised Kernel Self-organizing Maps. Mol Inform 2020; 39:e1900141. [PMID: 31994832 DOI: 10.1002/minf.201900141] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2019] [Accepted: 12/20/2019] [Indexed: 12/15/2022]
Abstract
Protein phosphorylation is a vital physiological process, which plays a critical role in controlling survival differentiation, cell growth, metabolism and apoptosis. The accurate identification of whether a protein will be phosphorylated solely from protein sequence is especially useful for both basic research and drug development. In this study, a new predictor specifically designed for the prediction of human phosphorylated proteins is proposed. The proposed method first train two supervised kernel self-organizing maps (SKSOMs): one is trained with feature from protein physiochemical composition view, while the other is trained with feature from protein evolutionary information view. Then, the two trained SKSOMs are ensembled to perform the final prediction. Rigorous computational experiments show that the proposed method achieves 78.75 % and 0.561 on ACC and MCC, which are 6.96 % and 12.5 % higher than that of the state-of-the-art predictor. Overall, the study demonstrated a new sensitive avenue to identify human phosphorylated proteins and could be readily extended to recognize phosphorylated proteins for other species.
Collapse
Affiliation(s)
- Bei-Liang Cui
- Network Information Center, Nanjing TECH University, Nanjing, 211816, P. R. China
| | - Yong Ding
- Information Center, Nanjing Polytechnic Institute, Nanjing, 210084, P. R. China
| |
Collapse
|
18
|
Zheng H, Yang H, Gong D, Mai L, Qiu X, Chen L, Su X, Wei R, Zeng Z. Progress in the Mechanism and Clinical Application of Cilostazol. Curr Top Med Chem 2020; 19:2919-2936. [PMID: 31763974 DOI: 10.2174/1568026619666191122123855] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2019] [Revised: 07/27/2019] [Accepted: 08/02/2019] [Indexed: 12/20/2022]
Abstract
Cilostazol is a unique platelet inhibitor that has been used clinically for more than 20 years. As a phosphodiesterase type III inhibitor, cilostazol is capable of reversible inhibition of platelet aggregation and vasodilation, has antiproliferative effects, and is widely used in the treatment of peripheral arterial disease, cerebrovascular disease, percutaneous coronary intervention, etc. This article briefly reviews the pharmacological mechanisms and clinical application of cilostazol.
Collapse
Affiliation(s)
- Huilei Zheng
- Department of Medical Examination & Health Management, First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China.,Guangxi Key Laboratory of Precision Medicine in Cardio-cerebrovascular Diseases Control and Prevention,Nanning, Guangxi, China.,Guangxi Clinical Research Center for Cardio-cerebrovascular Diseases, Nanning, Guangxi, China
| | - Hua Yang
- Guangxi Key Laboratory of Precision Medicine in Cardio-cerebrovascular Diseases Control and Prevention,Nanning, Guangxi, China.,Guangxi Clinical Research Center for Cardio-cerebrovascular Diseases, Nanning, Guangxi, China.,Department of Critical Care Medicine, Second People's Hospital of Nanning, Nanning, Guangxi, China
| | - Danping Gong
- Guangxi Key Laboratory of Precision Medicine in Cardio-cerebrovascular Diseases Control and Prevention,Nanning, Guangxi, China.,Guangxi Clinical Research Center for Cardio-cerebrovascular Diseases, Nanning, Guangxi, China.,Elderly Cardiology Ward, First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Lanxian Mai
- Guangxi Key Laboratory of Precision Medicine in Cardio-cerebrovascular Diseases Control and Prevention,Nanning, Guangxi, China.,Guangxi Clinical Research Center for Cardio-cerebrovascular Diseases, Nanning, Guangxi, China.,Disciplinary Construction Office, First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Xiaoling Qiu
- Guangxi Key Laboratory of Precision Medicine in Cardio-cerebrovascular Diseases Control and Prevention,Nanning, Guangxi, China.,Guangxi Clinical Research Center for Cardio-cerebrovascular Diseases, Nanning, Guangxi, China
| | - Lidai Chen
- Guangxi Key Laboratory of Precision Medicine in Cardio-cerebrovascular Diseases Control and Prevention,Nanning, Guangxi, China.,Guangxi Clinical Research Center for Cardio-cerebrovascular Diseases, Nanning, Guangxi, China
| | - Xiaozhou Su
- Guangxi Key Laboratory of Precision Medicine in Cardio-cerebrovascular Diseases Control and Prevention,Nanning, Guangxi, China.,Guangxi Clinical Research Center for Cardio-cerebrovascular Diseases, Nanning, Guangxi, China
| | - Ruoqi Wei
- Department of Computer Science and Engineering, University of Bridgeport,126 Park Ave, BRIDGEPORT, CT 06604, United States
| | - Zhiyu Zeng
- Guangxi Key Laboratory of Precision Medicine in Cardio-cerebrovascular Diseases Control and Prevention,Nanning, Guangxi, China.,Guangxi Clinical Research Center for Cardio-cerebrovascular Diseases, Nanning, Guangxi, China.,Elderly Cardiology Ward, First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| |
Collapse
|
19
|
Shao Y, Chou KC. pLoc_Deep-mEuk: Predict Subcellular Localization of Eukaryotic Proteins by Deep Learning. ACTA ACUST UNITED AC 2020. [DOI: 10.4236/ns.2020.126034] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
20
|
Qiu WR, Xu A, Xu ZC, Zhang CH, Xiao X. Identifying Acetylation Protein by Fusing Its PseAAC and Functional Domain Annotation. Front Bioeng Biotechnol 2019; 7:311. [PMID: 31867311 PMCID: PMC6908504 DOI: 10.3389/fbioe.2019.00311] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2019] [Accepted: 10/22/2019] [Indexed: 11/13/2022] Open
Abstract
Acetylation is one of post-translational modification (PTM), which often reacts with acetic acid and brings an acetyl radical to an organic compound. It is helpful to identify acetylation protein correctly for understanding the mechanism of acetylation in biological systems. Although many acetylation sites have been identified by high throughput experimental studies via mass spectrometry, there still are lots of acetylation sites need to be discovered. Computational methods have showed their power for identifying acetylation sites with informatics techniques which usually reduce experiment cost and improve the effectiveness and efficiency. In fact, if there is an approach can distinguish the acetylated proteins from the non-acetylated ones, it is no doubt a very meaningful and effective method for this issue. Here, we proposed a novel computational method for identifying acetylation proteins by extracting features from the conservation information of sequence via gray system model and KNN scores based on the information of functional domain annotation and subcellular localization. The authors have performed the 5-fold cross-validation on three datasets along with much analysis of features and the Relief feature selection algorithm. The obtained accuracies are all satisfactory, as the mean performance, the accuracy is 77.10%, the Matthew's correlation coefficient is 0.5457, and the AUC value is 0.8389. These works might provide useful insights for the related experimental validation, and further studies of other PTM process. For the convenience of related researchers, the web-server named “iACetyP” was established and is accessible at http://www.jci-bioinfo.cn/iAcetyP.
Collapse
Affiliation(s)
- Wang-Ren Qiu
- School of Information and Engineering, Jingdezhen Ceramic Institute, Jingdezhen, China.,School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Ao Xu
- School of Information and Engineering, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Zhao-Chun Xu
- School of Information and Engineering, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Chun-Hua Zhang
- School of Information and Engineering, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Xuan Xiao
- School of Information and Engineering, Jingdezhen Ceramic Institute, Jingdezhen, China
| |
Collapse
|
21
|
Qiu W, Xu C, Xiao X, Xu D. Computational Prediction of Ubiquitination Proteins Using Evolutionary Profiles and Functional Domain Annotation. Curr Genomics 2019; 20:389-399. [PMID: 32476995 PMCID: PMC7235393 DOI: 10.2174/1389202919666191014091250] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2019] [Revised: 07/14/2019] [Accepted: 08/29/2019] [Indexed: 11/22/2022] Open
Abstract
Background: Ubiquitination, as a post-translational modification, is a crucial biological process in cell signaling, apoptosis, and localization. Identification of ubiquitination proteins is of fundamental importance for understanding the molecular mechanisms in biological systems and diseases. Although high-throughput experimental studies using mass spectrometry have identified many ubiquitination proteins and ubiquitination sites, the vast majority of ubiquitination proteins remain undiscovered, even in well-studied model organisms. Objective: To reduce experimental costs, computational methods have been introduced to predict ubiquitination sites, but the accuracy is unsatisfactory. If it can be predicted whether a protein can be ubiquitinated or not, it will help in predicting ubiquitination sites. However, all the computational methods so far can only predict ubiquitination sites. Methods: In this study, the first computational method for predicting ubiquitination proteins without relying on ubiquitination site prediction has been developed. The method extracts features from sequence conservation information through a grey system model, as well as functional domain annotation and subcellular localization. Results: Together with the feature analysis and application of the relief feature selection algorithm, the results of 5-fold cross-validation on three datasets achieved a high accuracy of 90.13%, with Matthew’s correlation coefficient of 80.34%. The predicted results on an independent test data achieved 87.71% as accuracy and 75.43% of Matthew’s correlation coefficient, better than the prediction from the best ubiquitination site prediction tool available. Conclusion: Our study may guide experimental design and provide useful insights for studying the mechanisms and modulation of ubiquitination pathways. The code is available at: https://github.com/Chunhuixu/UBIPredic_QWRCHX
Collapse
Affiliation(s)
- Wangren Qiu
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen 333046, China
| | - Chunhui Xu
- Informatics Institute, University of Missouri, Columbia, MO 65201, USA
| | - Xuan Xiao
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen 333046, China
| | - Dong Xu
- Informatics Institute, University of Missouri, Columbia, MO 65201, USA.,Department of Electrical Engineering and Computer Science, Bond Life Sciences Center, University of Missouri, Columbia, MO 65201, USA
| |
Collapse
|
22
|
Malebary SJ, Rehman MSU, Khan YD. iCrotoK-PseAAC: Identify lysine crotonylation sites by blending position relative statistical features according to the Chou's 5-step rule. PLoS One 2019; 14:e0223993. [PMID: 31751380 PMCID: PMC6874067 DOI: 10.1371/journal.pone.0223993] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2019] [Accepted: 10/02/2019] [Indexed: 01/22/2023] Open
Abstract
Among different post-translational modifications (PTMs), one of the most important one is the lysine crotonylation in proteins. Its importance cannot be undermined related to different diseases and essential biological practice. The key step for finding the hidden mechanisms of crotonylation along with their occurrence sites is to completely apprehend the mechanism behind this biological process. In previously reported studies, researchers have used different techniques, like position weighted matrix (PWM), support vector machine (SVM), k nearest neighbors (KNN), and many others. However, the maximum prediction accuracy achieved was not such high. To address this, herein, we propose an improved predictor for lysine crotonylation sites named iCrotoK-PseAAC, in which we have incorporated various position and composition relative features along with statistical moments into PseAAC. The results of self-consistency testing were 100% accurate, while the 10-fold cross validation gave 99.0% accuracy. Based on the validation and comparison of model, it is concluded that the iCrotoK-PseAAC is more accurate than the previously proposed models.
Collapse
Affiliation(s)
- Sharaf Jameel Malebary
- Department of Information Technology, King Abdul Aziz University, Rabigh, Kingdom of Saudi Arabia
| | - Muhammad Safi ur Rehman
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| | - Yaser Daanial Khan
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| |
Collapse
|
23
|
Lan J, Liu Z, Liao C, Merkler DJ, Han Q, Li J. A Study for Therapeutic Treatment against Parkinson's Disease via Chou's 5-steps Rule. Curr Top Med Chem 2019; 19:2318-2333. [PMID: 31629395 DOI: 10.2174/1568026619666191019111528] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2019] [Revised: 08/05/2019] [Accepted: 08/22/2019] [Indexed: 11/22/2022]
Abstract
The enzyme L-DOPA decarboxylase (DDC), also called aromatic-L-amino-acid decarboxylase, catalyzes the biosynthesis of dopamine, serotonin, and trace amines. Its deficiency or perturbations in expression result in severe motor dysfunction or a range of neurodegenerative and psychiatric disorders. A DDC substrate, L-DOPA, combined with an inhibitor of the enzyme is still the most effective treatment for symptoms of Parkinson's disease. In this review, we provide an update regarding the structures, functions, and inhibitors of DDC, particularly with regards to the treatment of Parkinson's disease. This information will provide insight into the pharmacological treatment of Parkinson's disease.
Collapse
Affiliation(s)
- Jianqiang Lan
- Key Laboratory of Tropical Biological Resources of Ministry of Education, School of Life and Pharmaceutical Sciences, Hainan University, Haikou, Hainan 570228, China
| | - Zhongqiang Liu
- Key Laboratory of Tropical Biological Resources of Ministry of Education, School of Life and Pharmaceutical Sciences, Hainan University, Haikou, Hainan 570228, China
| | - Chenghong Liao
- Key Laboratory of Tropical Biological Resources of Ministry of Education, School of Life and Pharmaceutical Sciences, Hainan University, Haikou, Hainan 570228, China
| | - David J Merkler
- Department of Chemistry, University of South Florida, Tampa, FL, 33620, United States
| | - Qian Han
- Key Laboratory of Tropical Biological Resources of Ministry of Education, School of Life and Pharmaceutical Sciences, Hainan University, Haikou, Hainan 570228, China
| | - Jianyong Li
- Department of Biochemistry, Virginia Tech, Blacksburg, VA 24061, United States
| |
Collapse
|
24
|
Chou KC. Proposing Pseudo Amino Acid Components is an Important Milestone for Proteome and Genome Analyses. Int J Pept Res Ther 2019. [DOI: 10.1007/s10989-019-09910-7] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
|
25
|
|
26
|
Du X, Diao Y, Liu H, Li S. MsDBP: Exploring DNA-Binding Proteins by Integrating Multiscale Sequence Information via Chou’s Five-Step Rule. J Proteome Res 2019; 18:3119-3132. [DOI: 10.1021/acs.jproteome.9b00226] [Citation(s) in RCA: 58] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Affiliation(s)
- Xiuquan Du
- The School of Computer Science and Technology, Anhui University, Hefei, Anhui, China
| | - Yanyu Diao
- The School of Computer Science and Technology, Anhui University, Hefei, Anhui, China
| | - Heng Liu
- Department of Gastroenterology, The First Affiliated Hospital of Anhui Medical University, Hefei, Anhui, China
| | - Shuo Li
- Department of Medical Imaging, Western University, London, ON N6A 3K7, Canada
| |
Collapse
|
27
|
Niu B, Liang C, Lu Y, Zhao M, Chen Q, Zhang Y, Zheng L, Chou KC. Glioma stages prediction based on machine learning algorithm combined with protein-protein interaction networks. Genomics 2019; 112:837-847. [PMID: 31150762 DOI: 10.1016/j.ygeno.2019.05.024] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2019] [Accepted: 05/25/2019] [Indexed: 12/18/2022]
Abstract
BACKGROUND Glioma is the most lethal nervous system cancer. Recent studies have made great efforts to study the occurrence and development of glioma, but the molecular mechanisms are still unclear. This study was designed to reveal the molecular mechanisms of glioma based on protein-protein interaction network combined with machine learning methods. Key differentially expressed genes (DEGs) were screened and selected by using the protein-protein interaction (PPI) networks. RESULTS As a result, 19 genes between grade I and grade II, 21 genes between grade II and grade III, and 20 genes between grade III and grade IV. Then, five machine learning methods were employed to predict the gliomas stages based on the selected key genes. After comparison, Complement Naive Bayes classifier was employed to build the prediction model for grade II-III with accuracy 72.8%. And Random forest was employed to build the prediction model for grade I-II and grade III-VI with accuracy 97.1% and 83.2%, respectively. Finally, the selected genes were analyzed by PPI networks, Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, and the results improve our understanding of the biological functions of select DEGs involved in glioma growth. We expect that the key genes expressed have a guiding significance for the occurrence of gliomas or, at the very least, that they are useful for tumor researchers. CONCLUSION Machine learning combined with PPI networks, GO and KEGG analyses of selected DEGs improve our understanding of the biological functions involved in glioma growth.
Collapse
Affiliation(s)
- Bing Niu
- School of Life Sciences, Shanghai University, Shanghai 200444, China; Gordon Life Science Institute, Boston, MA 02478, USA.
| | - Chaofeng Liang
- Department of Neurosurgery, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
| | - Yi Lu
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Manman Zhao
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Qin Chen
- School of Life Sciences, Shanghai University, Shanghai 200444, China.
| | - Yuhui Zhang
- Renji Hospital, Medical School, Shanghai Jiaotong University, 160 Pujian Rd, New Pudong District, Shanghai 200127, China; Changhai Hospital, Second Military Medical University, Shanghai 200433, China.
| | - Linfeng Zheng
- Department of Radiology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200080, China; Department of Radiology, Shanghai First People's Hospital, Baoshan Branch, Shanghai 200940, China.
| | - Kuo-Chen Chou
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China; Gordon Life Science Institute, Boston, MA 02478, USA.
| |
Collapse
|
28
|
Messerli MA, Sarkar A. Advances in Electrochemistry for Monitoring Cellular Chemical Flux. Curr Med Chem 2019; 26:4984-5002. [PMID: 31057100 DOI: 10.2174/0929867326666190506111629] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2018] [Revised: 03/06/2019] [Accepted: 03/12/2019] [Indexed: 11/22/2022]
Abstract
The transport of organic and inorganic molecules, along with inorganic ions across the plasma membrane results in chemical fluxes that reflect the cellular function in healthy and diseased states. Measurement of these chemical fluxes enables the characterization of protein function and transporter stoichiometry, characterization of a single cell and embryo viability prior to implantation, and screening of pharmaceutical agents. Electrochemical sensors emerge as sensitive and non-invasive tools for measuring chemical fluxes immediately outside the cells in the boundary layer, that are capable of monitoring a diverse range of transported analytes including inorganic ions, gases, neurotransmitters, hormones, and pharmaceutical agents. Used on their own or in combination with other methods, these sensors continue to expand our understanding of the function of rare cells and small tissues. Advances in sensor construction and detection strategies continue to improve sensitivity under physiological conditions, diversify analyte detection, and increase throughput. These advances will be discussed in the context of addressing technical challenges to measuring chemical flux in the boundary layer of cells and measuring the resultant changes to the chemical concentration in the bulk media.
Collapse
Affiliation(s)
- Mark A Messerli
- Department of Biology and Microbiology, South Dakota State University, Brookings, SD. United States
| | - Anyesha Sarkar
- Department of Biology and Microbiology, South Dakota State University, Brookings, SD. United States
| |
Collapse
|
29
|
Barukab O, Khan YD, Khan SA, Chou KC. iSulfoTyr-PseAAC: Identify Tyrosine Sulfation Sites by Incorporating Statistical Moments via Chou's 5-steps Rule and Pseudo Components. Curr Genomics 2019; 20:306-320. [PMID: 32030089 PMCID: PMC6983959 DOI: 10.2174/1389202920666190819091609] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Revised: 08/04/2019] [Accepted: 08/06/2019] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND The amino acid residues, in protein, undergo post-translation modification (PTM) during protein synthesis, a process of chemical and physical change in an amino acid that in turn alters behavioral properties of proteins. Tyrosine sulfation is a ubiquitous posttranslational modification which is known to be associated with regulation of various biological functions and pathological pro-cesses. Thus its identification is necessary to understand its mechanism. Experimental determination through site-directed mutagenesis and high throughput mass spectrometry is a costly and time taking process, thus, the reliable computational model is required for identification of sulfotyrosine sites. METHODOLOGY In this paper, we present a computational model for the prediction of the sulfotyrosine sites named iSulfoTyr-PseAAC in which feature vectors are constructed using statistical moments of protein amino acid sequences and various position/composition relative features. These features are in-corporated into PseAAC. The model is validated by jackknife, cross-validation, self-consistency and in-dependent testing. RESULTS Accuracy determined through validation was 93.93% for jackknife test, 95.16% for cross-validation, 94.3% for self-consistency and 94.3% for independent testing. CONCLUSION The proposed model has better performance as compared to the existing predictors, how-ever, the accuracy can be improved further, in future, due to increasing number of sulfotyrosine sites in proteins.
Collapse
Affiliation(s)
| | | | - Sher Afzal Khan
- Address correspondence to this author at the Department of Information Technology, Faculty of Computing and Information Technology in Rabigh, King Abdulaziz University, P.O. Box 344, Rabigh, 21911, Saudi Arabia; and Department of Computer Sciences, Abdul Wali Khan University, Mardan, Pakistan; E-mail:
| | | |
Collapse
|
30
|
Ilyas S, Hussain W, Ashraf A, Khan YD, Khan SA, Chou KC. iMethylK_pseAAC: Improving Accuracy of Lysine Methylation Sites Identification by Incorporating Statistical Moments and Position Relative Features into General PseAAC via Chou's 5-steps Rule. Curr Genomics 2019; 20:275-292. [PMID: 32030087 PMCID: PMC6983956 DOI: 10.2174/1389202920666190809095206] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2019] [Revised: 07/02/2019] [Accepted: 07/26/2019] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND Methylation is one of the most important post-translational modifications in the human body which usually arises on lysine among the most intensely modified residues. It performs a dynamic role in numerous biological procedures, such as regulation of gene expression, regulation of protein function and RNA processing. Therefore, to identify lysine methylation sites is an important challenge as some experimental procedures are time-consuming. OBJECTIVE Herein, we propose a computational predictor named iMethylK_pseAAC to identify lysine methylation sites. METHODS Firstly, we constructed feature vectors based on PseAAC using position and composition rel-ative features and statistical moments. A neural network is trained based on the extracted features. The performance of the proposed method is then validated using cross-validation and jackknife testing. RESULTS The objective evaluation of the predictor showed accuracy of 96.7% for self-consistency, 91.61% for 10-fold cross-validation and 93.42% for jackknife testing. CONCLUSION It is concluded that iMethylK_pseAAC outperforms the counterparts to identify lysine methylation sites such as iMethyl_pseACC, BPB_pPMS and PMeS.
Collapse
Affiliation(s)
| | | | | | - Yaser Daanial Khan
- Address correspondence to this author at the Department of Computer Science, School of Systems and Technology, University of Management and Technology, P.O. Box 10033, C-II, Johar Town, Lahore, Pakistan; Tel: +923054440271; E-mail:
| | | | | |
Collapse
|
31
|
Pan Q, Guo Y, Guo L, Liao S, Zhao C, Wang S, Liu HF. Mechanistic Insights of Chemicals and Drugs as Risk Factors for Systemic Lupus Erythematosus. Curr Med Chem 2019; 27:5175-5188. [PMID: 30947650 DOI: 10.2174/0929867326666190404140658] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2018] [Revised: 03/25/2019] [Accepted: 03/27/2019] [Indexed: 12/21/2022]
Abstract
Systemic Lupus Erythematosus (SLE) is a chronic and relapsing heterogenous autoimmune disease that primarily affects women of reproductive age. Genetic and environmental risk factors are involved in the pathogenesis of SLE, and susceptibility genes have recently been identified. However, as gene therapy is far from clinical application, further investigation of environmental risk factors could reveal important therapeutic approaches. We systematically explored two groups of environmental risk factors: chemicals (including silica, solvents, pesticides, hydrocarbons, heavy metals, and particulate matter) and drugs (including procainamide, hydralazine, quinidine, Dpenicillamine, isoniazid, and methyldopa). Furthermore, the mechanisms underlying risk factors, such as genetic factors, epigenetic change, and disrupted immune tolerance, were explored. This review identifies novel risk factors and their underlying mechanisms. Practicable measures for the management of these risk factors will benefit SLE patients and provide potential therapeutic strategies.
Collapse
Affiliation(s)
- Qingjun Pan
- Key Laboratory of Prevention and Management of Chronic Kidney Disease of Zhanjiang City, Affiliated Hospital of Guangdong Medical University, 57th South Renmin Road, Zhanjiang 524001, Guangdong, China
| | - Yun Guo
- Key Laboratory of Prevention and Management of Chronic Kidney Disease of Zhanjiang City, Affiliated Hospital of Guangdong Medical University, 57th South Renmin Road, Zhanjiang 524001, Guangdong, China
| | - Linjie Guo
- Key Laboratory of Prevention and Management of Chronic Kidney Disease of Zhanjiang City, Affiliated Hospital of Guangdong Medical University, 57th South Renmin Road, Zhanjiang 524001, Guangdong, China
| | - Shuzhen Liao
- Key Laboratory of Prevention and Management of Chronic Kidney Disease of Zhanjiang City, Affiliated Hospital of Guangdong Medical University, 57th South Renmin Road, Zhanjiang 524001, Guangdong, China
| | - Chunfei Zhao
- Key Laboratory of Prevention and Management of Chronic Kidney Disease of Zhanjiang City, Affiliated Hospital of Guangdong Medical University, 57th South Renmin Road, Zhanjiang 524001, Guangdong, China
| | - Sijie Wang
- Key Laboratory of Prevention and Management of Chronic Kidney Disease of Zhanjiang City, Affiliated Hospital of Guangdong Medical University, 57th South Renmin Road, Zhanjiang 524001, Guangdong, China
| | - Hua-Feng Liu
- Key Laboratory of Prevention and Management of Chronic Kidney Disease of Zhanjiang City, Affiliated Hospital of Guangdong Medical University, 57th South Renmin Road, Zhanjiang 524001, Guangdong, China
| |
Collapse
|
32
|
Han Q, Yang C, Lu J, Zhang Y, Li J. Metabolism of Oxalate in Humans: A Potential Role Kynurenine Aminotransferase/Glutamine Transaminase/Cysteine Conjugate Beta-lyase Plays in Hyperoxaluria. Curr Med Chem 2019; 26:4944-4963. [PMID: 30907303 DOI: 10.2174/0929867326666190325095223] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2018] [Revised: 02/17/2019] [Accepted: 02/22/2019] [Indexed: 11/22/2022]
Abstract
Hyperoxaluria, excessive urinary oxalate excretion, is a significant health problem worldwide. Disrupted oxalate metabolism has been implicated in hyperoxaluria and accordingly, an enzymatic disturbance in oxalate biosynthesis can result in the primary hyperoxaluria. Alanine glyoxylate aminotransferase-1 and glyoxylate reductase, the enzymes involving glyoxylate (precursor for oxalate) metabolism, have been related to primary hyperoxalurias. Some studies suggest that other enzymes such as glycolate oxidase and alanine glyoxylate aminotransferase-2 might be associated with primary hyperoxaluria as well, but evidence of a definitive link is not strong between the clinical cases and gene mutations. There are still some idiopathic hyperoxalurias, which require a further study for the etiologies. Some aminotransferases, particularly kynurenine aminotransferases, can convert glyoxylate to glycine. Based on biochemical and structural characteristics, expression level, subcellular localization of some aminotransferases, a number of them appear able to catalyze the transamination of glyoxylate to glycine more efficiently than alanine glyoxylate aminotransferase-1. The aim of this minireview is to explore other undermining causes of primary hyperoxaluria and stimulate research toward achieving a comprehensive understanding of underlying mechanisms leading to the disease. Herein, we reviewed all aminotransferases in the liver for their functions in glyoxylate metabolism. Particularly, kynurenine aminotransferase-I and III were carefully discussed regarding their biochemical and structural characteristics, cellular localization, and enzyme inhibition. Kynurenine aminotransferase-III is, so far, the most efficient putative mitochondrial enzyme to transaminate glyoxylate to glycine in mammalian livers, might be an interesting enzyme to look over in hyperoxaluria etiology of primary hyperoxaluria and should be carefully investigated for its involvement in oxalate metabolism.
Collapse
Affiliation(s)
- Qian Han
- Key Laboratory of Tropical Biological Resources of Ministry of Education, Hainan University, Haikou, Hainan 570228. China
| | - Cihan Yang
- Key Laboratory of Tropical Biological Resources of Ministry of Education, Hainan University, Haikou, Hainan 570228. China
| | - Jun Lu
- Central South University Xiangya School of Medicine Affiliated Haikou People's Hospital, Haikou, Hainan 570208. China
| | - Yinai Zhang
- Central South University Xiangya School of Medicine Affiliated Haikou People's Hospital, Haikou, Hainan 570208. China
| | - Jianyong Li
- Department of Biochemistry, Virginia Tech, Blacksburg, VA 24061. United States
| |
Collapse
|
33
|
SPalmitoylC-PseAAC: A sequence-based model developed via Chou's 5-steps rule and general PseAAC for identifying S-palmitoylation sites in proteins. Anal Biochem 2019; 568:14-23. [DOI: 10.1016/j.ab.2018.12.019] [Citation(s) in RCA: 93] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2018] [Revised: 12/19/2018] [Accepted: 12/22/2018] [Indexed: 02/06/2023]
|
34
|
Ahmad A, Shatabda S. EPAI-NC: Enhanced prediction of adenosine to inosine RNA editing sites using nucleotide compositions. Anal Biochem 2019; 569:16-21. [DOI: 10.1016/j.ab.2019.01.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2018] [Revised: 01/03/2019] [Accepted: 01/11/2019] [Indexed: 01/24/2023]
|
35
|
Kabir M, Ahmad S, Iqbal M, Hayat M. iNR-2L: A two-level sequence-based predictor developed via Chou's 5-steps rule and general PseAAC for identifying nuclear receptors and their families. Genomics 2019; 112:276-285. [PMID: 30779939 DOI: 10.1016/j.ygeno.2019.02.006] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2018] [Revised: 01/09/2019] [Accepted: 02/07/2019] [Indexed: 12/25/2022]
Abstract
Nuclear receptor proteins (NRPs) perform a vital role in regulating gene expression. With the rapidity growth of NRPs in post-genomic era, it is highly recommendable to identify NRPs and their sub-families accurately from their primary sequences. Several conventional methods have been used for discrimination of NRPs and their sub-families, but did not achieve considerable results. In a sequel, a two-level new computational model "iNR-2 L" is developed. Two discrete methods namely: Dipeptide Composition and Tripeptide Composition were used to formulate NRPs sequences. Further, both the descriptor spaces were merged to construct hybrid space. Furthermore, feature selection technique minimum redundancy and maximum relevance was employed in order to select salient features as well as reduce the noise and redundancy. The experiential outcomes exhibited that the proposed model iNR-2 L achieved outstanding results. It is anticipated that the proposed computational model might be a practical and effective tool for academia and research community.
Collapse
Affiliation(s)
- Muhammad Kabir
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan; School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China.
| | - Saeed Ahmad
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan; School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
| | - Muhammad Iqbal
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan
| | - Maqsood Hayat
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan.
| |
Collapse
|
36
|
Jiang QX. Structural Variability in the RLR-MAVS Pathway and Sensitive Detection of Viral RNAs. Med Chem 2019; 15:443-458. [PMID: 30569868 PMCID: PMC6858087 DOI: 10.2174/1573406415666181219101613] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2018] [Revised: 10/23/2018] [Accepted: 12/12/2018] [Indexed: 12/25/2022]
Abstract
Cells need high-sensitivity detection of non-self molecules in order to fight against pathogens. These cellular sensors are thus of significant importance to medicinal purposes, especially for treating novel emerging pathogens. RIG-I-like receptors (RLRs) are intracellular sensors for viral RNAs (vRNAs). Their active forms activate mitochondrial antiviral signaling protein (MAVS) and trigger downstream immune responses against viral infection. Functional and structural studies of the RLR-MAVS signaling pathway have revealed significant supramolecular variability in the past few years, which revealed different aspects of the functional signaling pathway. Here I will discuss the molecular events of RLR-MAVS pathway from the angle of detecting single copy or a very low copy number of vRNAs in the presence of non-specific competition from cytosolic RNAs, and review key structural variability in the RLR / vRNA complexes, the MAVS helical polymers, and the adapter-mediated interactions between the active RLR / vRNA complex and the inactive MAVS in triggering the initiation of the MAVS filaments. These structural variations may not be exclusive to each other, but instead may reflect the adaptation of the signaling pathways to different conditions or reach different levels of sensitivity in its response to exogenous vRNAs.
Collapse
Affiliation(s)
- Qiu-Xing Jiang
- Department of Microbiology and Cell Science, University of Florida, Gainesville, FL 32611, United States
| |
Collapse
|
37
|
Bao W, Yang B, Li Z, Zhou Y. LAIPT: Lysine Acetylation Site Identification with Polynomial Tree. Int J Mol Sci 2018; 20:ijms20010113. [PMID: 30597947 PMCID: PMC6337602 DOI: 10.3390/ijms20010113] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2018] [Revised: 11/30/2018] [Accepted: 12/05/2018] [Indexed: 11/16/2022] Open
Abstract
Post-translational modification plays a key role in the field of biology. Experimental identification methods are time-consuming and expensive. Therefore, computational methods to deal with such issues overcome these shortcomings and limitations. In this article, we propose a lysine acetylation site identification with polynomial tree method (LAIPT), making use of the polynomial style to demonstrate amino-acid residue relationships in peptide segments. This polynomial style was enriched by the physical and chemical properties of amino-acid residues. Then, these reconstructed features were input into the employed classification model, named the flexible neural tree. Finally, some effect evaluation measurements were employed to test the model’s performance.
Collapse
Affiliation(s)
- Wenzheng Bao
- School of Information and Electrical Engineering, Xuzhou University of Technology, Xuzhou 221018, China.
| | - Bin Yang
- School of Information Science and Engineering, Zaozhuang University, Zaozhuang 277100, China.
| | - Zhengwei Li
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China.
| | - Yong Zhou
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China.
| |
Collapse
|
38
|
Cao M, Chen G, Yu J, Shi S. Computational prediction and analysis of species-specific fungi phosphorylation via feature optimization strategy. Brief Bioinform 2018; 21:595-608. [DOI: 10.1093/bib/bby122] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2018] [Revised: 11/16/2018] [Accepted: 11/22/2018] [Indexed: 11/12/2022] Open
Abstract
Abstract
Protein phosphorylation is a reversible and ubiquitous post-translational modification that primarily occurs at serine, threonine and tyrosine residues and regulates a variety of biological processes. In this paper, we first briefly summarized the current progresses in computational prediction of eukaryotic protein phosphorylation sites, which mainly focused on animals and plants, especially on human, with a less extent on fungi. Since the number of identified fungi phosphorylation sites has greatly increased in a wide variety of organisms and their roles in pathological physiology still remain largely unknown, more attention has been paid on the identification of fungi-specific phosphorylation. Here, experimental fungi phosphorylation sites data were collected and most of the sites were classified into different types to be encoded with various features and trained via a two-step feature optimization method. A novel method for prediction of species-specific fungi phosphorylation-PreSSFP was developed, which can identify fungi phosphorylation in seven species for specific serine, threonine and tyrosine residues (http://computbiol.ncu.edu.cn/PreSSFP). Meanwhile, we critically evaluated the performance of PreSSFP and compared it with other existing tools. The satisfying results showed that PreSSFP is a robust predictor. Feature analyses exhibited that there have some significant differences among seven species. The species-specific prediction via two-step feature optimization method to mine important features for training could considerably improve the prediction performance. We anticipate that our study provides a new lead for future computational analysis of fungi phosphorylation.
Collapse
Affiliation(s)
- Man Cao
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang, China
| | - Guodong Chen
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang, China
| | - Jialin Yu
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang, China
| | - Shaoping Shi
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang, China
| |
Collapse
|
39
|
Chen W, Liang X, Nong Z, Li Y, Pan X, Chen C, Huang L. The Multiple Applications and Possible Mechanisms of the Hyperbaric Oxygenation Therapy. Med Chem 2018; 15:459-471. [PMID: 30569869 DOI: 10.2174/1573406415666181219101328] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2018] [Revised: 10/23/2018] [Accepted: 12/12/2018] [Indexed: 12/18/2022]
Abstract
Hyperbaric Oxygenation Therapy (HBOT) is used as an adjunctive method for multiple diseases. The method meets the routine treating and is non-invasive, as well as provides 100% pure oxygen (O2), which is at above-normal atmospheric pressure in a specialized chamber. It is well known that in the condition of O2 deficiency, it will induce a series of adverse events. In order to prevent the injury induced by anoxia, the capability of offering pressurized O2 by HBOT seems involuntary and significant. In recent years, HBOT displays particular therapeutic efficacy in some degree, and it is thought to be beneficial to the conditions of angiogenesis, tissue ischemia and hypoxia, nerve system disease, diabetic complications, malignancies, Carbon monoxide (CO) poisoning and chronic radiation-induced injury. Single and combination HBOT are both applied in previous studies, and the manuscript is to review the current applications and possible mechanisms of HBOT. The applicability and validity of HBOT for clinical treatment remain controversial, even though it is regarded as an adjunct to conventional medical treatment with many other clinical benefits. There also exists a negative side effect of accepting pressurized O2, such as oxidative stress injury, DNA damage, cellular metabolic, activating of coagulation, endothelial dysfunction, acute neurotoxicity and pulmonary toxicity. Then it is imperative to comprehensively consider the advantages and disadvantages of HBOT in order to obtain a satisfying therapeutic outcome.
Collapse
Affiliation(s)
- Wan Chen
- Department of Emergency, the People's Hospital of Guangxi Zhuang Autonomous Region, Nanning, Guangxi 530021, China
| | - Xingmei Liang
- Department of Pharmacy, Guangxi Medical College, Nanning, Guangxi 530021, China
| | - Zhihuan Nong
- Department of Pharmacology, Guangxi Institute of Chinese Medicine and Pharmaceutical Science, Nanning 530022, China
| | - Yaoxuan Li
- Department of Neurology, the People's Hospital of Guangxi Zhuang Autonomous Region, Nanning 530022, China
| | - Xiaorong Pan
- Department of Hyperbaric oxygen, the People's Hospital of Guangxi Zhuang Autonomous Region, Nanning, Guangxi 530021, China
| | - Chunxia Chen
- Department of Hyperbaric oxygen, the People's Hospital of Guangxi Zhuang Autonomous Region, Nanning, Guangxi 530021, China
| | - Luying Huang
- Department of Respiratory Medicine, the People's Hospital of Guangxi Zhuang Autonomous Region, Nanning, Guangxi 530021, China
| |
Collapse
|
40
|
Jin W, Li QZ, Zuo YC, Cao YN, Zhang LQ, Hou R, Su WX. Relationship Between DNA Methylation in Key Region and the Differential Expressions of Genes in Human Breast Tumor Tissue. DNA Cell Biol 2018; 38:49-62. [PMID: 30346835 DOI: 10.1089/dna.2018.4276] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
Breast cancer has a high mortality rate for females. Aberrant DNA methylation plays a crucial role in the occurrence and progression of breast carcinoma. By comparing DNA methylation differences between tumor breast tissue and normal breast tissue, we calculate and analyze the distributions of the hyper- and hypomethylation sites in different function regions. Results indicate that enhancer regions are often hypomethylated in breast cancer. CpG islands (CGIs) are mainly hypermethylated, while the flanking CGI (shores and shelves) is more easily hypomethylated. The hypomethylation in gene body region is related to the upregulation of gene expression, and the hypomethylation of enhancer regions is closely associated with gene expression upregulation in breast cancer. Some key hypomethylation sites in enhancer regions and key hypermethylation sites in CGIs for regulating key genes are, respectively, found, such as oncogenes ESR1 and ERBB2 and tumor suppressor genes FBLN2, CEBPA, and FAT4. This suggests that the recognizing methylation status of these genes will be useful for the diagnosis of breast cancer.
Collapse
Affiliation(s)
- Wen Jin
- 1 Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University , Hohhot, China
| | - Qian-Zhong Li
- 1 Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University , Hohhot, China .,2 The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Inner Mongolia University , Hohhot, China
| | - Yong-Chun Zuo
- 2 The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Inner Mongolia University , Hohhot, China
| | - Yan-Ni Cao
- 1 Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University , Hohhot, China
| | - Lu-Qiang Zhang
- 1 Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University , Hohhot, China
| | - Rui Hou
- 1 Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University , Hohhot, China
| | - Wen-Xia Su
- 3 College of Science, Inner Mongolia Agricultural University , Hohhot, China
| |
Collapse
|
41
|
Butt AH, Rasool N, Khan YD. Predicting membrane proteins and their types by extracting various sequence features into Chou's general PseAAC. Mol Biol Rep 2018; 45:2295-2306. [PMID: 30238411 DOI: 10.1007/s11033-018-4391-5] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2018] [Accepted: 09/14/2018] [Indexed: 11/30/2022]
Abstract
For many biological functions membrane proteins (MPs) are considered crucial. Due to this nature of MPs, many pharmaceutical agents have reflected them as attractive targets. It bears indispensable importance that MPs are predicted with accurate measures using effective and efficient computational models (CMs). Annotation of MPs using in vitro analytical techniques is time-consuming and expensive; and in some cases, it can prove to be intractable. Due to this scenario, automated prediction and annotation of MPs through CM based techniques have appeared to be useful. Based on the use of computational intelligence and statistical moments based feature set, an MP prediction framework is proposed. Furthermore, the previously used dataset has been enhanced by incorporating new MPs from the latest release of UniProtKB. Rigorous experimentation proves that the use of statistical moments with a multilayer neural network, trained using back-propagation based prediction techniques allows more thorough results.
Collapse
Affiliation(s)
- Ahmad Hassan Butt
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, C-II, Johar Town, P.O. Box 10033, Lahore, 54770, Pakistan.
| | - Nouman Rasool
- Department of Life Sciences, School of Science, University of Management and Technology, C-II, Johar Town, P.O. Box 10033, Lahore, 54770, Pakistan
| | - Yaser Daanial Khan
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, C-II, Johar Town, P.O. Box 10033, Lahore, 54770, Pakistan
| |
Collapse
|
42
|
Cai L, Huang T, Su J, Zhang X, Chen W, Zhang F, He L, Chou KC. Implications of Newly Identified Brain eQTL Genes and Their Interactors in Schizophrenia. MOLECULAR THERAPY. NUCLEIC ACIDS 2018; 12:433-442. [PMID: 30195780 PMCID: PMC6041437 DOI: 10.1016/j.omtn.2018.05.026] [Citation(s) in RCA: 60] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/11/2018] [Revised: 05/19/2018] [Accepted: 05/30/2018] [Indexed: 12/21/2022]
Abstract
Schizophrenia (SCZ) is a devastating genetic mental disorder. Identification of the SCZ risk genes in brains is helpful to understand this disease. Thus, we first used the minimum Redundancy-Maximum Relevance (mRMR) approach to integrate the genome-wide sequence analysis results on SCZ and the expression quantitative trait locus (eQTL) data from ten brain tissues to identify the genes related to SCZ. Second, we adopted the variance inflation factor regression algorithm to identify their interacting genes in brains. Third, using multiple analysis methods, we explored and validated their roles. By means of the aforementioned procedures, we have found that (1) the cerebellum may play a crucial role in the pathogenesis of SCZ and (2) ITIH4 may be utilized as a clinical biomarker for the diagnosis of SCZ. These interesting findings may stimulate novel strategy for developing new drugs against SCZ. It has not escaped our notice that the approach reported here is of use for studying many other genome diseases as well.
Collapse
Affiliation(s)
- Lei Cai
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Collaborative Innovation Center for Genetics and Development, Shanghai Mental Health Center, Shanghai Jiaotong University, Shanghai 200240, China; Gordon Life Science Institute, Boston, MA 02478, USA; Shanghai Center for Women and Children's Health, Shanghai 200062, China.
| | - Tao Huang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Collaborative Innovation Center for Genetics and Development, Shanghai Mental Health Center, Shanghai Jiaotong University, Shanghai 200240, China; Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Jingjing Su
- Department of Neurology, Shanghai Ninth People's Hospital, Shanghai Jiaotong University School of Medicine, Shanghai 200011, China
| | - Xinxin Zhang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Collaborative Innovation Center for Genetics and Development, Shanghai Mental Health Center, Shanghai Jiaotong University, Shanghai 200240, China
| | - Wenzhong Chen
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Collaborative Innovation Center for Genetics and Development, Shanghai Mental Health Center, Shanghai Jiaotong University, Shanghai 200240, China
| | - Fuquan Zhang
- Department of Psychiatry, Wuxi Mental Health Center, Nanjing Medical University, Wuxi 214015, China
| | - Lin He
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Collaborative Innovation Center for Genetics and Development, Shanghai Mental Health Center, Shanghai Jiaotong University, Shanghai 200240, China; Shanghai Center for Women and Children's Health, Shanghai 200062, China.
| | - Kuo-Chen Chou
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Collaborative Innovation Center for Genetics and Development, Shanghai Mental Health Center, Shanghai Jiaotong University, Shanghai 200240, China; Gordon Life Science Institute, Boston, MA 02478, USA; Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China; Faculty of Computing and Information Technology in Rabigh, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| |
Collapse
|
43
|
Ning Q, Zhao X, Bao L, Ma Z, Zhao X. Detecting Succinylation sites from protein sequences using ensemble support vector machine. BMC Bioinformatics 2018; 19:237. [PMID: 29940836 PMCID: PMC6016146 DOI: 10.1186/s12859-018-2249-4] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2017] [Accepted: 06/14/2018] [Indexed: 12/14/2022] Open
Abstract
Background Lysine succinylation is a new kind of post-translational modification which plays a key role in protein conformation regulation and cellular function control. To understand the mechanism of succinylation profoundly, it is necessary to identify succinylation sites in proteins accurately. However, traditional methods, experimental approaches, are labor-intensive and time-consuming. Computational prediction methods have been proposed recent years, and they are popular because of their convenience and high speed. In this study, we developed a new method to predict succinylation sites in protein combining multiple features, including amino acid composition, binary encoding, physicochemical property and grey pseudo amino acid composition, with a feature selection scheme (information gain). And then, it was trained using SVM (Support Vector Machine) and an ensemble learning algorithm. Results The performance of this method was measured with an accuracy of 89.14% and a MCC (Matthew Correlation Coefficient) of 0.79 using 10-fold cross validation on training dataset and an accuracy of 84.5% and a MCC of 0.2 on independent dataset. Conclusions The conclusions made from this study can help to understand more of the succinylation mechanism. These results suggest that our method was very promising for predicting succinylation sites. The source code and data of this paper are freely available athttps://github.com/ningq669/PSuccE. Electronic supplementary material The online version of this article (10.1186/s12859-018-2249-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Qiao Ning
- School of Information Science and Technology, Northeast Normal University, Changchun, 130117, China
| | - Xiaosa Zhao
- School of Information Science and Technology, Northeast Normal University, Changchun, 130117, China
| | - Lingling Bao
- School of Information Science and Technology, Northeast Normal University, Changchun, 130117, China
| | - Zhiqiang Ma
- School of Information Science and Technology, Northeast Normal University, Changchun, 130117, China.
| | - Xiaowei Zhao
- Key Laboratory of Intelligent Information Processing of Jilin Universities, Northeast Normal University, Changchun, 130117, China.
| |
Collapse
|
44
|
Shi S, Wang L, Cao M, Chen G, Yu J. Proteomic analysis and prediction of amino acid variations that influence protein posttranslational modifications. Brief Bioinform 2018; 20:1597-1606. [DOI: 10.1093/bib/bby036] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2018] [Revised: 03/07/2018] [Indexed: 12/18/2022] Open
Abstract
Abstract
Accumulative studies have indicated that amino acid variations through changing the type of residues of the target sites or key flanking residues could directly or indirectly influence protein posttranslational modifications (PTMs) and bring about a detrimental effect on protein function. Computational mutation analysis can greatly narrow down the efforts on experimental work. To increase the utilization of current computational resources, we first provide an overview of computational prediction of amino acid variations that influence protein PTMs and their functional analysis. We also discuss the challenges that are faced while developing novel in silico approaches in the future. The development of better methods for mutation analysis-related protein PTMs will help to facilitate the development of personalized precision medicine.
Collapse
Affiliation(s)
- Shaoping Shi
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang, Jiangxi 330031, China
| | - Lina Wang
- Department of Science, Nanchang Institute of Technology, Nanchang, Jiangxi 330031, China
| | - Man Cao
- Department of Mathematics, School of Sciences, Nanchang University, Nanchang, Jiangxi 330031, China
| | - Guodong Chen
- Department of Mathematics, School of Sciences, Nanchang University, Nanchang, Jiangxi 330031, China
| | - Jialin Yu
- Department of Mathematics, School of Sciences, Nanchang University, Nanchang, Jiangxi 330031, China
| |
Collapse
|
45
|
Liu B, Weng F, Huang DS, Chou KC. iRO-3wPseKNC: identify DNA replication origins by three-window-based PseKNC. Bioinformatics 2018; 34:3086-3093. [DOI: 10.1093/bioinformatics/bty312] [Citation(s) in RCA: 96] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2018] [Accepted: 04/18/2018] [Indexed: 12/16/2022] Open
Affiliation(s)
- Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
- Gordon Life Science Institute, Belmont, MA, USA
| | - Fan Weng
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| | - De-Shuang Huang
- Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, Shanghai, China
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Belmont, MA, USA
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
46
|
iRNAm5C-PseDNC: identifying RNA 5-methylcytosine sites by incorporating physical-chemical properties into pseudo dinucleotide composition. Oncotarget 2018; 8:41178-41188. [PMID: 28476023 PMCID: PMC5522291 DOI: 10.18632/oncotarget.17104] [Citation(s) in RCA: 153] [Impact Index Per Article: 21.9] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2017] [Accepted: 03/15/2017] [Indexed: 01/24/2023] Open
Abstract
Occurring at cytosine (C) of RNA, 5-methylcytosine (m5C) is an important post-transcriptional modification (PTCM). The modification plays significant roles in biological processes by regulating RNA metabolism in both eukaryotes and prokaryotes. It may also, however, cause cancers and other major diseases. Given an uncharacterized RNA sequence that contains many C residues, can we identify which one of them can be of m5C modification, and which one cannot? It is no doubt a crucial problem, particularly with the explosive growth of RNA sequences in the postgenomic age. Unfortunately, so far no user-friendly web-server whatsoever has been developed to address such a problem. To meet the increasingly high demand from most experimental scientists working in the area of drug development, we have developed a new predictor called iRNAm5C-PseDNC by incorporating ten types of physical-chemical properties into pseudo dinucleotide composition via the auto/cross-covariance approach. Rigorous jackknife tests show that its anticipated accuracy is quite high. For most experimental scientists’ convenience, a user-friendly web-server for the predictor has been provided at http://www.jci-bioinfo.cn/iRNAm5C-PseDNC along with a step-by-step user guide, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved. It has not escaped our notice that the approach presented here can also be used to deal with many other problems in genome analysis.
Collapse
|
47
|
Tahir M, Jan B, Hayat M, Shah SU, Amin M. Efficient computational model for classification of protein localization images using Extended Threshold Adjacency Statistics and Support Vector Machines. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2018; 157:205-215. [PMID: 29477429 DOI: 10.1016/j.cmpb.2018.01.021] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/13/2017] [Revised: 01/02/2018] [Accepted: 01/24/2018] [Indexed: 06/08/2023]
Abstract
BACKGROUND AND OBJECTIVE Discriminative and informative feature extraction is the core requirement for accurate and efficient classification of protein subcellular localization images so that drug development could be more effective. The objective of this paper is to propose a novel modification in the Threshold Adjacency Statistics technique and enhance its discriminative power. METHODS In this work, we utilized Threshold Adjacency Statistics from a novel perspective to enhance its discrimination power and efficiency. In this connection, we utilized seven threshold ranges to produce seven distinct feature spaces, which are then used to train seven SVMs. The final prediction is obtained through the majority voting scheme. The proposed ETAS-SubLoc system is tested on two benchmark datasets using 5-fold cross-validation technique. RESULTS We observed that our proposed novel utilization of TAS technique has improved the discriminative power of the classifier. The ETAS-SubLoc system has achieved 99.2% accuracy, 99.3% sensitivity and 99.1% specificity for Endogenous dataset outperforming the classical Threshold Adjacency Statistics technique. Similarly, 91.8% accuracy, 96.3% sensitivity and 91.6% specificity values are achieved for Transfected dataset. CONCLUSIONS Simulation results validated the effectiveness of ETAS-SubLoc that provides superior prediction performance compared to the existing technique. The proposed methodology aims at providing support to pharmaceutical industry as well as research community towards better drug designing and innovation in the fields of bioinformatics and computational biology. The implementation code for replicating the experiments presented in this paper is available at: https://drive.google.com/file/d/0B7IyGPObWbSqRTRMcXI2bG5CZWs/view?usp=sharing.
Collapse
Affiliation(s)
- Muhammad Tahir
- College of Computing and Informatics, Saudi Electronic University, Al-Madinah Branch, Saudi Arabia
| | - Bismillah Jan
- Department of Computer and Information Sciences, Pakistan Institute of Engineering and Applied Sciences, Islamabad, Pakistan; Department of Computer Science, National University of Computer and Emerging Sciences, Peshawar Campus, Pakistan
| | - Maqsood Hayat
- Department of Computer Science, University College of Sciences, Shanker, Abdul Wali Khan University, Mardan, Pakistan.
| | - Shakir Ullah Shah
- Department of Computer Science, National University of Computer and Emerging Sciences, Peshawar Campus, Pakistan
| | - Muhammad Amin
- Department of Computer Science, National University of Computer and Emerging Sciences, Peshawar Campus, Pakistan
| |
Collapse
|
48
|
Qiu WR, Sun BQ, Xiao X, Xu ZC, Chou KC. iHyd-PseCp: Identify hydroxyproline and hydroxylysine in proteins by incorporating sequence-coupled effects into general PseAAC. Oncotarget 2018; 7:44310-44321. [PMID: 27322424 PMCID: PMC5190098 DOI: 10.18632/oncotarget.10027] [Citation(s) in RCA: 141] [Impact Index Per Article: 20.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2016] [Accepted: 05/29/2016] [Indexed: 12/30/2022] Open
Abstract
Protein hydroxylation is a posttranslational modification (PTM), in which a CH group in Pro (P) or Lys (K) residue has been converted into a COH group, or a hydroxyl group (−OH) is converted into an organic compound. Closely associated with cellular signaling activities, this type of PTM is also involved in some major diseases, such as stomach cancer and lung cancer. Therefore, from the angles of both basic research and drug development, we are facing a challenging problem: for an uncharacterized protein sequence containing many residues of P or K, which ones can be hydroxylated, and which ones cannot? With the explosive growth of protein sequences in the post-genomic age, the problem has become even more urgent. To address such a problem, we have developed a predictor called iHyd-PseCp by incorporating the sequence-coupled information into the general pseudo amino acid composition (PseAAC) and introducing the “Random Forest” algorithm to operate the calculation. Rigorous jackknife tests indicated that the new predictor remarkably outperformed the existing state-of-the-art prediction method for the same purpose. For the convenience of most experimental scientists, a user-friendly web-server for iHyd-PseCp has been established at http://www.jci-bioinfo.cn/iHyd-PseCp, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved.
Collapse
Affiliation(s)
- Wang-Ren Qiu
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China.,Department of Computer Science and Bond Life Science Center, University of Missouri, Columbia, MO, USA
| | - Bi-Qian Sun
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Xuan Xiao
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China.,Gordon Life Science Institute, Boston, MA, USA
| | - Zhao-Chun Xu
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA, USA.,Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi Arabia.,Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| |
Collapse
|
49
|
Chen W, Feng P, Yang H, Ding H, Lin H, Chou KC. iRNA-AI: identifying the adenosine to inosine editing sites in RNA sequences. Oncotarget 2018; 8:4208-4217. [PMID: 27926534 PMCID: PMC5354824 DOI: 10.18632/oncotarget.13758] [Citation(s) in RCA: 199] [Impact Index Per Article: 28.4] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2016] [Accepted: 11/23/2016] [Indexed: 01/14/2023] Open
Abstract
Catalyzed by adenosine deaminase (ADAR), the adenosine to inosine (A-to-I) editing in RNA is not only involved in various important biological processes, but also closely associated with a series of major diseases. Therefore, knowledge about the A-to-I editing sites in RNA is crucially important for both basic research and drug development. Given an uncharacterized RNA sequence that contains many adenosine (A) residues, can we identify which one of them can be of A-to-I editing, and which one cannot? Unfortunately, so far no computational method whatsoever has been developed to address such an important problem based on the RNA sequence information alone. To fill this empty area, we have proposed a predictor called iRNA-AI by incorporating the chemical properties of nucleotides and their sliding occurrence density distribution along a RNA sequence into the general form of pseudo nucleotide composition (PseKNC). It has been shown by the rigorous jackknife test and independent dataset test that the performance of the proposed predictor is quite promising. For the convenience of most experimental scientists, a user-friendly web-server for iRNA-AI has been established at http://lin.uestc.edu.cn/server/iRNA-AI/, by which users can easily get their desired results without the need to go through the mathematical details.
Collapse
Affiliation(s)
- Wei Chen
- Department of Physics, School of Sciences, and Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan, Tangshan, China.,Gordon Life Science Institute, Belmont, Massachusetts, United States of America
| | - Pengmian Feng
- Hebei Province Key Laboratory of Occupational Health and Safety for Coal Industry, School of Public Health, North China University of Science and Technology, Tangshan, China
| | - Hui Yang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,Gordon Life Science Institute, Belmont, Massachusetts, United States of America
| | - Kuo-Chen Chou
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,Gordon Life Science Institute, Belmont, Massachusetts, United States of America
| |
Collapse
|
50
|
iOri-Human: identify human origin of replication by incorporating dinucleotide physicochemical properties into pseudo nucleotide composition. Oncotarget 2018; 7:69783-69793. [PMID: 27626500 PMCID: PMC5342515 DOI: 10.18632/oncotarget.11975] [Citation(s) in RCA: 157] [Impact Index Per Article: 22.4] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2016] [Accepted: 09/06/2016] [Indexed: 02/07/2023] Open
Abstract
The initiation of replication is an extremely important process in DNA life cycle. Given an uncharacterized DNA sequence, can we identify where its origin of replication (ORI) is located? It is no doubt a fundamental problem in genome analysis. Particularly, with the rapid development of genome sequencing technology that results in a huge amount of sequence data, it is highly desired to develop computational methods for rapidly and effectively identifying the ORIs in these genomes. Unfortunately, by means of the existing computational methods, such as sequence alignment or kmer strategies, it could hardly achieve decent success rates. To address this problem, we developed a predictor called “iOri-Human”. Rigorous jackknife tests have shown that its overall accuracy and stability in identifying human ORIs are over 75% and 50%, respectively. In the predictor, it is through the pseudo nucleotide composition (an extension of pseudo amino acid composition) that 96 physicochemical properties for the 16 possible constituent dinucleotides have been incorporated to reflect the global sequence patterns in DNA as well as its local sequence patterns. Moreover, a user-friendly web-server for iOri-Human has been established at http://lin.uestc.edu.cn/server/iOri-Human.html, by which users can easily get their desired results without the need to through the complicated mathematics involved.
Collapse
|