1
|
Suleman MT, Alturise F, Alkhalifah T, Khan YD. m1A-Ensem: accurate identification of 1-methyladenosine sites through ensemble models. BioData Min 2024; 17:4. [PMID: 38360720 PMCID: PMC10868122 DOI: 10.1186/s13040-023-00353-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 12/31/2023] [Indexed: 02/17/2024] Open
Abstract
BACKGROUND 1-methyladenosine (m1A) is a variant of methyladenosine that holds a methyl substituent in the 1st position having a prominent role in RNA stability and human metabolites. OBJECTIVE Traditional approaches, such as mass spectrometry and site-directed mutagenesis, proved to be time-consuming and complicated. METHODOLOGY The present research focused on the identification of m1A sites within RNA sequences using novel feature development mechanisms. The obtained features were used to train the ensemble models, including blending, boosting, and bagging. Independent testing and k-fold cross validation were then performed on the trained ensemble models. RESULTS The proposed model outperformed the preexisting predictors and revealed optimized scores based on major accuracy metrics. CONCLUSION For research purpose, a user-friendly webserver of the proposed model can be accessed through https://taseersuleman-m1a-ensem1.streamlit.app/ .
Collapse
Affiliation(s)
- Muhammad Taseer Suleman
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, 54770, Pakistan
| | - Fahad Alturise
- Department of Computer, College of Science and Arts in Ar Rass, Qassim University, Ar Rass, Qassim, Saudi Arabia.
| | - Tamim Alkhalifah
- Department of Computer, College of Science and Arts in Ar Rass, Qassim University, Ar Rass, Qassim, Saudi Arabia
| | - Yaser Daanial Khan
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, 54770, Pakistan
| |
Collapse
|
2
|
Suleman MT, Khan YD. PseU-pred: An ensemble model for accurate identification of pseudouridine sites. Anal Biochem 2023:115247. [PMID: 37437648 DOI: 10.1016/j.ab.2023.115247] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Revised: 06/25/2023] [Accepted: 07/08/2023] [Indexed: 07/14/2023]
Abstract
Pseudouridine (ψ) is reported to occur frequently in all types of RNA. This uridine modification has been shown to be essential for processes such as RNA stability and stress response. Also, it is linked to a few human diseases, such as prostate cancer, anemia, etc. A few laboratory techniques, such as Pseudo-seq and N3-CMC-enriched Pseudouridine sequencing (CeU-Seq) are used for detecting ψ sites. However, these are laborious and drawn-out methods. The convenience of sequencing data has enabled the development of computationally intelligent models for improving ψ site identification methods. The proposed work provides a prediction model for the identification of ψ sites through popular ensemble methods such as stacking, bagging, and boosting. Features were obtained through a novel feature extraction mechanism with the assimilation of statistical moments, which were used to train ensemble models. The cross-validation test and independent set test were used to evaluate the precision of the trained models. The proposed model outperformed the preexisting predictors and revealed 87% accuracy, 0.90 specificity, 0.85 sensitivity, and a 0.75 Matthews correlation coefficient. A web server has been built and is available publicly for the researchers at https://taseersuleman-y-test-pseu-pred-c2wmtj.streamlit.app/.
Collapse
Affiliation(s)
- Muhammad Taseer Suleman
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, 54770, Pakistan.
| | - Yaser Daanial Khan
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, 54770, Pakistan.
| |
Collapse
|
3
|
Attique M, Alkhalifah T, Alturise F, Khan YD. DeepBCE: Evaluation of deep learning models for identification of immunogenic B-cell epitopes. Comput Biol Chem 2023; 104:107874. [PMID: 37126975 DOI: 10.1016/j.compbiolchem.2023.107874] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Revised: 04/17/2023] [Accepted: 04/20/2023] [Indexed: 05/03/2023]
Abstract
B-Cell epitopes (BCEs) can identify and bind with receptor proteins (antigens) to initiate an immune response against pathogens. Understanding antigen-antibody binding interactions has many applications in biotechnology and biomedicine, including designing antibodies, therapeutics, and vaccines. Lab-based experimental identification of these proteins is time-consuming and challenging. Computational techniques have been proposed to discover BCEs, but most lack of significant accomplishments. This work uses classical and deep learning models (DLMs) with sequence-based features to predict immunity stimulator BCEs from proteomics sequences. The proposed convolutional neural network-based model outperforms other models with an accuracy (ACC) of 0.878, an F-measure of 0.871, and an area under the receiver operating characteristic curve (AUC) of 0.945. The proposed strategy achieves 58.7% better results on average than other state-of-the-art approaches based on the Mathews Correlation Coefficient (MCC) results. The established model is accessible through a web application located at http://deeplbcepred.pythonanywhere.com.
Collapse
Affiliation(s)
- Muhammad Attique
- Department of Computer Science, University of Management and Technology, Lahore 54000, Pakistan; Department of Information Technology, University of Gujrat, Gujrat 50700, Pakistan
| | - Tamim Alkhalifah
- Department of Computer, College of Science and Arts in Ar Rass Qassim University, Ar Rass, Qassim, Saudi Arabia.
| | - Fahad Alturise
- Department of Computer, College of Science and Arts in Ar Rass Qassim University, Ar Rass, Qassim, Saudi Arabia
| | - Yaser Daanial Khan
- Department of Computer Science, University of Management and Technology, Lahore 54000, Pakistan
| |
Collapse
|
4
|
Grasso S, Dabene V, Hendriks MMW, Zwartjens P, Pellaux R, Held M, Panke S, van Dijl JM, Meyer A, van Rij T. Signal Peptide Efficiency: From High-Throughput Data to Prediction and Explanation. ACS Synth Biol 2023; 12:390-404. [PMID: 36649479 PMCID: PMC9942255 DOI: 10.1021/acssynbio.2c00328] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2022] [Indexed: 01/18/2023]
Abstract
The passage of proteins across biological membranes via the general secretory (Sec) pathway is a universally conserved process with critical functions in cell physiology and important industrial applications. Proteins are directed into the Sec pathway by a signal peptide at their N-terminus. Estimating the impact of physicochemical signal peptide features on protein secretion levels has not been achieved so far, partially due to the extreme sequence variability of signal peptides. To elucidate relevant features of the signal peptide sequence that influence secretion efficiency, an evaluation of ∼12,000 different designed signal peptides was performed using a novel miniaturized high-throughput assay. The results were used to train a machine learning model, and a post-hoc explanation of the model is provided. By describing each signal peptide with a selection of 156 physicochemical features, it is now possible to both quantify feature importance and predict the protein secretion levels directed by each signal peptide. Our analyses allow the detection and explanation of the relevant signal peptide features influencing the efficiency of protein secretion, generating a versatile tool for the de novo design and in silico evaluation of signal peptides.
Collapse
Affiliation(s)
- Stefano Grasso
- Department
of Medical Microbiology, University of Groningen,
University Medical Center Groningen, Hanzeplein 1, Groningen 9700 RB, The Netherlands
- DSM
Biotechnology Center, Alexander Fleminglaan 1, Delft 2613 AX, Netherlands
| | - Valentina Dabene
- Department
of Biosystems Science and Engineering, ETH
Zurich, Mattenstrasse
26, Basel 4058, Switzerland
- FGen
AG, Hochbergerstrasse
60C, Basel 4057, Switzerland
| | | | - Priscilla Zwartjens
- DSM
Biotechnology Center, Alexander Fleminglaan 1, Delft 2613 AX, Netherlands
| | - René Pellaux
- FGen
AG, Hochbergerstrasse
60C, Basel 4057, Switzerland
| | - Martin Held
- Department
of Biosystems Science and Engineering, ETH
Zurich, Mattenstrasse
26, Basel 4058, Switzerland
| | - Sven Panke
- Department
of Biosystems Science and Engineering, ETH
Zurich, Mattenstrasse
26, Basel 4058, Switzerland
| | - Jan Maarten van Dijl
- Department
of Medical Microbiology, University of Groningen,
University Medical Center Groningen, Hanzeplein 1, Groningen 9700 RB, The Netherlands
| | - Andreas Meyer
- FGen
AG, Hochbergerstrasse
60C, Basel 4057, Switzerland
| | - Tjeerd van Rij
- DSM
Biotechnology Center, Alexander Fleminglaan 1, Delft 2613 AX, Netherlands
| |
Collapse
|
5
|
Ali Z, Alturise F, Alkhalifah T, Khan YD. IGPred-HDnet: Prediction of Immunoglobulin Proteins Using Graphical Features and the Hierarchal Deep Learning-Based Approach. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2023; 2023:2465414. [PMID: 36744119 PMCID: PMC9891831 DOI: 10.1155/2023/2465414] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 09/16/2022] [Accepted: 10/12/2022] [Indexed: 01/26/2023]
Abstract
Motivation. Immunoglobulin proteins (IGP) (also called antibodies) are glycoproteins that act as B-cell receptors against external or internal antigens like viruses and bacteria. IGPs play a significant role in diverse cellular processes ranging from adhesion to cell recognition. IGP identifications via the in-silico approach are faster and more cost-effective than wet-lab technological methods. Methods. In this study, we developed an intelligent theoretical deep learning framework, "IGPred-HDnet" for the discrimination of IGPs and non-IGPs. Three types of promising descriptors are feature extraction based on graphical and statistical features (FEGS), amphiphilic pseudo-amino acid composition (Amp-PseAAC), and dipeptide composition (DPC) to extract the graphical, physicochemical, and sequential features. Next, the extracted attributes are evaluated through machine learning, i.e., decision tree (DT), support vector machine (SVM), k-nearest neighbour (KNN), and hierarchical deep network (HDnet) classifiers. The proposed predictor IGPred-HDnet was trained and tested using a 10-fold cross-validation and independent test. Results and Conclusion. The success rates in terms of accuracy (ACC) and Matthew's correlation coefficient (MCC) of IGPred-HDnet on training and independent dataset (Dtrain Dtest) are ACC = 98.00%, 99.10%, and MCC = 0.958, and 0.980 points, respectively. The empirical outcomes demonstrate that the IGPred-HDnet model efficacy on both datasets using the novel FEGS feature and HDnet algorithm achieved superior predictions to other existing computational models. We hope this research will provide great insights into the large-scale identification of IGPs and pharmaceutical companies in new drug design.
Collapse
Affiliation(s)
- Zakir Ali
- Department of Computer Science, School of Science and Technology, University of Management and Technology, Lahore, Pakistan
| | - Fahad Alturise
- Department of Computer, College of Science and Arts in Ar Rass, Qassim University, Ar Rass, Qassim, Saudi Arabia
| | - Tamim Alkhalifah
- Department of Computer, College of Science and Arts in Ar Rass, Qassim University, Ar Rass, Qassim, Saudi Arabia
| | - Yaser Daanial Khan
- Department of Computer Science, School of Science and Technology, University of Management and Technology, Lahore, Pakistan
| |
Collapse
|
6
|
Suleman MT, Alturise F, Alkhalifah T, Khan YD. iDHU-Ensem: Identification of dihydrouridine sites through ensemble learning models. Digit Health 2023; 9:20552076231165963. [PMID: 37009307 PMCID: PMC10064468 DOI: 10.1177/20552076231165963] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Accepted: 03/09/2023] [Indexed: 04/04/2023] Open
Abstract
Background Dihydrouridine (D) is one of the most significant uridine modifications that have a prominent occurrence in eukaryotes. The folding and conformational flexibility of transfer RNA (tRNA) can be attained through this modification. Objective The modification also triggers lung cancer in humans. The identification of D sites was carried out through conventional laboratory methods; however, those were costly and time-consuming. The readiness of RNA sequences helps in the identification of D sites through computationally intelligent models. However, the most challenging part is turning these biological sequences into distinct vectors. Methods The current research proposed novel feature extraction mechanisms and the identification of D sites in tRNA sequences using ensemble models. The ensemble models were then subjected to evaluation using k-fold cross-validation and independent testing. Results The results revealed that the stacking ensemble model outperformed all the ensemble models by revealing 0.98 accuracy, 0.98 specificity, 0.97 sensitivity, and 0.92 Matthews Correlation Coefficient. The proposed model, iDHU-Ensem, was also compared with pre-existing predictors using an independent test. The accuracy scores have shown that the proposed model in this research study performed better than the available predictors. Conclusion The current research contributed towards the enhancement of D site identification capabilities through computationally intelligent methods. A web-based server, iDHU-Ensem, was also made available for the researchers at https://taseersuleman-idhu-ensem-idhu-ensem.streamlit.app/.
Collapse
Affiliation(s)
- Muhammad Taseer Suleman
- Department of Computer Science, School of systems and technology, University of Management and Technology, Lahore, Pakistan
| | - Fahad Alturise
- Department of Computer, College of Science and Arts in Ar Rass, Qassim University, Ar Rass, Qassim, Saudi Arabia
- Fahad Alturise, Department of Computer, College of Science and Arts in Ar Rass, Qassim University, Ar Rass, Qassim, Saudi Arabia.
| | - Tamim Alkhalifah
- Department of Computer, College of Science and Arts in Ar Rass, Qassim University, Ar Rass, Qassim, Saudi Arabia
| | - Yaser Daanial Khan
- Department of Computer Science, School of systems and technology, University of Management and Technology, Lahore, Pakistan
| |
Collapse
|
7
|
Suleman MT, Khan YD. m1A-pred: Prediction of Modified 1-methyladenosine Sites in RNA Sequences through Artificial Intelligence. Comb Chem High Throughput Screen 2022; 25:2473-2484. [PMID: 35718969 DOI: 10.2174/1386207325666220617152743] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Revised: 04/06/2022] [Accepted: 04/11/2022] [Indexed: 01/27/2023]
Abstract
BACKGROUND The process of nucleotides modification or methyl groups addition to nucleotides is known as post-transcriptional modification (PTM). 1-methyladenosine (m1A) is a type of PTM formed by adding a methyl group to the nitrogen at the 1st position of the adenosine base. Many human disorders are associated with m1A, which is widely found in ribosomal RNA and transfer RNA. OBJECTIVE The conventional methods such as mass spectrometry and site-directed mutagenesis proved to be laborious and burdensome. Systematic identification of modified sites from RNA sequences is gaining much attention nowadays. Consequently, an extreme gradient boost predictor, m1A-Pred, is developed in this study for the prediction of modified m1A sites. METHODS The current study involves the extraction of position and composition-based properties within nucleotide sequences. The extraction of features helps in the development of the features vector. Statistical moments were endorsed for dimensionality reduction in the obtained features. RESULTS Through a series of experiments using different computational models and evaluation methods, it was revealed that the proposed predictor, m1A-pred, proved to be the most robust and accurate model for the identification of modified sites. AVAILABILITY AND IMPLEMENTATION To enhance the research on m1A sites, a friendly server was also developed, which was the final phase of this research.
Collapse
Affiliation(s)
- Muhammad Taseer Suleman
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| | - Yaser Daanial Khan
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| |
Collapse
|
8
|
Akmal MA, Hassan MA, Muhammad S, Khurshid KS, Mohamed A. An analytical study on the identification of N-linked glycosylation sites using machine learning model. PeerJ Comput Sci 2022; 8:e1069. [PMID: 36262138 PMCID: PMC9575850 DOI: 10.7717/peerj-cs.1069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Accepted: 07/25/2022] [Indexed: 06/16/2023]
Abstract
N-linked is the most common type of glycosylation which plays a significant role in identifying various diseases such as type I diabetes and cancer and helps in drug development. Most of the proteins cannot perform their biological and psychological functionalities without undergoing such modification. Therefore, it is essential to identify such sites by computational techniques because of experimental limitations. This study aims to analyze and synthesize the progress to discover N-linked places using machine learning methods. It also explores the performance of currently available tools to predict such sites. Almost seventy research articles published in recognized journals of the N-linked glycosylation field have shortlisted after the rigorous filtering process. The findings of the studies have been reported based on multiple aspects: publication channel, feature set construction method, training algorithm, and performance evaluation. Moreover, a literature survey has developed a taxonomy of N-linked sequence identification. Our study focuses on the performance evaluation criteria, and the importance of N-linked glycosylation motivates us to discover resources that use computational methods instead of the experimental method due to its limitations.
Collapse
Affiliation(s)
- Muhammad Aizaz Akmal
- Department of Computer Science, University of Engineering and Technology, KSK, Lahore, Punjab, Pakistan
| | - Muhammad Awais Hassan
- Department of Computer Science, University of Engineering and Technology, Lahore, Punjab, Pakistan
| | - Shoaib Muhammad
- Department of Computer Science, University of Engineering and Technology, Lahore, Punjab, Pakistan
| | - Khaldoon S. Khurshid
- Department of Computer Science, University of Engineering and Technology, Lahore, Punjab, Pakistan
| | | |
Collapse
|
9
|
Li Y, Li X, Liu Y, Yao Y, Huang G. MPMABP: A CNN and Bi-LSTM-Based Method for Predicting Multi-Activities of Bioactive Peptides. Pharmaceuticals (Basel) 2022; 15:707. [PMID: 35745625 PMCID: PMC9231127 DOI: 10.3390/ph15060707] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Revised: 05/23/2022] [Accepted: 05/30/2022] [Indexed: 12/30/2022] Open
Abstract
Bioactive peptides are typically small functional peptides with 2-20 amino acid residues and play versatile roles in metabolic and biological processes. Bioactive peptides are multi-functional, so it is vastly challenging to accurately detect all their functions simultaneously. We proposed a convolution neural network (CNN) and bi-directional long short-term memory (Bi-LSTM)-based deep learning method (called MPMABP) for recognizing multi-activities of bioactive peptides. The MPMABP stacked five CNNs at different scales, and used the residual network to preserve the information from loss. The empirical results showed that the MPMABP is superior to the state-of-the-art methods. Analysis on the distribution of amino acids indicated that the lysine preferred to appear in the anti-cancer peptide, the leucine in the anti-diabetic peptide, and the proline in the anti-hypertensive peptide. The method and analysis are beneficial to recognize multi-activities of bioactive peptides.
Collapse
Affiliation(s)
- You Li
- School of Electrical Engineering, Shaoyang University, Shaoyang 422000, China; (Y.L.); (X.L.)
| | - Xueyong Li
- School of Electrical Engineering, Shaoyang University, Shaoyang 422000, China; (Y.L.); (X.L.)
| | - Yuewu Liu
- College of Information and Intelligence, Hunan Agricultural University, Changsha 410128, China;
| | - Yuhua Yao
- School of Mathematics and Statistics, Hainan Normal University, Haikou 571158, China;
| | - Guohua Huang
- School of Electrical Engineering, Shaoyang University, Shaoyang 422000, China; (Y.L.); (X.L.)
| |
Collapse
|
10
|
Naseer S, Ali RF, Fati SM, Muneer A. Computational identification of 4-carboxyglutamate sites to supplement physiological studies using deep learning. Sci Rep 2022; 12:128. [PMID: 34996975 PMCID: PMC8741832 DOI: 10.1038/s41598-021-03895-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Accepted: 12/03/2021] [Indexed: 01/23/2023] Open
Abstract
In biological systems, Glutamic acid is a crucial amino acid which is used in protein biosynthesis. Carboxylation of glutamic acid is a significant post-translational modification which plays important role in blood coagulation by activating prothrombin to thrombin. Contrariwise, 4-carboxy-glutamate is also found to be involved in diseases including plaque atherosclerosis, osteoporosis, mineralized heart valves, bone resorption and serves as biomarker for onset of these diseases. Owing to the pathophysiological significance of 4-carboxyglutamate, its identification is important to better understand pathophysiological systems. The wet lab identification of prospective 4-carboxyglutamate sites is costly, laborious and time consuming due to inherent difficulties of in-vivo, ex-vivo and in vitro experiments. To supplement these experiments, we proposed, implemented, and evaluated a different approach to develop 4-carboxyglutamate site predictors using pseudo amino acid compositions (PseAAC) and deep neural networks (DNNs). Our approach does not require any feature extraction and employs deep neural networks to learn feature representation of peptide sequences and performing classification thereof. Proposed approach is validated using standard performance evaluation metrics. Among different deep neural networks, convolutional neural network-based predictor achieved best scores on independent dataset with accuracy of 94.7%, AuC score of 0.91 and F1-score of 0.874 which shows the promise of proposed approach. The iCarboxE-Deep server is deployed at https://share.streamlit.io/sheraz-n/carboxyglutamate/app.py .
Collapse
Affiliation(s)
- Sheraz Naseer
- Department of Computer Science, University of Management and Technology, Lahore, 54770, Pakistan
| | - Rao Faizan Ali
- Department of Computer Science, University of Management and Technology, Lahore, 54770, Pakistan.
- Computer and Information Sciences Department, Universiti Teknologi PETRONAS, 32610, Seri Iskandar, Malaysia.
| | - Suliman Mohamed Fati
- College of Computer and Information Sciences, Prince Sultan University, Riyadh, 11586, Saudi Arabia
| | - Amgad Muneer
- Computer and Information Sciences Department, Universiti Teknologi PETRONAS, 32610, Seri Iskandar, Malaysia
| |
Collapse
|
11
|
Alzahrani E, Alghamdi W, Ullah MZ, Khan YD. Identification of stress response proteins through fusion of machine learning models and statistical paradigms. Sci Rep 2021; 11:21767. [PMID: 34741132 PMCID: PMC8571424 DOI: 10.1038/s41598-021-99083-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Accepted: 09/13/2021] [Indexed: 11/08/2022] Open
Abstract
Proteins are a vital component of cells that perform physiological functions to ensure smooth operations of bodily functions. Identification of a protein's function involves a detailed understanding of the structure of proteins. Stress proteins are essential mediators of several responses to cellular stress and are categorized based on their structural characteristics. These proteins are found to be conserved across many eukaryotic and prokaryotic linkages and demonstrate varied crucial functional activities inside a cell. The in-vivo, ex vivo, and in-vitro identification of stress proteins are a time-consuming and costly task. This study is aimed at the identification of stress protein sequences with the aid of mathematical modelling and machine learning methods to supplement the aforementioned wet lab methods. The model developed using Random Forest showed remarkable results with 91.1% accuracy while models based on neural network and support vector machine showed 87.7% and 47.0% accuracy, respectively. Based on evaluation results it was concluded that random-forest based classifier surpassed all other predictors and is suitable for use in practical applications for the identification of stress proteins. Live web server is available at http://biopred.org/stressprotiens , while the webserver code available is at https://github.com/abdullah5naveed/SRP_WebServer.git.
Collapse
Affiliation(s)
- Ebraheem Alzahrani
- Department of Mathematics, Faculty of Science, King Abdulaziz University, P. O. Box 80203, Jeddah, 21589, Saudi Arabia
| | - Wajdi Alghamdi
- Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, P. O. Box 80221, Jeddah, 21589, Saudi Arabia
| | - Malik Zaka Ullah
- Department of Mathematics, Faculty of Science, King Abdulaziz University, P. O. Box 80203, Jeddah, 21589, Saudi Arabia
| | - Yaser Daanial Khan
- Department of Computer Science, University of Management and Technology, Lahore, 54770, Pakistan.
| |
Collapse
|
12
|
Siraj A, Lim DY, Tayara H, Chong KT. UbiComb: A Hybrid Deep Learning Model for Predicting Plant-Specific Protein Ubiquitylation Sites. Genes (Basel) 2021; 12:genes12050717. [PMID: 34064731 PMCID: PMC8151217 DOI: 10.3390/genes12050717] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Revised: 05/06/2021] [Accepted: 05/07/2021] [Indexed: 12/11/2022] Open
Abstract
Protein ubiquitylation is an essential post-translational modification process that performs a critical role in a wide range of biological functions, even a degenerative role in certain diseases, and is consequently used as a promising target for the treatment of various diseases. Owing to the significant role of protein ubiquitylation, these sites can be identified by enzymatic approaches, mass spectrometry analysis, and combinations of multidimensional liquid chromatography and tandem mass spectrometry. However, these large-scale experimental screening techniques are time consuming, expensive, and laborious. To overcome the drawbacks of experimental methods, machine learning and deep learning-based predictors were considered for prediction in a timely and cost-effective manner. In the literature, several computational predictors have been published across species; however, predictors are species-specific because of the unclear patterns in different species. In this study, we proposed a novel approach for predicting plant ubiquitylation sites using a hybrid deep learning model by utilizing convolutional neural network and long short-term memory. The proposed method uses the actual protein sequence and physicochemical properties as inputs to the model and provides more robust predictions. The proposed predictor achieved the best result with accuracy values of 80% and 81% and F-scores of 79% and 82% on the 10-fold cross-validation and an independent dataset, respectively. Moreover, we also compared the testing of the independent dataset with popular ubiquitylation predictors; the results demonstrate that our model significantly outperforms the other methods in prediction classification results.
Collapse
Affiliation(s)
- Arslan Siraj
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, Korea; (A.S.); (D.Y.L.)
| | - Dae Yeong Lim
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, Korea; (A.S.); (D.Y.L.)
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, Korea
- Correspondence: (H.T.); (K.T.C.)
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, Korea; (A.S.); (D.Y.L.)
- Advanced Electronics and Information Research Center, Jeonbuk National University, Jeonju 54896, Korea
- Correspondence: (H.T.); (K.T.C.)
| |
Collapse
|
13
|
iAmideV-Deep: Valine Amidation Site Prediction in Proteins Using Deep Learning and Pseudo Amino Acid Compositions. Symmetry (Basel) 2021. [DOI: 10.3390/sym13040560] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Amidation is an important post translational modification where a peptide ends with an amide group (–NH2) rather than carboxyl group (–COOH). These amidated peptides are less sensitive to proteolytic degradation with extended half-life in the bloodstream. Amides are used in different industries like pharmaceuticals, natural products, and biologically active compounds. The in-vivo, ex-vivo, and in-vitro identification of amidation sites is a costly and time-consuming but important task to study the physiochemical properties of amidated peptides. A less costly and efficient alternative is to supplement wet lab experiments with accurate computational models. Hence, an urgent need exists for efficient and accurate computational models to easily identify amidated sites in peptides. In this study, we present a new predictor, based on deep neural networks (DNN) and Pseudo Amino Acid Compositions (PseAAC), to learn efficient, task-specific, and effective representations for valine amidation site identification. Well-known DNN architectures are used in this contribution to learn peptide sequence representations and classify peptide chains. Of all the different DNN based predictors developed in this study, Convolutional neural network-based model showed the best performance surpassing all other DNN based models and reported literature contributions. The proposed model will supplement in-vivo methods and help scientists to determine valine amidation very efficiently and accurately, which in turn will enhance understanding of the valine amidation in different biological processes.
Collapse
|