Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Chaudhari M, Thapa N, Roy K, Newman RH, Saigo H, B K C D. DeepRMethylSite: a deep learning based approach for prediction of arginine methylation sites in proteins. Mol Omics 2021;16:448-454. [PMID: 32555810 DOI: 10.1039/d0mo00025f] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]

For:	Chaudhari M, Thapa N, Roy K, Newman RH, Saigo H, B K C D. DeepRMethylSite: a deep learning based approach for prediction of arginine methylation sites in proteins. Mol Omics 2021;16:448-454. [PMID: 32555810 DOI: 10.1039/d0mo00025f] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]

Number

Cited by Other Article(s)

Gu L, Chen T, Li J, Huang YA, Du Z, Leung VCM, Chen J. Hybrid Bayesian Optimization-Based Graphical Discovery for Methylation Sites Prediction. IEEE J Biomed Health Inform 2024;28:1917-1926. [PMID: 37801389 DOI: 10.1109/jbhi.2023.3322560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/08/2023]

Khandelwal M, Kumar Rout R. DeepPRMS: advanced deep learning model to predict protein arginine methylation sites. Brief Funct Genomics 2024:elae001. [PMID: 38267081 DOI: 10.1093/bfgp/elae001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2024] [Revised: 11/17/2023] [Accepted: 01/03/2024] [Indexed: 01/26/2024] Open

Khandelwal M, Rout RK. PRMxAI: protein arginine methylation sites prediction based on amino acid spatial distribution using explainable artificial intelligence. BMC Bioinformatics 2023;24:376. [PMID: 37794362 PMCID: PMC10548713 DOI: 10.1186/s12859-023-05491-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 09/21/2023] [Indexed: 10/06/2023] Open

Abstract

BACKGROUND

Protein methylation, a post-translational modification, is crucial in regulating various cellular functions. Arginine methylation is required to understand crucial biochemical activities and biological functions, like gene regulation, signal transduction, etc. However, some experimental methods, including Chip-Chip, mass spectrometry, and methylation-specific antibodies, exist for the prediction of methylated proteins. These experimental methods are expensive and tedious. As a result, computational methods based on machine learning play an efficient role in predicting arginine methylation sites.

RESULTS

In this research, a novel method called PRMxAI has been proposed to predict arginine methylation sites. The proposed PRMxAI extract sequence-based features, such as dipeptide composition, physicochemical properties, amino acid composition, and information theory-based features (Arimoto, Havrda-Charvat, Renyi, and Shannon entropy), to represent the protein sequences into numerical format. Various machine learning algorithms are implemented to select the better classifier, such as Decision trees, Naive Bayes, Random Forest, Support vector machines, and K-nearest neighbors. The random forest algorithm is selected as the underlying classifier for the PRMxAI model. The performance of PRMxAI is evaluated by employing 10-fold cross-validation, and it yields 87.17% and 90.40% accuracy on mono-methylarginine and di-methylarginine data sets, respectively. This research also examines the impact of various features on both data sets using explainable artificial intelligence.

CONCLUSIONS

The proposed PRMxAI shows the effectiveness of the features for predicting arginine methylation sites. Additionally, the SHapley Additive exPlanation method is used to interpret the predictive mechanism of the proposed model. The results indicate that the proposed PRMxAI model outperforms other state-of-the-art predictors.

Collapse

Pakhrin SC, Pokharel S, Pratyush P, Chaudhari M, Ismail HD, Kc DB. LMPhosSite: A Deep Learning-Based Approach for General Protein Phosphorylation Site Prediction Using Embeddings from the Local Window Sequence and Pretrained Protein Language Model. J Proteome Res 2023;22:2548-2557. [PMID: 37459437 DOI: 10.1021/acs.jproteome.2c00667] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/05/2023]

Wang H, Yao Z, Luo R, Liu J, Wang Z, Zhang G. LaCOme: Learning the latent convolutional patterns among transcriptomic features to improve classifications. Gene 2023;862:147246. [PMID: 36736509 DOI: 10.1016/j.gene.2023.147246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 12/22/2022] [Accepted: 01/27/2023] [Indexed: 02/04/2023]

Abstract

OMIC is a novel approach that analyses entire genetic or molecular profiles in humans and other organisms. It involves identifying and quantifying biological molecules that contribute to a species' structure, function, and dynamics. Finding the secrets of OMIC is like deciphering the biochemical code, but building data-driven models to mine the hidden phenotypic trait information has been a research hotspot. Transcriptome analysis is a popular biological technology for characterizing living systems' overall health, including cells and tissues. Individual transcript expression levels are known to be correlated with those of other transcripts. Nevertheless, most computational studies do not fully exploit these inter-feature correlations. Differential expression analyses, for example, assume that the expression levels of the transcripts are independent. Thus, we propose extracting these inter-feature correlations using the convolutional neural network (CNN) and transforming the transcriptomic features into a new space of convolutional transcriptomic (LaCOme) features. On most transcriptomic datasets in use, a series of comprehensive experiments have demonstrated that engineered LaCOme features outperform the original transcriptomic features in classification performances. Based on experimental results, OMIC data from biological samples could be further enriched using CNN to enhance computational analysis results. Also, feature rough screening can be used to extract valuable information from OMIC, regardless of the algorithm used to select features. It may always be better to create a novel feature than to keep the original. Furthermore, we investigated the feasibility of the feature construction method through cross-validation and independent verification, hoping to develop a more efficient and effective method.

Collapse

Zhou H, Tan W, Shi S. DeepGpgs: a novel deep learning framework for predicting arginine methylation sites combined with Gaussian prior and gated self-attention mechanism. Brief Bioinform 2023;24:7000314. [PMID: 36694944 DOI: 10.1093/bib/bbad018] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 12/26/2022] [Accepted: 01/04/2023] [Indexed: 01/26/2023] Open

Khandelwal M, Kumar Rout R, Umer S, Mallik S, Li A. Multifactorial feature extraction and site prognosis model for protein methylation data. Brief Funct Genomics 2023;22:20-30. [PMID: 36310537 DOI: 10.1093/bfgp/elac034] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 09/23/2022] [Accepted: 09/28/2022] [Indexed: 01/24/2023] Open

Zhao J, Jiang H, Zou G, Lin Q, Wang Q, Liu J, Ma L. CNNArginineMe: A CNN structure for training models for predicting arginine methylation sites based on the One-Hot encoding of peptide sequence. Front Genet 2022;13:1036862. [PMID: 36324513 PMCID: PMC9618650 DOI: 10.3389/fgene.2022.1036862] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Accepted: 10/04/2022] [Indexed: 11/30/2022] Open

Nguyen-Vo TH, Trinh QH, Nguyen L, Nguyen-Hoang PU, Rahardja S, Nguyen BP. iPromoter-Seqvec: identifying promoters using bidirectional long short-term memory and sequence-embedded features. BMC Genomics 2022;23:681. [PMID: 36192696 PMCID: PMC9531353 DOI: 10.1186/s12864-022-08829-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 08/08/2022] [Indexed: 11/30/2022] Open

Abstract

Background

Promoters, non-coding DNA sequences located at upstream regions of the transcription start site of genes/gene clusters, are essential regulatory elements for the initiation and regulation of transcriptional processes. Furthermore, identifying promoters in DNA sequences and genomes significantly contributes to discovering entire structures of genes of interest. Therefore, exploration of promoter regions is one of the most imperative topics in molecular genetics and biology. Besides experimental techniques, computational methods have been developed to predict promoters. In this study, we propose iPromoter-Seqvec – an efficient computational model to predict TATA and non-TATA promoters in human and mouse genomes using bidirectional long short-term memory neural networks in combination with sequence-embedded features extracted from input sequences. The promoter and non-promoter sequences were retrieved from the Eukaryotic Promoter database and then were refined to create four benchmark datasets.

Results

The area under the receiver operating characteristic curve (AUCROC) and the area under the precision-recall curve (AUCPR) were used as two key metrics to evaluate model performance. Results on independent test sets showed that iPromoter-Seqvec outperformed other state-of-the-art methods with AUCROC values ranging from 0.85 to 0.99 and AUCPR values ranging from 0.86 to 0.99. Models predicting TATA promoters in both species had slightly higher predictive power compared to those predicting non-TATA promoters. With a novel idea of constructing artificial non-promoter sequences based on promoter sequences, our models were able to learn highly specific characteristics discriminating promoters from non-promoters to improve predictive efficiency.

Conclusions

iPromoter-Seqvec is a stable and robust model for predicting both TATA and non-TATA promoters in human and mouse genomes. Our proposed method was also deployed as an online web server with a user-friendly interface to support research communities. Links to our source codes and web server are available at https://github.com/mldlproject/2022-iPromoter-Seqvec.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12864-022-08829-6.

Collapse

Ali SD, Tayara H, Chong KT. Interpretable machine learning identification of arginine methylation sites. Comput Biol Med 2022;147:105767. [DOI: 10.1016/j.compbiomed.2022.105767] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 06/12/2022] [Accepted: 06/18/2022] [Indexed: 11/25/2022]

Deep Learning-Based Advances In Protein Posttranslational Modification Site and Protein Cleavage Prediction. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022;2499:285-322. [PMID: 35696087 DOI: 10.1007/978-1-0716-2317-6_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]

Lumbanraja FR, Mahesworo B, Cenggoro TW, Sudigyo D, Pardamean B. SSMFN: a fused spatial and sequential deep learning model for methylation site prediction. PeerJ Comput Sci 2021;7:e683. [PMID: 34541311 PMCID: PMC8409337 DOI: 10.7717/peerj-cs.683] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Accepted: 07/30/2021] [Indexed: 05/12/2023]

Abstract

BACKGROUND

Conventional in vivo methods for post-translational modification site prediction such as spectrophotometry, Western blotting, and chromatin immune precipitation can be very expensive and time-consuming. Neural networks (NN) are one of the computational approaches that can predict effectively the post-translational modification site. We developed a neural network model, namely the Sequential and Spatial Methylation Fusion Network (SSMFN), to predict possible methylation sites on protein sequences.

METHOD

We designed our model to be able to extract spatial and sequential information from amino acid sequences. Convolutional neural networks (CNN) is applied to harness spatial information, while long short-term memory (LSTM) is applied for sequential data. The latent representation of the CNN and LSTM branch are then fused. Afterwards, we compared the performance of our proposed model to the state-of-the-art methylation site prediction models on the balanced and imbalanced dataset.

RESULTS

Our model appeared to be better in almost all measurement when trained on the balanced training dataset. On the imbalanced training dataset, all of the models gave better performance since they are trained on more data. In several metrics, our model also surpasses the PRMePred model, which requires a laborious effort for feature extraction and selection.

CONCLUSION

Our models achieved the best performance across different environments in almost all measurements. Also, our result suggests that the NN model trained on a balanced training dataset and tested on an imbalanced dataset will offer high specificity and low sensitivity. Thus, the NN model for methylation site prediction should be trained on an imbalanced dataset. Since in the actual application, there are far more negative samples than positive samples.

Collapse

Islam MKB, Rahman J, Hasan MAM, Ahmad S. predForm-Site: Formylation site prediction by incorporating multiple features and resolving data imbalance. Comput Biol Chem 2021;94:107553. [PMID: 34384997 DOI: 10.1016/j.compbiolchem.2021.107553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Revised: 06/22/2021] [Accepted: 07/28/2021] [Indexed: 10/20/2022]

Chaudhari M, Thapa N, Ismail H, Chopade S, Caragea D, Köhn M, Newman RH, Kc DB. DTL-DephosSite: Deep Transfer Learning Based Approach to Predict Dephosphorylation Sites. Front Cell Dev Biol 2021;9:662983. [PMID: 34249915 PMCID: PMC8264445 DOI: 10.3389/fcell.2021.662983] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Accepted: 05/20/2021] [Indexed: 11/17/2022] Open

Thapa N, Chaudhari M, Iannetta AA, White C, Roy K, Newman RH, Hicks LM, Kc DB. A deep learning based approach for prediction of Chlamydomonas reinhardtii phosphorylation sites. Sci Rep 2021;11:12550. [PMID: 34131195 PMCID: PMC8206365 DOI: 10.1038/s41598-021-91840-w] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Accepted: 05/28/2021] [Indexed: 11/23/2022] Open

Abstract

Protein phosphorylation, which is one of the most important post-translational modifications (PTMs), is involved in regulating myriad cellular processes. Herein, we present a novel deep learning based approach for organism-specific protein phosphorylation site prediction in Chlamydomonas reinhardtii, a model algal phototroph. An ensemble model combining convolutional neural networks and long short-term memory (LSTM) achieves the best performance in predicting phosphorylation sites in C. reinhardtii. Deemed Chlamy-EnPhosSite, the measured best AUC and MCC are 0.90 and 0.64 respectively for a combined dataset of serine (S) and threonine (T) in independent testing higher than those measures for other predictors. When applied to the entire C. reinhardtii proteome (totaling 1,809,304 S and T sites), Chlamy-EnPhosSite yielded 499,411 phosphorylated sites with a cut-off value of 0.5 and 237,949 phosphorylated sites with a cut-off value of 0.7. These predictions were compared to an experimental dataset of phosphosites identified by liquid chromatography-tandem mass spectrometry (LC–MS/MS) in a blinded study and approximately 89.69% of 2,663 C. reinhardtii S and T phosphorylation sites were successfully predicted by Chlamy-EnPhosSite at a probability cut-off of 0.5 and 76.83% of sites were successfully identified at a more stringent 0.7 cut-off. Interestingly, Chlamy-EnPhosSite also successfully predicted experimentally confirmed phosphorylation sites in a protein sequence (e.g., RPS6 S245) which did not appear in the training dataset, highlighting prediction accuracy and the power of leveraging predictions to identify biologically relevant PTM sites. These results demonstrate that our method represents a robust and complementary technique for high-throughput phosphorylation site prediction in C. reinhardtii. It has potential to serve as a useful tool to the community. Chlamy-EnPhosSite will contribute to the understanding of how protein phosphorylation influences various biological processes in this important model microalga.

Collapse

Di Blasi R, Blyuss O, Timms JF, Conole D, Ceroni F, Whitwell HJ. Non-Histone Protein Methylation: Biological Significance and Bioengineering Potential. ACS Chem Biol 2021;16:238-250. [PMID: 33411495 DOI: 10.1021/acschembio.0c00771] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]

Arafat ME, Ahmad MW, Shovan S, Dehzangi A, Dipta SR, Hasan MAM, Taherzadeh G, Shatabda S, Sharma A. Accurately Predicting Glutarylation Sites Using Sequential Bi-Peptide-Based Evolutionary Features. Genes (Basel) 2020;11:E1023. [PMID: 32878321 PMCID: PMC7565944 DOI: 10.3390/genes11091023] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Revised: 08/19/2020] [Accepted: 08/27/2020] [Indexed: 02/07/2023] Open