1
|
Ezoe A, Shimada Y, Sawada R, Douke A, Shibata T, Kadowaki M, Yamanishi Y. Pathway-based prediction of the therapeutic effects and mode of action of custom-made multiherbal medicines. Mol Inform 2024; 43:e202400108. [PMID: 39404192 DOI: 10.1002/minf.202400108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Revised: 06/23/2024] [Accepted: 06/24/2024] [Indexed: 11/14/2024]
Abstract
Multiherbal medicines are traditionally used as personalized medicines with custom combinations of crude drugs; however, the mechanisms of multiherbal medicines are unclear. In this study, we developed a novel pathway-based method to predict therapeutic effects and the mode of action of custom-made multiherbal medicines using machine learning. This method considers disease-related pathways as therapeutic targets and evaluates the comprehensive influence of constituent compounds on their potential target proteins in the disease-related pathways. Our proposed method enabled us to comprehensively predict new indications of 194 Kampo medicines for 87 diseases. Using Kampo-induced transcriptomic data, we demonstrated that Kampo constituent compounds stimulated the disease-related proteins and a customized Kampo formula enhanced the efficacy compared with an existing Kampo formula. The proposed method will be useful for discovering effective Kampo medicines and optimizing custom-made multiherbal medicines in practice.
Collapse
Affiliation(s)
- Akihiro Ezoe
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Iizuka, Fukuoka, 820-8502, Japan
- Plant Genomic Network Research Team, RIKEN Center for Sustainable Resource Science (CSRS), Yokohama, Kanagawa, 230-0045, Japan
| | - Yuki Shimada
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Iizuka, Fukuoka, 820-8502, Japan
| | - Ryusuke Sawada
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Iizuka, Fukuoka, 820-8502, Japan
- Department of Pharmacology, Okayama University Graduate School of Medicine, Dentistry and Pharmaceutical Sciences, Kita-ku, Okayama, 700-8558, Japan
| | - Akihiro Douke
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Iizuka, Fukuoka, 820-8502, Japan
| | - Tomokazu Shibata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Iizuka, Fukuoka, 820-8502, Japan
| | - Makoto Kadowaki
- Research Center for Pre-Disease Science, University of Toyama, Sugitani, Toyama, 930-0194, Japan
| | - Yoshihiro Yamanishi
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Iizuka, Fukuoka, 820-8502, Japan
- Graduate School of Informatics, Nagoya University, Chikusa, Nagoya, 464-8601, Japan
| |
Collapse
|
2
|
Alidoost M, Wilson JL. Preclinical side effect prediction through pathway engineering of protein interaction network models. CPT Pharmacometrics Syst Pharmacol 2024; 13:1180-1200. [PMID: 38736280 PMCID: PMC11247120 DOI: 10.1002/psp4.13150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Revised: 03/01/2024] [Accepted: 04/08/2024] [Indexed: 05/14/2024] Open
Abstract
Modeling tools aim to predict potential drug side effects, although they suffer from imperfect performance. Specifically, protein-protein interaction models predict drug effects from proteins surrounding drug targets, but they tend to overpredict drug phenotypes and require well-defined pathway phenotypes. In this study, we used PathFX, a protein-protein interaction tool, to predict side effects for active ingredient-side effect pairs extracted from drug labels. We observed limited performance and defined new pathway phenotypes using pathway engineering strategies. We defined new pathway phenotypes using a network-based and gene expression-based approach. Overall, we discovered a trade-off between sensitivity and specificity values and demonstrated a way to limit overprediction for side effects with sufficient true positive examples. We compared our predictions to animal models and demonstrated similar performance metrics, suggesting that protein-protein interaction models do not need perfect evaluation metrics to be useful. Pathway engineering, through the inclusion of true positive examples and omics measurements, emerges as a promising approach to enhance the utility of protein interaction network models for drug effect prediction.
Collapse
Affiliation(s)
- Mohammadali Alidoost
- Department of Bioengineering, University of California, Los Angeles, California, USA
| | - Jennifer L Wilson
- Department of Bioengineering, University of California, Los Angeles, California, USA
| |
Collapse
|
3
|
Waseem T, Rajput TA, Mushtaq MS, Babar MM, Rajadas J. Computational biology approaches for drug repurposing. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2024; 205:91-109. [PMID: 38789189 DOI: 10.1016/bs.pmbts.2024.03.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2024]
Abstract
The drug discovery and development (DDD) process greatly relies on the data available in various forms to generate hypotheses for novel drug design. The complex and heterogeneous nature of biological data makes it difficult to utilize or gather meaningful information as such. Computational biology techniques have provided us with opportunities to better understand biological systems through refining and organizing large amounts of data into actionable and systematic purviews. The drug repurposing approach has been utilized to overcome the expansive time periods and costs associated with traditional drug development. It deals with discovering new uses of already approved drugs that have an established safety and efficacy profile, thereby, requiring them to go through fewer development phases. Thus, drug repurposing through computational biology provides a systematic approach to drug development and overcomes the constraints of traditional processes. The current chapter covers the basics, approaches and tools of computational biology that can be employed to effectively develop repurposing profile of already approved drug molecules.
Collapse
Affiliation(s)
- Tanya Waseem
- Shifa College of Pharmaceutical Sciences, Shifa Tameer-e-Millat University, Islamabad, Pakistan
| | - Tausif Ahmed Rajput
- Shifa College of Pharmaceutical Sciences, Shifa Tameer-e-Millat University, Islamabad, Pakistan
| | | | - Mustafeez Mujtaba Babar
- Shifa College of Pharmaceutical Sciences, Shifa Tameer-e-Millat University, Islamabad, Pakistan; Advanced Drug Delivery and Regenerative Biomaterials Laboratory, Cardiovascular Institute and Pulmonary and Critical Care Medicine, Stanford University School of Medicine, Stanford University, Palo Alto, CA, United States.
| | - Jayakumar Rajadas
- Advanced Drug Delivery and Regenerative Biomaterials Laboratory, Cardiovascular Institute and Pulmonary and Critical Care Medicine, Stanford University School of Medicine, Stanford University, Palo Alto, CA, United States
| |
Collapse
|
4
|
Gao Z, Winhusen TJ, Gorenflo M, Ghitza UE, Davis PB, Kaelber DC, Xu R. Repurposing ketamine to treat cocaine use disorder: integration of artificial intelligence-based prediction, expert evaluation, clinical corroboration and mechanism of action analyses. Addiction 2023; 118:1307-1319. [PMID: 36792381 PMCID: PMC10631254 DOI: 10.1111/add.16168] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Accepted: 01/25/2023] [Indexed: 02/17/2023]
Abstract
BACKGROUND AND AIMS Cocaine use disorder (CUD) is a significant public health issue for which there is no Food and Drug Administration (FDA) approved medication. Drug repurposing looks for new cost-effective uses of approved drugs. This study presents an integrated strategy to identify repurposed FDA-approved drugs for CUD treatment. DESIGN Our drug repurposing strategy combines artificial intelligence (AI)-based drug prediction, expert panel review, clinical corroboration and mechanisms of action analysis being implemented in the National Drug Abuse Treatment Clinical Trials Network (CTN). Based on AI-based prediction and expert knowledge, ketamine was ranked as the top candidate for clinical corroboration via electronic health record (EHR) evaluation of CUD patient cohorts prescribed ketamine for anesthesia or depression compared with matched controls who received non-ketamine anesthesia or antidepressants/midazolam. Genetic and pathway enrichment analyses were performed to understand ketamine's potential mechanisms of action in the context of CUD. SETTING The study utilized TriNetX to access EHRs from more than 90 million patients world-wide. Genetic- and functional-level analyses used DisGeNet, Search Tool for Interactions of Chemicals and Kyoto Encyclopedia of Genes and Genomes databases. PARTICIPANTS A total of 7742 CUD patients who received anesthesia (3871 ketamine-exposed and 3871 anesthetic-controlled) and 7910 CUD patients with depression (3955 ketamine-exposed and 3955 antidepressant-controlled) were identified after propensity score-matching. MEASUREMENTS EHR analysis outcome was a CUD remission diagnosis within 1 year of drug prescription. FINDINGS Patients with CUD prescribed ketamine for anesthesia displayed a significantly higher rate of CUD remission compared with matched individuals prescribed other anesthetics [hazard ratio (HR) = 1.98, 95% confidence interval (CI) = 1.42-2.78]. Similarly, CUD patients prescribed ketamine for depression evidenced a significantly higher CUD remission ratio compared with matched patients prescribed antidepressants or midazolam (HR = 4.39, 95% CI = 2.89-6.68). The mechanism of action analysis revealed that ketamine directly targets multiple CUD-associated genes (BDNF, CNR1, DRD2, GABRA2, GABRB3, GAD1, OPRK1, OPRM1, SLC6A3, SLC6A4) and pathways implicated in neuroactive ligand-receptor interaction, cAMP signaling and cocaine abuse/dependence. CONCLUSIONS Ketamine appears to be a potential repurposed drug for treatment of cocaine use disorder.
Collapse
Affiliation(s)
- Zhenxiang Gao
- Center for Artificial Intelligence in Drug Discovery, School of Medicine, Case Western Reserve University, Cleveland, OH, USA
| | - T. John Winhusen
- Center for Addiction Research, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | - Maria Gorenflo
- Center for Artificial Intelligence in Drug Discovery, School of Medicine, Case Western Reserve University, Cleveland, OH, USA
- Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH, USA
| | - Udi E. Ghitza
- Center for the Clinical Trials Network (CCTN), National Institute on Drug Abuse (NIDA), National Institutes of Health (NIH), Bethesda, MD, USA
| | - Pamela B. Davis
- Center for Community Health Integration, School of Medicine, Case Western Reserve University, Cleveland, OH, USA
| | - David C. Kaelber
- Center for Clinical Informatics Research and Education, The Metro Health System, Cleveland, OH, USA
| | - Rong Xu
- Center for Artificial Intelligence in Drug Discovery, School of Medicine, Case Western Reserve University, Cleveland, OH, USA
| |
Collapse
|
5
|
Koutroumpa NM, Papavasileiou KD, Papadiamantis AG, Melagraki G, Afantitis A. A Systematic Review of Deep Learning Methodologies Used in the Drug Discovery Process with Emphasis on In Vivo Validation. Int J Mol Sci 2023; 24:6573. [PMID: 37047543 PMCID: PMC10095548 DOI: 10.3390/ijms24076573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2022] [Revised: 03/24/2023] [Accepted: 03/28/2023] [Indexed: 04/05/2023] Open
Abstract
The discovery and development of new drugs are extremely long and costly processes. Recent progress in artificial intelligence has made a positive impact on the drug development pipeline. Numerous challenges have been addressed with the growing exploitation of drug-related data and the advancement of deep learning technology. Several model frameworks have been proposed to enhance the performance of deep learning algorithms in molecular design. However, only a few have had an immediate impact on drug development since computational results may not be confirmed experimentally. This systematic review aims to summarize the different deep learning architectures used in the drug discovery process and are validated with further in vivo experiments. For each presented study, the proposed molecule or peptide that has been generated or identified by the deep learning model has been biologically evaluated in animal models. These state-of-the-art studies highlight that even if artificial intelligence in drug discovery is still in its infancy, it has great potential to accelerate the drug discovery cycle, reduce the required costs, and contribute to the integration of the 3R (Replacement, Reduction, Refinement) principles. Out of all the reviewed scientific articles, seven algorithms were identified: recurrent neural networks, specifically, long short-term memory (LSTM-RNNs), Autoencoders (AEs) and their Wasserstein Autoencoders (WAEs) and Variational Autoencoders (VAEs) variants; Convolutional Neural Networks (CNNs); Direct Message Passing Neural Networks (D-MPNNs); and Multitask Deep Neural Networks (MTDNNs). LSTM-RNNs were the most used architectures with molecules or peptide sequences as inputs.
Collapse
Affiliation(s)
- Nikoletta-Maria Koutroumpa
- Department of ChemoInformatics, NovaMechanics Ltd., Nicosia 1070, Cyprus
- School of Chemical Engineering, National Technical University of Athens, 157 80 Athens, Greece
- Division of Data Driven Innovation, Entelos Institute, Larnaca 6059, Cyprus
| | - Konstantinos D. Papavasileiou
- Department of ChemoInformatics, NovaMechanics Ltd., Nicosia 1070, Cyprus
- Division of Data Driven Innovation, Entelos Institute, Larnaca 6059, Cyprus
- Department of ChemoInformatics, NovaMechanics MIKE., 185 45 Piraeus, Greece
| | - Anastasios G. Papadiamantis
- Department of ChemoInformatics, NovaMechanics Ltd., Nicosia 1070, Cyprus
- Division of Data Driven Innovation, Entelos Institute, Larnaca 6059, Cyprus
| | - Georgia Melagraki
- Division of Physical Sciences & Applications, Hellenic Military Academy, 166 73 Vari, Greece
| | - Antreas Afantitis
- Department of ChemoInformatics, NovaMechanics Ltd., Nicosia 1070, Cyprus
- Division of Data Driven Innovation, Entelos Institute, Larnaca 6059, Cyprus
- Department of ChemoInformatics, NovaMechanics MIKE., 185 45 Piraeus, Greece
| |
Collapse
|
6
|
De Vita S, Chini MG, Bifulco G, Lauro G. Target identification by structure-based computational approaches: Recent advances and perspectives. Bioorg Med Chem Lett 2023; 83:129171. [PMID: 36739998 DOI: 10.1016/j.bmcl.2023.129171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Revised: 12/15/2022] [Accepted: 02/01/2023] [Indexed: 02/05/2023]
Abstract
The use of computational techniques in the early stages of drug discovery has recently experienced a boost, especially in the target identification step. Finding the biological partner(s) for new or existing synthetic and/or natural compounds by "wet" approaches may be challenging; therefore, preliminary in silico screening is even more recommended. After a brief overview of some of the most known target identification techniques, recent advances in structure-based computational approaches for target identification are reported in this digest, focusing on Inverse Virtual Screening and its recent applications. Moreover, future perspectives concerning the use of such methodologies, coupled or not with other approaches, are analyzed.
Collapse
Affiliation(s)
- Simona De Vita
- Department of Pharmacy, University of Salerno, Via Giovanni Paolo II 132, 84084 Fisciano (SA), Italy
| | - Maria Giovanna Chini
- Department of Biosciences and Territory, University of Molise, Contrada Fonte Lappone, 86090 Pesche (IS), Italy
| | - Giuseppe Bifulco
- Department of Pharmacy, University of Salerno, Via Giovanni Paolo II 132, 84084 Fisciano (SA), Italy.
| | - Gianluigi Lauro
- Department of Pharmacy, University of Salerno, Via Giovanni Paolo II 132, 84084 Fisciano (SA), Italy.
| |
Collapse
|
7
|
Qin S, Li W, Yu H, Xu M, Li C, Fu L, Sun S, He Y, Lv J, He W, Chen L. Guiding Drug Repositioning for Cancers Based on Drug Similarity Networks. Int J Mol Sci 2023; 24:ijms24032244. [PMID: 36768566 PMCID: PMC9917231 DOI: 10.3390/ijms24032244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2022] [Revised: 01/05/2023] [Accepted: 01/16/2023] [Indexed: 01/24/2023] Open
Abstract
Drug repositioning aims to discover novel clinical benefits of existing drugs, is an effective way to develop drugs for complex diseases such as cancer and may facilitate the process of traditional drug development. Meanwhile, network-based computational biology approaches, which allow the integration of information from different aspects to understand the relationships between biomolecules, has been successfully applied to drug repurposing. In this work, we developed a new strategy for network-based drug repositioning against cancer. Combining the mechanism of action and clinical efficacy of the drugs, a cancer-related drug similarity network was constructed, and the correlation score of each drug with a specific cancer was quantified. The top 5% of scoring drugs were reviewed for stability and druggable potential to identify potential repositionable drugs. Of the 11 potentially repurposable drugs for non-small cell lung cancer (NSCLC), 10 were confirmed by clinical trial articles and databases. The targets of these drugs were significantly enriched in cancer-related pathways and significantly associated with the prognosis of NSCLC. In light of the successful application of our approach to colorectal cancer as well, it provides an effective clue and valuable perspective for drug repurposing in cancer.
Collapse
Affiliation(s)
- Shimei Qin
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Wan Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Hongzheng Yu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Manyi Xu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Chao Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Lei Fu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Shibin Sun
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Yuehan He
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Junjie Lv
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Weiming He
- Institute of Opto-Electronics, Harbin Institute of Technology, Harbin 150001, China
| | - Lina Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
- Correspondence: ; Tel.: +86-451-8667-4768
| |
Collapse
|
8
|
Song T, Dai H, Wang S, Wang G, Zhang X, Zhang Y, Jiao L. TransCluster: A Cell-Type Identification Method for single-cell RNA-Seq data using deep learning based on transformer. Front Genet 2022; 13:1038919. [PMID: 36303549 PMCID: PMC9592860 DOI: 10.3389/fgene.2022.1038919] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Accepted: 09/23/2022] [Indexed: 11/25/2022] Open
Abstract
Recent advances in single-cell RNA sequencing (scRNA-seq) have accelerated the development of techniques to classify thousands of cells through transcriptome profiling. As more and more scRNA-seq data become available, supervised cell type classification methods using externally well-annotated source data become more popular than unsupervised clustering algorithms. However, accurate cellular annotation of single cell transcription data remains a significant challenge. Here, we propose a hybrid network structure called TransCluster, which uses linear discriminant analysis and a modified Transformer to enhance feature learning. It is a cell-type identification tool for single-cell transcriptomic maps. It shows high accuracy and robustness in many cell data sets of different human tissues. It is superior to other known methods in external test data set. To our knowledge, TransCluster is the first attempt to use Transformer for annotating cell types of scRNA-seq, which greatly improves the accuracy of cell-type identification.
Collapse
Affiliation(s)
- Tao Song
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, China
- Department of Artificial Intelligence, Faculty of Computer Science, Campus de Montegancedo, Polytechnical University of Madrid, Boadilla Del Monte, Madrid, Spain
- *Correspondence: Tao Song, ; Shuang Wang,
| | - Huanhuan Dai
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, China
| | - Shuang Wang
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, China
- *Correspondence: Tao Song, ; Shuang Wang,
| | - Gan Wang
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, China
| | - Xudong Zhang
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, China
| | - Ying Zhang
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, China
| | - Linfang Jiao
- College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, China
| |
Collapse
|
9
|
Multi-TransDTI: Transformer for Drug–Target Interaction Prediction Based on Simple Universal Dictionaries with Multi-View Strategy. Biomolecules 2022; 12:biom12050644. [PMID: 35625572 PMCID: PMC9138327 DOI: 10.3390/biom12050644] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 04/19/2022] [Accepted: 04/25/2022] [Indexed: 01/03/2023] Open
Abstract
Prediction on drug–target interaction has always been a crucial link for drug discovery and repositioning, which have witnessed tremendous progress in recent years. Despite many efforts made, the existing representation learning or feature generation approaches of both drugs and proteins remain complicated as well as in high dimension. In addition, it is difficult for current methods to extract local important residues from sequence information while remaining focused on global structure. At the same time, massive data is not always easily accessible, which makes model learning from small datasets imminent. As a result, we propose an end-to-end learning model with SUPD and SUDD methods to encode drugs and proteins, which not only leave out the complicated feature extraction process but also greatly reduce the dimension of the embedding matrix. Meanwhile, we use a multi-view strategy with a transformer to extract local important residues of proteins for better representation learning. Finally, we evaluate our model on the BindingDB dataset in comparisons with different state-of-the-art models from comprehensive indicators. In results of 100% BindingDB, our AUC, AUPR, ACC, and F1-score reached 90.9%, 89.8%, 84.2%, and 84.3% respectively, which successively exceed the average values of other models by 2.2%, 2.3%, 2.6%, and 2.6%. Moreover, our model also generally surpasses their performance on 30% and 50% BindingDB datasets.
Collapse
|
10
|
Wang X, Zhang Z, Zhang C, Meng X, Shi X, Qu P. TransPhos: A Deep-Learning Model for General Phosphorylation Site Prediction Based on Transformer-Encoder Architecture. Int J Mol Sci 2022; 23:ijms23084263. [PMID: 35457080 PMCID: PMC9029334 DOI: 10.3390/ijms23084263] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 04/04/2022] [Accepted: 04/09/2022] [Indexed: 02/06/2023] Open
Abstract
Protein phosphorylation is one of the most critical post-translational modifications of proteins in eukaryotes, which is essential for a variety of biological processes. Plenty of attempts have been made to improve the performance of computational predictors for phosphorylation site prediction. However, most of them are based on extra domain knowledge or feature selection. In this article, we present a novel deep learning-based predictor, named TransPhos, which is constructed using a transformer encoder and densely connected convolutional neural network blocks, for predicting phosphorylation sites. Data experiments are conducted on the datasets of PPA (version 3.0) and Phospho. ELM. The experimental results show that our TransPhos performs better than several deep learning models, including Convolutional Neural Networks (CNN), Long-term and short-term memory networks (LSTM), Recurrent neural networks (RNN) and Fully connected neural networks (FCNN), and some state-of-the-art deep learning-based prediction tools, including GPS2.1, NetPhos, PPRED, Musite, PhosphoSVM, SKIPHOS, and DeepPhos. Our model achieves a good performance on the training datasets of Serine (S), Threonine (T), and Tyrosine (Y), with AUC values of 0.8579, 0.8335, and 0.6953 using 10-fold cross-validation tests, respectively, and demonstrates that the presented TransPhos tool considerably outperforms competing predictors in general protein phosphorylation site prediction.
Collapse
Affiliation(s)
- Xun Wang
- College of Computer Science and Technology, China University of Petroleum, Qingdao 266555, China; (Z.Z.); (C.Z.); (X.M.); (X.S.); (P.Q.)
- State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080, China
- Correspondence:
| | - Zhiyuan Zhang
- College of Computer Science and Technology, China University of Petroleum, Qingdao 266555, China; (Z.Z.); (C.Z.); (X.M.); (X.S.); (P.Q.)
| | - Chaogang Zhang
- College of Computer Science and Technology, China University of Petroleum, Qingdao 266555, China; (Z.Z.); (C.Z.); (X.M.); (X.S.); (P.Q.)
| | - Xiangyu Meng
- College of Computer Science and Technology, China University of Petroleum, Qingdao 266555, China; (Z.Z.); (C.Z.); (X.M.); (X.S.); (P.Q.)
| | - Xin Shi
- College of Computer Science and Technology, China University of Petroleum, Qingdao 266555, China; (Z.Z.); (C.Z.); (X.M.); (X.S.); (P.Q.)
| | - Peng Qu
- College of Computer Science and Technology, China University of Petroleum, Qingdao 266555, China; (Z.Z.); (C.Z.); (X.M.); (X.S.); (P.Q.)
| |
Collapse
|