101
|
Cao R, Wang M, Bin Y, Zheng C. DLFF-ACP: prediction of ACPs based on deep learning and multi-view features fusion. PeerJ 2021; 9:e11906. [PMID: 34414035 PMCID: PMC8344685 DOI: 10.7717/peerj.11906] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Accepted: 07/14/2021] [Indexed: 01/10/2023] Open
Abstract
An emerging type of therapeutic agent, anticancer peptides (ACPs), has attracted attention because of its lower risk of toxic side effects. However process of identifying ACPs using experimental methods is both time-consuming and laborious. In this study, we developed a new and efficient algorithm that predicts ACPs by fusing multi-view features based on dual-channel deep neural network ensemble model. In the model, one channel used the convolutional neural network CNN to automatically extract the potential spatial features of a sequence. Another channel was used to process and extract more effective features from handcrafted features. Additionally, an effective feature fusion method was explored for the mutual fusion of different features. Finally, we adopted the neural network to predict ACPs based on the fusion features. The performance comparisons across the single and fusion features showed that the fusion of multi-view features could effectively improve the model's predictive ability. Among these, the fusion of the features extracted by the CNN and composition of k-spaced amino acid group pairs achieved the best performance. To further validate the performance of our model, we compared it with other existing methods using two independent test sets. The results showed that our model's area under curve was 0.90, which was higher than that of the other existing methods on the first test set and higher than most of the other existing methods on the second test set. The source code and datasets are available at https://github.com/wame-ng/DLFF-ACP.
Collapse
Affiliation(s)
- Ruifen Cao
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, Hefei, Anhui, China
- Engineering Research Center of Big Data Application in Private Health Medicine, Fujian Province University, Putian, Fujian, China
| | - Meng Wang
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, Hefei, Anhui, China
| | - Yannan Bin
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, Hefei, Anhui, China
- Institutes of Physical Science and Information Technology, Anhui University, Hefei, Anhui, China
| | - Chunhou Zheng
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, Hefei, Anhui, China
- Engineering Research Center of Big Data Application in Private Health Medicine, Fujian Province University, Putian, Fujian, China
| |
Collapse
|
102
|
Chen J, Cheong HH, Siu SWI. xDeep-AcPEP: Deep Learning Method for Anticancer Peptide Activity Prediction Based on Convolutional Neural Network and Multitask Learning. J Chem Inf Model 2021; 61:3789-3803. [PMID: 34327990 DOI: 10.1021/acs.jcim.1c00181] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Cancer is one of the leading causes of death worldwide. Conventional cancer treatment relies on radiotherapy and chemotherapy, but both methods bring severe side effects to patients, as these therapies not only attack cancer cells but also damage normal cells. Anticancer peptides (ACPs) are a promising alternative as therapeutic agents that are efficient and selective against tumor cells. Here, we propose a deep learning method based on convolutional neural networks to predict biological activity (EC50, LC50, IC50, and LD50) against six tumor cells, including breast, colon, cervix, lung, skin, and prostate. We show that models derived with multitask learning achieve better performance than conventional single-task models. In repeated 5-fold cross validation using the CancerPPD data set, the best models with the applicability domain defined obtain an average mean squared error of 0.1758, Pearson's correlation coefficient of 0.8086, and Kendall's correlation coefficient of 0.6156. As a step toward model interpretability, we infer the contribution of each residue in the sequence to the predicted activity by means of feature importance weights derived from the convolutional layers of the model. The present method, referred to as xDeep-AcPEP, will help to identify effective ACPs in rational peptide design for therapeutic purposes. The data, script files for reproducing the experiments, and the final prediction models can be downloaded from http://github.com/chen709847237/xDeep-AcPEP. The web server to directly access this prediction method is at https://app.cbbio.online/acpep/home.
Collapse
Affiliation(s)
- Jiarui Chen
- Department of Computer and Information Science, University of Macau, Avenida da Universidade, Taipa, Macau 999078, China
| | - Hong Hin Cheong
- Department of Computer and Information Science, University of Macau, Avenida da Universidade, Taipa, Macau 999078, China
| | - Shirley W I Siu
- Department of Computer and Information Science, University of Macau, Avenida da Universidade, Taipa, Macau 999078, China.,School of Pharmaceutical Sciences, Universiti Sains Malaysia, 11800 USM, Penang, Malaysia
| |
Collapse
|
103
|
He W, Wang Y, Cui L, Su R, Wei L. Learning embedding features based on multi-sense-scaled attention architecture to improve the predictive performance of anticancer peptides. Bioinformatics 2021; 37:4684-4693. [PMID: 34323948 DOI: 10.1093/bioinformatics/btab560] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2021] [Revised: 07/03/2021] [Accepted: 07/28/2021] [Indexed: 01/10/2023] Open
Abstract
MOTIVATION Anticancer peptides (ACPs) have recently emerged as effective anticancer drugs in cancer therapy. Machine-learning-based predictors have been developed to identify ACPs and achieve satisfactory performance. However, existing methods suffer from experience-based feature engineering, which not only restricts the representation ability of the models to a certain extent but also lacks adaptivity for different data, limiting the further improvement of the predictive performance and impacting the robustness of the predictive models. To alleviate the above problems, we propose a novel deep-learning-based predictor named ACPred-LAF, in which we propose a novel multi-sense and multi-scaled embedding algorithm to automatically learn and extract context sequential characteristics of ACPs. RESULTS Through the feature comparative analysis, we demonstrate that our learnable and self-adaptive embedding features are better than hand-crafted features in capturing discriminative information, which can effectively benefit the performance improvement for ACP prediction. In addition, benchmarking comparison results demonstrate that our ACPred-LAF outperforms the state-of-the-art methods both on existing benchmark datasets and our newly constructed dataset. Furthermore, we also prove and validate the robustness of the model via the data interference experiment. To avoid potential evaluation bias, here we construct a new ACP benchmark dataset named ACP-Mixed by integrating existing datasets. We expect our newly constructed dataset to be a golden standard benchmark dataset in this field. To facilitate the use of our model, we develop a web server as the implementation of ACPred-LAF. AVAILABILITY Our proposed ACPred-LAF, newly constructed benchmark dataset ACP-Mixed are open source collaborative initiatives available in the GitHub repository (https://github.com/TearsWaiting/ACPred-LAF). Besides, a webserver as the implementation of ACPred-LAF that can be accessed via: http://server.malab.cn/ACPred-LAF. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Wenjia He
- School of Software, Shandong University, Jinan, China.,Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| | - Yu Wang
- School of Software, Shandong University, Jinan, China.,Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| | - Lizhen Cui
- School of Software, Shandong University, Jinan, China.,Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| | - Ran Su
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Leyi Wei
- School of Software, Shandong University, Jinan, China.,Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| |
Collapse
|
104
|
Liang X, Li F, Chen J, Li J, Wu H, Li S, Song J, Liu Q. Large-scale comparative review and assessment of computational methods for anti-cancer peptide identification. Brief Bioinform 2021; 22:bbaa312. [PMID: 33316035 PMCID: PMC8294543 DOI: 10.1093/bib/bbaa312] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Revised: 09/30/2020] [Accepted: 08/25/2020] [Indexed: 12/13/2022] Open
Abstract
Anti-cancer peptides (ACPs) are known as potential therapeutics for cancer. Due to their unique ability to target cancer cells without affecting healthy cells directly, they have been extensively studied. Many peptide-based drugs are currently evaluated in the preclinical and clinical trials. Accurate identification of ACPs has received considerable attention in recent years; as such, a number of machine learning-based methods for in silico identification of ACPs have been developed. These methods promote the research on the mechanism of ACPs therapeutics against cancer to some extent. There is a vast difference in these methods in terms of their training/testing datasets, machine learning algorithms, feature encoding schemes, feature selection methods and evaluation strategies used. Therefore, it is desirable to summarize the advantages and disadvantages of the existing methods, provide useful insights and suggestions for the development and improvement of novel computational tools to characterize and identify ACPs. With this in mind, we firstly comprehensively investigate 16 state-of-the-art predictors for ACPs in terms of their core algorithms, feature encoding schemes, performance evaluation metrics and webserver/software usability. Then, comprehensive performance assessment is conducted to evaluate the robustness and scalability of the existing predictors using a well-prepared benchmark dataset. We provide potential strategies for the model performance improvement. Moreover, we propose a novel ensemble learning framework, termed ACPredStackL, for the accurate identification of ACPs. ACPredStackL is developed based on the stacking ensemble strategy combined with SVM, Naïve Bayesian, lightGBM and KNN. Empirical benchmarking experiments against the state-of-the-art methods demonstrate that ACPredStackL achieves a comparative performance for predicting ACPs. The webserver and source code of ACPredStackL is freely available at http://bigdata.biocie.cn/ACPredStackL/ and https://github.com/liangxiaoq/ACPredStackL, respectively.
Collapse
Affiliation(s)
- Xiao Liang
- College of Information Engineering, Northwest A&F University, Yangling, 712100, China
- Shaanxi Key Laboratory of Agricultural Information Perception and Intelligent Service, Yangling, Shaanxi 712100, China
| | - Fuyi Li
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Monash Centre for Data Science, Monash University, Melbourne, VIC 3800, Australia
- Department of Microbiology and Immunology, Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, Victoria, Australia
| | - Jinxiang Chen
- College of Information Engineering, Northwest A&F University, Yangling, 712100, China
| | - Junlong Li
- College of Information Engineering, Northwest A&F University, Yangling, 712100, China
| | - Hao Wu
- College of Information Engineering, Northwest A&F University, Yangling, 712100, China
| | - Shuqin Li
- College of Information Engineering, Northwest A&F University, Yangling, 712100, China
- Shaanxi Key Laboratory of Agricultural Information Perception and Intelligent Service, Yangling, Shaanxi 712100, China
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Monash Centre for Data Science, Monash University, Melbourne, VIC 3800, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia
| | - Quanzhong Liu
- College of Information Engineering, Northwest A&F University, Yangling, 712100, China
- Shaanxi Key Laboratory of Agricultural Information Perception and Intelligent Service, Yangling, Shaanxi 712100, China
| |
Collapse
|
105
|
Wan Y, Wang Z, Lee TY. Incorporating support vector machine with sequential minimal optimization to identify anticancer peptides. BMC Bioinformatics 2021; 22:286. [PMID: 34051755 PMCID: PMC8164238 DOI: 10.1186/s12859-021-03965-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2020] [Accepted: 01/08/2021] [Indexed: 12/09/2022] Open
Abstract
BACKGROUND Cancer is one of the major causes of death worldwide. To treat cancer, the use of anticancer peptides (ACPs) has attracted increased attention in recent years. ACPs are a unique group of small molecules that can target and kill cancer cells fast and directly. However, identifying ACPs by wet-lab experiments is time-consuming and labor-intensive. Therefore, it is significant to develop computational tools for ACPs prediction. Though some ACP prediction tools have been developed recently, their performances are not well enough and most of them do not offer a function to distinguish ACPs from antimicrobial peptides (AMPs). Considering the fact that a growing number of studies have shown that some AMPs exhibit anticancer function, this work tries to build a model for distinguishing AMPs from ACPs in addition to a model that predicts ACPs from whole peptides. RESULTS This study chooses amino acid composition, N5C5, k-space, position-specific scoring matrix (PSSM) as features, and analyzes them by machine learning methods, including support vector machine (SVM) and sequential minimal optimization (SMO) to build a model (model 2) for distinguishing ACPs from whole peptides. Another model (model 1) that distinguishes ACPs from AMPs is also developed. Comparing to previous models, models developed in this research show better performance (accuracy: 85.5% for model 1 and 95.2% for model 2). CONCLUSIONS This work utilizes a new feature, PSSM, which contributes to better performance than other features. In addition to SVM, SMO is used in this research for optimizing SVM and the SMO-optimized models show better performance than non-optimized models. Last but not least, this work provides two different functions, including distinguishing ACPs from AMPs and distinguishing ACPs from all peptides. The second SMO-optimized model, which utilizes PSSM as a feature, performs better than all other existing tools.
Collapse
Affiliation(s)
- Yu Wan
- School of Life and Health Sciences, The Chinese University of Hong Kong, Shenzhen, Shenzhen, 518172, Guangdong, People's Republic of China
| | - Zhuo Wang
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Shenzhen, 518172, Guangdong, People's Republic of China
| | - Tzong-Yi Lee
- School of Life and Health Sciences, The Chinese University of Hong Kong, Shenzhen, Shenzhen, 518172, Guangdong, People's Republic of China.
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Shenzhen, 518172, Guangdong, People's Republic of China.
| |
Collapse
|
106
|
Zhao Y, Wang S, Fei W, Feng Y, Shen L, Yang X, Wang M, Wu M. Prediction of Anticancer Peptides with High Efficacy and Low Toxicity by Hybrid Model Based on 3D Structure of Peptides. Int J Mol Sci 2021; 22:5630. [PMID: 34073203 PMCID: PMC8198792 DOI: 10.3390/ijms22115630] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 04/30/2021] [Accepted: 05/19/2021] [Indexed: 02/07/2023] Open
Abstract
Recently, anticancer peptides (ACPs) have emerged as unique and promising therapeutic agents for cancer treatment compared with antibody and small molecule drugs. In addition to experimental methods of ACPs discovery, it is also necessary to develop accurate machine learning models for ACP prediction. In this study, features were extracted from the three-dimensional (3D) structure of peptides to develop the model, compared to most of the previous computational models, which are based on sequence information. In order to develop ACPs with more potency, more selectivity and less toxicity, the model for predicting ACPs, hemolytic peptides and toxic peptides were established by peptides 3D structure separately. Multiple datasets were collected according to whether the peptide sequence was chemically modified. After feature extraction and screening, diverse algorithms were used to build the model. Twelve models with excellent performance (Acc > 90%) in the ACPs mixed datasets were used to form a hybrid model to predict the candidate ACPs, and then the optimal model of hemolytic peptides (Acc = 73.68%) and toxic peptides (Acc = 85.5%) was used for safety prediction. Novel ACPs were found by using those models, and five peptides were randomly selected to determine their anticancer activity and toxic side effects in vitro experiments.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Min Wang
- State Key Laboratory of Natural Medicines, School of Life Science and Technology, China Pharmaceutical University, Nanjing 210009, China; (Y.Z.); (S.W.); (W.F.); (Y.F.); (L.S.); (X.Y.)
| | - Min Wu
- State Key Laboratory of Natural Medicines, School of Life Science and Technology, China Pharmaceutical University, Nanjing 210009, China; (Y.Z.); (S.W.); (W.F.); (Y.F.); (L.S.); (X.Y.)
| |
Collapse
|
107
|
Charoenkwan P, Chiangjong W, Nantasenamat C, Hasan MM, Manavalan B, Shoombuatong W. StackIL6: a stacking ensemble model for improving the prediction of IL-6 inducing peptides. Brief Bioinform 2021; 22:6271998. [PMID: 33963832 DOI: 10.1093/bib/bbab172] [Citation(s) in RCA: 95] [Impact Index Per Article: 23.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Revised: 03/30/2021] [Accepted: 04/10/2021] [Indexed: 12/13/2022] Open
Abstract
The release of interleukin (IL)-6 is stimulated by antigenic peptides from pathogens as well as by immune cells for activating aggressive inflammation. IL-6 inducing peptides are derived from pathogens and can be used as diagnostic biomarkers for predicting various stages of disease severity as well as being used as IL-6 inhibitors for the suppression of aggressive multi-signaling immune responses. Thus, the accurate identification of IL-6 inducing peptides is of great importance for investigating their mechanism of action as well as for developing diagnostic and immunotherapeutic applications. This study proposes a novel stacking ensemble model (termed StackIL6) for accurately identifying IL-6 inducing peptides. More specifically, StackIL6 was constructed from twelve different feature descriptors derived from three major groups of features (composition-based features, composition-transition-distribution-based features and physicochemical properties-based features) and five popular machine learning algorithms (extremely randomized trees, logistic regression, multi-layer perceptron, support vector machine and random forest). To enhance the utility of baseline models, they were effectively and systematically integrated through a stacking strategy to build the final meta-based model. Extensive benchmarking experiments demonstrated that StackIL6 could achieve significantly better performance than the existing method (IL6PRED) and outperformed its constituent baseline models on both training and independent test datasets, which thereby support its excellent discrimination and generalization abilities. To facilitate easy access to the StackIL6 model, it was established as a freely available web server accessible at http://camt.pythonanywhere.com/StackIL6. It is anticipated that StackIL6 can help to facilitate rapid screening of promising IL-6 inducing peptides for the development of diagnostic and immunotherapeutic applications in the future.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai 50200, Thailand
| | - Wararat Chiangjong
- Pediatric Translational Research Unit, Department of Pediatrics, Faculty of Medicine, Ramathibodi Hospital, Mahidol University, Bangkok 10400, Thailand
| | - Chanin Nantasenamat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| | - Md Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
| | | | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| |
Collapse
|
108
|
Kardani K, Bolhassani A. Antimicrobial/anticancer peptides: bioactive molecules and therapeutic agents. Immunotherapy 2021; 13:669-684. [PMID: 33878901 DOI: 10.2217/imt-2020-0312] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
Antimicrobial peptides (AMPs) have been known as host-defense peptides. These cationic and amphipathic peptides are relatively short (∼5-50 L-amino acids) with molecular weight less than 10 kDa. AMPs have various roles including immunomodulatory, angiogenic and antitumor activities. Anticancer peptides (ACPs) are a main subset of AMPs as a novel therapeutic approach against tumor cells. The physicochemical properties of the ACPs influence their cell penetration, stability and efficiency of targeting. Up to now, several databases and web servers for in silico prediction of AMPs/ACPs have been established prior to the lab analysis. The present review focuses on the recent advancement about AMPs/ACPs activities including their in silico prediction by computational tools and their potential applications as therapeutic agents especially in cancer.
Collapse
Affiliation(s)
- Kimia Kardani
- Department of Hepatitis & AIDS, Pasteur Institute of Iran, Tehran, Iran.,Iranian Comprehensive Hemophilia Care Center, Tehran, Iran
| | - Azam Bolhassani
- Department of Hepatitis & AIDS, Pasteur Institute of Iran, Tehran, Iran
| |
Collapse
|
109
|
Charoenkwan P, Chiangjong W, Lee VS, Nantasenamat C, Hasan MM, Shoombuatong W. Improved prediction and characterization of anticancer activities of peptides using a novel flexible scoring card method. Sci Rep 2021; 11:3017. [PMID: 33542286 PMCID: PMC7862624 DOI: 10.1038/s41598-021-82513-9] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Accepted: 01/18/2021] [Indexed: 01/30/2023] Open
Abstract
As anticancer peptides (ACPs) have attracted great interest for cancer treatment, several approaches based on machine learning have been proposed for ACP identification. Although existing methods have afforded high prediction accuracies, however such models are using a large number of descriptors together with complex ensemble approaches that consequently leads to low interpretability and thus poses a challenge for biologists and biochemists. Therefore, it is desirable to develop a simple, interpretable and efficient predictor for accurate ACP identification as well as providing the means for the rational design of new anticancer peptides with promising potential for clinical application. Herein, we propose a novel flexible scoring card method (FSCM) making use of propensity scores of local and global sequential information for the development of a sequence-based ACP predictor (named iACP-FSCM) for improving the prediction accuracy and model interpretability. To the best of our knowledge, iACP-FSCM represents the first sequence-based ACP predictor for rationalizing an in-depth understanding into the molecular basis for the enhancement of anticancer activities of peptides via the use of FSCM-derived propensity scores. The independent testing results showed that the iACP-FSCM provided accuracies of 0.825 and 0.910 as evaluated on the main and alternative datasets, respectively. Results from comparative benchmarking demonstrated that iACP-FSCM could outperform seven other existing ACP predictors with marked improvements of 7% and 17% for accuracy and MCC, respectively, on the main dataset. Furthermore, the iACP-FSCM (0.910) achieved very comparable results to that of the state-of-the-art ensemble model AntiCP2.0 (0.920) as evaluated on the alternative dataset. Comparative results demonstrated that iACP-FSCM was the most suitable choice for ACP identification and characterization considering its simplicity, interpretability and generalizability. It is highly anticipated that the iACP-FSCM may be a robust tool for the rapid screening and identification of promising ACPs for clinical use.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, 50200, Thailand
| | - Wararat Chiangjong
- Pediatric Translational Research Unit, Department of Pediatrics, Faculty of Medicine, Ramathibodi Hospital, Mahidol University, Bangkok, 10400, Thailand
| | - Vannajan Sanghiran Lee
- Department of Chemistry, Centre of Theoretical and Computational Physics, Faculty of Science, University of Malaya, 50603, Kuala Lumpur, Malaysia
| | - Chanin Nantasenamat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Md Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka, 820-8502, Japan
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| |
Collapse
|
110
|
Zhang YP, Zou Q. PPTPP: a novel therapeutic peptide prediction method using physicochemical property encoding and adaptive feature representation learning. Bioinformatics 2020; 36:3982-3987. [PMID: 32348463 DOI: 10.1093/bioinformatics/btaa275] [Citation(s) in RCA: 64] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Revised: 03/31/2020] [Accepted: 04/22/2020] [Indexed: 12/18/2022] Open
Abstract
MOTIVATION Peptide is a promising candidate for therapeutic and diagnostic development due to its great physiological versatility and structural simplicity. Thus, identifying therapeutic peptides and investigating their properties are fundamentally important. As an inexpensive and fast approach, machine learning-based predictors have shown their strength in therapeutic peptide identification due to excellences in massive data processing. To date, no reported therapeutic peptide predictor can perform high-quality generic prediction and informative physicochemical properties (IPPs) identification simultaneously. RESULTS In this work, Physicochemical Property-based Therapeutic Peptide Predictor (PPTPP), a Random Forest-based prediction method was presented to address this issue. A novel feature encoding and learning scheme were initiated to produce and rank physicochemical property-related features. Besides being capable of predicting multiple therapeutics peptides with high comparability to established predictors, the presented method is also able to identify peptides' informative IPP. Results presented in this work not only illustrated the soundness of its working capacity but also demonstrated its potential for investigating other therapeutic peptides. AVAILABILITY AND IMPLEMENTATION https://github.com/YPZ858/PPTPP. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yu P Zhang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China.,Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, UK
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
111
|
Wei L, He W, Malik A, Su R, Cui L, Manavalan B. Computational prediction and interpretation of cell-specific replication origin sites from multiple eukaryotes by exploiting stacking framework. Brief Bioinform 2020; 22:5956930. [PMID: 33152766 DOI: 10.1093/bib/bbaa275] [Citation(s) in RCA: 76] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Revised: 09/14/2020] [Accepted: 09/21/2020] [Indexed: 12/13/2022] Open
Abstract
Origins of replication sites (ORIs), which refers to the initiative locations of genomic DNA replication, play essential roles in DNA replication process. Detection of ORIs' distribution in genome scale is one of key steps to in-depth understanding their regulation mechanisms. In this study, we presented a novel machine learning-based approach called Stack-ORI encompassing 10 cell-specific prediction models for identifying ORIs from four different eukaryotic species (Homo sapiens, Mus musculus, Drosophila melanogaster and Arabidopsis thaliana). For each cell-specific model, we employed 12 feature encoding schemes that cover nucleic acid composition, position-specific and physicochemical properties information. The optimal feature set was identified from each encoding individually and developed their respective baseline models using the eXtreme Gradient Boosting (XGBoost) classifier. Subsequently, the predicted scores of 12 baseline models are integrated as a novel feature vector to train XGBoost and develop the final model. Extensive experimental results show that Stack-ORI achieves significantly better performance as compared with their baseline models on both training and independent datasets. Interestingly, Stack-ORI consistently outperforms existing predictor in all cell-specific models, not only on training but also on independent test. Moreover, our novel approach provides necessary interpretations that help understanding model success by leveraging the powerful SHapley Additive exPlanation algorithm, thus underlining the most important feature encoding schemes significant for predicting cell-specific ORIs.
Collapse
Affiliation(s)
- Leyi Wei
- computer science from Xiamen University, China
| | - Wenjia He
- School of Software at Shandong University, China
| | - Adeel Malik
- Institute of Intelligence Informatics Technology, Sangmyung University, Seoul, Republic of Korea
| | - Ran Su
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Lizhen Cui
- School of Software, Shandong University, the Deputy Director of the E-Commerce Research Center and the Director of the Research Center of Software and Data Engineering, Jinan
| | | |
Collapse
|
112
|
Hasan MM, Schaduangrat N, Basith S, Lee G, Shoombuatong W, Manavalan B. HLPpred-Fuse: improved and robust prediction of hemolytic peptide and its activity by fusing multiple feature representation. Bioinformatics 2020; 36:3350-3356. [PMID: 32145017 DOI: 10.1093/bioinformatics/btaa160] [Citation(s) in RCA: 148] [Impact Index Per Article: 29.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2020] [Revised: 02/19/2020] [Accepted: 03/03/2020] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION Therapeutic peptides failing at clinical trials could be attributed to their toxicity profiles like hemolytic activity, which hamper further progress of peptides as drug candidates. The accurate prediction of hemolytic peptides (HLPs) and its activity from the given peptides is one of the challenging tasks in immunoinformatics, which is essential for drug development and basic research. Although there are a few computational methods that have been proposed for this aspect, none of them are able to identify HLPs and their activities simultaneously. RESULTS In this study, we proposed a two-layer prediction framework, called HLPpred-Fuse, that can accurately and automatically predict both hemolytic peptides (HLPs or non-HLPs) as well as HLPs activity (high and low). More specifically, feature representation learning scheme was utilized to generate 54 probabilistic features by integrating six different machine learning classifiers and nine different sequence-based encodings. Consequently, the 54 probabilistic features were fused to provide sufficiently converged sequence information which was used as an input to extremely randomized tree for the development of two final prediction models which independently identify HLP and its activity. Performance comparisons over empirical cross-validation analysis, independent test and case study against state-of-the-art methods demonstrate that HLPpred-Fuse consistently outperformed these methods in the identification of hemolytic activity. AVAILABILITY AND IMPLEMENTATION For the convenience of experimental scientists, a web-based tool has been established at http://thegleelab.org/HLPpred-Fuse. CONTACT glee@ajou.ac.kr or watshara.sho@mahidol.ac.th or bala@ajou.ac.kr. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Md Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Iizuka, Fukuoka 820-8502, Japan.,Japan Society for the Promotion of Science, Chiyoda-ku, Tokyo 102-0083, Japan
| | - Nalini Schaduangrat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| | - Shaherin Basith
- Department of Physiology, Ajou University School of Medicine, Suwon 16499, Republic of Korea
| | - Gwang Lee
- Department of Physiology, Ajou University School of Medicine, Suwon 16499, Republic of Korea.,Department of Molecular Science and Technology, Ajou University, Suwon 16499, Republic of Korea
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| | - Balachandran Manavalan
- Department of Physiology, Ajou University School of Medicine, Suwon 16499, Republic of Korea
| |
Collapse
|
113
|
Charoenkwan P, Yana J, Nantasenamat C, Hasan MM, Shoombuatong W. iUmami-SCM: A Novel Sequence-Based Predictor for Prediction and Analysis of Umami Peptides Using a Scoring Card Method with Propensity Scores of Dipeptides. J Chem Inf Model 2020; 60:6666-6678. [DOI: 10.1021/acs.jcim.0c00707] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai 50200, Thailand
| | - Janchai Yana
- Department of Chemistry, Faculty of Science and Technology, Chiang Mai Rajabhat University, Chiang Mai 50300, Thailand
| | - Chanin Nantasenamat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| | - Md. Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| |
Collapse
|
114
|
Yu L, Jing R, Liu F, Luo J, Li Y. DeepACP: A Novel Computational Approach for Accurate Identification of Anticancer Peptides by Deep Learning Algorithm. MOLECULAR THERAPY-NUCLEIC ACIDS 2020; 22:862-870. [PMID: 33230481 PMCID: PMC7658571 DOI: 10.1016/j.omtn.2020.10.005] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Accepted: 10/06/2020] [Indexed: 12/24/2022]
Abstract
Cancer is one of the most dangerous diseases to human health. The accurate prediction of anticancer peptides (ACPs) would be valuable for the development and design of novel anticancer agents. Current deep neural network models have obtained state-of-the-art prediction accuracy for the ACP classification task. However, based on existing studies, it remains unclear which deep learning architecture achieves the best performance. Thus, in this study, we first present a systematic exploration of three important deep learning architectures: convolutional, recurrent, and convolutional-recurrent networks for distinguishing ACPs from non-ACPs. We find that the recurrent neural network with bidirectional long short-term memory cells is superior to other architectures. By utilizing the proposed model, we implement a sequence-based deep learning tool (DeepACP) to accurately predict the likelihood of a peptide exhibiting anticancer activity. The results indicate that DeepACP outperforms several existing methods and can be used as an effective tool for the prediction of anticancer peptides. Furthermore, we visualize and understand the deep learning model. We hope that our strategy can be extended to identify other types of peptides and may provide more assistance to the development of proteomics and new drugs.
Collapse
Affiliation(s)
- Lezheng Yu
- School of Chemistry and Materials Science, Guizhou Education University, Guiyang 550018, China
- Corresponding author: Lezheng Yu, School of Chemistry and Materials Science, Guizhou Education University, Guiyang 550018, China.
| | - Runyu Jing
- College of Cybersecurity, Sichuan University, Chengdu 610065, China
| | - Fengjuan Liu
- School of Geography and Resources, Guizhou Education University, Guiyang 550018, China
| | - Jiesi Luo
- Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou 646000, Sichuan, China
- Corresponding author: Jiesi Luo, Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou 646000, Sichuan, China.
| | - Yizhou Li
- College of Cybersecurity, Sichuan University, Chengdu 610065, China
| |
Collapse
|
115
|
Lathwal A, Kumar R, Raghava GP. Computer-aided designing of oncolytic viruses for overcoming translational challenges of cancer immunotherapy. Drug Discov Today 2020; 25:1198-1205. [DOI: 10.1016/j.drudis.2020.04.008] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Revised: 04/05/2020] [Accepted: 04/15/2020] [Indexed: 12/26/2022]
|
116
|
Liu Q, Chen J, Wang Y, Li S, Jia C, Song J, Li F. DeepTorrent: a deep learning-based approach for predicting DNA N4-methylcytosine sites. Brief Bioinform 2020; 22:5865572. [PMID: 32608476 DOI: 10.1093/bib/bbaa124] [Citation(s) in RCA: 74] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Revised: 05/05/2020] [Accepted: 05/20/2020] [Indexed: 12/27/2022] Open
Abstract
DNA N4-methylcytosine (4mC) is an important epigenetic modification that plays a vital role in regulating DNA replication and expression. However, it is challenging to detect 4mC sites through experimental methods, which are time-consuming and costly. Thus, computational tools that can identify 4mC sites would be very useful for understanding the mechanism of this important type of DNA modification. Several machine learning-based 4mC predictors have been proposed in the past 3 years, although their performance is unsatisfactory. Deep learning is a promising technique for the development of more accurate 4mC site predictions. In this work, we propose a deep learning-based approach, called DeepTorrent, for improved prediction of 4mC sites from DNA sequences. It combines four different feature encoding schemes to encode raw DNA sequences and employs multi-layer convolutional neural networks with an inception module integrated with bidirectional long short-term memory to effectively learn the higher-order feature representations. Dimension reduction and concatenated feature maps from the filters of different sizes are then applied to the inception module. In addition, an attention mechanism and transfer learning techniques are also employed to train the robust predictor. Extensive benchmarking experiments demonstrate that DeepTorrent significantly improves the performance of 4mC site prediction compared with several state-of-the-art methods.
Collapse
Affiliation(s)
- Quanzhong Liu
- College of Information Engineering, Northwest A&F University
| | - Jinxiang Chen
- College of Information Engineering, Northwest A&F University
| | - Yanze Wang
- College of Information Engineering, Northwest A&F University
| | - Shuqin Li
- College of Information Engineering, Northwest A&F University
| | - Cangzhi Jia
- School of Science, Dalian Maritime University
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Australia
| | | |
Collapse
|
117
|
Basith S, Manavalan B, Hwan Shin T, Lee G. Machine intelligence in peptide therapeutics: A next‐generation tool for rapid disease screening. Med Res Rev 2020; 40:1276-1314. [DOI: 10.1002/med.21658] [Citation(s) in RCA: 139] [Impact Index Per Article: 27.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2019] [Revised: 11/26/2019] [Accepted: 12/16/2019] [Indexed: 12/12/2022]
Affiliation(s)
- Shaherin Basith
- Department of PhysiologyAjou University School of MedicineSuwon Republic of Korea
| | | | - Tae Hwan Shin
- Department of PhysiologyAjou University School of MedicineSuwon Republic of Korea
| | - Gwang Lee
- Department of PhysiologyAjou University School of MedicineSuwon Republic of Korea
| |
Collapse
|