51
|
Deng Y, Ma S, Li J, Zheng B, Lv Z. Using the Random Forest for Identifying Key Physicochemical Properties of Amino Acids to Discriminate Anticancer and Non-Anticancer Peptides. Int J Mol Sci 2023; 24:10854. [PMID: 37446031 PMCID: PMC10341712 DOI: 10.3390/ijms241310854] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 06/17/2023] [Accepted: 06/26/2023] [Indexed: 07/15/2023] Open
Abstract
Anticancer peptides (ACPs) represent a promising new therapeutic approach in cancer treatment. They can target cancer cells without affecting healthy tissues or altering normal physiological functions. Machine learning algorithms have increasingly been utilized for predicting peptide sequences with potential ACP effects. This study analyzed four benchmark datasets based on a well-established random forest (RF) algorithm. The peptide sequences were converted into 566 physicochemical features extracted from the amino acid index (AAindex) library, which were then subjected to feature selection using four methods: light gradient-boosting machine (LGBM), analysis of variance (ANOVA), chi-squared test (Chi2), and mutual information (MI). Presenting and merging the identified features using Venn diagrams, 19 key amino acid physicochemical properties were identified that can be used to predict the likelihood of a peptide sequence functioning as an ACP. The results were quantified by performance evaluation metrics to determine the accuracy of predictions. This study aims to enhance the efficiency of designing peptide sequences for cancer treatment.
Collapse
Affiliation(s)
- Yiting Deng
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China; (Y.D.); (S.M.); (B.Z.)
| | - Shuhan Ma
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China; (Y.D.); (S.M.); (B.Z.)
| | - Jiayu Li
- College of Life Science, Sichuan University, Chengdu 610065, China;
| | - Bowen Zheng
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China; (Y.D.); (S.M.); (B.Z.)
| | - Zhibin Lv
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China; (Y.D.); (S.M.); (B.Z.)
| |
Collapse
|
52
|
Deng H, Ding M, Wang Y, Li W, Liu G, Tang Y. ACP-MLC: A two-level prediction engine for identification of anticancer peptides and multi-label classification of their functional types. Comput Biol Med 2023; 158:106844. [PMID: 37058760 DOI: 10.1016/j.compbiomed.2023.106844] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2023] [Revised: 03/09/2023] [Accepted: 03/30/2023] [Indexed: 04/07/2023]
Abstract
Anticancer peptides (ACPs), a series of short bioactive peptides, are promising candidates in fighting against cancer due to their high activity, low toxicity, and not likely cause drug resistance. The accurate identification of ACPs and classification of their functional types is of great importance for investigating their mechanisms of action and developing peptide-based anticancer therapies. Here, we provided a computational tool, called ACP-MLC, to address binary classification and multi-label classification of ACPs for a given peptide sequence. Briefly, ACP-MLC is a two-level prediction engine, in which the 1st-level model predicts whether a query sequence is an ACP or not by random forest algorithm, and the 2nd-level model predicts which tissue types the sequence might target by the binary relevance algorithm. Development and evaluation by high-quality datasets, our ACP-MLC yielded an area under the receiver operating characteristic curve (AUC) of 0.888 on the independent test set for the 1st-level prediction, and obtained 0.157 hamming loss, 0.577 subset accuracy, 0.802 F1-scoremacro, and 0.826 F1-scoremicro on the independent test set for the 2nd-level prediction. A systematic comparison demonstrated that ACP-MLC outperformed existing binary classifiers and other multi-label learning classifiers for ACP prediction. Finally, we interpreted the important features of ACP-MLC by the SHAP method. User-friendly software and the datasets are available at https://github.com/Nicole-DH/ACP-MLC. We believe that the ACP-MLC would be a powerful tool in ACP discovery.
Collapse
|
53
|
Li Y, Ma D, Chen D, Chen Y. ACP-GBDT: An improved anticancer peptide identification method with gradient boosting decision tree. Front Genet 2023; 14:1165765. [PMID: 37065496 PMCID: PMC10090421 DOI: 10.3389/fgene.2023.1165765] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 03/09/2023] [Indexed: 03/31/2023] Open
Abstract
Cancer is one of the most dangerous diseases in the world, killing millions of people every year. Drugs composed of anticancer peptides have been used to treat cancer with low side effects in recent years. Therefore, identifying anticancer peptides has become a focus of research. In this study, an improved anticancer peptide predictor named ACP-GBDT, based on gradient boosting decision tree (GBDT) and sequence information, is proposed. To encode the peptide sequences included in the anticancer peptide dataset, ACP-GBDT uses a merged-feature composed of AAIndex and SVMProt-188D. A GBDT is adopted to train the prediction model in ACP-GBDT. Independent testing and ten-fold cross-validation show that ACP-GBDT can effectively distinguish anticancer peptides from non-anticancer ones. The comparison results of the benchmark dataset show that ACP-GBDT is simpler and more effective than other existing anticancer peptide prediction methods.
Collapse
Affiliation(s)
- Yanjuan Li
- College of Electrical and Information Engineering, Quzhou University, Quzhou, China
| | - Di Ma
- College of Computer, Hangzhou Dianzi University, Hangzhou, China
| | - Dong Chen
- College of Electrical and Information Engineering, Quzhou University, Quzhou, China
- *Correspondence: Dong Chen, ; Yu Chen,
| | - Yu Chen
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
- *Correspondence: Dong Chen, ; Yu Chen,
| |
Collapse
|
54
|
Charoenkwan P, Chumnanpuen P, Schaduangrat N, Oh C, Manavalan B, Shoombuatong W. PSRQSP: An effective approach for the interpretable prediction of quorum sensing peptide using propensity score representation learning. Comput Biol Med 2023; 158:106784. [PMID: 36989748 DOI: 10.1016/j.compbiomed.2023.106784] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2022] [Revised: 02/07/2023] [Accepted: 03/10/2023] [Indexed: 03/14/2023]
Abstract
Quorum sensing peptides (QSPs) are microbial signaling molecules involved in several cellular processes, such as cellular communication, virulence expression, bioluminescence, and swarming, in various bacterial species. Understanding QSPs is essential for identifying novel drug targets for controlling bacterial populations and pathogenicity. In this study, we present a novel computational approach (PSRQSP) for improving the prediction and analysis of QSPs. In PSRQSP, we develop a novel propensity score representation learning (PSR) scheme. Specifically, we utilized the PSR approach to extract and learn a comprehensive set of estimated propensities of 20 amino acids, 400 dipeptides, and 400 g-gap dipeptides from a pool of scoring card method-based models. Finally, to maximize the utility of the propensity scores, we explored a set of optimal propensity scores and combined them to construct a final meta-predictor. Our experimental results showed that combining multiview propensity scores was more beneficial for identifying QSPs than the conventional feature descriptors. Moreover, extensive benchmarking experiments based on the independent test were sufficient to demonstrate the predictive capability and effectiveness of PSRQSP by outperforming the conventional ML-based and existing methods, with an accuracy of 94.44% and AUC of 0.967. PSR-derived propensity scores were employed to determine the crucial physicochemical properties for a better understanding of the functional mechanisms of QSPs. Finally, we constructed an easy-to-use web server for the PSRQSP (http://pmlabstack.pythonanywhere.com/PSRQSP). PSRQSP is anticipated to be an efficient computational tool for accelerating the data-driven discovery of potential QSPs for drug discovery and development.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, 50200, Thailand
| | - Pramote Chumnanpuen
- Department of Zoology, Faculty of Science, Kasetsart University, Bangkok, 10900, Thailand; Omics Center for Agriculture, Bioresources, Food, and Health, Kasetsart University (OmiKU), Bangkok, 10900, Thailand
| | - Nalini Schaduangrat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Changmin Oh
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Gyeonggi-do, Republic of Korea
| | - Balachandran Manavalan
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Gyeonggi-do, Republic of Korea.
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| |
Collapse
|
55
|
Charoenkwan P, Schaduangrat N, Pham NT, Manavalan B, Shoombuatong W. Pretoria: An effective computational approach for accurate and high-throughput identification of CD8+ t-cell epitopes of eukaryotic pathogens. Int J Biol Macromol 2023; 238:124228. [PMID: 36996953 DOI: 10.1016/j.ijbiomac.2023.124228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 03/11/2023] [Accepted: 03/25/2023] [Indexed: 03/31/2023]
Abstract
T-cells recognize antigenic epitopes present on major histocompatibility complex (MHC) molecules, triggering an adaptive immune response in the host. T-cell epitope (TCE) identification is challenging because of the extensive number of undetermined proteins found in eukaryotic pathogens, as well as MHC polymorphisms. In addition, conventional experimental approaches for TCE identification are time-consuming and expensive. Thus, computational approaches that can accurately and rapidly identify CD8+ T-cell epitopes (TCEs) of eukaryotic pathogens based solely on sequence information may facilitate the discovery of novel CD8+ TCEs in a cost-effective manner. Here, Pretoria (Predictor of CD8+ TCEs of eukaryotic pathogens) is proposed as the first stack-based approach for accurate and large-scale identification of CD8+ TCEs of eukaryotic pathogens. In particular, Pretoria enabled the extraction and exploration of crucial information embedded in CD8+ TCEs by employing a comprehensive set of 12 well-known feature descriptors extracted from multiple groups, including physicochemical properties, composition-transition-distribution, pseudo-amino acid composition, and amino acid composition. These feature descriptors were then utilized to construct a pool of 144 different machine learning (ML)-based classifiers based on 12 popular ML algorithms. Finally, the feature selection method was used to effectively determine the important ML classifiers for the construction of our stacked model. The experimental results indicated that Pretoria is an accurate and effective computational approach for CD8+ TCE prediction; it was superior to several conventional ML classifiers and the existing method in terms of the independent test, with an accuracy of 0.866, MCC of 0.732, and AUC of 0.921. Additionally, to maximize user convenience for high-throughput identification of CD8+ TCEs of eukaryotic pathogens, a user-friendly web server of Pretoria (http://pmlabstack.pythonanywhere.com/Pretoria) was developed and made freely available.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai 50200, Thailand
| | - Nalini Schaduangrat
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| | - Nhat Truong Pham
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea
| | - Balachandran Manavalan
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea.
| | - Watshara Shoombuatong
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand.
| |
Collapse
|
56
|
Ghaly G, Tallima H, Dabbish E, Badr ElDin N, Abd El-Rahman MK, Ibrahim MAA, Shoeib T. Anti-Cancer Peptides: Status and Future Prospects. Molecules 2023; 28:molecules28031148. [PMID: 36770815 PMCID: PMC9920184 DOI: 10.3390/molecules28031148] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2022] [Revised: 12/26/2022] [Accepted: 01/19/2023] [Indexed: 01/26/2023] Open
Abstract
The dramatic rise in cancer incidence, alongside treatment deficiencies, has elevated cancer to the second-leading cause of death globally. The increasing morbidity and mortality of this disease can be traced back to a number of causes, including treatment-related side effects, drug resistance, inadequate curative treatment and tumor relapse. Recently, anti-cancer bioactive peptides (ACPs) have emerged as a potential therapeutic choice within the pharmaceutical arsenal due to their high penetration, specificity and fewer side effects. In this contribution, we present a general overview of the literature concerning the conformational structures, modes of action and membrane interaction mechanisms of ACPs, as well as provide recent examples of their successful employment as targeting ligands in cancer treatment. The use of ACPs as a diagnostic tool is summarized, and their advantages in these applications are highlighted. This review expounds on the main approaches for peptide synthesis along with their reconstruction and modification needed to enhance their therapeutic effect. Computational approaches that could predict therapeutic efficacy and suggest ACP candidates for experimental studies are discussed. Future research prospects in this rapidly expanding area are also offered.
Collapse
Affiliation(s)
- Gehane Ghaly
- Department of Chemistry, The American University in Cairo, New Cairo 11835, Egypt
| | - Hatem Tallima
- Department of Chemistry, The American University in Cairo, New Cairo 11835, Egypt
| | - Eslam Dabbish
- Department of Chemistry, The American University in Cairo, New Cairo 11835, Egypt
| | - Norhan Badr ElDin
- Analytical Chemistry Department, Faculty of Pharmacy, Cairo University, Kasr-El Aini Street, Cairo 11562, Egypt
| | - Mohamed K. Abd El-Rahman
- Analytical Chemistry Department, Faculty of Pharmacy, Cairo University, Kasr-El Aini Street, Cairo 11562, Egypt
- Department of Chemistry and Chemical Biology, Harvard University, 12 Oxford Street, Cambridge, MA 02138, USA
| | - Mahmoud A. A. Ibrahim
- Computational Chemistry Laboratory, Chemistry Department, Faculty of Science, Minia University, Minia 61519, Egypt
- School of Health Sciences, University of Kwa-Zulu-Natal, Westville, Durban 4000, South Africa
| | - Tamer Shoeib
- Department of Chemistry, The American University in Cairo, New Cairo 11835, Egypt
- Correspondence:
| |
Collapse
|
57
|
Yu H, Luo X. IPPF-FE: an integrated peptide and protein function prediction framework based on fused features and ensemble models. Brief Bioinform 2023; 24:6834141. [PMID: 36403184 DOI: 10.1093/bib/bbac476] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Revised: 09/23/2022] [Accepted: 10/05/2022] [Indexed: 11/21/2022] Open
Abstract
The prediction of peptide and protein function is important for research and industrial applications, and many machine learning methods have been developed for this purpose. The existing models have encountered many challenges, including the lack of effective and comprehensive features and the limited applicability of each model. Here, we introduce an Integrated Peptide and Protein function prediction Framework based on Fused features and Ensemble models (IPPF-FE), which can accurately capture the relationship between features and labels. The results indicated that IPPF-FE outperformed existing state-of-the-art (SOTA) models on more than 8 different categories of peptide and protein tasks. In addition, t-distributed Stochastic Neighbour Embedding demonstrated the advantages of IPPF-FE. We anticipate that our method will become a versatile tool for peptide and protein prediction tasks and shed light on the future development of related models. The model is open source and available in the GitHub repository https://github.com/Luo-SynBioLab/IPPF-FE.
Collapse
Affiliation(s)
- Han Yu
- Center for Synthetic Biochemistry, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Xiaozhou Luo
- Shenzhen Key Laboratory for the Intelligent Microbial Manufacturing of Medicines, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.,University of Chinese Academy of Sciences, Beijing 100049, China.,CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.,Center for Synthetic Biochemistry, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| |
Collapse
|
58
|
Yuan Q, Chen K, Yu Y, Le NQK, Chua MCH. Prediction of anticancer peptides based on an ensemble model of deep learning and machine learning using ordinal positional encoding. Brief Bioinform 2023; 24:6987656. [PMID: 36642410 DOI: 10.1093/bib/bbac630] [Citation(s) in RCA: 47] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Revised: 12/01/2022] [Accepted: 12/28/2022] [Indexed: 01/17/2023] Open
Abstract
Anticancer peptides (ACPs) are the types of peptides that have been demonstrated to have anticancer activities. Using ACPs to prevent cancer could be a viable alternative to conventional cancer treatments because they are safer and display higher selectivity. Due to ACP identification being highly lab-limited, expensive and lengthy, a computational method is proposed to predict ACPs from sequence information in this study. The process includes the input of the peptide sequences, feature extraction in terms of ordinal encoding with positional information and handcrafted features, and finally feature selection. The whole model comprises of two modules, including deep learning and machine learning algorithms. The deep learning module contained two channels: bidirectional long short-term memory (BiLSTM) and convolutional neural network (CNN). Light Gradient Boosting Machine (LightGBM) was used in the machine learning module. Finally, this study voted the three models' classification results for the three paths resulting in the model ensemble layer. This study provides insights into ACP prediction utilizing a novel method and presented a promising performance. It used a benchmark dataset for further exploration and improvement compared with previous studies. Our final model has an accuracy of 0.7895, sensitivity of 0.8153 and specificity of 0.7676, and it was increased by at least 2% compared with the state-of-the-art studies in all metrics. Hence, this paper presents a novel method that can potentially predict ACPs more effectively and efficiently. The work and source codes are made available to the community of researchers and developers at https://github.com/khanhlee/acp-ope/.
Collapse
Affiliation(s)
- Qitong Yuan
- Institute of Systems Science, National University of Singapore, 25 Heng Mui Keng Terrace, 119615, Singapore, Singapore
| | - Keyi Chen
- Institute of Systems Science, National University of Singapore, 25 Heng Mui Keng Terrace, 119615, Singapore, Singapore
| | - Yimin Yu
- Institute of Systems Science, National University of Singapore, 25 Heng Mui Keng Terrace, 119615, Singapore, Singapore
| | - Nguyen Quoc Khanh Le
- Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, 250 Wuxing St, 106, Taipei, Taiwan.,Research Center for Artificial Intelligence in Medicine, Taipei Medical University, 250 Wuxing St, 106, Taipei, Taiwan.,Translational Imaging Research Center, Taipei Medical University Hospital, 252 Wuxing St, 110, Taipei, Taiwan
| | - Matthew Chin Heng Chua
- Institute of Systems Science, National University of Singapore, 25 Heng Mui Keng Terrace, 119615, Singapore, Singapore
| |
Collapse
|
59
|
Guo X, Tiwari P, Zou Q, Ding Y. Subspace projection-based weighted echo state networks for predicting therapeutic peptides. Knowl Based Syst 2023. [DOI: 10.1016/j.knosys.2023.110307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
60
|
PSRTTCA: A new approach for improving the prediction and characterization of tumor T cell antigens using propensity score representation learning. Comput Biol Med 2023; 152:106368. [PMID: 36481763 DOI: 10.1016/j.compbiomed.2022.106368] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Revised: 10/19/2022] [Accepted: 11/25/2022] [Indexed: 11/27/2022]
Abstract
Despite the arsenal of existing cancer therapies, the ongoing recurrence and new cases of cancer pose a serious health concern that necessitates the development of new and effective treatments. Cancer immunotherapy, which uses the body's immune system to combat cancer, is a promising treatment option. As a result, in silico methods for identifying and characterizing tumor T cell antigens (TTCAs) would be useful for better understanding their functional mechanisms. Although few computational methods for TTCA identification have been developed, their lack of model interpretability is a major drawback. Thus, developing computational methods for the effective identification and characterization of TTCAs is a critical endeavor. PSRTTCA, a new machine learning (ML)-based approach for improving the identification and characterization of TTCAs based on their primary sequences, is proposed in this study. Specifically, we introduce a new propensity score representation learning algorithm that allows one to generate various sets of propensity scores of amino acids, dipeptides, and g-gap dipeptides to be TTCAs. To enhance the predictive performance, optimal sets of variant propensity scores were determined and fed into the final meta-predictor (PSRTTCA). Benchmarking results revealed that PSRTTCA was a more precise and promising tool for the identification and characterization of TTCAs than conventional ML classifiers and existing methods. Furthermore, PSR-derived propensities of amino acids in becoming TTCAs are used to reveal the relationship between TTCAs and their informative physicochemical properties in order to provide insights into TTCA characteristics. Finally, a user-friendly online computational platform of PSRTTCA is publicly available at http://pmlabstack.pythonanywhere.com/PSRTTCA. The PSRTTCA predictor is anticipated to facilitate community-wide efforts in accelerating the discovery of novel TTCAs for cancer immunotherapy and other clinical applications.
Collapse
|
61
|
Liang Y, Ma X. iACP-GE: accurate identification of anticancer peptides by using gradient boosting decision tree and extra tree. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2023; 34:1-19. [PMID: 36562289 DOI: 10.1080/1062936x.2022.2160011] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 12/12/2022] [Indexed: 06/17/2023]
Abstract
Cancer is one of the main diseases threatening human life, accounting for millions of deaths around the world each year. Traditional physical and chemical methods for cancer treatment are extremely time-consuming, lab-intensive, expensive, inefficient and difficult to be applied in a high-throughput way. Hence, it is an urgent task to develop automated computational methods to enable fast and accurate identification of anticancer peptides (ACPs). In this paper, we develop a novel model named iACP-GE to identify ACPs. Multi-features are extracted by using binary encoding, enhanced grouped amino acid composition and BLOSUM62 encoding based on the N5C5 sequence, as well as detrended forward moving-average auto-cross correlation analysis based on physicochemical properties of 20 natural amino acids. Thus, 835 features are obtained for each sample, in order to avoid information redundancy, gradient boosting decision tree was adopted as the feature selection strategy. Then, the optimal feature subset is input to the extra tree classifier. The accuracies of ACP740 and ACP240 datasets with the 5-fold cross-validation were 90.54% and 91.25%, respectively. Experimental results indicate that iACP-GE significantly outperforms several existing models on ACP740 and ACP240 datasets and can be used as an effective tool for the identification of ACPs. The datasets and source codes for iACP-GE are available at https://github.com/yunyunliang88/iACP-GE.
Collapse
Affiliation(s)
- Y Liang
- School of Science, Xi'an Polytechnic University, Xi'an, P. R. China
| | - X Ma
- School of Science, Xi'an Polytechnic University, Xi'an, P. R. China
| |
Collapse
|
62
|
Wu X, Zeng W, Lin F. GCNCPR-ACPs: a novel graph convolution network method for ACPs prediction. BMC Bioinformatics 2022; 23:560. [PMID: 36564705 PMCID: PMC9789540 DOI: 10.1186/s12859-022-04771-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 05/31/2022] [Indexed: 12/25/2022] Open
Abstract
BACKGROUND Anticancer peptide (ACP) inhibits and kills tumor cells. Research on ACP is of great significance for the development of new drugs, and the prediction of ACPs and non-ACPs is the new hotspot. RESULTS We propose a new machine learning-based method named GCNCPR-ACPs (a Graph Convolutional Neural Network Method based on collapse pooling and residual network to predict the ACPs), which automatically and accurately predicts ACPs using residual graph convolution networks, differentiable graph pooling, and features extracted using peptide sequence information extraction. The GCNCPR-ACPs method can effectively capture different levels of node attributes for amino acid node representation learning, GCNCPR-ACPs uses node2vec and one-hot embedding methods to extract initial amino acid features for ACP prediction. CONCLUSIONS Experimental results of ten-fold cross-validation and independent validation based on different metrics showed that GCNCPR-ACPs significantly outperformed state-of-the-art methods. Specifically, the evaluation indicators of Matthews Correlation Coefficient (MCC) and AUC of our predicator were 69.5% and 90%, respectively, which were 4.3% and 2% higher than those of the other predictors, respectively, in ten-fold cross-validation. And in the independent test, the scores of MCC and SP were 69.6% and 93.9%, respectively, which were 37.6% and 5.5% higher than those of the other predictors, respectively. The overall results showed that the GCNCPR-ACPs method proposed in the current paper can effectively predict ACPs.
Collapse
Affiliation(s)
- Xiujin Wu
- grid.12955.3a0000 0001 2264 7233School of Informatics, Xiamen University, Xiamen, Fujian China
| | - Wenhua Zeng
- grid.12955.3a0000 0001 2264 7233School of Informatics, Xiamen University, Xiamen, Fujian China
| | - Fan Lin
- grid.12955.3a0000 0001 2264 7233School of Informatics, Xiamen University, Xiamen, Fujian China ,grid.2515.30000 0004 0378 8438Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA USA
| |
Collapse
|
63
|
Ao C, Jiao S, Wang Y, Yu L, Zou Q. Biological Sequence Classification: A Review on Data and General Methods. RESEARCH (WASHINGTON, D.C.) 2022; 2022:0011. [PMID: 39285948 PMCID: PMC11404319 DOI: 10.34133/research.0011] [Citation(s) in RCA: 47] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Accepted: 10/25/2022] [Indexed: 09/19/2024]
Abstract
With the rapid development of biotechnology, the number of biological sequences has grown exponentially. The continuous expansion of biological sequence data promotes the application of machine learning in biological sequences to construct predictive models for mining biological sequence information. There are many branches of biological sequence classification research. In this review, we mainly focus on the function and modification classification of biological sequences based on machine learning. Sequence-based prediction and analysis are the basic tasks to understand the biological functions of DNA, RNA, proteins, and peptides. However, there are hundreds of classification models developed for biological sequences, and the quite varied specific methods seem dizzying at first glance. Here, we aim to establish a long-term support website (http://lab.malab.cn/~acy/BioseqData/home.html), which provides readers with detailed information on the classification method and download links to relevant datasets. We briefly introduce the steps to build an effective model framework for biological sequence data. In addition, a brief introduction to single-cell sequencing data analysis methods and applications in biology is also included. Finally, we discuss the current challenges and future perspectives of biological sequence classification research.
Collapse
Affiliation(s)
- Chunyan Ao
- School of Computer Science and Technology, Xidian University, Xi'an, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Shihu Jiao
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Yansu Wang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Liang Yu
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
64
|
Zhou C, Peng D, Liao B, Jia R, Wu F. ACP_MS: prediction of anticancer peptides based on feature extraction. Brief Bioinform 2022; 23:6793775. [PMID: 36326080 DOI: 10.1093/bib/bbac462] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 09/10/2022] [Accepted: 09/27/2022] [Indexed: 11/06/2022] Open
Abstract
Anticancer peptides (ACPs) are bioactive peptides with antitumor activity and have become the most promising drugs in the treatment of cancer. Therefore, the accurate prediction of ACPs is of great significance to the research of cancer diseases. In the paper, we developed a more efficient prediction model called ACP_MS. Firstly, the monoMonoKGap method is used to extract the characteristic of anticancer peptide sequences and form the digital features. Then, the AdaBoost model is used to select the most discriminating features from the digital features. Finally, a stochastic gradient descent algorithm is introduced to identify anticancer peptide sequences. We adopt 7-fold cross-validation and independent test set validation, and the final accuracy of the main dataset reached 92.653% and 91.597%, respectively. The accuracy of the alternate dataset reached 98.678% and 98.317%, respectively. Compared with other advanced prediction models, the ACP_MS model improves the identification ability of anticancer peptide sequences. The data of this model can be downloaded from the public website for free https://github.com/Zhoucaimao1998/Zc.
Collapse
Affiliation(s)
- Caimao Zhou
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China.,School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Dejun Peng
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China.,School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Bo Liao
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China.,School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Ranran Jia
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China.,School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Fangxiang Wu
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China.,School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| |
Collapse
|
65
|
Bi XA, Mao Y, Luo S, Wu H, Zhang L, Luo X, Xu L. A novel generation adversarial network framework with characteristics aggregation and diffusion for brain disease classification and feature selection. Brief Bioinform 2022; 23:6762742. [PMID: 36259367 DOI: 10.1093/bib/bbac454] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 09/01/2022] [Accepted: 09/23/2022] [Indexed: 12/14/2022] Open
Abstract
Imaging genetics provides unique insights into the pathological studies of complex brain diseases by integrating the characteristics of multi-level medical data. However, most current imaging genetics research performs incomplete data fusion. Also, there is a lack of effective deep learning methods to analyze neuroimaging and genetic data jointly. Therefore, this paper first constructs the brain region-gene networks to intuitively represent the association pattern of pathogenetic factors. Second, a novel feature information aggregation model is constructed to accurately describe the information aggregation process among brain region nodes and gene nodes. Finally, a deep learning method called feature information aggregation and diffusion generative adversarial network (FIAD-GAN) is proposed to efficiently classify samples and select features. We focus on improving the generator with the proposed convolution and deconvolution operations, with which the interpretability of the deep learning framework has been dramatically improved. The experimental results indicate that FIAD-GAN can not only achieve superior results in various disease classification tasks but also extract brain regions and genes closely related to AD. This work provides a novel method for intelligent clinical decisions. The relevant biomedical discoveries provide a reliable reference and technical basis for the clinical diagnosis, treatment and pathological analysis of disease.
Collapse
Affiliation(s)
- Xia-An Bi
- Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing, and College of Information Science and Engineering in Hunan Normal University, Changsha, P.R. China
| | - Yuhua Mao
- Department of Computing, School of Information Science and Engineering, Hunan Normal University, Changsha, China
| | - Sheng Luo
- Department of Computing, School of Information Science and Engineering, Hunan Normal University, Changsha, China
| | - Hao Wu
- Department of Computing, School of Information Science and Engineering, Hunan Normal University, Changsha, China
| | - Lixia Zhang
- School of Information Science and Engineering, Hunan Normal University, Changsha, P.R. China
| | - Xun Luo
- College of Information Science and Engineering in Hunan Normal University, Changsha, P.R. China
| | - Luyun Xu
- College of Business in Hunan Normal University, Changsha, P.R. China
| |
Collapse
|
66
|
Schaduangrat N, Anuwongcharoen N, Moni MA, Lio' P, Charoenkwan P, Shoombuatong W. StackPR is a new computational approach for large-scale identification of progesterone receptor antagonists using the stacking strategy. Sci Rep 2022; 12:16435. [PMID: 36180453 PMCID: PMC9525257 DOI: 10.1038/s41598-022-20143-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 09/09/2022] [Indexed: 11/24/2022] Open
Abstract
Progesterone receptors (PRs) are implicated in various cancers since their presence/absence can determine clinical outcomes. The overstimulation of progesterone can facilitate oncogenesis and thus, its modulation through PR inhibition is urgently needed. To address this issue, a novel stacked ensemble learning approach (termed StackPR) is presented for fast, accurate, and large-scale identification of PR antagonists using only SMILES notation without the need for 3D structural information. We employed six popular machine learning (ML) algorithms (i.e., logistic regression, partial least squares, k-nearest neighbor, support vector machine, extremely randomized trees, and random forest) coupled with twelve conventional molecular descriptors to create 72 baseline models. Then, a genetic algorithm in conjunction with the self-assessment-report approach was utilized to determine m out of the 72 baseline models as means of developing the final meta-predictor using the stacking strategy and tenfold cross-validation test. Experimental results on the independent test dataset show that StackPR achieved impressive predictive performance with an accuracy of 0.966 and Matthew's coefficient correlation of 0.925. In addition, analysis based on the SHapley Additive exPlanation algorithm and molecular docking indicates that aliphatic hydrocarbons and nitrogen-containing substructures were the most important features for having PR antagonist activity. Finally, we implemented an online webserver using StackPR, which is freely accessible at http://pmlabstack.pythonanywhere.com/StackPR . StackPR is anticipated to be a powerful computational tool for the large-scale identification of unknown PR antagonist candidates for follow-up experimental validation.
Collapse
Affiliation(s)
- Nalini Schaduangrat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Nuttapat Anuwongcharoen
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Mohammad Ali Moni
- Artificial Intelligence & Digital Health Data Science, School of Health and Rehabilitation Sciences, Faculty of Health and Behavioural Sciences, The University of Queensland, St Lucia, QLD, 4072, Australia
| | - Pietro Lio'
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK
| | - Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, 50200, Thailand.
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| |
Collapse
|
67
|
Charoenkwan P, Schaduangrat N, Lio’ P, Moni MA, Shoombuatong W, Manavalan B. Computational prediction and interpretation of druggable proteins using a stacked ensemble-learning framework. iScience 2022; 25:104883. [PMID: 36046193 PMCID: PMC9421381 DOI: 10.1016/j.isci.2022.104883] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2022] [Revised: 07/08/2022] [Accepted: 08/02/2022] [Indexed: 11/22/2022] Open
Abstract
Discovery of potential drugs requires rapid and precise identification of drug targets. Although traditional experimental methodologies can accurately identify drug targets, they are time-consuming and inappropriate for high-throughput screening. Computational approaches based on machine learning (ML) algorithms can expedite the prediction of druggable proteins; however, the performance of the existing computational methods remains unsatisfactory. This study proposes a computational tool, SPIDER, to enhance the accurate prediction of druggable proteins. SPIDER employs various feature descriptors pertaining to several aspects, including physicochemical properties, compositional information, and composition-transition-distribution information, coupled with well-known ML algorithms to facilitate the construction of the final meta-predictor. The experimental results showed that SPIDER enabled more precise and robust prediction of druggable proteins than the baseline models and current existing methods in terms of the independent test dataset. An online web server was established and made freely available online.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai 50200, Thailand
| | - Nalini Schaduangrat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| | - Pietro Lio’
- Department of Computer Science and Technology, University of Cambridge, Cambridge CB3 0FD, UK
| | - Mohammad Ali Moni
- Artificial Intelligence & Digital Health, School of Health and Rehabilitation Sciences, Faculty of Health and Behavioural Sciences, The University of Queensland, St Lucia, QLD 4072, Australia
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| | - Balachandran Manavalan
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea
| |
Collapse
|
68
|
Zhu L, Ye C, Hu X, Yang S, Zhu C. ACP-check: An anticancer peptide prediction model based on bidirectional long short-term memory and multi-features fusion strategy. Comput Biol Med 2022; 148:105868. [PMID: 35868046 DOI: 10.1016/j.compbiomed.2022.105868] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Revised: 06/14/2022] [Accepted: 07/09/2022] [Indexed: 11/16/2022]
Abstract
The anticancer peptide is an emerging anticancer drug that has become an effective alternative to chemotherapy and targeted therapy due to fewer side effects and resistance. The traditional biological experimental method for identifying anticancer peptides is a time-consuming and complicated process that hinders large-scale, rapid, and effective identification. In this paper, we propose a model based on a bidirectional long short-term memory network and multi-features fusion, called ACP-check, which employs a bidirectional long short-term memory network to extract time-dependent information features from peptide sequences, and combines them with amino acid sequence features including binary profile feature, dipeptide composition, the composition of k-spaced amino acid group pairs, amino acid composition, and sequence-order-coupling number. To verify the performance of the model, six benchmark datasets are selected, including ACPred-Fuse, ACPred-FL, ACP240, ACP740, main and alternate datasets of AntiCP2.0. In terms of Matthews correlation coefficients, ACP-check obtains 0.37, 0.82, 0.80, 0.75, 0.56, and 0.86 on six datasets respectively, which is an improvement by 2%-86% than existing state-of-the-art anticancer peptides prediction methods. Furthermore, ACP-check achieves prediction accuracy with 0.91, 0.91, 0.90, 0.87, 0.78, and 0.93 respectively, which increases range from 1%-49%. Overall, the comparison experiment shows that ACP-check can accurately identify anticancer peptides by sequence-level information. The code and data are available at http://www.cczubio.top/ACP-check/.
Collapse
Affiliation(s)
- Lun Zhu
- School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou, 213164, China
| | - Chenyang Ye
- School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou, 213164, China
| | - Xuemei Hu
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, 130012, China
| | - Sen Yang
- School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou, 213164, China; Changzhou No.2 People's Hospital, the Affiliated Hospital of Nanjing Medical University, Changzhou, 213164, China.
| | - Chenyang Zhu
- School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou, 213164, China
| |
Collapse
|
69
|
Zou H, Yang F, Yin Z. Integrating multiple sequence features for identifying anticancer peptides. Comput Biol Chem 2022; 99:107711. [DOI: 10.1016/j.compbiolchem.2022.107711] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Revised: 05/16/2022] [Accepted: 05/29/2022] [Indexed: 11/03/2022]
|
70
|
Liang Y, Wu Y, Zhang Z, Liu N, Peng J, Tang J. Hyb4mC: a hybrid DNA2vec-based model for DNA N4-methylcytosine sites prediction. BMC Bioinformatics 2022; 23:258. [PMID: 35768759 PMCID: PMC9241225 DOI: 10.1186/s12859-022-04789-6] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Accepted: 06/10/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND DNA N4-methylcytosine is part of the restrictive modification system, which works by regulating some biological processes, for example, the initiation of DNA replication, mismatch repair and inactivation of transposon. However, using experimental methods to detect 4mC sites is time-consuming and expensive. Besides, considering the huge differences in the number of 4mC samples among different species, it is challenging to achieve a robust multi-species 4mC site prediction performance. Hence, it is of great significance to develop effective computational tools to identify 4mC sites. RESULTS This work proposes a flexible deep learning-based framework to predict 4mC sites, called Hyb4mC. Hyb4mC adopts the DNA2vec method for sequence embedding, which captures more efficient and comprehensive information compared with the sequence-based feature method. Then, two different subnets are used for further analysis: Hyb_Caps and Hyb_Conv. Hyb_Caps is composed of a capsule neural network and can generalize from fewer samples. Hyb_Conv combines the attention mechanism with a text convolutional neural network for further feature learning. CONCLUSIONS Extensive benchmark tests have shown that Hyb4mC can significantly enhance the performance of predicting 4mC sites compared with the recently proposed methods.
Collapse
Affiliation(s)
- Ying Liang
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, China.
| | - Yanan Wu
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, China
| | - Zequn Zhang
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, China
| | - Niannian Liu
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, China
| | - Jun Peng
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, China
| | - Jianjun Tang
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, China
| |
Collapse
|
71
|
Chen X, Huang J, He B. AntiDMPpred: a web service for identifying anti-diabetic peptides. PeerJ 2022; 10:e13581. [PMID: 35722269 PMCID: PMC9205309 DOI: 10.7717/peerj.13581] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Accepted: 05/23/2022] [Indexed: 01/17/2023] Open
Abstract
Diabetes mellitus (DM) is a chronic metabolic disease that has been a major threat to human health globally, causing great economic and social adversities. The oral administration of anti-diabetic peptide drugs has become a novel route for diabetes therapy. Numerous bioactive peptides have demonstrated potential anti-diabetic properties and are promising as alternative treatment measures to prevent and manage diabetes. The computational prediction of anti-diabetic peptides can help promote peptide-based drug discovery in the process of searching newly effective therapeutic peptide agents for diabetes treatment. Here, we resorted to random forest to develop a computational model, named AntiDMPpred, for predicting anti-diabetic peptides. A benchmark dataset with 236 anti-diabetic and 236 non-anti-diabetic peptides was first constructed. Four types of sequence-derived descriptors were used to represent the peptide sequences. We then combined four machine learning methods and six feature scoring methods to select the non-redundant features, which were fed into diverse machine learning classifiers to train the models. Experimental results show that AntiDMPpred reached an accuracy of 77.12% and area under the receiver operating curve (AUCROC) of 0.8193 in the nested five-fold cross-validation, yielding a satisfactory performance and surpassing other classifiers implemented in the study. The web service is freely accessible at http://i.uestc.edu.cn/AntiDMPpred/cgi-bin/AntiDMPpred.pl. We hope AntiDMPpred could improve the discovery of anti-diabetic bioactive peptides.
Collapse
Affiliation(s)
- Xue Chen
- Medical College, Guizhou University, Guiyang, China
| | - Jian Huang
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Bifang He
- Medical College, Guizhou University, Guiyang, China
| |
Collapse
|
72
|
Charoenkwan P, Schaduangrat N, Lio' P, Moni MA, Manavalan B, Shoombuatong W. NEPTUNE: A novel computational approach for accurate and large-scale identification of tumor homing peptides. Comput Biol Med 2022; 148:105700. [PMID: 35715261 DOI: 10.1016/j.compbiomed.2022.105700] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Revised: 05/31/2022] [Accepted: 06/04/2022] [Indexed: 11/16/2022]
Abstract
Tumor homing peptides (THPs) play a crucial role in recognizing and specifically binding to cancer cells. Although experimental approaches can facilitate the precise identification of THPs, they are usually time-consuming, labor-intensive, and not cost-effective. However, computational approaches can identify THPs by utilizing sequence information alone, thus highlighting their great potential for large-scale identification of THPs. Herein, we propose NEPTUNE, a novel computational approach for the accurate and large-scale identification of THPs from sequence information. Specifically, we constructed variant baseline models from multiple feature encoding schemes coupled with six popular machine learning algorithms. Subsequently, we comprehensively assessed and investigated the effects of these baseline models on THP prediction. Finally, the probabilistic information generated by the optimal baseline models is fed into a support vector machine-based classifier to construct the final meta-predictor (NEPTUNE). Cross-validation and independent tests demonstrated that NEPTUNE achieved superior performance for THP prediction compared with its constituent baseline models and the existing methods. Moreover, we employed the powerful SHapley additive exPlanations method to improve the interpretation of NEPTUNE and elucidate the most important features for identifying THPs. Finally, we implemented an online web server using NEPTUNE, which is available at http://pmlabstack.pythonanywhere.com/NEPTUNE. NEPTUNE could be beneficial for the large-scale identification of unknown THP candidates for follow-up experimental validation.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, 50200, Thailand
| | - Nalini Schaduangrat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Pietro Lio'
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK
| | - Mohammad Ali Moni
- Artificial Intelligence & Digital Health, School of Health and Rehabilitation Sciences, Faculty of Health and Behavioural Sciences, The University of Queensland St Lucia, QLD, 4072, Australia
| | - Balachandran Manavalan
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Gyeonggi-do, Republic of Korea.
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| |
Collapse
|
73
|
Li Y, Li X, Liu Y, Yao Y, Huang G. MPMABP: A CNN and Bi-LSTM-Based Method for Predicting Multi-Activities of Bioactive Peptides. Pharmaceuticals (Basel) 2022; 15:707. [PMID: 35745625 PMCID: PMC9231127 DOI: 10.3390/ph15060707] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Revised: 05/23/2022] [Accepted: 05/30/2022] [Indexed: 12/30/2022] Open
Abstract
Bioactive peptides are typically small functional peptides with 2-20 amino acid residues and play versatile roles in metabolic and biological processes. Bioactive peptides are multi-functional, so it is vastly challenging to accurately detect all their functions simultaneously. We proposed a convolution neural network (CNN) and bi-directional long short-term memory (Bi-LSTM)-based deep learning method (called MPMABP) for recognizing multi-activities of bioactive peptides. The MPMABP stacked five CNNs at different scales, and used the residual network to preserve the information from loss. The empirical results showed that the MPMABP is superior to the state-of-the-art methods. Analysis on the distribution of amino acids indicated that the lysine preferred to appear in the anti-cancer peptide, the leucine in the anti-diabetic peptide, and the proline in the anti-hypertensive peptide. The method and analysis are beneficial to recognize multi-activities of bioactive peptides.
Collapse
Affiliation(s)
- You Li
- School of Electrical Engineering, Shaoyang University, Shaoyang 422000, China; (Y.L.); (X.L.)
| | - Xueyong Li
- School of Electrical Engineering, Shaoyang University, Shaoyang 422000, China; (Y.L.); (X.L.)
| | - Yuewu Liu
- College of Information and Intelligence, Hunan Agricultural University, Changsha 410128, China;
| | - Yuhua Yao
- School of Mathematics and Statistics, Hainan Normal University, Haikou 571158, China;
| | - Guohua Huang
- School of Electrical Engineering, Shaoyang University, Shaoyang 422000, China; (Y.L.); (X.L.)
| |
Collapse
|
74
|
Charoenkwan P, Schaduangrat N, Hasan MM, Moni MA, Lió P, Shoombuatong W. Empirical comparison and analysis of machine learning-based predictors for predicting and analyzing of thermophilic proteins. EXCLI JOURNAL 2022; 21:554-570. [PMID: 35651661 PMCID: PMC9150013 DOI: 10.17179/excli2022-4723] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Accepted: 02/21/2022] [Indexed: 12/15/2022]
Abstract
Thermophilic proteins (TPPs) are critical for basic research and in the food industry due to their ability to maintain a thermodynamically stable fold at extremely high temperatures. Thus, the expeditious identification of novel TPPs through computational models from protein sequences is very desirable. Over the last few decades, a number of computational methods, especially machine learning (ML)-based methods, for in silico prediction of TPPs have been developed. Therefore, it is desirable to revisit these methods and summarize their advantages and disadvantages in order to further develop new computational approaches to achieve more accurate and improved prediction of TPPs. With this goal in mind, we comprehensively investigate a large collection of fourteen state-of-the-art TPP predictors in terms of their dataset size, feature encoding schemes, feature selection strategies, ML algorithms, evaluation strategies and web server/software usability. To the best of our knowledge, this article represents the first comprehensive review on the development of ML-based methods for in silico prediction of TPPs. Among these TPP predictors, they can be classified into two groups according to the interpretability of ML algorithms employed (i.e., computational black-box methods and computational white-box methods). In order to perform the comparative analysis, we conducted a comparative study on several currently available TPP predictors based on two benchmark datasets. Finally, we provide future perspectives for the design and development of new computational models for TPP prediction. We hope that this comprehensive review will facilitate researchers in selecting an appropriate TPP predictor that is the most suitable one to deal with their purposes and provide useful perspectives for the development of more effective and accurate TPP predictors.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, Thailand, 50200
| | - Nalini Schaduangrat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand, 10700
| | - Md Mehedi Hasan
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA 70112, USA
| | - Mohammad Ali Moni
- School of Health and Rehabilitation Sciences, Faculty of Health and Behavioural Sciences, the University of Queensland, St Lucia, QLD 4072, Australia
| | - Pietro Lió
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand, 10700
| |
Collapse
|
75
|
Feng G, Yao H, Li C, Liu R, Huang R, Fan X, Ge R, Miao Q. ME-ACP: Multi-view neural networks with ensemble model for identification of anticancer peptides. Comput Biol Med 2022; 145:105459. [DOI: 10.1016/j.compbiomed.2022.105459] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2022] [Revised: 03/22/2022] [Accepted: 03/24/2022] [Indexed: 12/26/2022]
|
76
|
Yan K, Lv H, Guo Y, Chen Y, Wu H, Liu B. TPpred-ATMV: therapeutic peptide prediction by adaptive multi-view tensor learning model. Bioinformatics 2022; 38:2712-2718. [PMID: 35561206 DOI: 10.1093/bioinformatics/btac200] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2022] [Revised: 03/17/2022] [Accepted: 04/06/2022] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Therapeutic peptide prediction is important for the discovery of efficient therapeutic peptides and drug development. Researchers have developed several computational methods to identify different therapeutic peptide types. However, these computational methods focus on identifying some specific types of therapeutic peptides, failing to predict the comprehensive types of therapeutic peptides. Moreover, it is still challenging to utilize different properties to predict the therapeutic peptides. RESULTS In this study, an adaptive multi-view based on the tensor learning framework TPpred-ATMV is proposed for predicting different types of therapeutic peptides. TPpred-ATMV constructs the class and probability information based on various sequence features. We constructed the latent subspace among the multi-view features and constructed an auto-weighted multi-view tensor learning model to utilize the high correlation based on the multi-view features. Experimental results showed that the TPpred-ATMV is better than or highly comparable with the other state-of-the-art methods for predicting eight types of therapeutic peptides. AVAILABILITY AND IMPLEMENTATION The code of TPpred-ATMV is accessed at: https://github.com/cokeyk/TPpred-ATMV. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ke Yan
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Hongwu Lv
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Yichen Guo
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Yongyong Chen
- Bio-Computing Research Center, Harbin Institute of Technology, Shenzhen 518055, China
| | - Hao Wu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
- Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing 100081, China
| |
Collapse
|
77
|
Charoenkwan P, Ahmed S, Nantasenamat C, Quinn JMW, Moni MA, Lio' P, Shoombuatong W. AMYPred-FRL is a novel approach for accurate prediction of amyloid proteins by using feature representation learning. Sci Rep 2022; 12:7697. [PMID: 35546347 PMCID: PMC9095707 DOI: 10.1038/s41598-022-11897-z] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2021] [Accepted: 05/03/2022] [Indexed: 12/13/2022] Open
Abstract
Amyloid proteins have the ability to form insoluble fibril aggregates that have important pathogenic effects in many tissues. Such amyloidoses are prominently associated with common diseases such as type 2 diabetes, Alzheimer's disease, and Parkinson's disease. There are many types of amyloid proteins, and some proteins that form amyloid aggregates when in a misfolded state. It is difficult to identify such amyloid proteins and their pathogenic properties, but a new and effective approach is by developing effective bioinformatics tools. While several machine learning (ML)-based models for in silico identification of amyloid proteins have been proposed, their predictive performance is limited. In this study, we present AMYPred-FRL, a novel meta-predictor that uses a feature representation learning approach to achieve more accurate amyloid protein identification. AMYPred-FRL combined six well-known ML algorithms (extremely randomized tree, extreme gradient boosting, k-nearest neighbor, logistic regression, random forest, and support vector machine) with ten different sequence-based feature descriptors to generate 60 probabilistic features (PFs), as opposed to state-of-the-art methods developed by a single feature-based approach. A logistic regression recursive feature elimination (LR-RFE) method was used to find the optimal m number of 60 PFs in order to improve the predictive performance. Finally, using the meta-predictor approach, the 20 selected PFs were fed into a logistic regression method to create the final hybrid model (AMYPred-FRL). Both cross-validation and independent tests showed that AMYPred-FRL achieved superior predictive performance than its constituent baseline models. In an extensive independent test, AMYPred-FRL outperformed the existing methods by 5.5% and 16.1%, respectively, with accuracy and MCC of 0.873 and 0.710. To expedite high-throughput prediction, a user-friendly web server of AMYPred-FRL is freely available at http://pmlabstack.pythonanywhere.com/AMYPred-FRL. It is anticipated that AMYPred-FRL will be a useful tool in helping researchers to identify new amyloid proteins.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, 50200, Thailand
| | - Saeed Ahmed
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Chanin Nantasenamat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Julian M W Quinn
- Bone Biology Division, Garvan Institute of Medical Research, 384 Victoria Street, Darlinghurst, NSW, 2010, Australia
| | - Mohammad Ali Moni
- Artificial Intelligence and Digital Health Data Science, School of Health and Rehabilitation Sciences, Faculty of Health and Behavioural Sciences, The University of Queensland, St Lucia, QLD, 4072, Australia
| | - Pietro Lio'
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| |
Collapse
|
78
|
Multi-channel CNN based anticancer peptides identification. Anal Biochem 2022; 650:114707. [PMID: 35568159 DOI: 10.1016/j.ab.2022.114707] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Revised: 01/27/2022] [Accepted: 04/27/2022] [Indexed: 11/20/2022]
Abstract
Cancer is one of the most dangerous diseases in the world that often leads to misery and death. Current treatments include different kinds of anticancer therapy which exhibit different types of side effects. Because of certain physicochemical properties, anticancer peptides (ACPs) have opened a new path of treatments for this deadly disease. That is why a well-performed methodology for identifying novel anticancer peptides has great importance in the fight against cancer. In addition to the laboratory techniques, various machine learning and deep learning methodologies have developed in recent years for this task. Although these models have shown reasonable predictive ability, there's still room for improvement in terms of performance and exploring new types of algorithms. In this work, we have proposed a novel multi-channel convolutional neural network (CNN) for identifying anticancer peptides from protein sequences. We have collected data from the existing state-of-the-art methodologies and applied binary encoding for data preprocessing. We have also employed k-fold cross-validation to train our models on benchmark datasets and compared our models' performance on the independent datasets. The comparison has indicated our models' superiority on various evaluation metrics. We think our work can be a valuable asset in finding novel anticancer peptides. We have provided a user-friendly web server for academic purposes and it is publicly available at: \texttt{http://103.99.176.239/iacp-cnn/}.
Collapse
|
79
|
Chen Q, Yang C, Xie Y, Wang Y, Li X, Wang K, Huang J, Yan W. GM-Pep: A High Efficiency Strategy to De Novo Design Functional Peptide Sequences. J Chem Inf Model 2022; 62:2617-2629. [PMID: 35533298 DOI: 10.1021/acs.jcim.2c00089] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Although peptides are regarded as ideal therapeutic agents, only a small proportion of the marketed drugs are peptides. In the past decade, pharmacists have paid great attention to the development of peptide therapeutics. Except a few approved chemically/rationally designed peptides, most attempts failed due to unsatisfactory efficacy or safety. Luckily, computation methods, such as artificial intelligence, have been utilized to accelerate the discovery of therapeutic peptides by predicting the activity, toxicity, and absorption, distribution, metabolism, and excretion of polypeptides. Usually, a specific biological activity of a peptide could be accurately determined by an interest-oriented binary classification constructed of a positive set and another un-experimentally validated negative set regardless of other characteristics, which suggests that it could be challenging to realize the comprehensive evaluation of the research object in the early stage of drug research and development. Herein, we proposed an integrated method (GM-Pep) that contained a conditional variational autoencoder model (CVAE) and a positive sample training multiclassifier (Deep-Multiclassifier) to effectively generate a single bioactive peptide sequence without toxicity and referential side effects. The results showed that our Deep-Multiclassifier model gave a sequence accuracy of up to 96.41% [toxicity (94.48%), antifungal (96.58%), antihypertensive (97.18%), and antibacterial (96.91%), respectively]. The properties of Deep-Multiclassifier and CVAE were validated through 12 first synthesized antibacterial peptides or compared to random peptides. The source code and data sets are available at https://github.com/TimothyChen225/GM-Pep.
Collapse
Affiliation(s)
- Qushuo Chen
- The Institute of Pharmacology, Key Laboratory of Preclinical Study for New Drugs of Gansu Province, School of Basic Medical Sciences, Lanzhou University, Lanzhou, Gansu 730000, China
| | - Changyan Yang
- The Institute of Pharmacology, Key Laboratory of Preclinical Study for New Drugs of Gansu Province, School of Basic Medical Sciences, Lanzhou University, Lanzhou, Gansu 730000, China
| | - Yihao Xie
- The Institute of Pharmacology, Key Laboratory of Preclinical Study for New Drugs of Gansu Province, School of Basic Medical Sciences, Lanzhou University, Lanzhou, Gansu 730000, China
| | - Yuqiang Wang
- School of Stomatology, Lanzhou University,Lanzhou, Gansu 730000, China
| | - Xiaoxu Li
- School of Computer and Communication, Lanzhou University of Technology, Lanzhou, Gansu 730050, China
| | - Kairong Wang
- The Institute of Pharmacology, Key Laboratory of Preclinical Study for New Drugs of Gansu Province, School of Basic Medical Sciences, Lanzhou University, Lanzhou, Gansu 730000, China
| | - Jinqi Huang
- Department of Hematology, Affiliated Hospital of Guangdong Medical University, Zhanjiang, Guangdong 524000, China
| | - Wenjin Yan
- The Institute of Pharmacology, Key Laboratory of Preclinical Study for New Drugs of Gansu Province, School of Basic Medical Sciences, Lanzhou University, Lanzhou, Gansu 730000, China
| |
Collapse
|
80
|
Ali H, Unar A, Zubair M, Dil S, Ullah F, Khan I, Hussain A, Shi Q. In silico analysis of a novel pathogenic variant c.7G > A in C14orf39 gene identified by WES in a Pakistani family with azoospermia. Mol Genet Genomics 2022; 297:719-730. [PMID: 35305148 DOI: 10.1007/s00438-022-01876-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Accepted: 02/20/2022] [Indexed: 11/25/2022]
Abstract
Infertility is a multifactorial disorder that affects approximately 12% of couples of childbearing ages worldwide. Few studies have been conducted to understand the genetic causes of infertility in depth. The synaptonemal complex (SC), which is essential for the progression of meiosis, is a conserved tripartite structure that binds homologous chromosomes together and is thus required for fertility. This study investigated genetic causes of infertility in a Pakistani consanguineous family containing two patients suffering from non-obstructive azoospermia (NOA). We performed whole-exome sequencing, followed by Sanger sequencing, and identified a novel pathogenic variant (c.7G > A [p.D3N]) in the SC coding gene C14orf39, which was recessively co-segregated with NOA. In silico analysis revealed that charges on wild-type residues were lost, which may result in loss of interactions with other molecules and residues, and a reduction in protein stability occurred, which was caused by the p.D3N mutation. The novel variant generated the mutant protein C14ORF39D3N, and homozygous mutations in C14orf39 resulted in NOA. The transcriptome profile of C14ORF39 shows that it is specifically expressed in early brain development, which suggests that research in this area is required to study other functions of C14ORF39 in addition to its role in the germline. This research highlights the conserved role of C14orf39/SIX6OS1 in assembly of the SC and its indispensable role in facilitating genetic diagnosis in patients with infertility, which may enable the development of future treatments.
Collapse
Affiliation(s)
- Haider Ali
- First Affiliated Hospital of USTC, Hefei National Laboratory for Physical Sciences at Microscale, The CAS Key Laboratory of Innate Immunity and Chronic Disease, School of Basic Medical Sciences, Division of Life Sciences and Medicine, CAS Center for Excellence in Molecular Cell Science, Collaborative Innovation Center of Genetics and Development, University of Science and Technology of China, Hefei, 230027, China
| | - Ahsanullah Unar
- First Affiliated Hospital of USTC, Hefei National Laboratory for Physical Sciences at Microscale, The CAS Key Laboratory of Innate Immunity and Chronic Disease, School of Basic Medical Sciences, Division of Life Sciences and Medicine, CAS Center for Excellence in Molecular Cell Science, Collaborative Innovation Center of Genetics and Development, University of Science and Technology of China, Hefei, 230027, China
| | - Muhammad Zubair
- First Affiliated Hospital of USTC, Hefei National Laboratory for Physical Sciences at Microscale, The CAS Key Laboratory of Innate Immunity and Chronic Disease, School of Basic Medical Sciences, Division of Life Sciences and Medicine, CAS Center for Excellence in Molecular Cell Science, Collaborative Innovation Center of Genetics and Development, University of Science and Technology of China, Hefei, 230027, China
| | - Sobia Dil
- First Affiliated Hospital of USTC, Hefei National Laboratory for Physical Sciences at Microscale, The CAS Key Laboratory of Innate Immunity and Chronic Disease, School of Basic Medical Sciences, Division of Life Sciences and Medicine, CAS Center for Excellence in Molecular Cell Science, Collaborative Innovation Center of Genetics and Development, University of Science and Technology of China, Hefei, 230027, China
| | - Farman Ullah
- Center of Biotechnology and Microbiology, University of Swat, Swat, 19120, Pakistan
| | - Ihsan Khan
- First Affiliated Hospital of USTC, Hefei National Laboratory for Physical Sciences at Microscale, The CAS Key Laboratory of Innate Immunity and Chronic Disease, School of Basic Medical Sciences, Division of Life Sciences and Medicine, CAS Center for Excellence in Molecular Cell Science, Collaborative Innovation Center of Genetics and Development, University of Science and Technology of China, Hefei, 230027, China
| | - Ansar Hussain
- First Affiliated Hospital of USTC, Hefei National Laboratory for Physical Sciences at Microscale, The CAS Key Laboratory of Innate Immunity and Chronic Disease, School of Basic Medical Sciences, Division of Life Sciences and Medicine, CAS Center for Excellence in Molecular Cell Science, Collaborative Innovation Center of Genetics and Development, University of Science and Technology of China, Hefei, 230027, China
| | - Qinghua Shi
- First Affiliated Hospital of USTC, Hefei National Laboratory for Physical Sciences at Microscale, The CAS Key Laboratory of Innate Immunity and Chronic Disease, School of Basic Medical Sciences, Division of Life Sciences and Medicine, CAS Center for Excellence in Molecular Cell Science, Collaborative Innovation Center of Genetics and Development, University of Science and Technology of China, Hefei, 230027, China.
| |
Collapse
|
81
|
Wu X, Zeng W, Lin F, Xu P, Li X. Anticancer Peptide Prediction via Multi-Kernel CNN and Attention Model. Front Genet 2022; 13:887894. [PMID: 35571059 PMCID: PMC9092594 DOI: 10.3389/fgene.2022.887894] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Accepted: 03/25/2022] [Indexed: 11/13/2022] Open
Abstract
Background: Modern lifestyles mean that people are more likely to suffer from some form of cancer. As anticancer peptides can effectively kill cancer cells and play an important role in fighting cancer, they have been a subject of increasing research interest. Methods: This study presents a useful tool to identify the anticancer peptides based on a multi-kernel CNN and attention model, called ACP-MCAM. This model can automatically learn adaptive embedding and the context sequence features of ACP. In addition, to obtain better interpretability and integrity, we visualized the model. Results: Benchmarking comparison shows that ACP-MCAM significantly outperforms several state-of-the-art models. Different encoding schemes have different impacts on the performance of the model. We also studied tmethod parameter optimization. Conclusion: The ACP-MCAM can integrate multi-kernel CNN and self-attention mechanism, which outperforms the previous model in identifying anticancer peptides. It is expected that the work will provide new research ideas for anticancer peptide prediction in the future. In addition, this work will promote the development of the interdisciplinary field of artificial intelligence and biomedicine.
Collapse
Affiliation(s)
- Xiujin Wu
- School of Informatics, Xiamen University, Xiamen, China
| | - Wenhua Zeng
- School of Informatics, Xiamen University, Xiamen, China
| | - Fan Lin
- School of Informatics, Xiamen University, Xiamen, China
- Boston Children’s Hospital, Boston, MA, United States
| | - Peng Xu
- Chongqing Michong Technology Co., Ltd., Chongqing, China
| | | |
Collapse
|
82
|
Jiao S, Chen Z, Zhang L, Zhou X, Shi L. ATGPred-FL: sequence-based prediction of autophagy proteins with feature representation learning. Amino Acids 2022; 54:799-809. [PMID: 35286461 DOI: 10.1007/s00726-022-03145-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2021] [Accepted: 01/28/2022] [Indexed: 11/26/2022]
Abstract
Autophagy plays an important role in biological evolution and is regulated by many autophagy proteins. Accurate identification of autophagy proteins is crucially important to reveal their biological functions. Due to the expense and labor cost of experimental methods, it is urgent to develop automated, accurate and reliable sequence-based computational tools to enable the identification of novel autophagy proteins among numerous proteins and peptides. For this purpose, a new predictor named ATGPred-FL was proposed for the efficient identification of autophagy proteins. We investigated various sequence-based feature descriptors and adopted the feature learning method to generate corresponding, more informative probability features. Then, a two-step feature selection strategy based on accuracy was utilized to remove irrelevant and redundant features, leading to the most discriminative 14-dimensional feature set. The final predictor was built using a support vector machine classifier, which performed favorably on both the training and testing sets with accuracy values of 94.40% and 90.50%, respectively. ATGPred-FL is the first ATG machine learning predictor based on protein primary sequences. We envision that ATGPred-FL will be an effective and useful tool for autophagy protein identification, and it is available for free at http://lab.malab.cn/~acy/ATGPred-FL , the source code and datasets are accessible at https://github.com/jiaoshihu/ATGPred .
Collapse
Affiliation(s)
- Shihu Jiao
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Zheng Chen
- School of Applied Chemistry and Biological Technology, Shenzhen Polytechnic, 7098 Liuxian Street, Shenzhen, 518055, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, No.4 Block 2 North Jianshe Road, Chengdu, 61005, China
| | - Lichao Zhang
- School of Intelligent Manufacturing and Equipment, Shenzhen Institute of Information Technology, Shenzhen, 518172, China
| | - Xun Zhou
- Beidahuang Industry Group General Hospital, Harbin, 150001, China.
| | - Lei Shi
- Department of Spine Surgery, Changzheng Hospital, Naval Medical University, No 415, Fengyang Road, Huangpu District, Shanghai, 210000, China.
| |
Collapse
|
83
|
Ahmad S, Charoenkwan P, Quinn JMW, Moni MA, Hasan MM, Lio' P, Shoombuatong W. SCORPION is a stacking-based ensemble learning framework for accurate prediction of phage virion proteins. Sci Rep 2022; 12:4106. [PMID: 35260777 PMCID: PMC8904530 DOI: 10.1038/s41598-022-08173-5] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Accepted: 03/03/2022] [Indexed: 12/30/2022] Open
Abstract
Fast and accurate identification of phage virion proteins (PVPs) would greatly aid facilitation of antibacterial drug discovery and development. Although, several research efforts based on machine learning (ML) methods have been made for in silico identification of PVPs, these methods have certain limitations. Therefore, in this study, we propose a new computational approach, termed SCORPION, (StaCking-based Predictior fOR Phage VIrion PrOteiNs), to accurately identify PVPs using only protein primary sequences. Specifically, we explored comprehensive 13 different feature descriptors from different aspects (i.e., compositional information, composition-transition-distribution information, position-specific information and physicochemical properties) with 10 popular ML algorithms to construct a pool of optimal baseline models. These optimal baseline models were then used to generate probabilistic features (PFs) and considered as a new feature vector. Finally, we utilized a two-step feature selection strategy to determine the optimal PF feature vector and used this feature vector to develop a stacked model (SCORPION). Both tenfold cross-validation and independent test results indicate that SCORPION achieves superior predictive performance than its constitute baseline models and existing methods. We anticipate SCORPION will serve as a useful tool for the cost-effective and large-scale screening of new PVPs. The source codes and datasets for this work are available for downloading in the GitHub repository (https://github.com/saeed344/SCORPION).
Collapse
Affiliation(s)
- Saeed Ahmad
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, 50200, Thailand
| | - Julian M W Quinn
- Bone Biology Division, Garvan Institute of Medical Research, 384 Victoria Street, Darlinghurst, NSW, 2010, Australia
| | - Mohammad Ali Moni
- Faculty of Health and Behavioural Sciences, School of Health and Rehabilitation Sciences, The University of Queensland, St Lucia, QLD, 4072, Australia
| | - Md Mehedi Hasan
- Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane Center for Biomedical Informatics and Genomics, Tulane University, New Orleans, LA, 70112, USA
| | - Pietro Lio'
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| |
Collapse
|
84
|
Wei PJ, Pang ZZ, Jiang LJ, Tan D, Su Y, Zheng CH. Promoter Prediction in Nannochloropsis Based on Densely Connected Convolutional Neural Networks. Methods 2022; 204:38-46. [DOI: 10.1016/j.ymeth.2022.03.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Revised: 03/03/2022] [Accepted: 03/28/2022] [Indexed: 10/18/2022] Open
|
85
|
ACPNet: A Deep Learning Network to Identify Anticancer Peptides by Hybrid Sequence Information. Molecules 2022; 27:molecules27051544. [PMID: 35268644 PMCID: PMC8912097 DOI: 10.3390/molecules27051544] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Revised: 02/20/2022] [Accepted: 02/23/2022] [Indexed: 12/18/2022] Open
Abstract
Cancer is one of the most dangerous threats to human health. One of the issues is drug resistance action, which leads to side effects after drug treatment. Numerous therapies have endeavored to relieve the drug resistance action. Recently, anticancer peptides could be a novel and promising anticancer candidate, which can inhibit tumor cell proliferation, migration, and suppress the formation of tumor blood vessels, with fewer side effects. However, it is costly, laborious and time consuming to identify anticancer peptides by biological experiments with a high throughput. Therefore, accurately identifying anti-cancer peptides becomes a key and indispensable step for anticancer peptides therapy. Although some existing computer methods have been developed to predict anticancer peptides, the accuracy still needs to be improved. Thus, in this study, we propose a deep learning-based model, called ACPNet, to distinguish anticancer peptides from non-anticancer peptides (non-ACPs). ACPNet employs three different types of peptide sequence information, peptide physicochemical properties and auto-encoding features linking the training process. ACPNet is a hybrid deep learning network, which fuses fully connected networks and recurrent neural networks. The comparison with other existing methods on ACPs82 datasets shows that ACPNet not only achieves the improvement of 1.2% Accuracy, 2.0% F1-score, and 7.2% Recall, but also gets balanced performance on the Matthews correlation coefficient. Meanwhile, ACPNet is verified on an independent dataset, with 20 proven anticancer peptides, and only one anticancer peptide is predicted as non-ACPs. The comparison and independent validation experiment indicate that ACPNet can accurately distinguish anticancer peptides from non-ACPs.
Collapse
|
86
|
Nguyen L, Nguyen Vo TH, Trinh QH, Nguyen BH, Nguyen-Hoang PU, Le L, Nguyen BP. iANP-EC: Identifying Anticancer Natural Products Using Ensemble Learning Incorporated with Evolutionary Computation. J Chem Inf Model 2022; 62:5080-5089. [PMID: 35157472 DOI: 10.1021/acs.jcim.1c00920] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Cancer is one of the most deadly diseases that annually kills millions of people worldwide. The investigation on anticancer medicines has never ceased to seek better and more adaptive agents with fewer side effects. Besides chemically synthetic anticancer compounds, natural products are scientifically proved as a highly potential alternative source for anticancer drug discovery. Along with experimental approaches being used to find anticancer drug candidates, computational approaches have been developed to virtually screen for potential anticancer compounds. In this study, we construct an ensemble computational framework, called iANP-EC, using machine learning approaches incorporated with evolutionary computation. Four learning algorithms (k-NN, SVM, RF, and XGB) and four molecular representation schemes are used to build a set of classifiers, among which the top-four best-performing classifiers are selected to form an ensemble classifier. Particle swarm optimization (PSO) is used to optimise the weights used to combined the four top classifiers. The models are developed by a set of curated 997 compounds which are collected from the NPACT and CancerHSP databases. The results show that iANP-EC is a stable, robust, and effective framework that achieves an AUC-ROC value of 0.9193 and an AUC-PR value of 0.8366. The comparative analysis of molecular substructures between natural anticarcinogens and nonanticarcinogens partially unveils several key substructures that drive anticancerous activities. We also deploy the proposed ensemble model as an online web server with a user-friendly interface to support the research community in identifying natural products with anticancer activities.
Collapse
Affiliation(s)
- Loc Nguyen
- Computational Biology Center, International University - VNU HCMC, Ho Chi Minh City 700000, Vietnam
| | - Thanh-Hoang Nguyen Vo
- School of Mathematics and Statistics, Victoria University of Wellington, Wellington 6140, New Zealand
| | - Quang H Trinh
- Computational Biology Center, International University - VNU HCMC, Ho Chi Minh City 700000, Vietnam.,School of Information and Communication Technology, Hanoi University of Science and Technology, Hanoi 100000, Vietnam
| | - Bach Hoai Nguyen
- School of Engineering and Computer Science, Victoria University of Wellington, Wellington 6140, New Zealand
| | - Phuong-Uyen Nguyen-Hoang
- Computational Biology Center, International University - VNU HCMC, Ho Chi Minh City 700000, Vietnam
| | - Ly Le
- Computational Biology Center, International University - VNU HCMC, Ho Chi Minh City 700000, Vietnam.,Vingroup Big Data Institute, Ha Noi 100000, Vietnam
| | - Binh P Nguyen
- School of Mathematics and Statistics, Victoria University of Wellington, Wellington 6140, New Zealand
| |
Collapse
|
87
|
Manavalan B, Basith S, Lee G. Comparative analysis of machine learning-based approaches for identifying therapeutic peptides targeting SARS-CoV-2. Brief Bioinform 2022; 23:bbab412. [PMID: 34595489 PMCID: PMC8500067 DOI: 10.1093/bib/bbab412] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 08/27/2021] [Accepted: 09/07/2021] [Indexed: 01/08/2023] Open
Abstract
Coronavirus disease 2019 (COVID-19) has impacted public health as well as societal and economic well-being. In the last two decades, various prediction algorithms and tools have been developed for predicting antiviral peptides (AVPs). The current COVID-19 pandemic has underscored the need to develop more efficient and accurate machine learning (ML)-based prediction algorithms for the rapid identification of therapeutic peptides against severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2). Several peptide-based ML approaches, including anti-coronavirus peptides (ACVPs), IL-6 inducing epitopes and other epitopes targeting SARS-CoV-2, have been implemented in COVID-19 therapeutics. Owing to the growing interest in the COVID-19 field, it is crucial to systematically compare the existing ML algorithms based on their performances. Accordingly, we comprehensively evaluated the state-of-the-art IL-6 and AVP predictors against coronaviruses in terms of core algorithms, feature encoding schemes, performance evaluation metrics and software usability. A comprehensive performance assessment was then conducted to evaluate the robustness and scalability of the existing predictors using well-constructed independent validation datasets. Additionally, we discussed the advantages and disadvantages of the existing methods, providing useful insights into the development of novel computational tools for characterizing and identifying epitopes or ACVPs. The insights gained from this review are anticipated to provide critical guidance to the scientific community in the rapid design and development of accurate and efficient next-generation in silico tools against SARS-CoV-2.
Collapse
Affiliation(s)
| | - Shaherin Basith
- Department of Physiology, Ajou University School of Medicine, Suwon 16499, Korea
| | - Gwang Lee
- Department of Physiology, Ajou University School of Medicine, Suwon 16499, Korea
| |
Collapse
|
88
|
He W, Jiang Y, Jin J, Li Z, Zhao J, Manavalan B, Su R, Gao X, Wei L. Accelerating bioactive peptide discovery via mutual information-based meta-learning. Brief Bioinform 2021; 23:6457168. [PMID: 34882225 DOI: 10.1093/bib/bbab499] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Revised: 10/07/2021] [Accepted: 10/30/2021] [Indexed: 12/28/2022] Open
Abstract
Recently, machine learning methods have been developed to identify various peptide bio-activities. However, due to the lack of experimentally validated peptides, machine learning methods cannot provide a sufficiently trained model, easily resulting in poor generalizability. Furthermore, there is no generic computational framework to predict the bioactivities of different peptides. Thus, a natural question is whether we can use limited samples to build an effective predictive model for different kinds of peptides. To address this question, we propose Mutual Information Maximization Meta-Learning (MIMML), a novel meta-learning-based predictive model for bioactive peptide discovery. Using few samples from various functional peptides, MIMML can sufficiently learn the discriminative information amongst various functions and characterize functional differences. Experimental results show excellent performance of MIMML though using far fewer training samples as compared to the state-of-the-art methods. We also decipher the latent relationships among different kinds of functions to understand what meta-model learned to improve a specific task. In summary, this study is a pioneering work in the field of functional peptide mining and provides the first-of-its-kind solution for few-sample learning problems in biological sequence analysis, accelerating the new functional peptide discovery. The source codes and datasets are available on https://github.com/TearsWaiting/MIMML.
Collapse
Affiliation(s)
- Wenjia He
- School of Software, Shandong University, Jinan, China.,Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China.,BioMap, Beijing, China
| | - Yi Jiang
- School of Software, Shandong University, Jinan, China.,Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| | - Junru Jin
- School of Software, Shandong University, Jinan, China.,Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| | - Zhongshen Li
- School of Software, Shandong University, Jinan, China.,Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| | - Jiaojiao Zhao
- School of Software, Shandong University, Jinan, China.,Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| | | | - Ran Su
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Xin Gao
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical, and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal, 23955-6900, Saudi Arabia
| | - Leyi Wei
- School of Software, Shandong University, Jinan, China.,Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| |
Collapse
|
89
|
Charoenkwan P, Nantasenamat C, Hasan MM, Moni MA, Lio' P, Manavalan B, Shoombuatong W. StackDPPIV: A novel computational approach for accurate prediction of dipeptidyl peptidase IV (DPP-IV) inhibitory peptides. Methods 2021; 204:189-198. [PMID: 34883239 DOI: 10.1016/j.ymeth.2021.12.001] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Revised: 11/30/2021] [Accepted: 12/01/2021] [Indexed: 12/12/2022] Open
Abstract
The development of efficient and effective bioinformatics tools and pipelines for identifying peptides with dipeptidyl peptidase IV (DPP-IV) inhibitory activities from large-scale protein datasets is of great importance for the discovery and development of potential and promising antidiabetic drugs. In this study, we present a novel stacking-based ensemble learning predictor (termed StackDPPIV) designed for identification of DPP-IV inhibitory peptides. Unlike the existing method, which is based on single-feature-based methods, we combined five popular machine learning algorithms in conjunction with ten different feature encodings from multiple perspectives to generate a pool of various baseline models. Subsequently, the probabilistic features derived from these baseline models were systematically integrated and deemed as new feature representations. Finally, in order to improve the predictive performance, the genetic algorithm based on the self-assessment-report was utilized to determine a set of informative probabilistic features and then used the optimal one for developing the final meta-predictor (StackDPPIV). Experiment results demonstrated that StackDPPIV could outperform its constituent baseline models on both the training and independent datasets. Furthermore, StackDPPIV achieved an accuracy of 0.891, MCC of 0.784 and AUC of 0.961, which were 9.4%, 19.0% and 11.4%, respectively, higher than that of the existing method on the independent test. Feature analysis demonstrated that our feature representations had more discriminative ability as compared to conventional feature descriptors, which highlights the combination of different features was essential for the performance improvement. In order to implement the proposed predictor, we had built a user-friendly online web server at http://pmlabstack.pythonanywhere.com/StackDPPIV.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai 50200, Thailand
| | - Chanin Nantasenamat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| | - Md Mehedi Hasan
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA 70112, USA
| | - Mohammad Ali Moni
- School of Health and Rehabilitation Sciences, Faculty of Health and Behavioural Sciences, the University of Queensland St Lucia, QLD 4072, Australia
| | - Pietro Lio'
- Department of Computer Science and Technology, University of Cambridge, Cambridge CB3 0FD, UK
| | - Balachandran Manavalan
- Department of Physiology, Ajou University School of Medicine, Suwon 16499, Republic of Korea.
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand.
| |
Collapse
|
90
|
Charoenkwan P, Nantasenamat C, Hasan MM, Moni MA, Manavalan B, Shoombuatong W. UMPred-FRL: A New Approach for Accurate Prediction of Umami Peptides Using Feature Representation Learning. Int J Mol Sci 2021; 22:ijms222313124. [PMID: 34884927 PMCID: PMC8658322 DOI: 10.3390/ijms222313124] [Citation(s) in RCA: 49] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Revised: 12/01/2021] [Accepted: 12/02/2021] [Indexed: 11/16/2022] Open
Abstract
Umami ingredients have been identified as important factors in food seasoning and production. Traditional experimental methods for characterizing peptides exhibiting umami sensory properties (umami peptides) are time-consuming, laborious, and costly. As a result, it is preferable to develop computational tools for the large-scale identification of available sequences in order to identify novel peptides with umami sensory properties. Although a computational tool has been developed for this purpose, its predictive performance is still insufficient. In this study, we use a feature representation learning approach to create a novel machine-learning meta-predictor called UMPred-FRL for improved umami peptide identification. We combined six well-known machine learning algorithms (extremely randomized trees, k-nearest neighbor, logistic regression, partial least squares, random forest, and support vector machine) with seven different feature encodings (amino acid composition, amphiphilic pseudo-amino acid composition, dipeptide composition, composition-transition-distribution, and pseudo-amino acid composition) to develop the final meta-predictor. Extensive experimental results demonstrated that UMPred-FRL was effective and achieved more accurate performance on the benchmark dataset compared to its baseline models, and consistently outperformed the existing method on the independent test dataset. Finally, to aid in the high-throughput identification of umami peptides, the UMPred-FRL web server was established and made freely available online. It is expected that UMPred-FRL will be a powerful tool for the cost-effective large-scale screening of candidate peptides with potential umami sensory properties.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai 50200, Thailand;
| | - Chanin Nantasenamat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand;
| | - Md Mehedi Hasan
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA 70112, USA;
| | - Mohammad Ali Moni
- Artificial Intelligence & Digital Health Data Science, School of Health and Rehabilitation Sciences, Faculty of Health and Behavioural Sciences, The University of Queensland, St Lucia, QLD 4072, Australia;
| | - Balachandran Manavalan
- Department of Physiology, Ajou University School of Medicine, Suwon 16499, Korea
- Correspondence: (B.M.); (W.S.)
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand;
- Correspondence: (B.M.); (W.S.)
| |
Collapse
|
91
|
Xu Z, Luo M, Lin W, Xue G, Wang P, Jin X, Xu C, Zhou W, Cai Y, Yang W, Nie H, Jiang Q. DLpTCR: an ensemble deep learning framework for predicting immunogenic peptide recognized by T cell receptor. Brief Bioinform 2021; 22:6355415. [PMID: 34415016 DOI: 10.1093/bib/bbab335] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Revised: 07/25/2021] [Accepted: 07/28/2021] [Indexed: 12/30/2022] Open
Abstract
Accurate prediction of immunogenic peptide recognized by T cell receptor (TCR) can greatly benefit vaccine development and cancer immunotherapy. However, identifying immunogenic peptides accurately is still a huge challenge. Most of the antigen peptides predicted in silico fail to elicit immune responses in vivo without considering TCR as a key factor. This inevitably causes costly and time-consuming experimental validation test for predicted antigens. Therefore, it is necessary to develop novel computational methods for precisely and effectively predicting immunogenic peptide recognized by TCR. Here, we described DLpTCR, a multimodal ensemble deep learning framework for predicting the likelihood of interaction between single/paired chain(s) of TCR and peptide presented by major histocompatibility complex molecules. To investigate the generality and robustness of the proposed model, COVID-19 data and IEDB data were constructed for independent evaluation. The DLpTCR model exhibited high predictive power with area under the curve up to 0.91 on COVID-19 data while predicting the interaction between peptide and single TCR chain. Additionally, the DLpTCR model achieved the overall accuracy of 81.03% on IEDB data while predicting the interaction between peptide and paired TCR chains. The results demonstrate that DLpTCR has the ability to learn general interaction rules and generalize to antigen peptide recognition by TCR. A user-friendly webserver is available at http://jianglab.org.cn/DLpTCR/. Additionally, a stand-alone software package that can be downloaded from https://github.com/jiangBiolab/DLpTCR.
Collapse
Affiliation(s)
- Zhaochun Xu
- Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin 150000, China
| | - Meng Luo
- Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin 150000, China
| | - Weizhong Lin
- Center for Bioinformatics, Computer Department, Jingdezhen Ceramic Institute, Jingdezhen 333403, China
| | - Guangfu Xue
- Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin 150000, China
| | - Pingping Wang
- Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin 150000, China
| | - Xiyun Jin
- Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin 150000, China
| | - Chang Xu
- Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin 150000, China
| | - Wenyang Zhou
- Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin 150000, China
| | - Yideng Cai
- Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin 150000, China
| | - Wenyi Yang
- Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin 150000, China
| | - Huan Nie
- Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin 150000, China
| | - Qinghua Jiang
- Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin 150000, China.,Key Laboratory of Biological Data (Harbin Institute of Technology), Ministry of Education, China
| |
Collapse
|
92
|
Guo Y, Yan K, Lv H, Liu B. PreTP-EL: prediction of therapeutic peptides based on ensemble learning. Brief Bioinform 2021; 22:6359002. [PMID: 34459488 DOI: 10.1093/bib/bbab358] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Revised: 07/27/2021] [Accepted: 08/11/2021] [Indexed: 01/02/2023] Open
Abstract
Therapeutic peptides are important for understanding the correlation between peptides and their therapeutic diagnostic potential. The therapeutic peptides can be further divided into different types based on therapeutic function sharing different characteristics. Although some computational approaches have been proposed to predict different types of therapeutic peptides, they failed to accurately predict all types of therapeutic peptides. In this study, a predictor called PreTP-EL has been proposed via employing the ensemble learning approach to fuse the different features and machine learning techniques in order to capture the different characteristics of various therapeutic peptides. Experimental results showed that PreTP-EL outperformed other competing methods. Availability and implementation: A user-friendly web-server of PreTP-EL predictor is available at http://bliulab.net/PreTP-EL.
Collapse
Affiliation(s)
- Yichen Guo
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Ke Yan
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Hongwu Lv
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
93
|
You H, Yu L, Tian S, Ma X, Xing Y, Song J, Wu W. Anti-cancer Peptide Recognition Based on Grouped Sequence and Spatial Dimension Integrated Networks. Interdiscip Sci 2021; 14:196-208. [PMID: 34637113 DOI: 10.1007/s12539-021-00481-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2021] [Revised: 09/05/2021] [Accepted: 09/09/2021] [Indexed: 11/24/2022]
Abstract
The diversification of the characteristic sequences of anti-cancer peptides has imposed difficulties on research. To effectively predict new anti-cancer peptides, this paper proposes a more suitable feature grouping sequence and spatial dimension-integrated network algorithm for anti-cancer peptide sequence prediction called GRCI-Net. The main process is as follows: First, we implemented the fusion reduction of binary structure features and K-mer sparse matrix features through principal component analysis and generated a set of new features; second, we constructed a new bidirectional long- and short-term memory network. We used traditional convolution and dilated convolution to acquire features in the spatial dimension using the memory network's grouping sequence model, which is designed to better handle the diversification of anti-cancer peptide feature sequences and to fully learn the contextual information between features. Finally, we achieved the fusion of grouping sequence features and spatial dimensional integration features through two sets of dense network layers, achieved the prediction of anti-cancer peptides through the sigmoid function, and verified the approach with two public datasets, ACP740 (accuracy reached 0.8230) and ACP240 (accuracy reached 0.8750). The following is a link to the model code and datasets mentioned in this article: https://github.com/ YouHongfeng101/ACP-DL.
Collapse
Affiliation(s)
- Hongfeng You
- College of Information Science and Engineering, Xinjiang University, 666 Shengli Road, Tianshan District, Urumqi, Xinjiang, China
| | - Long Yu
- Network Center, Xinjiang University, Xinjiang, China.
| | - Shengwei Tian
- School of Software, Xinjiang University, Tianshan District, 666 Shengli Road, Urumqi, Xinjiang, China
| | - Xiang Ma
- Department of Cardiology, The First Affiliated Hospital of Xinjiang Medical University, Urumqi, 830011, China
| | - Yan Xing
- Imaging Center, The First Affiliated Hospital of Xinjiang Medical University, No. 137, LiYuShan South Road, Urumqi, Xinjiang, China
| | - Jinmiao Song
- College of Information Science and Engineering, Xinjiang University, Urumqi, Xinjiang, China
| | - Weidong Wu
- People's Hospital of Xinjiang Uygur Autonomous Region, Urumqi, Xinjiang, China
| |
Collapse
|
94
|
Malik AA, Chotpatiwetchkul W, Phanus-Umporn C, Nantasenamat C, Charoenkwan P, Shoombuatong W. StackHCV: a web-based integrative machine-learning framework for large-scale identification of hepatitis C virus NS5B inhibitors. J Comput Aided Mol Des 2021; 35:1037-1053. [PMID: 34622387 DOI: 10.1007/s10822-021-00418-1] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Accepted: 09/17/2021] [Indexed: 01/07/2023]
Abstract
Fast and accurate identification of inhibitors with potency against HCV NS5B polymerase is currently a challenging task. As conventional experimental methods is the gold standard method for the design and development of new HCV inhibitors, they often require costly investment of time and resources. In this study, we develop a novel machine learning-based meta-predictor (termed StackHCV) for accurate and large-scale identification of HCV inhibitors. Unlike the existing method, which is based on single-feature-based approach, we first constructed a pool of various baseline models by employing a wide range of heterogeneous molecular fingerprints with five popular machine learning algorithms (k-nearest neighbor, multi-layer perceptron, partial least squares, random forest and support vectors machine). Secondly, we integrated these baseline models in order to develop the final meta-based model by means of the stacking strategy. Extensive benchmarking experiments showed that StackHCV achieved a more accurate and stable performance as compared to its constituent baseline models on the training dataset and also outperformed the existing predictor on the independent test dataset. To facilitate the high-throughput identification of HCV inhibitors, we built a web server that can be freely accessed at http://camt.pythonanywhere.com/StackHCV . It is expected that StackHCV could be a useful tool for fast and precise identification of potential drugs against HCV NS5B particularly for liver cancer therapy and other clinical applications.
Collapse
Affiliation(s)
- Aijaz Ahmad Malik
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Warot Chotpatiwetchkul
- Applied Computational Chemistry Research Unit, Department of Chemistry, School of Science, King Mongkut's Institute of Technology Ladkrabang, Bangkok, 10520, Thailand
| | - Chuleeporn Phanus-Umporn
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Chanin Nantasenamat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, 50200, Thailand.
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| |
Collapse
|
95
|
Cai L, Wang L, Fu X, Zeng X. Active Semisupervised Model for Improving the Identification of Anticancer Peptides. ACS OMEGA 2021; 6:23998-24008. [PMID: 34568678 PMCID: PMC8459422 DOI: 10.1021/acsomega.1c03132] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Indexed: 06/13/2023]
Abstract
Cancer is one of the most dangerous threats to human health. Accurate identification of anticancer peptides (ACPs) is valuable for the development and design of new anticancer agents. However, most machine-learning algorithms have limited ability to identify ACPs, and their accuracy is sensitive to the amount of label data. In this paper, we construct a new technology that combines active learning (AL) and label propagation (LP) algorithm to solve this problem, called (ACP-ALPM). First, we develop an efficient feature representation method based on various descriptor information and coding information of the peptide sequence. Then, an AL strategy is used to filter out the most informative data for model training, and a more powerful LP classifier is cast through continuous iterations. Finally, we evaluate the performance of ACP-ALPM and compare it with that of some of the state-of-the-art and classic methods; experimental results show that our method is significantly superior to them. In addition, through the experimental comparison of random selection and AL on three public data sets, it is proved that the AL strategy is more effective. Notably, a visualization experiment further verified that AL can utilize unlabeled data to improve the performance of the model. We hope that our method can be extended to other types of peptides and provide more inspiration for other similar work.
Collapse
Affiliation(s)
- Lijun Cai
- Department of Information
Science and Technology, Hunan University, Changsha, Hunan 410000, China
| | - Li Wang
- Department of Information
Science and Technology, Hunan University, Changsha, Hunan 410000, China
| | - Xiangzheng Fu
- Department of Information
Science and Technology, Hunan University, Changsha, Hunan 410000, China
| | - Xiangxiang Zeng
- Department of Information
Science and Technology, Hunan University, Changsha, Hunan 410000, China
| |
Collapse
|
96
|
Lv Z, Cui F, Zou Q, Zhang L, Xu L. Anticancer peptides prediction with deep representation learning features. Brief Bioinform 2021; 22:bbab008. [PMID: 33529337 DOI: 10.1093/bib/bbab008] [Citation(s) in RCA: 87] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2020] [Revised: 12/20/2020] [Accepted: 01/05/2021] [Indexed: 12/13/2022] Open
Abstract
Anticancer peptides constitute one of the most promising therapeutic agents for combating common human cancers. Using wet experiments to verify whether a peptide displays anticancer characteristics is time-consuming and costly. Hence, in this study, we proposed a computational method named identify anticancer peptides via deep representation learning features (iACP-DRLF) using light gradient boosting machine algorithm and deep representation learning features. Two kinds of sequence embedding technologies were used, namely soft symmetric alignment embedding and unified representation (UniRep) embedding, both of which involved deep neural network models based on long short-term memory networks and their derived networks. The results showed that the use of deep representation learning features greatly improved the capability of the models to discriminate anticancer peptides from other peptides. Also, UMAP (uniform manifold approximation and projection for dimension reduction) and SHAP (shapley additive explanations) analysis proved that UniRep have an advantage over other features for anticancer peptide identification. The python script and pretrained models could be downloaded from https://github.com/zhibinlv/iACP-DRLF or from http://public.aibiochem.net/iACP-DRLF/.
Collapse
Affiliation(s)
- Zhibin Lv
- University of Electronic Science and Technology of China
| | - Feifei Cui
- University of Electronic Science and Technology of China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences at University of Electronic Science and Technology of China
| | - Lichao Zhang
- School of Intelligent Manufacturing and Equipment, Shenzhen Institute of Information Technology, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, China
| |
Collapse
|
97
|
Jiang M, Zhao B, Luo S, Wang Q, Chu Y, Chen T, Mao X, Liu Y, Wang Y, Jiang X, Wei DQ, Xiong Y. NeuroPpred-Fuse: an interpretable stacking model for prediction of neuropeptides by fusing sequence information and feature selection methods. Brief Bioinform 2021; 22:6350884. [PMID: 34396388 DOI: 10.1093/bib/bbab310] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 07/01/2021] [Accepted: 07/18/2021] [Indexed: 12/13/2022] Open
Abstract
Neuropeptides acting as signaling molecules in the nervous system of various animals play crucial roles in a wide range of physiological functions and hormone regulation behaviors. Neuropeptides offer many opportunities for the discovery of new drugs and targets for the treatment of neurological diseases. In recent years, there have been several data-driven computational predictors of various types of bioactive peptides, but the relevant work about neuropeptides is little at present. In this work, we developed an interpretable stacking model, named NeuroPpred-Fuse, for the prediction of neuropeptides through fusing a variety of sequence-derived features and feature selection methods. Specifically, we used six types of sequence-derived features to encode the peptide sequences and then combined them. In the first layer, we ensembled three base classifiers and four feature selection algorithms, which select non-redundant important features complementarily. In the second layer, the output of the first layer was merged and fed into logistic regression (LR) classifier to train the model. Moreover, we analyzed the selected features and explained the feasibility of the selected features. Experimental results show that our model achieved 90.6% accuracy and 95.8% AUC on the independent test set, outperforming the state-of-the-art models. In addition, we exhibited the distribution of selected features by these tree models and compared the results on the training set to that on the test set. These results fully showed that our model has a certain generalization ability. Therefore, we expect that our model would provide important advances in the discovery of neuropeptides as new drugs for the treatment of neurological diseases.
Collapse
Affiliation(s)
- Mingming Jiang
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Bowen Zhao
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Shenggan Luo
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Qiankun Wang
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yanyi Chu
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Tianhang Chen
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Xueying Mao
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yatong Liu
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yanjing Wang
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Xue Jiang
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| |
Collapse
|
98
|
Charoenkwan P, Chiangjong W, Hasan MM, Nantasenamat C, Shoombuatong W. Review and comparative analysis of machine learning-based predictors for predicting and analyzing of anti-angiogenic peptides. Curr Med Chem 2021; 29:849-864. [PMID: 34375178 DOI: 10.2174/0929867328666210810145806] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Revised: 06/17/2021] [Accepted: 06/22/2021] [Indexed: 11/22/2022]
Abstract
Cancer is one of the leading causes of death worldwide and underlying this is angiogenesis that represents one of the hallmarks of cancer. Ongoing effort is already under way in the discovery of anti-angiogenic peptides (AAPs) as a promising therapeutic route by tackling the formation of new blood vessels. As such, the identification of AAPs constitutes a viable path for understanding their mechanistic properties pertinent for the discovery of new anti-cancer drugs. In spite of the abundance of peptide sequences in public databases, experimental efforts in the identification of anti-angiogenic peptides have progressed very slowly owing to its high expenditures and laborious nature. Owing to its inherent ability to make sense of large volumes of data, machine learning (ML) represents a lucrative technique that can be harnessed for peptide-based drug discovery. In this review, we conducted a comprehensive and comparative analysis of ML-based AAP predictors in terms of their employed feature descriptors, ML algorithms, cross-validation methods and prediction performance. Moreover, the common framework of these AAP predictors and their inherent weaknesses are also discussed. Particularly, we explore future perspectives for improving the prediction accuracy and model interpretability, which represents an interesting avenue for overcoming some of the inherent weaknesses of existing AAP predictors. We anticipate that this review would assist researchers in the rapid screening and identification of promising AAPs for clinical use.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, Thailand
| | - Wararat Chiangjong
- Pediatric Translational Research Unit, Department of Pediatrics, Faculty of Medicine, Ramathibodi Hospital, Mahidol University, Bangkok 10400, Thailand
| | - Md Mehedi Hasan
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA 70112, United States
| | - Chanin Nantasenamat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand
| |
Collapse
|
99
|
Li Y, Pu F, Wang J, Zhou Z, Zhang C, He F, Ma Z, Zhang J. Machine Learning Methods in Prediction of Protein Palmitoylation Sites: A Brief Review. Curr Pharm Des 2021; 27:2189-2198. [PMID: 33183190 DOI: 10.2174/1381612826666201112142826] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Accepted: 07/27/2020] [Indexed: 11/22/2022]
Abstract
Protein palmitoylation is a fundamental and reversible post-translational lipid modification that involves a series of biological processes. Although a large number of experimental studies have explored the molecular mechanism behind the palmitoylation process, the computational methods has attracted much attention for its good performance in predicting palmitoylation sites compared with expensive and time-consuming biochemical experiments. The prediction of protein palmitoylation sites is helpful to reveal its biological mechanism. Therefore, the research on the application of machine learning methods to predict palmitoylation sites has become a hot topic in bioinformatics and promoted the development in the related fields. In this review, we briefly introduced the recent development in predicting protein palmitoylation sites by using machine learningbased methods and discussed their benefits and drawbacks. The perspective of machine learning-based methods in predicting palmitoylation sites was also provided. We hope the review could provide a guide in related fields.
Collapse
Affiliation(s)
- Yanwen Li
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Feng Pu
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Jingru Wang
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Zhiguo Zhou
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Chunhua Zhang
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Fei He
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Zhiqiang Ma
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Jingbo Zhang
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| |
Collapse
|
100
|
Nasiri F, Atanaki FF, Behrouzi S, Kavousi K, Bagheri M. CpACpP: In Silico Cell-Penetrating Anticancer Peptide Prediction Using a Novel Bioinformatics Framework. ACS OMEGA 2021; 6:19846-19859. [PMID: 34368571 PMCID: PMC8340416 DOI: 10.1021/acsomega.1c02569] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Accepted: 07/13/2021] [Indexed: 05/12/2023]
Abstract
Cell-penetrating anticancer peptides (Cp-ACPs) are considered promising candidates in solid tumor and hematologic cancer therapies. Current approaches for the design and discovery of Cp-ACPs trust the expensive high-throughput screenings that often give rise to multiple obstacles, including instrumentation adaptation and experimental handling. The application of machine learning (ML) tools developed for peptide activity prediction is importantly of growing interest. In this study, we applied the random forest (RF)-, support vector machine (SVM)-, and eXtreme gradient boosting (XGBoost)-based algorithms to predict the active Cp-ACPs using an experimentally validated data set. The model, CpACpP, was developed on the basis of two independent cell-penetrating peptide (CPP) and anticancer peptide (ACP) subpredictors. Various compositional and physiochemical-based features were combined or selected using the multilayered recursive feature elimination (RFE) method for both data sets. Our results showed that the ACP subclassifiers obtain a mean performance accuracy (ACC) of 0.98 with an area under curve (AUC) ≈ 0.98 vis-à-vis the CPP predictors displaying relevant values of ∼0.94 and ∼0.95 via the hybrid-based features and independent data sets, respectively. Also, the predicting evaluation of Cp-ACPs gave accuracies of ∼0.79 and 0.89 on a series of independent sequences by applying our CPP and ACP classifiers, respectively, which leaves the performance of our predictors better than the earlier reported ACPred, mACPpred, MLCPP, and CPPred-RF. The described consensus-based fusion method additionally reached an AUC of 0.94 for the prediction of Cp-ACP (http://cbb1.ut.ac.ir/CpACpP/Index).
Collapse
Affiliation(s)
- Farid Nasiri
- Peptide
Chemistry Laboratory, Department of Biochemistry, Institute of Biochemistry
and Biophysics (IBB), University of Tehran, Tehran 14176-14335, Iran
| | - Fereshteh Fallah Atanaki
- Laboratory
of Complex Biological Systems and Bioinformatics (CBB), Department
of Bioinformatics, Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran 14176-14411, Iran
| | - Saman Behrouzi
- Laboratory
of Complex Biological Systems and Bioinformatics (CBB), Department
of Bioinformatics, Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran 14176-14411, Iran
| | - Kaveh Kavousi
- Laboratory
of Complex Biological Systems and Bioinformatics (CBB), Department
of Bioinformatics, Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran 14176-14411, Iran
| | - Mojtaba Bagheri
- Peptide
Chemistry Laboratory, Department of Biochemistry, Institute of Biochemistry
and Biophysics (IBB), University of Tehran, Tehran 14176-14335, Iran
| |
Collapse
|