Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Wei L, Hu J, Li F, Song J, Su R, Zou Q. Comparative analysis and prediction of quorum-sensing peptides using feature representation learning and machine learning algorithms. Brief Bioinform 2018;21:106-119. [PMID: 30383239 DOI: 10.1093/bib/bby107] [Citation(s) in RCA: 56] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2018] [Revised: 09/18/2018] [Accepted: 10/05/2018] [Indexed: 12/11/2022] Open

For:	Wei L, Hu J, Li F, Song J, Su R, Zou Q. Comparative analysis and prediction of quorum-sensing peptides using feature representation learning and machine learning algorithms. Brief Bioinform 2018;21:106-119. [PMID: 30383239 DOI: 10.1093/bib/bby107] [Citation(s) in RCA: 56] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2018] [Revised: 09/18/2018] [Accepted: 10/05/2018] [Indexed: 12/11/2022] Open

Number

Cited by Other Article(s)

Zhou X, Liu G, Cao S, Lv J. Deep Learning for Antimicrobial Peptides: Computational Models and Databases. J Chem Inf Model 2025;65:1708-1717. [PMID: 39927895 DOI: 10.1021/acs.jcim.5c00006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/11/2025]

Kumar N, Du Z, Li Y. pLM4CPPs: Protein Language Model-Based Predictor for Cell Penetrating Peptides. J Chem Inf Model 2025;65:1128-1139. [PMID: 39878455 DOI: 10.1021/acs.jcim.4c01338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2025]

Abstract

Cell-penetrating peptides (CPPs) are short peptides capable of penetrating cell membranes, making them valuable for drug delivery and intracellular targeting. Accurate prediction of CPPs can streamline experimental validation in the lab. This study aims to assess pretrained protein language models (pLMs) for their effectiveness in representing CPPs and develop a reliable model for CPP classification. We evaluated peptide embeddings generated from BEPLER, CPCProt, SeqVec, various ESM variants (ESM, ESM-2 with expanded feature set, ESM-1b, and ESM-1v), ProtT5-XL UniRef50, ProtT5-XL BFD, and ProtBERT. We developed pLM4CCPs, a novel deep learning architecture using convolutional neural networks (CNNs) as the classifier for binary classification of CPPs. pLM4CCPs demonstrated superior performance over existing state-of-the-art CPP prediction models, achieving improvements in accuracy (ACC) by 4.9-5.5%, Matthews correlation coefficient (MCC) by 9.3-10.2%, and sensitivity (Sn) by 14.1-19.6%. Among all the tested models, ESM-1280 and ProtT5-XL BFD demonstrated the highest overall performance on the kelm data set. ESM-1280 achieved an ACC of 0.896, an MCC of 0.796, a Sn of 0.844, and a specificity (Sp) of 0.978. ProtT5-XL BFD exhibited superior performance with an ACC of 0.901, an MCC of 0.802, an Sn of 0.885, and an Sp of 0.917. pLM4CCPs combine predictions from multiple models to provide a consensus on whether a given peptide sequence is classified as a CPP or non-CPP. This approach will enhance prediction reliability by leveraging the strengths of each individual model. A user-friendly web server for bioactivity predictions, along with data sets, is available at https://ry2acnp6ep.us-east-1.awsapprunner.com. The source code and protocol for adapting pLM4CPPs can be accessed on GitHub at https://github.com/drkumarnandan/pLM4CPPs. This platform aims to advance CPP prediction and peptide functionality modeling, aiding researchers in exploring peptide functionality effectively.

Collapse

Zhang W, Ding Y, Wei L, Guo X, Ni F. Therapeutic peptides identification via kernel risk sensitive loss-based k-nearest neighbor model and multi-Laplacian regularization. Brief Bioinform 2024;25:bbae534. [PMID: 39438076 PMCID: PMC11495874 DOI: 10.1093/bib/bbae534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2024] [Revised: 08/30/2024] [Accepted: 10/08/2024] [Indexed: 10/25/2024] Open

Yadav AK, Gupta PK, Singh TR. PMTPred: machine-learning-based prediction of protein methyltransferases using the composition of k-spaced amino acid pairs. Mol Divers 2024;28:2301-2315. [PMID: 39033257 DOI: 10.1007/s11030-024-10937-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Accepted: 07/10/2024] [Indexed: 07/23/2024]

Zhang Z, Pan Y, Hussain W, Chen G, Li E. BBSdb, an open resource for bacterial biofilm-associated proteins. Front Cell Infect Microbiol 2024;14:1428784. [PMID: 39149420 PMCID: PMC11324577 DOI: 10.3389/fcimb.2024.1428784] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2024] [Accepted: 07/11/2024] [Indexed: 08/17/2024] Open

Xin R, Zhang F, Zheng J, Zhang Y, Yu C, Feng X. SDBA: Score Domain-Based Attention for DNA N4-Methylcytosine Site Prediction from Multiperspectives. J Chem Inf Model 2024;64:2839-2853. [PMID: 37646411 DOI: 10.1021/acs.jcim.3c00688] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]

Kumar S, Balaya RDA, Kanekar S, Raju R, Prasad TSK, Kandasamy RK. Computational tools for exploring peptide-membrane interactions in gram-positive bacteria. Comput Struct Biotechnol J 2023;21:1995-2008. [PMID: 36950221 PMCID: PMC10025024 DOI: 10.1016/j.csbj.2023.02.051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 02/27/2023] [Accepted: 02/27/2023] [Indexed: 03/05/2023] Open

Abstract

The vital cellular functions in Gram-positive bacteria are controlled by signaling molecules known as quorum sensing peptides (QSPs), considered promising therapeutic interventions for bacterial infections. In the bacterial system QSPs bind to membrane-coupled receptors, which then auto-phosphorylate and activate intracellular response regulators. These response regulators induce target gene expression in bacteria. One of the most reliable trends in drug discovery research for virulence-associated molecular targets is the use of peptide drugs or new functionalities. In this perspective, computational methods act as auxiliary aids for biologists, where methodologies based on machine learning and in silico analysis are developed as suitable tools for target peptide identification. Therefore, the development of quick and reliable computational resources to identify or predict these QSPs along with their receptors and inhibitors is receiving considerable attention. The databases such as Quorumpeps and Quorum Sensing of Human Gut Microbes (QSHGM) provide a detailed overview of the structures and functions of QSPs. The tools and algorithms such as QSPpred, QSPred-FL, iQSP, EnsembleQS and PEPred-Suite have been used for the generic prediction of QSPs and feature representation. The availability of compiled key resources for utilizing peptide features based on amino acid composition, positional preferences, and motifs as well as structural and physicochemical properties, including biofilm inhibitory peptides, can aid in elucidating the QSP and membrane receptor interactions in infectious Gram-positive pathogens. Herein, we present a comprehensive survey of diverse computational approaches that are suitable for detecting QSPs and QS interference molecules. This review highlights the utility of these methods for developing potential biomarkers against infectious Gram-positive pathogens.

Collapse

Ebrahimi Tarki F, Zarrabi M, Abdiali A, Sharbatdar M. Integration of Machine Learning and Structural Analysis for Predicting Peptide Antibiofilm Effects: Advancements in Drug Discovery for Biofilm-Related Infections. IRANIAN JOURNAL OF PHARMACEUTICAL RESEARCH : IJPR 2023;22:e138704. [PMID: 38450220 PMCID: PMC10916117 DOI: 10.5812/ijpr-138704] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 08/22/2023] [Accepted: 08/26/2023] [Indexed: 03/08/2024]

Abstract

Background

The rise of antibiotic resistance has become a major concern, signaling the end of the golden age of antibiotics. Bacterial biofilms, which exhibit high resistance to antibiotics, significantly contribute to the emergence of antibiotic resistance. Therefore, there is an urgent need to discover new therapeutic agents with specific characteristics to effectively combat biofilm-related infections. Studies have shown the promising potential of peptides as antimicrobial agents.

Objectives

This study aimed to establish a cost-effective and streamlined computational method for predicting the antibiofilm effects of peptides. This method can assist in addressing the intricate challenge of designing peptides with strong antibiofilm properties, a task that can be both challenging and costly.

Methods

A positive library, consisting of peptide sequences with antibiofilm activity exceeding 50%, was assembled, along with a negative library containing quorum-sensing peptides. For each peptide sequence, feature vectors were calculated, while considering the primary structure, the order of amino acids, their physicochemical properties, and their distributions. Multiple supervised learning algorithms were used to classify peptides with significant antibiofilm effects for subsequent experimental evaluations.

Results

The computational approach exhibited high accuracy in predicting the antibiofilm effects of peptides, with accuracy, precision, Matthew's correlation coefficient (MCC), and F1 score of 99%, 99%, 0.97, and 0.99, respectively. The performance level of this computational approach was comparable to that of previous methods. This study introduced a novel approach by combining the feature space with high antibiofilm activity.

Conclusions

In this study, a reliable and cost-effective method was developed for predicting the antibiofilm effects of peptides using a computational approach. This approach allows for the identification of peptide sequences with substantial antibiofilm activities for further experimental investigations. Accessible source codes and raw data of this study can be found online (hiABF), providing easy access and enabling future updates.

Collapse

Niu M, Zou Q. SgRNA-RF: Identification of SgRNA On-Target Activity With Imbalanced Datasets. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022;19:2442-2453. [PMID: 33979289 DOI: 10.1109/tcbb.2021.3079116] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]

Feng C, Wu J, Wei H, Xu L, Zou Q. CRCF: A Method of Identifying Secretory Proteins of Malaria Parasites. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022;19:2149-2157. [PMID: 34061749 DOI: 10.1109/tcbb.2021.3085589] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]

Wu X, Zeng W, Lin F, Xu P, Li X. Anticancer Peptide Prediction via Multi-Kernel CNN and Attention Model. Front Genet 2022;13:887894. [PMID: 35571059 PMCID: PMC9092594 DOI: 10.3389/fgene.2022.887894] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Accepted: 03/25/2022] [Indexed: 11/13/2022] Open

Zhang H, Zou Q, Ju Y, Song C, Chen D. Distance-based support vector machine to predict DNA N6-methyladenine modification. Curr Bioinform 2022. [DOI: 10.2174/1574893617666220404145517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Chen Z, Jiao S, Zhao D, Zou Q, Xu L, Zhang L, Su X. The Characterization of Structure and Prediction for Aquaporin in Tumour Progression by Machine Learning. Front Cell Dev Biol 2022;10:845622. [PMID: 35178393 PMCID: PMC8844512 DOI: 10.3389/fcell.2022.845622] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Accepted: 01/17/2022] [Indexed: 11/21/2022] Open

Abstract

Recurrence and new cases of cancer constitute a challenging human health problem. Aquaporins (AQPs) can be expressed in many types of tumours, including the brain, breast, pancreas, colon, skin, ovaries, and lungs, and the histological grade of cancer is positively correlated with AQP expression. Therefore, the identification of aquaporins is an area to explore. Computational tools play an important role in aquaporin identification. In this research, we propose reliable, accurate and automated sequence predictor iAQPs-RF to identify AQPs. In this study, the feature extraction method was 188D (global protein sequence descriptor, GPSD). Six common classifiers, including random forest (RF), NaiveBayes (NB), support vector machine (SVM), XGBoost, logistic regression (LR) and decision tree (DT), were used for AQP classification. The classification results show that the random forest (RF) algorithm is the most suitable machine learning algorithm, and the accuracy was 97.689%. Analysis of Variance (ANOVA) was used to analyse these characteristics. Feature rank based on the ANOVA method and IFS strategy was applied to search for the optimal features. The classification results suggest that the 26th feature (neutral/hydrophobic) and 21st feature (hydrophobic) are the two most powerful and informative features that distinguish AQPs from non-AQPs. Previous studies reported that plasma membrane proteins have hydrophobic characteristics. Aquaporin subcellular localization prediction showed that all aquaporins were plasma membrane proteins with highly conserved transmembrane structures. In addition, the 3D structure of aquaporins was consistent with the localization results. Therefore, these studies confirmed that aquaporins possess hydrophobic properties. Although aquaporins are highly conserved transmembrane structures, the phylogenetic tree shows the diversity of aquaporins during evolution. The PCA showed that positive and negative samples were well separated by 54D features, indicating that the 54D feature can effectively classify aquaporins. The online prediction server is accessible at http://lab.malab.cn/∼acy/iAQP.

Collapse

Kabir M, Nantasenamat C, Kanthawong S, Charoenkwan P, Shoombuatong W. Large-scale comparative review and assessment of computational methods for phage virion proteins identification. EXCLI JOURNAL 2022;21:11-29. [PMID: 35145365 PMCID: PMC8822302 DOI: 10.17179/excli2021-4411] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Accepted: 11/29/2021] [Indexed: 12/11/2022]

Zhao Z, Yang W, Zhai Y, Liang Y, Zhao Y. Identify DNA-Binding Proteins Through the Extreme Gradient Boosting Algorithm. Front Genet 2022;12:821996. [PMID: 35154264 PMCID: PMC8837382 DOI: 10.3389/fgene.2021.821996] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Accepted: 12/07/2021] [Indexed: 12/13/2022] Open

Manavalan B, Basith S, Lee G. Comparative analysis of machine learning-based approaches for identifying therapeutic peptides targeting SARS-CoV-2. Brief Bioinform 2022;23:bbab412. [PMID: 34595489 PMCID: PMC8500067 DOI: 10.1093/bib/bbab412] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 08/27/2021] [Accepted: 09/07/2021] [Indexed: 01/08/2023] Open

Zhang Z, Gong Y, Gao B, Li H, Gao W, Zhao Y, Dong B. SNAREs-SAP: SNARE Proteins Identification With PSSM Profiles. Front Genet 2022;12:809001. [PMID: 34987554 PMCID: PMC8721734 DOI: 10.3389/fgene.2021.809001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Accepted: 11/15/2021] [Indexed: 12/20/2022] Open

Zhang L, Lv Y, Xu L, Zhou M. A Review of DNA Data Storage Technologies Based on Biomolecules. Curr Bioinform 2022. [DOI: 10.2174/1574893616666210813101237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Zhou H, Wang H, Ding Y, Tang J. Multivariate Information Fusion for Identifying Antifungal Peptides with Hilbert-Schmidt Independence Criterion. Curr Bioinform 2022. [DOI: 10.2174/1574893616666210727161003] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Zhao D, Teng Z, Li Y, Chen D. iAIPs: Identifying Anti-Inflammatory Peptides Using Random Forest. Front Genet 2021;12:773202. [PMID: 34917130 PMCID: PMC8669811 DOI: 10.3389/fgene.2021.773202] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Accepted: 10/08/2021] [Indexed: 12/25/2022] Open

Lin X. Genomic Variation Prediction: A Summary From Different Views. Front Cell Dev Biol 2021;9:795883. [PMID: 34901036 PMCID: PMC8656232 DOI: 10.3389/fcell.2021.795883] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Accepted: 11/11/2021] [Indexed: 12/02/2022] Open

Jiao S, Zou Q, Guo H, Shi L. iTTCA-RF: a random forest predictor for tumor T cell antigens. J Transl Med 2021;19:449. [PMID: 34706730 PMCID: PMC8554859 DOI: 10.1186/s12967-021-03084-x] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Accepted: 09/16/2021] [Indexed: 12/21/2022] Open

Abstract

BACKGROUND

Cancer is one of the most serious diseases threatening human health. Cancer immunotherapy represents the most promising treatment strategy due to its high efficacy and selectivity and lower side effects compared with traditional treatment. The identification of tumor T cell antigens is one of the most important tasks for antitumor vaccines development and molecular function investigation. Although several machine learning predictors have been developed to identify tumor T cell antigen, more accurate tumor T cell antigen identification by existing methodology is still challenging.

METHODS

In this study, we used a non-redundant dataset of 592 tumor T cell antigens (positive samples) and 393 tumor T cell antigens (negative samples). Four types feature encoding methods have been studied to build an efficient predictor, including amino acid composition, global protein sequence descriptors and grouped amino acid and peptide composition. To improve the feature representation ability of the hybrid features, we further employed a two-step feature selection technique to search for the optimal feature subset. The final prediction model was constructed using random forest algorithm.

RESULTS

Finally, the top 263 informative features were selected to train the random forest classifier for detecting tumor T cell antigen peptides. iTTCA-RF provides satisfactory performance, with balanced accuracy, specificity and sensitivity values of 83.71%, 78.73% and 88.69% over tenfold cross-validation as well as 73.14%, 62.67% and 83.61% over independent tests, respectively. The online prediction server was freely accessible at http://lab.malab.cn/~acy/iTTCA .

CONCLUSIONS

We have proven that the proposed predictor iTTCA-RF is superior to the other latest models, and will hopefully become an effective and useful tool for identifying tumor T cell antigens presented in the context of major histocompatibility complex class I.

Collapse

Liu T, Chen J, Zhang Q, Hippe K, Hunt C, Le T, Cao R, Tang H. The Development of Machine Learning Methods in discriminating Secretory Proteins of Malaria Parasite. Curr Med Chem 2021;29:807-821. [PMID: 34636289 DOI: 10.2174/0929867328666211005140625] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 07/28/2021] [Accepted: 08/15/2021] [Indexed: 11/22/2022]

Xue Y, Ye X, Wei L, Zhang X, Sakurai T, Wei L. Better Performance with Transformer: CPPFormer in precise prediction of cell-Penetrating Peptides. Curr Med Chem 2021;29:881-893. [PMID: 34544332 DOI: 10.2174/0929867328666210920103140] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Revised: 07/28/2021] [Accepted: 08/07/2021] [Indexed: 11/22/2022]

Niu M, Wu J, Zou Q, Liu Z, Xu L. rBPDL:Predicting RNA-Binding Proteins Using Deep Learning. IEEE J Biomed Health Inform 2021;25:3668-3676. [PMID: 33780344 DOI: 10.1109/jbhi.2021.3069259] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Zhang J, Zhang Z, Pu L, Tang J, Guo F. AIEpred: An Ensemble Predictive Model of Classifier Chain to Identify Anti-Inflammatory Peptides. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021;18:1831-1840. [PMID: 31985437 DOI: 10.1109/tcbb.2020.2968419] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]

Liang X, Li F, Chen J, Li J, Wu H, Li S, Song J, Liu Q. Large-scale comparative review and assessment of computational methods for anti-cancer peptide identification. Brief Bioinform 2021;22:bbaa312. [PMID: 33316035 PMCID: PMC8294543 DOI: 10.1093/bib/bbaa312] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Revised: 09/30/2020] [Accepted: 08/25/2020] [Indexed: 12/13/2022] Open

Abstract

Anti-cancer peptides (ACPs) are known as potential therapeutics for cancer. Due to their unique ability to target cancer cells without affecting healthy cells directly, they have been extensively studied. Many peptide-based drugs are currently evaluated in the preclinical and clinical trials. Accurate identification of ACPs has received considerable attention in recent years; as such, a number of machine learning-based methods for in silico identification of ACPs have been developed. These methods promote the research on the mechanism of ACPs therapeutics against cancer to some extent. There is a vast difference in these methods in terms of their training/testing datasets, machine learning algorithms, feature encoding schemes, feature selection methods and evaluation strategies used. Therefore, it is desirable to summarize the advantages and disadvantages of the existing methods, provide useful insights and suggestions for the development and improvement of novel computational tools to characterize and identify ACPs. With this in mind, we firstly comprehensively investigate 16 state-of-the-art predictors for ACPs in terms of their core algorithms, feature encoding schemes, performance evaluation metrics and webserver/software usability. Then, comprehensive performance assessment is conducted to evaluate the robustness and scalability of the existing predictors using a well-prepared benchmark dataset. We provide potential strategies for the model performance improvement. Moreover, we propose a novel ensemble learning framework, termed ACPredStackL, for the accurate identification of ACPs. ACPredStackL is developed based on the stacking ensemble strategy combined with SVM, Naïve Bayesian, lightGBM and KNN. Empirical benchmarking experiments against the state-of-the-art methods demonstrate that ACPredStackL achieves a comparative performance for predicting ACPs. The webserver and source code of ACPredStackL is freely available at http://bigdata.biocie.cn/ACPredStackL/ and https://github.com/liangxiaoq/ACPredStackL, respectively.

Collapse

Hunt C, Montgomery S, Berkenpas JW, Sigafoos N, Oakley JC, Espinosa J, Justice N, Kishaba K, Hippe K, Si D, Hou J, Ding H, Cao R. Recent Progress of Machine Learning in Gene Therapy. Curr Gene Ther 2021;22:132-143. [PMID: 34161210 DOI: 10.2174/1566523221666210622164133] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Revised: 03/15/2021] [Accepted: 04/02/2021] [Indexed: 11/22/2022]

Zeng R, Cheng S, Liao M. 4mCPred-MTL: Accurate Identification of DNA 4mC Sites in Multiple Species Using Multi-Task Deep Learning Based on Multi-Head Attention Mechanism. Front Cell Dev Biol 2021;9:664669. [PMID: 34041243 PMCID: PMC8141656 DOI: 10.3389/fcell.2021.664669] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Accepted: 03/17/2021] [Indexed: 01/10/2023] Open

Yang X, Ye X, Li X, Wei L. iDNA-MT: Identification DNA Modification Sites in Multiple Species by Using Multi-Task Learning Based a Neural Network Tool. Front Genet 2021;12:663572. [PMID: 33868390 PMCID: PMC8044371 DOI: 10.3389/fgene.2021.663572] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Accepted: 03/02/2021] [Indexed: 02/04/2023] Open

Nilamyani AN, Auliah FN, Moni MA, Shoombuatong W, Hasan MM, Kurata H. PredNTS: Improved and Robust Prediction of Nitrotyrosine Sites by Integrating Multiple Sequence Features. Int J Mol Sci 2021;22:2704. [PMID: 33800121 PMCID: PMC7962192 DOI: 10.3390/ijms22052704] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2021] [Revised: 03/02/2021] [Accepted: 03/03/2021] [Indexed: 12/15/2022] Open

Huang Q, Zhou W, Guo F, Xu L, Zhang L. 6mA-Pred: identifying DNA N6-methyladenine sites based on deep learning. PeerJ 2021;9:e10813. [PMID: 33604189 PMCID: PMC7866889 DOI: 10.7717/peerj.10813] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Accepted: 12/30/2020] [Indexed: 01/03/2023] Open

Recent Advances in Predicting Protein S-Nitrosylation Sites. BIOMED RESEARCH INTERNATIONAL 2021;2021:5542224. [PMID: 33628788 PMCID: PMC7892234 DOI: 10.1155/2021/5542224] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2021] [Revised: 01/24/2021] [Accepted: 01/25/2021] [Indexed: 01/09/2023]

Using a low correlation high orthogonality feature set and machine learning methods to identify plant pentatricopeptide repeat coding gene/protein. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.02.079] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Dai R, Zhang W, Tang W, Wynendaele E, Zhu Q, Bin Y, De Spiegeleer B, Xia J. BBPpred: Sequence-Based Prediction of Blood-Brain Barrier Peptides with Feature Representation Learning and Logistic Regression. J Chem Inf Model 2021;61:525-534. [PMID: 33426873 DOI: 10.1021/acs.jcim.0c01115] [Citation(s) in RCA: 55] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]

Li Y, Zhang Z, Teng Z, Liu X. PredAmyl-MLP: Prediction of Amyloid Proteins Using Multilayer Perceptron. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2020;2020:8845133. [PMID: 33294004 PMCID: PMC7700051 DOI: 10.1155/2020/8845133] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 10/06/2020] [Accepted: 10/31/2020] [Indexed: 01/20/2023]

Wang C, Zhang H, Li Z, Zhou X, Cheng Y, Chen R. White Blood Cell Image Segmentation Based on Color Component Combination and Contour Fitting. Curr Bioinform 2020. [DOI: 10.2174/1574893614666191017102310] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Manavalan B, Hasan MM, Basith S, Gosu V, Shin TH, Lee G. Empirical Comparison and Analysis of Web-Based DNA N ⁴-Methylcytosine Site Prediction Tools. MOLECULAR THERAPY. NUCLEIC ACIDS 2020;22:406-420. [PMID: 33230445 PMCID: PMC7533314 DOI: 10.1016/j.omtn.2020.09.010] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/16/2020] [Accepted: 09/11/2020] [Indexed: 12/12/2022]

Bin Y, Zhang W, Tang W, Dai R, Li M, Zhu Q, Xia J. Prediction of Neuropeptides from Sequence Information Using Ensemble Classifier and Hybrid Features. J Proteome Res 2020;19:3732-3740. [DOI: 10.1021/acs.jproteome.0c00276] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]

Xu ZC, Feng PM, Yang H, Qiu WR, Chen W, Lin H. iRNAD: a computational tool for identifying D modification sites in RNA sequence. Bioinformatics 2020;35:4922-4929. [PMID: 31077296 DOI: 10.1093/bioinformatics/btz358] [Citation(s) in RCA: 74] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Revised: 03/01/2019] [Accepted: 04/27/2019] [Indexed: 12/19/2022] Open

Identification of Human Enzymes Using Amino Acid Composition and the Composition of k-Spaced Amino Acid Pairs. BIOMED RESEARCH INTERNATIONAL 2020;2020:9235920. [PMID: 32596396 PMCID: PMC7273372 DOI: 10.1155/2020/9235920] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/14/2020] [Accepted: 04/22/2020] [Indexed: 11/17/2022]

Manavalan B, Basith S, Shin TH, Wei L, Lee G. mAHTPred: a sequence-based meta-predictor for improving the prediction of anti-hypertensive peptides using effective feature representation. Bioinformatics 2020;35:2757-2765. [PMID: 30590410 DOI: 10.1093/bioinformatics/bty1047] [Citation(s) in RCA: 174] [Impact Index Per Article: 34.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2018] [Revised: 12/05/2018] [Accepted: 12/20/2018] [Indexed: 11/13/2022] Open

Abstract

MOTIVATION

Cardiovascular disease is the primary cause of death globally accounting for approximately 17.7 million deaths per year. One of the stakes linked with cardiovascular diseases and other complications is hypertension. Naturally derived bioactive peptides with antihypertensive activities serve as promising alternatives to pharmaceutical drugs. So far, there is no comprehensive analysis, assessment of diverse features and implementation of various machine-learning (ML) algorithms applied for antihypertensive peptide (AHTP) model construction.

RESULTS

In this study, we utilized six different ML algorithms, namely, Adaboost, extremely randomized tree (ERT), gradient boosting (GB), k-nearest neighbor, random forest (RF) and support vector machine (SVM) using 51 feature descriptors derived from eight different feature encodings for the prediction of AHTPs. While ERT-based trained models performed consistently better than other algorithms regardless of various feature descriptors, we treated them as baseline predictors, whose predicted probability of AHTPs was further used as input features separately for four different ML-algorithms (ERT, GB, RF and SVM) and developed their corresponding meta-predictors using a two-step feature selection protocol. Subsequently, the integration of four meta-predictors through an ensemble learning approach improved the balanced prediction performance and model robustness on the independent dataset. Upon comparison with existing methods, mAHTPred showed superior performance with an overall improvement of approximately 6-7% in both benchmarking and independent datasets.

AVAILABILITY AND IMPLEMENTATION

The user-friendly online prediction tool, mAHTPred is freely accessible at http://thegleelab.org/mAHTPred.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Liang P, Yang W, Chen X, Long C, Zheng L, Li H, Zuo Y. Machine Learning of Single-Cell Transcriptome Highly Identifies mRNA Signature by Comparing F-Score Selection with DGE Analysis. MOLECULAR THERAPY. NUCLEIC ACIDS 2020;20:155-163. [PMID: 32169803 PMCID: PMC7066034 DOI: 10.1016/j.omtn.2020.02.004] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/12/2019] [Revised: 12/27/2019] [Accepted: 02/05/2020] [Indexed: 12/21/2022]

Liu B, Gao X, Zhang H. BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches. Nucleic Acids Res 2020;47:e127. [PMID: 31504851 PMCID: PMC6847461 DOI: 10.1093/nar/gkz740] [Citation(s) in RCA: 270] [Impact Index Per Article: 54.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2019] [Revised: 08/07/2019] [Accepted: 08/17/2019] [Indexed: 12/14/2022] Open

Fallah Atanaki F, Behrouzi S, Ariaeenejad S, Boroomand A, Kavousi K. BIPEP: Sequence-based Prediction of Biofilm Inhibitory Peptides Using a Combination of NMR and Physicochemical Descriptors. ACS OMEGA 2020;5:7290-7297. [PMID: 32280870 PMCID: PMC7144140 DOI: 10.1021/acsomega.9b04119] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Accepted: 03/12/2020] [Indexed: 05/26/2023]

Meng C, Hu Y, Zhang Y, Guo F. PSBP-SVM: A Machine Learning-Based Computational Identifier for Predicting Polystyrene Binding Peptides. Front Bioeng Biotechnol 2020;8:245. [PMID: 32296690 PMCID: PMC7137786 DOI: 10.3389/fbioe.2020.00245] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Accepted: 03/09/2020] [Indexed: 12/11/2022] Open

iQSP: A Sequence-Based Tool for the Prediction and Analysis of Quorum Sensing Peptides via Chou's 5-Steps Rule and Informative Physicochemical Properties. Int J Mol Sci 2019;21:ijms21010075. [PMID: 31861928 PMCID: PMC6981611 DOI: 10.3390/ijms21010075] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Revised: 12/13/2019] [Accepted: 12/18/2019] [Indexed: 01/18/2023] Open

Zhang M, Li F, Marquez-Lago TT, Leier A, Fan C, Kwoh CK, Chou KC, Song J, Jia C. MULTiPly: a novel multi-layer predictor for discovering general and specific types of promoters. Bioinformatics 2019;35:2957-2965. [PMID: 30649179 PMCID: PMC6736106 DOI: 10.1093/bioinformatics/btz016] [Citation(s) in RCA: 81] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2018] [Revised: 12/09/2018] [Accepted: 01/05/2019] [Indexed: 12/22/2022] Open

AtbPpred: A Robust Sequence-Based Prediction of Anti-Tubercular Peptides Using Extremely Randomized Trees. Comput Struct Biotechnol J 2019;17:972-981. [PMID: 31372196 PMCID: PMC6658830 DOI: 10.1016/j.csbj.2019.06.024] [Citation(s) in RCA: 72] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2019] [Revised: 06/27/2019] [Accepted: 06/28/2019] [Indexed: 01/01/2023] Open

Abstract

Mycobacterium tuberculosis is one of the most dangerous pathogens in humans. It acts as an etiological agent of tuberculosis (TB), infecting almost one-third of the world's population. Owing to the high incidence of multidrug-resistant TB and extensively drug-resistant TB, there is an urgent need for novel and effective alternative therapies. Peptide-based therapy has several advantages, such as diverse mechanisms of action, low immunogenicity, and selective affinity to bacterial cell envelopes. However, the identification of anti-tubercular peptides (AtbPs) via experimentation is laborious and expensive; hence, the development of an efficient computational method is necessary for the prediction of AtbPs prior to both in vitro and in vivo experiments. To this end, we developed a two-layer machine learning (ML)-based predictor called AtbPpred for the identification of AtbPs. In the first layer, we applied a two-step feature selection procedure and identified the optimal feature set individually for nine different feature encodings, whose corresponding models were developed using extremely randomized tree (ERT). In the second-layer, the predicted probability of AtbPs from the above nine models were considered as input features to ERT and developed the final predictor. AtbPpred respectively achieved average accuracies of 88.3% and 87.3% during cross-validation and an independent evaluation, which were ~8.7% and 10.0% higher than the state-of-the-art method. Furthermore, we established a user-friendly webserver which is currently available at http://thegleelab.org/AtbPpred. We anticipate that this predictor could be useful in the high-throughput prediction of AtbPs and also provide mechanistic insights into its functions.

•

We developed a novel computational framework for the identification of anti-tubercular peptides using Extremely randomized tree.

•

AtbPpred displayed superior performance compared to the existing method on both benchmark and independent datasets.

•

We constructed a user-friendly web server that implements the proposed AtbPpred method.

Collapse

Ye M, Wang W, Yao C, Fan R, Wang P. Gene Selection Method for Microarray Data Classification Using Particle Swarm Optimization and Neighborhood Rough Set. Curr Bioinform 2019. [DOI: 10.2174/1574893614666190204150918] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]