1
|
Akbar S, Raza A, Awan HH, Zou Q, Alghamdi W, Saeed A. pNPs-CapsNet: Predicting Neuropeptides Using Protein Language Models and FastText Encoding-Based Weighted Multi-View Feature Integration with Deep Capsule Neural Network. ACS OMEGA 2025; 10:12403-12416. [PMID: 40191328 PMCID: PMC11966582 DOI: 10.1021/acsomega.4c11449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/24/2024] [Revised: 02/04/2025] [Accepted: 03/07/2025] [Indexed: 04/09/2025]
Abstract
Neuropeptides (NPs) are critical signaling molecules that are essential in numerous physiological processes and possess significant therapeutic potential. Computational prediction of NPs has emerged as a promising alternative to traditional experimental methods, often labor-intensive, time-consuming, and expensive. Recent advancements in computational peptide models provide a cost-effective approach to identifying NPs, characterized by high selectivity toward target cells and minimal side effects. In this study, we propose a novel deep capsule neural network-based computational model, namely pNPs-CapsNet, to predict NPs and non-NPs accurately. Input samples are numerically encoded using pretrained protein language models, including ESM, ProtBERT-BFD, and ProtT5, to extract attention mechanism-based contextual and semantic features. A differential evolution-based weighted feature integration method is utilized to construct a multiview vector. Additionally, a two-tier feature selection strategy, comprising MRMD and SHAP analysis, is developed to identify and select optimal features. Finally, the novel capsule neural network (CapsNet) is trained using the selected optimal feature set. The proposed pNPs-CapsNet model achieved a remarkable predictive accuracy of 98.10% and an AUC of 0.98. To validate the generalization capability of the pNPs-CapsNet model, independent samples reported an accuracy of 95.21% and an AUC of 0.96. The pNPs-CapsNet model outperforms existing state-of-the-art models, demonstrating 4% and 2.5% improved predictive accuracy for training and independent data sets, respectively. The demonstrated efficacy and consistency of pNPs-CapsNet underline its potential as a valuable and robust tool for advancing drug discovery and academic research.
Collapse
Affiliation(s)
- Shahid Akbar
- Institute
of Fundamental and Frontier Sciences, University
of Electronic Science and Technology of China, Chengdu 610054, China
- Department
of Computer Science, Abdul Wali Khan University
Mardan, Mardan 23200, Khyber Pakhtunkhwa, Pakistan
| | - Ali Raza
- Department
of Computer Science, Bahria University, Islamabad 44220, Pakistan
| | - Hamid Hussain Awan
- Department
of Computer Science, Rawalpindi Women University, Rawalpindi 46300, Punjab, Pakistan
| | - Quan Zou
- Institute
of Fundamental and Frontier Sciences, University
of Electronic Science and Technology of China, Chengdu 610054, China
- Yangtze
Delta Region Institute (Quzhou), University
of Electronic Science and Technology of China, Quzhou 324000, PR China
| | - Wajdi Alghamdi
- Department
of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Aamir Saeed
- Department
of Computer Science and IT, University of
Engineering and Technology, Jalozai Campus, Peshawar 25000, Pakistan
| |
Collapse
|
2
|
Rahmani R, Kalankesh LR, Ferdousi R. Computational approaches for identifying neuropeptides: A comprehensive review. MOLECULAR THERAPY. NUCLEIC ACIDS 2025; 36:102409. [PMID: 40171446 PMCID: PMC11960512 DOI: 10.1016/j.omtn.2024.102409] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/03/2025]
Abstract
Neuropeptides (NPs) are key signaling molecules that interact with G protein-coupled receptors, influencing neuronal activities and developmental pathways, as well as the endocrine and immune systems. They are significant in disease contexts, offering potential therapeutic targets for conditions such as anxiety, neurological disorders, cardiovascular health, and diabetes. Understanding and detecting NPs is crucial because of their complex functions in health and disease. Historically, identifying NPs via wet lab techniques has been time-consuming and costly. However, integrating computational methods has shown the potential to improve efficiency, accuracy, and cost-effectiveness. Computational techniques, such as artificial intelligence and machine learning, have been extensively researched in recent years for the identification of NP. This review explores the application of machine learning (ML) techniques in predicting various aspects of NPs, including their sequences, cleavage sites, and precursors. Additionally, it provides insights into databases containing NP metadata and specialized tools used in this domain.
Collapse
Affiliation(s)
- Roya Rahmani
- Student Research Committee, Tabriz University of Medical Science, Tabriz, Iran
- Department of Health Information Technology, School of Management and Medical Informatics, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Leila R. Kalankesh
- Department of Health Information Technology, School of Management and Medical Informatics, Tabriz University of Medical Sciences, Tabriz, Iran
- Tabriz University of Medical Sciences, Research Center of Psychiatry and Behavioral Sciences Tabriz, East Azerbaijan, Iran
| | - Reza Ferdousi
- Department of Health Information Technology, School of Management and Medical Informatics, Tabriz University of Medical Sciences, Tabriz, Iran
| |
Collapse
|
3
|
Petre ML, Kontouli Pertesi AN, Boulioglou OE, Sarantidi E, Korovesi AG, Kozei A, Katsafadou AI, Tsangaris GT, Trichopoulou A, Anagnostopoulos AK. Bioactive Peptides in Greek Goat Colostrum: Relevance to Human Metabolism. Foods 2024; 13:3949. [PMID: 39683021 DOI: 10.3390/foods13233949] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2024] [Revised: 11/20/2024] [Accepted: 12/03/2024] [Indexed: 12/18/2024] Open
Abstract
Colostrum is essential for the survival and development of newborn mammals. This primary source of nourishment during the first days of infant life is rich in functional components conductive to the enhancement of neonate immunity and growth. Compared with mature milk, a higher protein and peptide content is observed in colostrum, whilst it is low in fat and carbohydrates. The functional properties of colostrum are closely linked to the release of bioactive peptides during the gastrointestinal digestion of colostrum proteins. Our study aimed to comprehensively analyze the whey proteome of colostrum from indigenous Greek goats and to examine the influence of bioactive peptides released during digestion on human metabolism. Colostrum and mature milk samples from healthy ewes were subjected to nanoLC-MS/MS analysis, revealing differentially expressed proteins. These proteins were functionally characterized and subjected to in silico digestion. Using machine learning models, we classified the peptide functional groups, while molecular docking assessed the binding affinity of the proposed angiotensin-converting enzyme (ACE)- and dipeptidyl peptidase IV (DPPIV)-inhibitory peptides to their target molecules. A total of 898 proteins were identified in colostrum, 40 of which were overexpressed compared with mature milk. The enzymatic cleavage of upregulated proteins by key gastrointestinal tract proteases and the downstream analysis of peptide sequences identified 117 peptides predicted (with >80% confidence) to impact metabolism, primarily through modulation of the renin-angiotensin system, insulin secretion, and redox pathways. This work advances our understanding of dietary bioactive peptides and their relevance to human metabolism, highlighting the potential health benefits of colostrum consumption.
Collapse
Affiliation(s)
- Maria Louiza Petre
- Department of Biotechnology, Biomedical Research Foundation of the Academy of Athens, 11527 Athens, Greece
| | | | - Olympia Eirini Boulioglou
- Department of Biotechnology, Biomedical Research Foundation of the Academy of Athens, 11527 Athens, Greece
| | - Eleana Sarantidi
- Department of Biotechnology, Biomedical Research Foundation of the Academy of Athens, 11527 Athens, Greece
| | | | - Athina Kozei
- Department of Biotechnology, Biomedical Research Foundation of the Academy of Athens, 11527 Athens, Greece
| | | | - George T Tsangaris
- Department of Biotechnology, Biomedical Research Foundation of the Academy of Athens, 11527 Athens, Greece
| | - Antonia Trichopoulou
- Department of Biotechnology, Biomedical Research Foundation of the Academy of Athens, 11527 Athens, Greece
- Center for Public Health, Research and Education, Academy of Athens, 11528 Athens, Greece
| | - Athanasios K Anagnostopoulos
- Department of Biotechnology, Biomedical Research Foundation of the Academy of Athens, 11527 Athens, Greece
- Oncology Unit, 3rd Department of Internal Medicine, "Sotiria" Hospital, Medical School, National Kapodistrian Univeristy of Athens, 11527 Athens, Greece
| |
Collapse
|
4
|
Liang Y, Cao M, Zhang S. NeuroPred-ResSE: Predicting neuropeptides by integrating residual block and squeeze-excitation attention mechanism. Anal Biochem 2024; 695:115648. [PMID: 39154878 DOI: 10.1016/j.ab.2024.115648] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Revised: 07/31/2024] [Accepted: 08/15/2024] [Indexed: 08/20/2024]
Abstract
Neuropeptides play crucial roles in regulating neurological function acting as signaling molecules, which provide new opportunity for developing drugs for the treatment of neurological diseases. Therefore, it is very necessary to develop a rapid and accurate prediction model for neuropeptides. Although a few prediction tools have been developed, there is room for improvement in prediction accuracy by using deep learning approach. In this paper, we establish the NeuroPred-ResSE model based on residual block and squeeze-excitation attention mechanism. Firstly, we extract multi-features by using one-hot coding based on the NT5CT5 sequence, dipeptide deviation from expected mean and natural vector. Then, we integrate residual block and squeeze-excitation attention mechanism, which can capture and identify the most relevant attribute features. Finally, the accuracies of the training set and test set are 97.16 % and 96.60 % based on the 5-fold cross-validation and independent test, respectively, and other evaluation metrics have also obtained satisfactory results. The experimental results show that the performance of the NeuroPred-ResSE model outperforms those of existing state-of-the-art models, and our model is an effective, intelligent and robust prediction tool. The datasets and source codes are available at https://github.com/yunyunliang88/NeuroPred-ResSE.
Collapse
Affiliation(s)
- Yunyun Liang
- School of Science, Xi'an Polytechnic University, Xi'an, 710048, PR China.
| | - Mengyi Cao
- School of Science, Xi'an Polytechnic University, Xi'an, 710048, PR China
| | - Shengli Zhang
- School of Mathematics and Statistics, Xidian University, Xi'an, 710071, PR China
| |
Collapse
|
5
|
Zandawala M, Bilal Amir M, Shin J, Yim WC, Alfonso Yañez Guerra L. Proteome-wide neuropeptide identification using NeuroPeptide-HMMer (NP-HMMer). Gen Comp Endocrinol 2024; 357:114597. [PMID: 39084320 DOI: 10.1016/j.ygcen.2024.114597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/07/2024] [Revised: 07/20/2024] [Accepted: 07/27/2024] [Indexed: 08/02/2024]
Abstract
Neuropeptides are essential neuronal signaling molecules that orchestrate animal behavior and physiology via actions within the nervous system and on peripheral tissues. Due to the small size of biologically active mature peptides, their identification on a proteome-wide scale poses a significant challenge using existing bioinformatics tools like BLAST. To address this, we have developed NeuroPeptide-HMMer (NP-HMMer), a hidden Markov model (HMM)-based tool to facilitate neuropeptide discovery, especially in underexplored invertebrates. NP-HMMer utilizes manually curated HMMs for 46 neuropeptide families, enabling rapid and accurate identification of neuropeptides. Validation of NP-HMMer on Drosophila melanogaster, Daphnia pulex, Tribolium castaneum and Tenebrio molitor demonstrated its effectiveness in identifying known neuropeptides across diverse arthropods. Additionally, we showcase the utility of NP-HMMer by discovering novel neuropeptides in Priapulida and Rotifera, identifying 22 and 19 new peptides, respectively. This tool represents a significant advancement in neuropeptide research, offering a robust method for annotating neuropeptides across diverse proteomes and providing insights into the evolutionary conservation of neuropeptide signaling pathways.
Collapse
Affiliation(s)
- Meet Zandawala
- Department of Biochemistry and Molecular Biology, University of Nevada, Reno, NV 89557, USA; Integrative Neuroscience Program, University of Nevada, Reno, NV 89557, USA; Neurobiology and Genetics, Theodor-Boveri-Institute, Biocenter, Julius-Maximilians-University of Würzburg, Am Hubland, 97074 Würzburg, Germany.
| | - Muhammad Bilal Amir
- Department of Biochemistry and Molecular Biology, University of Nevada, Reno, NV 89557, USA
| | - Joel Shin
- Department of Biochemistry and Molecular Biology, University of Nevada, Reno, NV 89557, USA
| | - Won C Yim
- Department of Biochemistry and Molecular Biology, University of Nevada, Reno, NV 89557, USA
| | - Luis Alfonso Yañez Guerra
- School of Biological Sciences, University of Southampton, University Road, SO17 1BJ Southampton, UK; Institute for Life Sciences, University of Southampton, University Road SO17 1BJ, Southampton, UK.
| |
Collapse
|
6
|
Liu D, Lin Z, Jia C. NeuroCNN_GNB: an ensemble model to predict neuropeptides based on a convolution neural network and Gaussian naive Bayes. Front Genet 2023; 14:1226905. [PMID: 37576553 PMCID: PMC10414792 DOI: 10.3389/fgene.2023.1226905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 06/30/2023] [Indexed: 08/15/2023] Open
Abstract
Neuropeptides contain more chemical information than other classical neurotransmitters and have multiple receptor recognition sites. These characteristics allow neuropeptides to have a correspondingly higher selectivity for nerve receptors and fewer side effects. Traditional experimental methods, such as mass spectrometry and liquid chromatography technology, still need the support of a complete neuropeptide precursor database and the basic characteristics of neuropeptides. Incomplete neuropeptide precursor and information databases will lead to false-positives or reduce the sensitivity of recognition. In recent years, studies have proven that machine learning methods can rapidly and effectively predict neuropeptides. In this work, we have made a systematic attempt to create an ensemble tool based on four convolution neural network models. These baseline models were separately trained on one-hot encoding, AAIndex, G-gap dipeptide encoding and word2vec and integrated using Gaussian Naive Bayes (NB) to construct our predictor designated NeuroCNN_GNB. Both 5-fold cross-validation tests using benchmark datasets and independent tests showed that NeuroCNN_GNB outperformed other state-of-the-art methods. Furthermore, this novel framework provides essential interpretations that aid the understanding of model success by leveraging the powerful Shapley Additive exPlanation (SHAP) algorithm, thereby highlighting the most important features relevant for predicting neuropeptides.
Collapse
Affiliation(s)
- Di Liu
- Information Science and Technology College, Dalian Maritime University, Dalian, China
| | - Zhengkui Lin
- Information Science and Technology College, Dalian Maritime University, Dalian, China
| | - Cangzhi Jia
- School of Science, Dalian Maritime University, Dalian, China
| |
Collapse
|
7
|
Liu Y, Wang S, Li X, Liu Y, Zhu X. NeuroPpred-SVM: A New Model for Predicting Neuropeptides Based on Embeddings of BERT. J Proteome Res 2023; 22:718-728. [PMID: 36749151 DOI: 10.1021/acs.jproteome.2c00363] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Neuropeptides play pivotal roles in different physiological processes and are related to different kinds of diseases. Identification of neuropeptides is of great benefit for studying the mechanism of these physiological processes and the treatment of neurological disorders. Several state-of-the-art neuropeptide predictors have been developed by using a two-layer stacking ensemble algorithm. Although the two-layer stacking ensemble algorithm can improve the feature representability, these models are complex, which are not as efficient as the models based on one classifier. In this study, we proposed a new model, NeuroPpred-SVM, to predict neuropeptides based on the embeddings of Bidirectional Encoder Representations from Transformers and other sequential features by using a support vector machine (SVM). The experimental results indicate that our model achieved a cross-validation area under the receiver operating characteristic (AUROC) curve of 0.969 on the training data set and an AUROC of 0.966 on the independent test set. By comparing our model with the other four state-of-the-art models including NeuroPIpred, PredNeuroP, NeuroPpred-Fuse, and NeuroPpred-FRL on the independent test set, our model achieved the highest AUROC, Matthews correlation coefficient, accuracy, and specificity, which indicate that our model outperforms the existing models. We believed that NeuroPpred-SVM could be a useful tool for identifying neuropeptides with high accuracy and low cost. The data sets and Python code are available at https://github.com/liuyf-a/NeuroPpred-SVM.
Collapse
Affiliation(s)
- Yufeng Liu
- School of Sciences, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Shuyu Wang
- School of Sciences, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Xiang Li
- School of Sciences, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Yinbo Liu
- School of Sciences, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Xiaolei Zhu
- School of Sciences, Anhui Agricultural University, Hefei, Anhui 230036, China
| |
Collapse
|
8
|
Phetsanthad A, Vu NQ, Yu Q, Buchberger AR, Chen Z, Keller C, Li L. Recent advances in mass spectrometry analysis of neuropeptides. MASS SPECTROMETRY REVIEWS 2023; 42:706-750. [PMID: 34558119 PMCID: PMC9067165 DOI: 10.1002/mas.21734] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 08/22/2021] [Accepted: 08/28/2021] [Indexed: 05/08/2023]
Abstract
Due to their involvement in numerous biochemical pathways, neuropeptides have been the focus of many recent research studies. Unfortunately, classic analytical methods, such as western blots and enzyme-linked immunosorbent assays, are extremely limited in terms of global investigations, leading researchers to search for more advanced techniques capable of probing the entire neuropeptidome of an organism. With recent technological advances, mass spectrometry (MS) has provided methodology to gain global knowledge of a neuropeptidome on a spatial, temporal, and quantitative level. This review will cover key considerations for the analysis of neuropeptides by MS, including sample preparation strategies, instrumental advances for identification, structural characterization, and imaging; insightful functional studies; and newly developed absolute and relative quantitation strategies. While many discoveries have been made with MS, the methodology is still in its infancy. Many of the current challenges and areas that need development will also be highlighted in this review.
Collapse
Affiliation(s)
- Ashley Phetsanthad
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, WI 53706, USA
| | - Nhu Q. Vu
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, WI 53706, USA
| | - Qing Yu
- School of Pharmacy, University of Wisconsin-Madison, 777 Highland Avenue, Madison, WI 53705, USA
| | - Amanda R. Buchberger
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, WI 53706, USA
| | - Zhengwei Chen
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, WI 53706, USA
| | - Caitlin Keller
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, WI 53706, USA
| | - Lingjun Li
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, WI 53706, USA
- School of Pharmacy, University of Wisconsin-Madison, 777 Highland Avenue, Madison, WI 53705, USA
| |
Collapse
|
9
|
Vogt E, Sonderegger L, Chen YY, Segessemann T, Künzler M. Structural and Functional Analysis of Peptides Derived from KEX2-Processed Repeat Proteins in Agaricomycetes Using Reverse Genetics and Peptidomics. Microbiol Spectr 2022; 10:e0202122. [PMID: 36314921 PMCID: PMC9769878 DOI: 10.1128/spectrum.02021-22] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Accepted: 10/06/2022] [Indexed: 12/24/2022] Open
Abstract
Bioactivities of fungal peptides are of interest for basic research and therapeutic drug development. Some of these peptides are derived from "KEX2-processed repeat proteins" (KEPs), a recently defined class of precursor proteins that contain multiple peptide cores flanked by KEX2 protease cleavage sites. Genome mining has revealed that KEPs are widespread in the fungal kingdom. Their functions are largely unknown. Here, we present the first in-depth structural and functional analysis of KEPs in a basidiomycete. We bioinformatically identified KEP-encoding genes in the genome of the model agaricomycete Coprinopsis cinerea and established a detection protocol for the derived peptides by overexpressing the C. cinerea KEPs in the yeast Pichia pastoris. Using this protocol, which includes peptide extraction and mass spectrometry with data analysis using the search engine Mascot, we confirmed the presence of several KEP-derived peptides in C. cinerea, as well as in the edible mushrooms Lentinula edodes, Pleurotus ostreatus, and Pleurotus eryngii. While CRISPR-mediated knockout of C. cinerea kep genes did not result in any detectable phenotype, knockout of kex genes caused defects in mycelial growth and fruiting body formation. These results suggest that KEP-derived peptides may play a role in the interaction of C. cinerea with the biotic environment and that the KEP-processing KEX proteases target a variety of substrates in agaricomycetes, including some important for mycelial growth and differentiation. IMPORTANCE Two recent bioinformatics studies have demonstrated that KEX2-processed repeat proteins are widespread in the fungal kingdom. However, despite the prevalence of KEPs in fungal genomes, only few KEP-derived peptides have been detected and studied so far. Here, we present a protocol for the extraction and structural characterization of KEP-derived peptides from fungal culture supernatants and tissues. The protocol was successfully used to detect several linear and minimally modified KEP-derived peptides in the agaricomycetes C. cinerea, L. edodes, P. ostreatus, and P. eryngii. Our study establishes a new protocol for the targeted search of KEP-derived peptides in fungi, which will hopefully lead to the discovery of more of these interesting fungal peptides and allow a further characterization of KEPs.
Collapse
Affiliation(s)
- Eva Vogt
- ETH Zürich, Department of Biology, Institute of Microbiology, Zürich, Switzerland
| | - Lukas Sonderegger
- ETH Zürich, Department of Biology, Institute of Microbiology, Zürich, Switzerland
| | - Ying-Yu Chen
- ETH Zürich, Department of Biology, Institute of Microbiology, Zürich, Switzerland
| | - Tina Segessemann
- ETH Zürich, Department of Biology, Institute of Microbiology, Zürich, Switzerland
| | - Markus Künzler
- ETH Zürich, Department of Biology, Institute of Microbiology, Zürich, Switzerland
| |
Collapse
|
10
|
Anapindi KDB, Romanova EV, Checco JW, Sweedler JV. Mass Spectrometry Approaches Empowering Neuropeptide Discovery and Therapeutics. Pharmacol Rev 2022; 74:662-679. [PMID: 35710134 PMCID: PMC9553102 DOI: 10.1124/pharmrev.121.000423] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
The discovery of insulin in the early 1900s ushered in the era of research related to peptides acting as hormones and neuromodulators, among other regulatory roles. These essential gene products are found in all organisms, from the most primitive to the most evolved, and carry important biologic information that coordinates complex physiology and behavior; their misregulation has been implicated in a variety of diseases. The evolutionary origins of at least 30 neuropeptide signaling systems have been traced to the common ancestor of protostomes and deuterostomes. With the use of relevant animal models and modern technologies, we can gain mechanistic insight into orthologous and paralogous endogenous peptides and translate that knowledge into medically relevant insights and new treatments. Groundbreaking advances in medicine and basic science influence how signaling peptides are defined today. The precise mechanistic pathways for over 100 endogenous peptides in mammals are now known and have laid the foundation for multiple drug development pipelines. Peptide biologics have become valuable drugs due to their unique specificity and biologic activity, lack of toxic metabolites, and minimal undesirable interactions. This review outlines modern technologies that enable neuropeptide discovery and characterization, and highlights lessons from nature made possible by neuropeptide research in relevant animal models that is being adopted by the pharmaceutical industry. We conclude with a brief overview of approaches/strategies for effective development of peptides as drugs. SIGNIFICANCE STATEMENT: Neuropeptides, an important class of cell-cell signaling molecules, are involved in maintaining a range of physiological functions. Since the discovery of insulin's activity, over 100 bioactive peptides and peptide analogs have been used as therapeutics. Because these are complex molecules not easily predicted from a genome and their activity can change with subtle chemical modifications, mass spectrometry (MS) has significantly empowered peptide discovery and characterization. This review highlights contributions of MS-based research towards the development of therapeutic peptides.
Collapse
Affiliation(s)
- Krishna D B Anapindi
- Department of Chemistry and the Beckman Institute for Advanced Science and Technology, University of Illinois Urbana-Champaign, Urbana, Illinois (K.D.B.A., E.V.R., J.V.S.) and Department of Chemistry, University of Nebraska-Lincoln, Lincoln, Nebraska (J.W.C.)
| | - Elena V Romanova
- Department of Chemistry and the Beckman Institute for Advanced Science and Technology, University of Illinois Urbana-Champaign, Urbana, Illinois (K.D.B.A., E.V.R., J.V.S.) and Department of Chemistry, University of Nebraska-Lincoln, Lincoln, Nebraska (J.W.C.)
| | - James W Checco
- Department of Chemistry and the Beckman Institute for Advanced Science and Technology, University of Illinois Urbana-Champaign, Urbana, Illinois (K.D.B.A., E.V.R., J.V.S.) and Department of Chemistry, University of Nebraska-Lincoln, Lincoln, Nebraska (J.W.C.)
| | - Jonathan V Sweedler
- Department of Chemistry and the Beckman Institute for Advanced Science and Technology, University of Illinois Urbana-Champaign, Urbana, Illinois (K.D.B.A., E.V.R., J.V.S.) and Department of Chemistry, University of Nebraska-Lincoln, Lincoln, Nebraska (J.W.C.)
| |
Collapse
|
11
|
Zhou Y, Xie S, Yang Y, Jiang L, Liu S, Li W, Abagna HB, Ning L, Huang J. SSH2.0: A Better Tool for Predicting the Hydrophobic Interaction Risk of Monoclonal Antibody. Front Genet 2022; 13:842127. [PMID: 35368659 PMCID: PMC8965096 DOI: 10.3389/fgene.2022.842127] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Accepted: 01/31/2022] [Indexed: 01/11/2023] Open
Abstract
Therapeutic antibodies play a crucial role in the treatment of various diseases. However, the success rate of antibody drug development is low partially because of unfavourable biophysical properties of antibody drug candidates such as the high aggregation tendency, which is mainly driven by hydrophobic interactions of antibody molecules. Therefore, early screening of the risk of hydrophobic interaction of antibody drug candidates is crucial. Experimental screening is laborious, time-consuming, and costly, warranting the development of efficient and high-throughput computational tools for prediction of hydrophobic interactions of therapeutic antibodies. In the present study, 131 antibodies with hydrophobic interaction experiment data were used to train a new support vector machine-based ensemble model, termed SSH2.0, to predict the hydrophobic interactions of antibodies. Feature selection was performed against CKSAAGP by using the graph-based algorithm MRMD2.0. Based on the antibody sequence, SSH2.0 achieved the sensitivity and accuracy of 100.00 and 83.97%, respectively. This approach eliminates the need of three-dimensional structure of antibodies and enables rapid screening of therapeutic antibody candidates in the early developmental stage, thereby saving time and cost. In addition, a web server was constructed that is freely available at http://i.uestc.edu.cn/SSH2/.
Collapse
Affiliation(s)
- Yuwei Zhou
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Shiyang Xie
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Yue Yang
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Lixu Jiang
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Siqi Liu
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Wei Li
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hamza Bukari Abagna
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Lin Ning
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu, China
| | - Jian Huang
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
12
|
Jiang M, Zhao B, Luo S, Wang Q, Chu Y, Chen T, Mao X, Liu Y, Wang Y, Jiang X, Wei DQ, Xiong Y. NeuroPpred-Fuse: an interpretable stacking model for prediction of neuropeptides by fusing sequence information and feature selection methods. Brief Bioinform 2021; 22:6350884. [PMID: 34396388 DOI: 10.1093/bib/bbab310] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 07/01/2021] [Accepted: 07/18/2021] [Indexed: 12/13/2022] Open
Abstract
Neuropeptides acting as signaling molecules in the nervous system of various animals play crucial roles in a wide range of physiological functions and hormone regulation behaviors. Neuropeptides offer many opportunities for the discovery of new drugs and targets for the treatment of neurological diseases. In recent years, there have been several data-driven computational predictors of various types of bioactive peptides, but the relevant work about neuropeptides is little at present. In this work, we developed an interpretable stacking model, named NeuroPpred-Fuse, for the prediction of neuropeptides through fusing a variety of sequence-derived features and feature selection methods. Specifically, we used six types of sequence-derived features to encode the peptide sequences and then combined them. In the first layer, we ensembled three base classifiers and four feature selection algorithms, which select non-redundant important features complementarily. In the second layer, the output of the first layer was merged and fed into logistic regression (LR) classifier to train the model. Moreover, we analyzed the selected features and explained the feasibility of the selected features. Experimental results show that our model achieved 90.6% accuracy and 95.8% AUC on the independent test set, outperforming the state-of-the-art models. In addition, we exhibited the distribution of selected features by these tree models and compared the results on the training set to that on the test set. These results fully showed that our model has a certain generalization ability. Therefore, we expect that our model would provide important advances in the discovery of neuropeptides as new drugs for the treatment of neurological diseases.
Collapse
Affiliation(s)
- Mingming Jiang
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Bowen Zhao
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Shenggan Luo
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Qiankun Wang
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yanyi Chu
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Tianhang Chen
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Xueying Mao
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yatong Liu
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yanjing Wang
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Xue Jiang
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| |
Collapse
|
13
|
He B, Yang S, Long J, Chen X, Zhang Q, Gao H, Chen H, Huang J. TUPDB: Target-Unrelated Peptide Data Bank. Interdiscip Sci 2021; 13:426-432. [PMID: 33993461 DOI: 10.1007/s12539-021-00436-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 04/29/2021] [Accepted: 05/06/2021] [Indexed: 11/29/2022]
Abstract
The isolation of target-unrelated peptides (TUPs) through biopanning remains as a major problem of phage display selection experiments. These TUPs do not have any actual affinity toward targets of interest, which tend to be mistakenly identified as target-binding peptides. Therefore, an information portal for storing TUP data is urgently needed. Here, we present a TUP data bank (TUPDB), which is a comprehensive, manually curated database of approximately 73 experimentally verified TUPs and 1963 potential TUPs collected from TUPScan, the BDB database, and public research articles. The TUPScan tool has been integrated in TUPDB to facilitate TUP analysis. We believe that TUPDB can help identify and remove TUPs in future reports in the biopanning community. The database is of great importance to improving the quality of phage display-based epitope mapping and promoting the development of vaccines, diagnostics, and therapeutics. The TUPDB database is available at http://i.uestc.edu.cn/tupdb .
Collapse
Affiliation(s)
- Bifang He
- School of Medicine, Guizhou University, Guiyang, 550025, China. .,Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 611731, China.
| | - Shanshan Yang
- School of Medicine, Guizhou University, Guiyang, 550025, China
| | - Jinjin Long
- School of Medicine, Guizhou University, Guiyang, 550025, China
| | - Xue Chen
- School of Medicine, Guizhou University, Guiyang, 550025, China
| | - Qianyue Zhang
- School of Medicine, Guizhou University, Guiyang, 550025, China
| | - Hui Gao
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, China
| | - Heng Chen
- School of Medicine, Guizhou University, Guiyang, 550025, China.
| | - Jian Huang
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 611731, China.
| |
Collapse
|
14
|
Hasan MM, Alam MA, Shoombuatong W, Deng HW, Manavalan B, Kurata H. NeuroPred-FRL: an interpretable prediction model for identifying neuropeptide using feature representation learning. Brief Bioinform 2021; 22:6272801. [PMID: 33975333 DOI: 10.1093/bib/bbab167] [Citation(s) in RCA: 57] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2021] [Revised: 03/23/2021] [Accepted: 04/09/2021] [Indexed: 12/13/2022] Open
Abstract
Neuropeptides (NPs) are the most versatile neurotransmitters in the immune systems that regulate various central anxious hormones. An efficient and effective bioinformatics tool for rapid and accurate large-scale identification of NPs is critical in immunoinformatics, which is indispensable for basic research and drug development. Although a few NP prediction tools have been developed, it is mandatory to improve their NPs' prediction performances. In this study, we have developed a machine learning-based meta-predictor called NeuroPred-FRL by employing the feature representation learning approach. First, we generated 66 optimal baseline models by employing 11 different encodings, six different classifiers and a two-step feature selection approach. The predicted probability scores of NPs based on the 66 baseline models were combined to be deemed as the input feature vector. Second, in order to enhance the feature representation ability, we applied the two-step feature selection approach to optimize the 66-D probability feature vector and then inputted the optimal one into a random forest classifier for the final meta-model (NeuroPred-FRL) construction. Benchmarking experiments based on both cross-validation and independent tests indicate that the NeuroPred-FRL achieves a superior prediction performance of NPs compared with the other state-of-the-art predictors. We believe that the proposed NeuroPred-FRL can serve as a powerful tool for large-scale identification of NPs, facilitating the characterization of their functional mechanisms and expediting their applications in clinical therapy. Moreover, we interpreted some model mechanisms of NeuroPred-FRL by leveraging the robust SHapley Additive exPlanation algorithm.
Collapse
Affiliation(s)
- Md Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan.,Japan Society for the Promotion of Science, 5-3-1 Kojimachi, Chiyoda-ku, Tokyo 102-0083, Japan
| | - Md Ashad Alam
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA, 70112 USA
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| | - Hong-Wen Deng
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA, 70112 USA
| | | | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
| |
Collapse
|
15
|
Li Y, Zhang Z, Teng Z, Liu X. PredAmyl-MLP: Prediction of Amyloid Proteins Using Multilayer Perceptron. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2020; 2020:8845133. [PMID: 33294004 PMCID: PMC7700051 DOI: 10.1155/2020/8845133] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 10/06/2020] [Accepted: 10/31/2020] [Indexed: 01/20/2023]
Abstract
Amyloid is generally an aggregate of insoluble fibrin; its abnormal deposition is the pathogenic mechanism of various diseases, such as Alzheimer's disease and type II diabetes. Therefore, accurately identifying amyloid is necessary to understand its role in pathology. We proposed a machine learning-based prediction model called PredAmyl-MLP, which consists of the following three steps: feature extraction, feature selection, and classification. In the step of feature extraction, seven feature extraction algorithms and different combinations of them are investigated, and the combination of SVMProt-188D and tripeptide composition (TPC) is selected according to the experimental results. In the step of feature selection, maximum relevant maximum distance (MRMD) and binomial distribution (BD) are, respectively, used to remove the redundant or noise features, and the appropriate features are selected according to the experimental results. In the step of classification, we employed multilayer perceptron (MLP) to train the prediction model. The 10-fold cross-validation results show that the overall accuracy of PredAmyl-MLP reached 91.59%, and the performance was better than the existing methods.
Collapse
Affiliation(s)
- Yanjuan Li
- College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China
| | - Zitong Zhang
- College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China
| | - Zhixia Teng
- College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China
| | - Xiaoyan Liu
- College of Computer Science and Technology, Harbin Institute of Technology, Harbin 150040, China
| |
Collapse
|
16
|
Tang Q, Kang J, Yuan J, Tang H, Li X, Lin H, Huang J, Chen W. DNA4mC-LIP: a linear integration method to identify N4-methylcytosine site in multiple species. Bioinformatics 2020; 36:3327-3335. [PMID: 32108866 DOI: 10.1093/bioinformatics/btaa143] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2020] [Revised: 02/12/2020] [Accepted: 02/25/2020] [Indexed: 12/17/2022] Open
Abstract
MOTIVATION DNA N4-methylcytosine (4mC) is a crucial epigenetic modification. However, the knowledge about its biological functions is limited. Effective and accurate identification of 4mC sites will be helpful to reveal its biological functions and mechanisms. Since experimental methods are cost and ineffective, a number of machine learning-based approaches have been proposed to detect 4mC sites. Although these methods yielded acceptable accuracy, there is still room for the improvement of the prediction performance and the stability of existing methods in practical applications. RESULTS In this work, we first systematically assessed the existing methods based on an independent dataset. And then, we proposed DNA4mC-LIP, a linear integration method by combining existing predictors to identify 4mC sites in multiple species. The results obtained from independent dataset demonstrated that DNA4mC-LIP outperformed existing methods for identifying 4mC sites. To facilitate the scientific community, a web server for DNA4mC-LIP was developed. We anticipated that DNA4mC-LIP could serve as a powerful computational technique for identifying 4mC sites and facilitate the interpretation of 4mC mechanism. AVAILABILITY AND IMPLEMENTATION http://i.uestc.edu.cn/DNA4mC-LIP/. CONTACT hlin@uestc.edu.cn or hj@uestc.edu.cn or chenweiimu@gmail.com. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Qiang Tang
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China
| | - Juanjuan Kang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Jiaqing Yuan
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China
| | - Hua Tang
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China
| | - Xianhai Li
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Jian Huang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Wei Chen
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China.,Center for Genomics and Computational Biology, School of Life Sciences, North China University of Science and Technology, Tangshan 063000, China
| |
Collapse
|
17
|
Li Q, Xu L, Li Q, Zhang L. Identification and Classification of Enhancers Using Dimension Reduction Technique and Recurrent Neural Network. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2020; 2020:8852258. [PMID: 33133227 PMCID: PMC7591959 DOI: 10.1155/2020/8852258] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Revised: 09/16/2020] [Accepted: 09/30/2020] [Indexed: 12/21/2022]
Abstract
Enhancers are noncoding fragments in DNA sequences, which play an important role in gene transcription and translation. However, due to their high free scattering and positional variability, the identification and classification of enhancers have a higher level of complexity than those of coding genes. In order to solve this problem, many computer studies have been carried out in this field, but there are still some deficiencies in these prediction models. In this paper, we use various feature extraction strategies, dimension reduction technology, and a comprehensive application of machine model and recurrent neural network model to achieve an accurate prediction of enhancer identification and classification with the accuracy of was 76.7% and 84.9%, respectively. The model proposed in this paper is superior to the previous methods in performance index or feature dimension, which provides inspiration for the prediction of enhancers by computer technology in the future.
Collapse
Affiliation(s)
- Qingwen Li
- College of Animal Science and Technology, Northeast Agricultural University, Harbin, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
| | - Qingyuan Li
- Forestry and Fruit Tree Research Institute, Wuhan Academy of Agricultural Sciences, Wuhan, China
| | - Lichao Zhang
- School of Intelligent Manufacturing and Equipment, Shenzhen Institute of Information Technology, Shenzhen, China
| |
Collapse
|
18
|
Chen T, Wang X, Chu Y, Wang Y, Jiang M, Wei DQ, Xiong Y. T4SE-XGB: Interpretable Sequence-Based Prediction of Type IV Secreted Effectors Using eXtreme Gradient Boosting Algorithm. Front Microbiol 2020; 11:580382. [PMID: 33072049 PMCID: PMC7541839 DOI: 10.3389/fmicb.2020.580382] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2020] [Accepted: 08/21/2020] [Indexed: 12/19/2022] Open
Abstract
Type IV secreted effectors (T4SEs) can be translocated into the cytosol of host cells via type IV secretion system (T4SS) and cause diseases. However, experimental approaches to identify T4SEs are time- and resource-consuming, and the existing computational tools based on machine learning techniques have some obvious limitations such as the lack of interpretability in the prediction models. In this study, we proposed a new model, T4SE-XGB, which uses the eXtreme gradient boosting (XGBoost) algorithm for accurate identification of type IV effectors based on optimal features based on protein sequences. After trying 20 different types of features, the best performance was achieved when all features were fed into XGBoost by the 5-fold cross validation in comparison with other machine learning methods. Then, the ReliefF algorithm was adopted to get the optimal feature set on our dataset, which further improved the model performance. T4SE-XGB exhibited highest predictive performance on the independent test set and outperformed other published prediction tools. Furthermore, the SHAP method was used to interpret the contribution of features to model predictions. The identification of key features can contribute to improved understanding of multifactorial contributors to host-pathogen interactions and bacterial pathogenesis. In addition to type IV effector prediction, we believe that the proposed framework can provide instructive guidance for similar studies to construct prediction methods on related biological problems. The data and source code of this study can be freely accessed at https://github.com/CT001002/T4SE-XGB.
Collapse
Affiliation(s)
- Tianhang Chen
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Xiangeng Wang
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China.,Department of Biomedical Sciences, City University of Hong Kong, Hong Kong, China
| | - Yanyi Chu
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China.,Peng Cheng Laboratory, Shenzhen, China
| | - Yanjing Wang
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Mingming Jiang
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China.,Peng Cheng Laboratory, Shenzhen, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
19
|
Wang Y, Kang J, Li N, Zhou Y, Tang Z, He B, Huang J. NeuroCS: A Tool to Predict Cleavage Sites of Neuropeptide Precursors. Protein Pept Lett 2020; 27:337-345. [PMID: 31721688 DOI: 10.2174/0929866526666191112150636] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2019] [Revised: 07/16/2019] [Accepted: 09/24/2019] [Indexed: 11/22/2022]
Abstract
BACKGROUND Neuropeptides are a class of bioactive peptides produced from neuropeptide precursors through a series of extremely complex processes, mediating neuronal regulations in many aspects. Accurate identification of cleavage sites of neuropeptide precursors is of great significance for the development of neuroscience and brain science. OBJECTIVE With the explosive growth of neuropeptide precursor data, it is pretty much needed to develop bioinformatics methods for predicting neuropeptide precursors' cleavage sites quickly and efficiently. METHODS We started with processing the neuropeptide precursor data from SwissProt and NueoPedia into two sets of data, training dataset and testing dataset. Subsequently, six feature extraction schemes were applied to generate different feature sets and then feature selection methods were used to find the optimal feature subset of each. Thereafter the support vector machine was utilized to build models for different feature types. Finally, the performance of models were evaluated with the independent testing dataset. RESULTS Six models are built through support vector machine. Among them the enhanced amino acid composition-based model reaches the highest accuracy of 91.60% in the 5-fold cross validation. When evaluated with independent testing dataset, it also showed an excellent performance with a high accuracy of 90.37% and Area under Receiver Operating Characteristic curve up to 0.9576. CONCLUSION The performance of the developed model was decent. Moreover, for users' convenience, an online web server called NeuroCS is built, which is freely available at http://i.uestc.edu.cn/NeuroCS/dist/index.html#/. NeuroCS can be used to predict neuropeptide precursors' cleavage sites effectively.
Collapse
Affiliation(s)
- Ying Wang
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Juanjuan Kang
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Ning Li
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Yuwei Zhou
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Zhongjie Tang
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Bifang He
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China.,Medical College, Guizhou University, Guiyang, China
| | - Jian Huang
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
20
|
Tang Q, Nie F, Kang J, Chen W. ncPro-ML: An integrated computational tool for identifying non-coding RNA promoters in multiple species. Comput Struct Biotechnol J 2020; 18:2445-2452. [PMID: 33005306 PMCID: PMC7509369 DOI: 10.1016/j.csbj.2020.09.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Revised: 08/30/2020] [Accepted: 09/01/2020] [Indexed: 02/07/2023] Open
Abstract
A computational method for identifying non-coding promoters was proposed for the first time. A high-quality dataset was built to train and test the models for identifying non-coding promoters. A user-friendly web server was developed to recognize non-coding promoters.
The promoter is located near the transcription start sites and regulates transcription initiation of the gene. Accurate identification of promoters is essential for understanding the mechanism of gene regulation. Since experimental methods are costly and ineffective, developing efficient and accurate computational tools to identify promoters are necessary. Although a series of methods have been proposed for identifying promoters, none of them is able to identify the promoters of non-coding RNA (ncRNA). In the present work, a new method called ncPro-ML was proposed to identify the promoter of ncRNA in Homo sapiens and Mus musculus, in which different kinds of sequence encoding schemes were used to convert DNA sequences into feature vectors. To test the length effect, for each species, datasets including sequences with different lengths were built. The results demonstrated that ncPro-ML achieved the best performance based on the dataset with the sequence length of 221 nucleotides for human and mouse. The performances of ncPro-ML were also satisfying from both independent dataset test and cross-species test. The results indicate that the proposed predictor can server as a powerful tool for the discovery of ncRNA promoters. In addition, a web-server for ncPro-ML was developed, which can be freely accessed at http://www.bio-bigdata.cn/ncPro-ML/.
Collapse
Affiliation(s)
- Qiang Tang
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China
| | - Fulei Nie
- Center for Genomics and Computational Biology, Scholl of Life Sciences, North China University of Science and Technology, Tangshan 063210, China
- School of Public Health, North China University of Science and Technology, Tangshan 063210, China
| | - Juanjuan Kang
- Affiliated Foshan Maternity & Child Healthcare Hospital, Southern Medical University (Foshan Maternity & Child Healthcare Hospital), Foshan 528000, China
| | - Wei Chen
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China
- Center for Genomics and Computational Biology, Scholl of Life Sciences, North China University of Science and Technology, Tangshan 063210, China
- School of Public Health, North China University of Science and Technology, Tangshan 063210, China
- Corresponding author: Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China.
| |
Collapse
|
21
|
Bin Y, Zhang W, Tang W, Dai R, Li M, Zhu Q, Xia J. Prediction of Neuropeptides from Sequence Information Using Ensemble Classifier and Hybrid Features. J Proteome Res 2020; 19:3732-3740. [DOI: 10.1021/acs.jproteome.0c00276] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Affiliation(s)
- Yannan Bin
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, Anhui 230601, China
- School of Computer Science and Technology, Anhui University, Hefei, Anhui 230601, China
| | - Wei Zhang
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, Anhui 230601, China
| | - Wending Tang
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, Anhui 230601, China
| | - Ruyu Dai
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, Anhui 230601, China
| | - Menglu Li
- School of Computer Science and Technology, Anhui University, Hefei, Anhui 230601, China
| | - Qizhi Zhu
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, Anhui 230601, China
| | - Junfeng Xia
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, Anhui 230601, China
- School of Computer Science and Technology, Anhui University, Hefei, Anhui 230601, China
| |
Collapse
|
22
|
SSH: A Tool for Predicting Hydrophobic Interaction of Monoclonal Antibodies Using Sequences. BIOMED RESEARCH INTERNATIONAL 2020; 2020:3508107. [PMID: 32596302 PMCID: PMC7288208 DOI: 10.1155/2020/3508107] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/29/2020] [Revised: 04/28/2020] [Accepted: 05/13/2020] [Indexed: 12/31/2022]
Abstract
Therapeutic antibodies are one of the most important parts of the pharmaceutical industry. They are widely used in treating various diseases such as autoimmune diseases, cancer, inflammation, and infectious diseases. Their development process however is often brought to a standstill or takes a longer time and is then more expensive due to their hydrophobicity problems. Hydrophobic interactions can cause problems on half-life, drug administration, and immunogenicity at all stages of antibody drug development. Some of the most widely accepted and used technologies for determining the hydrophobic interactions of antibodies include standup monolayer adsorption chromatography (SMAC), salt-gradient affinity-capture self-interaction nanoparticle spectroscopy (SGAC-SINS), and hydrophobic interaction chromatography (HIC). However, to measure SMAC, SGAC-SINS, and HIC for hundreds of antibody drug candidates is time-consuming and costly. To save time and money, a predictor called SSH is developed. Based on the antibody's sequence only, it can predict the hydrophobic interactions of monoclonal antibodies (mAbs). Using the leave-one-out crossvalidation, SSH achieved 91.226% accuracy, 96.396% sensitivity or recall, 84.196% specificity, 87.754% precision, 0.828 Mathew correlation coefficient (MCC), 0.919 f-score, and 0.961 area under the receiver operating characteristic (ROC) curve (AUC).
Collapse
|
23
|
Kang J, Yu S, Lu S, Xu G, Zhu J, Yan N, Luo D, Xu K, Zhang Z, Huang J. Use of a 6-miRNA panel to distinguish lymphoma from reactive lymphoid hyperplasia. Signal Transduct Target Ther 2020; 5:2. [PMID: 32296019 PMCID: PMC6946694 DOI: 10.1038/s41392-019-0097-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2019] [Revised: 11/12/2019] [Accepted: 11/15/2019] [Indexed: 12/16/2022] Open
Affiliation(s)
- Juanjuan Kang
- Center for Informational Biology, University of Electronic Science and Technology of China, 611731, Chengdu, China
| | - Sisi Yu
- Sichuan Cancer Hospital & Institute, Sichuan Cancer Center, School of Medicine, University of Electronic Science and Technology of China, 610041, Chengdu, China
| | - Song Lu
- Sichuan Cancer Hospital & Institute, Sichuan Cancer Center, School of Medicine, University of Electronic Science and Technology of China, 610041, Chengdu, China
| | - Guohui Xu
- Sichuan Cancer Hospital & Institute, Sichuan Cancer Center, School of Medicine, University of Electronic Science and Technology of China, 610041, Chengdu, China
| | - Jiang Zhu
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, 611137, Chengdu, China
- Research Center, Chengdu Nuoen Genomics, Ltd., 610041, Chengdu, China
| | - Na Yan
- Research Center, Chengdu Nuoen Genomics, Ltd., 610041, Chengdu, China
| | - Delun Luo
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, 611137, Chengdu, China
- Research Center, Chengdu Nuoen Genomics, Ltd., 610041, Chengdu, China
| | - Kai Xu
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, 611137, Chengdu, China
- Research Center, Chengdu Nuoen Genomics, Ltd., 610041, Chengdu, China
| | - Zhihui Zhang
- Sichuan Cancer Hospital & Institute, Sichuan Cancer Center, School of Medicine, University of Electronic Science and Technology of China, 610041, Chengdu, China.
| | - Jian Huang
- Center for Informational Biology, University of Electronic Science and Technology of China, 611731, Chengdu, China.
| |
Collapse
|
24
|
Malebary SJ, Rehman MSU, Khan YD. iCrotoK-PseAAC: Identify lysine crotonylation sites by blending position relative statistical features according to the Chou's 5-step rule. PLoS One 2019; 14:e0223993. [PMID: 31751380 PMCID: PMC6874067 DOI: 10.1371/journal.pone.0223993] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2019] [Accepted: 10/02/2019] [Indexed: 01/22/2023] Open
Abstract
Among different post-translational modifications (PTMs), one of the most important one is the lysine crotonylation in proteins. Its importance cannot be undermined related to different diseases and essential biological practice. The key step for finding the hidden mechanisms of crotonylation along with their occurrence sites is to completely apprehend the mechanism behind this biological process. In previously reported studies, researchers have used different techniques, like position weighted matrix (PWM), support vector machine (SVM), k nearest neighbors (KNN), and many others. However, the maximum prediction accuracy achieved was not such high. To address this, herein, we propose an improved predictor for lysine crotonylation sites named iCrotoK-PseAAC, in which we have incorporated various position and composition relative features along with statistical moments into PseAAC. The results of self-consistency testing were 100% accurate, while the 10-fold cross validation gave 99.0% accuracy. Based on the validation and comparison of model, it is concluded that the iCrotoK-PseAAC is more accurate than the previously proposed models.
Collapse
Affiliation(s)
- Sharaf Jameel Malebary
- Department of Information Technology, King Abdul Aziz University, Rabigh, Kingdom of Saudi Arabia
| | - Muhammad Safi ur Rehman
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| | - Yaser Daanial Khan
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| |
Collapse
|
25
|
Jiang L, Yu M, Zhou Y, Tang Z, Li N, Kang J, He B, Huang J. AGONOTES: A Robot Annotator for Argonaute Proteins. Interdiscip Sci 2019; 12:109-116. [PMID: 31741225 DOI: 10.1007/s12539-019-00349-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2019] [Revised: 10/06/2019] [Accepted: 10/30/2019] [Indexed: 12/01/2022]
Abstract
The argonaute protein (Ago) exists in almost all organisms. In eukaryotes, it functions as a regulatory system for gene expression. In prokaryotes, it is a type of defense system against foreign invasive genomes. The Ago system has been engineered for gene silencing and genome editing and plays an important role in biological studies. With an increasing number of genomes and proteomes of various microbes becoming available, computational tools for identifying and annotating argonaute proteins are urgently needed. We introduce AGONOTES (Argonaute Notes). It is a web service especially designed for identifying and annotating Ago. AGONOTES uses the BLASTP similarity search algorithm to categorize all submitted proteins into three groups: prokaryotic argonaute protein (pAgo), eukaryotic argonaute protein (eAgo), and non-argonaute protein (non-Ago). Argonaute proteins can then be aligned to the corresponding standard set of Ago sequences using the multiple sequence alignment program MUSCLE. All functional domains of Ago can further be curated from the alignment results and visualized easily through Bio::Graphic modules in the BioPerl bundle. Compared with existing tools such as CD-Search and available databases such as UniProt and AGONOTES showed a much better performance on domain annotations, which is fundamental in studying the new Ago. AGONOTES can be freely accessed at http://i.uestc.edu.cn/agonotes/. AGONOTES is a friendly tool for annotating Ago domains from a proteome or a series of protein sequences.
Collapse
Affiliation(s)
- Lixu Jiang
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 637111, China
| | - Min Yu
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 637111, China
| | - Yuwei Zhou
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 637111, China
| | - Zhongjie Tang
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 637111, China
| | - Ning Li
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 637111, China
| | - Juanjuan Kang
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 637111, China
| | - Bifang He
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 637111, China.,School of Medicine, Guizhou University, Guiyang, China
| | - Jian Huang
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 637111, China.
| |
Collapse
|
26
|
He B, Chen H, Huang J. PhD7Faster 2.0: predicting clones propagating faster from the Ph.D.-7 phage display library by coupling PseAAC and tripeptide composition. PeerJ 2019; 7:e7131. [PMID: 31245183 PMCID: PMC6585900 DOI: 10.7717/peerj.7131] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2018] [Accepted: 05/15/2019] [Indexed: 01/08/2023] Open
Abstract
Selection from phage display libraries empowers isolation of high-affinity ligands for various targets. However, this method also identifies propagation-related target-unrelated peptides (PrTUPs). These false positive hits appear because of their amplification advantages. In this report, we present PhD7Faster 2.0 for predicting fast-propagating clones from the Ph.D.-7 phage display library, which was developed based on the support vector machine. Feature selection was performed against PseAAC and tripeptide composition using the incremental feature selection method. Ten-fold cross-validation results show that PhD7Faster 2.0 succeeds a decent performance with the accuracy of 81.84%, the Matthews correlation coefficient of 0.64 and the area under the ROC curve of 0.90. The permutation test with 1,000 shuffles resulted in p < 0.001. We implemented PhD7Faster 2.0 into a publicly accessible web tool (http://i.uestc.edu.cn/sarotup3/cgi-bin/PhD7Faster.pl) and constructed standalone graphical user interface and command-line versions for different systems. The standalone PhD7Faster 2.0 is able to detect PrTUPs within small datasets as well as large-scale datasets. This makes PhD7Faster 2.0 an enhanced and powerful tool for scanning and reporting faster-growing clones from the Ph.D.-7 phage display library.
Collapse
Affiliation(s)
- Bifang He
- School of Medicine, Guizhou University, Guiyang, Guizhou, China.,Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| | - Heng Chen
- School of Medicine, Guizhou University, Guiyang, Guizhou, China
| | - Jian Huang
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| |
Collapse
|
27
|
Dzisoo AM, He B, Karikari R, Agoalikum E, Huang J. CISI: A Tool for Predicting Cross-interaction or Self-interaction of Monoclonal Antibodies Using Sequences. Interdiscip Sci 2019; 11:691-697. [PMID: 31119495 DOI: 10.1007/s12539-019-00330-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2018] [Revised: 04/17/2019] [Accepted: 04/26/2019] [Indexed: 01/22/2023]
Abstract
Monoclonal antibodies (mAbs) are one of the robust classes of therapeutic proteins. Their stability, specificity, and high solubility allow the successful development and commercialization of antibody-based drugs. Though with these characteristics, mAbs projects are often suspended due to self- or cross-interaction of monoclonal antibodies. This is one of the main reasons which causes the development of mAbs into drugs taking forever and expensive. CISI is short for cross-interaction or self-interaction of mAbs. It can be quantified by several assays. The assays such as poly-specificity reagent and cross-interaction chromatography can measure cross-interaction of mAbs. Self-interaction can be assayed through clone self-interaction by biolayer interferometry and affinity-capture self-interaction nanoparticle spectroscopy. To save time and money, we developed a model called CISI which can predict cross-interaction or self-interaction based on tripeptide composition. It showed 88.20% accuracy, 90.22% sensitivity, 86.05% specificity, 0.78 Mathew correlation coefficient, and 0.96 area under the receiver operating characteristic (ROC) curve (AUC) in the leave-one-out cross-validation. CISI is freely available at http://i.uestc.edu.cn/eli/cgi-bin/cisi.pl.
Collapse
Affiliation(s)
- Anthony Mackitz Dzisoo
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 611731, China
| | - Bifang He
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 611731, China.,School of Medicine, Guizhou University, Guiyang, 550025, China
| | - Rita Karikari
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 611731, China
| | - Elijah Agoalikum
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 611731, China
| | - Jian Huang
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 611731, China.
| |
Collapse
|
28
|
NeuroPIpred: a tool to predict, design and scan insect neuropeptides. Sci Rep 2019; 9:5129. [PMID: 30914676 PMCID: PMC6435694 DOI: 10.1038/s41598-019-41538-x] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2018] [Accepted: 03/05/2019] [Indexed: 12/15/2022] Open
Abstract
Insect neuropeptides and their associated receptors have been one of the potential targets for the pest control. The present study describes in silico models developed using natural and modified insect neuropeptides for predicting and designing new neuropeptides. Amino acid composition analysis revealed the preference of residues C, D, E, F, G, N, S, and Y in insect neuropeptides The positional residue preference analysis show that in natural neuropeptides residues like A, N, F, D, P, S, and I are preferred at N terminus and residues like L, R, P, F, N, and G are preferred at C terminus. Prediction models were developed using input features like amino acid and dipeptide composition, binary profiles and implementing different machine learning techniques. Dipeptide composition based SVM model performed best among all the models. In case of NeuroPIpred_DS1, model achieved an accuracy of 86.50% accuracy and 0.73 MCC on training dataset and 83.71% accuracy and 0.67 MCC on validation dataset whereas in case of NeuroPIpred_DS2, model achieved 97.47% accuracy and 0.95 MCC on training dataset and 97.93% accuracy and 0.96 MCC on validation dataset. In order to assist researchers, we created standalone and user friendly web server NeuroPIpred, available at (https://webs.iiitd.edu.in/raghava/neuropipred.)
Collapse
|
29
|
Tan JX, Dao FY, Lv H, Feng PM, Ding H. Identifying Phage Virion Proteins by Using Two-Step Feature Selection Methods. Molecules 2018; 23:molecules23082000. [PMID: 30103458 PMCID: PMC6222849 DOI: 10.3390/molecules23082000] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2018] [Revised: 07/30/2018] [Accepted: 08/08/2018] [Indexed: 12/31/2022] Open
Abstract
Accurate identification of phage virion protein is not only a key step for understanding the function of the phage virion protein but also helpful for further understanding the lysis mechanism of the bacterial cell. Since traditional experimental methods are time-consuming and costly for identifying phage virion proteins, it is extremely urgent to apply machine learning methods to accurately and efficiently identify phage virion proteins. In this work, a support vector machine (SVM) based method was proposed by mixing multiple sets of optimal g-gap dipeptide compositions. The analysis of variance (ANOVA) and the minimal-redundancy-maximal-relevance (mRMR) with an increment feature selection (IFS) were applied to single out the optimal feature set. In the five-fold cross-validation test, the proposed method achieved an overall accuracy of 87.95%. We believe that the proposed method will become an efficient and powerful method for scientists concerning phage virion proteins.
Collapse
Affiliation(s)
- Jiu-Xin Tan
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | - Fu-Ying Dao
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | - Hao Lv
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | - Peng-Mian Feng
- Hebei Province Key Laboratory of Occupational Health and Safety for Coal Industry, School of Public Health, North China University of Science and Technology, Tangshan 063000, China.
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| |
Collapse
|
30
|
Lin H, Peng S, Huang J. Special issue on Computational Resources and Methods in Biological Sciences. Int J Biol Sci 2018; 14:807-810. [PMID: 29989106 PMCID: PMC6036761 DOI: 10.7150/ijbs.27554] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2018] [Accepted: 06/03/2018] [Indexed: 12/11/2022] Open
Abstract
This special issue covers a wide range of topics in computational biology, such as database construction, sequence analysis and function prediction with machine learning methods, disease-related diagnosis, drug-target and drug discovery, and electronic health record system construction.
Collapse
Affiliation(s)
- Hao Lin
- Center for Informational Biology, University of Electronic Science and Technology of China (UESTC), Chengdu 611731, China.,School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC), Chengdu 610054, China
| | - Shaoliang Peng
- School of Computer Science, National University of Defense Technology, Changsha 410073, China
| | - Jian Huang
- Center for Informational Biology, University of Electronic Science and Technology of China (UESTC), Chengdu 611731, China.,School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC), Chengdu 610054, China
| |
Collapse
|