1
|
Li R, Yu J, Ye D, Liu S, Zhang H, Lin H, Feng J, Deng K. Conotoxins: Classification, Prediction, and Future Directions in Bioinformatics. Toxins (Basel) 2025; 17:78. [PMID: 39998095 PMCID: PMC11860864 DOI: 10.3390/toxins17020078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2024] [Revised: 01/25/2025] [Accepted: 02/07/2025] [Indexed: 02/26/2025] Open
Abstract
Conotoxins, a diverse family of disulfide-rich peptides derived from the venom of Conus species, have gained prominence in biomedical research due to their highly specific interactions with ion channels, receptors, and neurotransmitter systems. Their pharmacological properties make them valuable molecular tools and promising candidates for therapeutic development. However, traditional conotoxin classification and functional characterization remain labor-intensive, necessitating the increasing adoption of computational approaches. In particular, machine learning (ML) techniques have facilitated advancements in sequence-based classification, functional prediction, and de novo peptide design. This review explores recent progress in applying ML and deep learning (DL) to conotoxin research, comparing key databases, feature extraction techniques, and classification models. Additionally, we discuss future research directions, emphasizing the integration of multimodal data and the refinement of predictive frameworks to enhance therapeutic discovery.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Kejun Deng
- The Clinical Hospital of Chengdu Brain Science Institute, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China; (R.L.); (J.Y.); (D.Y.); (S.L.); (H.Z.); (H.L.); (J.F.)
| |
Collapse
|
2
|
Li R, Hasan MM, Wang D. In Silico Conotoxin Studies: Progress and Prospects. Molecules 2024; 29:6061. [PMID: 39770149 PMCID: PMC11677113 DOI: 10.3390/molecules29246061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2024] [Revised: 12/14/2024] [Accepted: 12/18/2024] [Indexed: 01/11/2025] Open
Abstract
Cone snails of the genus Conus have evolved to produce structurally distinct and functionally diverse venom peptides for defensive and predatory purposes. This nature-devised delicacy enlightened drug discovery and for decades, the bioactive cone snail venom peptides, known as conotoxins, have been widely explored for their therapeutic potential, yet we know very little about them. With the augmentation of computational algorithms from the realms of bioinformatics and machine learning, in silico strategies have made substantial contributions to facilitate conotoxin studies although still with certain limitations. In this review, we made a bibliometric analysis of in silico conotoxin studies from 2004 to 2024 and then discussed in silico strategies to not only efficiently classify conotoxin superfamilies but also speed up drug discovery from conotoxins, reveal binding modes of known conotoxin-ion channel interactions at a microscopic level and relate the mechanisms of ion channel modulation to its underlying molecular structure. We summarized the current progress of studies in this field and gave an outlook on prospects.
Collapse
Affiliation(s)
- Ruihan Li
- Department of Chinese Medicine and Pharmacy, School of Pharmacy, Jiangsu University, Zhenjiang 212013, China;
| | - Md. Mahadhi Hasan
- Division of Chemistry and Structural Biology, Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD 4072, Australia;
- Pharmacy Discipline, Life Science School, Khulna University, Khulna 9208, Bangladesh
| | - Dan Wang
- Department of Chinese Medicine and Pharmacy, School of Pharmacy, Jiangsu University, Zhenjiang 212013, China;
| |
Collapse
|
3
|
Gillani M, Pollastri G. Protein subcellular localization prediction tools. Comput Struct Biotechnol J 2024; 23:1796-1807. [PMID: 38707539 PMCID: PMC11066471 DOI: 10.1016/j.csbj.2024.04.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 04/11/2024] [Accepted: 04/11/2024] [Indexed: 05/07/2024] Open
Abstract
Protein subcellular localization prediction is of great significance in bioinformatics and biological research. Most of the proteins do not have experimentally determined localization information, computational prediction methods and tools have been acting as an active research area for more than two decades now. Knowledge of the subcellular location of a protein provides valuable information about its functionalities, the functioning of the cell, and other possible interactions with proteins. Fast, reliable, and accurate predictors provides platforms to harness the abundance of sequence data to predict subcellular locations accordingly. During the last decade, there has been a considerable amount of research effort aimed at developing subcellular localization predictors. This paper reviews recent subcellular localization prediction tools in the Eukaryotic, Prokaryotic, and Virus-based categories followed by a detailed analysis. Each predictor is discussed based on its main features, strengths, weaknesses, algorithms used, prediction techniques, and analysis. This review is supported by prediction tools taxonomies that highlight their rele- vant area and examples for uncomplicated categorization and ease of understandability. These taxonomies help users find suitable tools according to their needs. Furthermore, recent research gaps and challenges are discussed to cover areas that need the utmost attention. This survey provides an in-depth analysis of the most recent prediction tools to facilitate readers and can be considered a quick guide for researchers to identify and explore the recent literature advancements.
Collapse
Affiliation(s)
- Maryam Gillani
- School of Computer Science, University College Dublin (UCD), Dublin, D04 V1W8, Ireland
| | - Gianluca Pollastri
- School of Computer Science, University College Dublin (UCD), Dublin, D04 V1W8, Ireland
| |
Collapse
|
4
|
Yao L, Guan J, Xie P, Chung C, Deng J, Huang Y, Chiang Y, Lee T. AMPActiPred: A three-stage framework for predicting antibacterial peptides and activity levels with deep forest. Protein Sci 2024; 33:e5006. [PMID: 38723168 PMCID: PMC11081525 DOI: 10.1002/pro.5006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2024] [Revised: 04/10/2024] [Accepted: 04/13/2024] [Indexed: 05/13/2024]
Abstract
The emergence and spread of antibiotic-resistant bacteria pose a significant public health threat, necessitating the exploration of alternative antibacterial strategies. Antibacterial peptide (ABP) is a kind of antimicrobial peptide (AMP) that has the potential ability to fight against bacteria infection, offering a promising avenue for developing novel therapeutic interventions. This study introduces AMPActiPred, a three-stage computational framework designed to identify ABPs, characterize their activity against diverse bacterial species, and predict their activity levels. AMPActiPred employed multiple effective peptide descriptors to effectively capture the compositional features and physicochemical properties of peptides. AMPActiPred utilized deep forest architecture, a cascading architecture similar to deep neural networks, capable of effectively processing and exploring original features to enhance predictive performance. In the first stage, AMPActiPred focuses on ABP identification, achieving an Accuracy of 87.6% and an MCC of 0.742 on an elaborate dataset, demonstrating state-of-the-art performance. In the second stage, AMPActiPred achieved an average GMean at 82.8% in identifying ABPs targeting 10 bacterial species, indicating AMPActiPred can achieve balanced predictions regarding the functional activity of ABP across this set of species. In the third stage, AMPActiPred demonstrates robust predictive capabilities for ABP activity levels with an average PCC of 0.722. Furthermore, AMPActiPred exhibits excellent interpretability, elucidating crucial features associated with antibacterial activity. AMPActiPred is the first computational framework capable of predicting targets and activity levels of ABPs. Finally, to facilitate the utilization of AMPActiPred, we have established a user-friendly web interface deployed at https://awi.cuhk.edu.cn/∼AMPActiPred/.
Collapse
Affiliation(s)
- Lantian Yao
- Kobilka Institute of Innovative Drug Discovery, School of MedicineThe Chinese University of Hong KongShenzhenChina
- School of Science and EngineeringThe Chinese University of Hong KongShenzhenChina
| | - Jiahui Guan
- Kobilka Institute of Innovative Drug Discovery, School of MedicineThe Chinese University of Hong KongShenzhenChina
- School of MedicineThe Chinese University of Hong KongShenzhenChina
| | - Peilin Xie
- Kobilka Institute of Innovative Drug Discovery, School of MedicineThe Chinese University of Hong KongShenzhenChina
| | - Chia‐Ru Chung
- Department of Computer Science and Information EngineeringNational Central UniversityTaoyuanTaiwan
| | - Junyang Deng
- School of MedicineThe Chinese University of Hong KongShenzhenChina
| | - Yixian Huang
- School of MedicineThe Chinese University of Hong KongShenzhenChina
| | - Ying‐Chih Chiang
- Kobilka Institute of Innovative Drug Discovery, School of MedicineThe Chinese University of Hong KongShenzhenChina
- School of MedicineThe Chinese University of Hong KongShenzhenChina
| | - Tzong‐Yi Lee
- Institute of Bioinformatics and Systems BiologyNational Yang Ming Chiao Tung UniversityHsinchuTaiwan
- Center for Intelligent Drug Systems and Smart Bio‐devices (IDS2B)National Yang Ming Chiao Tung UniversityHsinchuTaiwan
| |
Collapse
|
5
|
Yao L, Guan J, Li W, Chung CR, Deng J, Chiang YC, Lee TY. Identifying Antitubercular Peptides via Deep Forest Architecture with Effective Feature Representation. Anal Chem 2024; 96:1538-1546. [PMID: 38226973 DOI: 10.1021/acs.analchem.3c04196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2024]
Abstract
Tuberculosis (TB) is a severe disease caused by Mycobacterium tuberculosis that poses a significant threat to human health. The emergence of drug-resistant strains has made the global fight against TB even more challenging. Antituberculosis peptides (ATPs) have shown promising results as a potential treatment for TB. However, conventional wet lab-based approaches to ATP discovery are time-consuming and costly and often fail to discover peptides with desired properties. To address these challenges, we propose a novel machine learning-based framework called ATPfinder that can significantly accelerate the discovery of ATP. Our approach integrates various efficient peptide descriptors and utilizes the deep forest algorithm to construct the model. This neural network-like cascading structure can effectively process and mine features without complex hyperparameter tuning. Our experimental results show that ATPfinder outperforms existing ATP prediction tools, achieving state-of-the-art performance with an accuracy of 89.3% and an MCC of 0.70. Moreover, our framework exhibits better robustness than baseline algorithms commonly used for other sequence analysis tasks. Additionally, the excellent interpretability of our model can assist researchers in understanding the critical features of ATP. Finally, we developed a downloadable desktop application to simplify the use of our framework for researchers. Therefore, ATPfinder can facilitate the discovery of peptide drugs and provide potential solutions for TB treatment. Our framework is freely available at https://github.com/lantianyao/ATPfinder/ (data sets and code) and https://awi.cuhk.edu.cn/dbAMP/ATPfinder.html (software).
Collapse
Affiliation(s)
- Lantian Yao
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China
| | - Jiahui Guan
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China
| | - Wenshuo Li
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China
| | - Chia-Ru Chung
- Department of Computer Science and Information Engineering, National Central University, 320317 Taoyuan, Taiwan
| | - Junyang Deng
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China
| | - Ying-Chih Chiang
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China
| | - Tzong-Yi Lee
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, 300093 Hsinchu, Taiwan
- Center for Intelligent Drug Systems and Smart Bio-devices (IDS2B), National Yang Ming Chiao Tung University, 300093 Hsinchu, Taiwan
| |
Collapse
|
6
|
Lee YCJ, Cowan A, Tankard A. Peptide Toxins as Biothreats and the Potential for AI Systems to Enhance Biosecurity. Front Bioeng Biotechnol 2022; 10:860390. [PMID: 35356782 PMCID: PMC8959115 DOI: 10.3389/fbioe.2022.860390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2022] [Accepted: 02/17/2022] [Indexed: 11/16/2022] Open
Abstract
Biological weapons have been used for thousands of years, but recent advances in synthesis technologies have made peptide and protein toxin production more accessible and pose a threat to biosecurity worldwide. Natural toxins such as conotoxins, certain hemolytic compounds, and enterotoxins are peptide agents that can be synthesized in an environment with weak biosecurity measures and rudimentarily weaponized for limited use against smaller targets for lethal or nonlethal effects. Technological advances are changing the threat landscape around biological weapons and potentially facilitating a shift from state sponsored to more micro-level threats stemming from terror cells, insider threats, and lone wolf attacks. Here, we present the reader with an overview of the threat of peptide and protein toxins, provide examples of potent peptide toxins, and introduce capabilities of a proposed biosecurity program utilizing artificial intelligence that unifies commercial nucleotide and peptide synthesis vendors.
Collapse
|
7
|
Akmal MA, Hussain W, Rasool N, Khan YD, Khan SA, Chou KC. Using CHOU'S 5-Steps Rule to Predict O-Linked Serine Glycosylation Sites by Blending Position Relative Features and Statistical Moment. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2045-2056. [PMID: 31985438 DOI: 10.1109/tcbb.2020.2968441] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Glycosylation of proteins in eukaryote cells is an important and complicated post-translation modification due to its pivotal role and association with crucial physiological functions within most of the proteins. Identification of glycosylation sites in a polypeptide chain is not an easy task due to multiple impediments. Analytical identification of these sites is expensive and laborious. There is a dire need to develop a reliable computational method for precise determination of such sites which can help researchers to save time and effort. Herein, we propose a novel predictor namely iGlycoS-PseAAC by integrating the Chou's Pseudo Amino Acid Composition (PseAAC) and relative/absolute position-based features. The self-consistency results show that the accuracy revealed by the model using the benchmark dataset for prediction of O-linked glycosylation having serine sites is 98.8 percent. The overall accuracy of predictor achieved through 10-fold cross validation by combining the positive and negative results is 97.2 percent. The overall accuracy achieved through Jackknife test is 96.195 percent by aggregating of all the prediction results. Thus the proposed predictor can help in predicting the O-linked glycosylated serine sites in an efficient and accurate way. The overall results show that the accuracy of the iGlycoS-PseAAC is higher than the existing tools.
Collapse
|
8
|
Zhang ZM, Guan ZX, Wang F, Zhang D, Ding H. Application of Machine Learning Methods in Predicting Nuclear Receptors and their Families. Med Chem 2021; 16:594-604. [PMID: 31584374 DOI: 10.2174/1573406415666191004125551] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Revised: 06/18/2019] [Accepted: 08/23/2019] [Indexed: 11/22/2022]
Abstract
Nuclear receptors (NRs) are a superfamily of ligand-dependent transcription factors that are closely related to cell development, differentiation, reproduction, homeostasis, and metabolism. According to the alignments of the conserved domains, NRs are classified and assigned the following seven subfamilies or eight subfamilies: (1) NR1: thyroid hormone like (thyroid hormone, retinoic acid, RAR-related orphan receptor, peroxisome proliferator activated, vitamin D3- like), (2) NR2: HNF4-like (hepatocyte nuclear factor 4, retinoic acid X, tailless-like, COUP-TFlike, USP), (3) NR3: estrogen-like (estrogen, estrogen-related, glucocorticoid-like), (4) NR4: nerve growth factor IB-like (NGFI-B-like), (5) NR5: fushi tarazu-F1 like (fushi tarazu-F1 like), (6) NR6: germ cell nuclear factor like (germ cell nuclear factor), and (7) NR0: knirps like (knirps, knirpsrelated, embryonic gonad protein, ODR7, trithorax) and DAX like (DAX, SHP), or dividing NR0 into (7) NR7: knirps like and (8) NR8: DAX like. Different NRs families have different structural features and functions. Since the function of a NR is closely correlated with which subfamily it belongs to, it is highly desirable to identify NRs and their subfamilies rapidly and effectively. The knowledge acquired is essential for a proper understanding of normal and abnormal cellular mechanisms. With the advent of the post-genomics era, huge amounts of sequence-known proteins have increased explosively. Conventional methods for accurately classifying the family of NRs are experimental means with high cost and low efficiency. Therefore, it has created a greater need for bioinformatics tools to effectively recognize NRs and their subfamilies for the purpose of understanding their biological function. In this review, we summarized the application of machine learning methods in the prediction of NRs from different aspects. We hope that this review will provide a reference for further research on the classification of NRs and their families.
Collapse
Affiliation(s)
- Zi-Mei Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Zheng-Xing Guan
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Fang Wang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Dan Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
9
|
RAM-PGK: Prediction of Lysine Phosphoglycerylation Based on Residue Adjacency Matrix. Genes (Basel) 2020; 11:genes11121524. [PMID: 33419274 PMCID: PMC7766696 DOI: 10.3390/genes11121524] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 12/15/2020] [Accepted: 12/16/2020] [Indexed: 11/29/2022] Open
Abstract
Background: Post-translational modification (PTM) is a biological process that is associated with the modification of proteome, which results in the alteration of normal cell biology and pathogenesis. There have been numerous PTM reports in recent years, out of which, lysine phosphoglycerylation has emerged as one of the recent developments. The traditional methods of identifying phosphoglycerylated residues, which are experimental procedures such as mass spectrometry, have shown to be time-consuming and cost-inefficient, despite the abundance of proteins being sequenced in this post-genomic era. Due to these drawbacks, computational techniques are being sought to establish an effective identification system of phosphoglycerylated lysine residues. The development of a predictor for phosphoglycerylation prediction is not a first, but it is necessary as the latest predictor falls short in adequately detecting phosphoglycerylated and non-phosphoglycerylated lysine residues. Results: In this work, we introduce a new predictor named RAM-PGK, which uses sequence-based information relating to amino acid residues to predict phosphoglycerylated and non-phosphoglycerylated sites. A benchmark dataset was employed for this purpose, which contained experimentally identified phosphoglycerylated and non-phosphoglycerylated lysine residues. From the dataset, we extracted the residue adjacency matrix pertaining to each lysine residue in the protein sequences and converted them into feature vectors, which is used to build the phosphoglycerylation predictor. Conclusion: RAM-PGK, which is based on sequential features and support vector machine classifiers, has shown a noteworthy improvement in terms of performance in comparison to some of the recent prediction methods. The performance metrics of the RAM-PGK predictor are: 0.5741 sensitivity, 0.6436 specificity, 0.0531 precision, 0.6414 accuracy, and 0.0824 Mathews correlation coefficient.
Collapse
|
10
|
Zamani F, Olyaee MH, Khanteymoori A. NCMHap: a novel method for haplotype reconstruction based on Neutrosophic c-means clustering. BMC Bioinformatics 2020; 21:475. [PMID: 33092523 PMCID: PMC7579908 DOI: 10.1186/s12859-020-03775-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2020] [Accepted: 09/22/2020] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Single individual haplotype problem refers to reconstructing haplotypes of an individual based on several input fragments sequenced from a specified chromosome. Solving this problem is an important task in computational biology and has many applications in the pharmaceutical industry, clinical decision-making, and genetic diseases. It is known that solving the problem is NP-hard. Although several methods have been proposed to solve the problem, it is found that most of them have low performances in dealing with noisy input fragments. Therefore, proposing a method which is accurate and scalable, is a challenging task. RESULTS In this paper, we introduced a method, named NCMHap, which utilizes the Neutrosophic c-means (NCM) clustering algorithm. The NCM algorithm can effectively detect the noise and outliers in the input data. In addition, it can reduce their effects in the clustering process. The proposed method has been evaluated by several benchmark datasets. Comparing with existing methods indicates when NCM is tuned by suitable parameters, the results are encouraging. In particular, when the amount of noise increases, it outperforms the comparing methods. CONCLUSION The proposed method is validated using simulated and real datasets. The achieved results recommend the application of NCMHap on the datasets which involve the fragments with a huge amount of gaps and noise.
Collapse
Affiliation(s)
- Fatemeh Zamani
- Department of Computer Engineering, University of Zanjan, Zanjan, Iran
| | - Mohammad Hossein Olyaee
- Department of Computer Engineering, Faculty of Engineering, University of Gonabad, Gonabad, Iran
| | | |
Collapse
|
11
|
A bioinformatic study of antimicrobial peptides identified in the Black Soldier Fly (BSF) Hermetia illucens (Diptera: Stratiomyidae). Sci Rep 2020; 10:16875. [PMID: 33037295 PMCID: PMC7547115 DOI: 10.1038/s41598-020-74017-9] [Citation(s) in RCA: 90] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Accepted: 09/21/2020] [Indexed: 01/02/2023] Open
Abstract
Antimicrobial peptides (AMPs) play a key role in the innate immunity, the first line of defense against bacteria, fungi, and viruses. AMPs are small molecules, ranging from 10 to 100 amino acid residues produced by all living organisms. Because of their wide biodiversity, insects are among the richest and most innovative sources for AMPs. In particular, the insect Hermetia illucens (Diptera: Stratiomyidae) shows an extraordinary ability to live in hostile environments, as it feeds on decaying substrates, which are rich in microbial colonies, and is one of the most promising sources for AMPs. The larvae and the combined adult male and female H. illucens transcriptomes were examined, and all the sequences, putatively encoding AMPs, were analysed with different machine learning-algorithms, such as the Support Vector Machine, the Discriminant Analysis, the Artificial Neural Network, and the Random Forest available on the CAMP database, in order to predict their antimicrobial activity. Moreover, the iACP tool, the AVPpred, and the Antifp servers were used to predict the anticancer, the antiviral, and the antifungal activities, respectively. The related physicochemical properties were evaluated with the Antimicrobial Peptide Database Calculator and Predictor. These analyses allowed to identify 57 putatively active peptides suitable for subsequent experimental validation studies.
Collapse
|
12
|
Chen W, Nie F, Ding H. Recent Advances of Computational Methods for Identifying Bacteriophage Virion Proteins. Protein Pept Lett 2020; 27:259-264. [PMID: 30968770 DOI: 10.2174/0929866526666190410124642] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2019] [Revised: 03/07/2019] [Accepted: 04/01/2019] [Indexed: 01/09/2023]
Abstract
Phage Virion Proteins (PVP) are essential materials of bacteriophage, which participate in a series of biological processes. Accurate identification of phage virion proteins is helpful to understand the mechanism of interaction between the phage and its host bacteria. Since experimental method is labor intensive and time-consuming, in the past few years, many computational approaches have been proposed to identify phage virion proteins. In order to facilitate researchers to select appropriate methods, it is necessary to give a comprehensive review and comparison on existing computational methods on identifying phage virion proteins. In this review, we summarized the existing computational methods for identifying phage virion proteins and also assessed their performances on an independent dataset. Finally, challenges and future perspectives for identifying phage virion proteins were presented. Taken together, we hope that this review could provide clues to researches on the study of phage virion proteins.
Collapse
Affiliation(s)
- Wei Chen
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611730, China.,Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.,Center for Genomics and Computational Biology, School of Life Sciences, North China University of Science and Technology, Tangshan 063000, China
| | - Fulei Nie
- Center for Genomics and Computational Biology, School of Life Sciences, North China University of Science and Technology, Tangshan 063000, China
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
13
|
Xu L, Liang G, Chen B, Tan X, Xiang H, Liao C. A Computational Method for the Identification of Endolysins and Autolysins. Protein Pept Lett 2020; 27:329-336. [PMID: 31577192 DOI: 10.2174/0929866526666191002104735] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2019] [Revised: 06/27/2019] [Accepted: 09/03/2019] [Indexed: 12/21/2022]
Abstract
BACKGROUND Cell lytic enzyme is a kind of highly evolved protein, which can destroy the cell structure and kill the bacteria. Compared with antibiotics, cell lytic enzyme will not cause serious problem of drug resistance of pathogenic bacteria. Thus, the study of cell wall lytic enzymes aims at finding an efficient way for curing bacteria infectious. Compared with using antibiotics, the problem of drug resistance becomes more serious. Therefore, it is a good choice for curing bacterial infections by using cell lytic enzymes. Cell lytic enzyme includes endolysin and autolysin and the difference between them is the purpose of the break of cell wall. The identification of the type of cell lytic enzymes is meaningful for the study of cell wall enzymes. OBJECTIVE In this article, our motivation is to predict the type of cell lytic enzyme. Cell lytic enzyme is helpful for killing bacteria, so it is meaningful for study the type of cell lytic enzyme. However, it is time consuming to detect the type of cell lytic enzyme by experimental methods. Thus, an efficient computational method for the type of cell lytic enzyme prediction is proposed in our work. METHODS We propose a computational method for the prediction of endolysin and autolysin. First, a data set containing 27 endolysins and 41 autolysins is built. Then the protein is represented by tripeptides composition. The features are selected with larger confidence degree. At last, the classifier is trained by the labeled vectors based on support vector machine. The learned classifier is used to predict the type of cell lytic enzyme. RESULTS Following the proposed method, the experimental results show that the overall accuracy can attain 97.06%, when 44 features are selected. Compared with Ding's method, our method improves the overall accuracy by nearly 4.5% ((97.06-92.9)/92.9%). The performance of our proposed method is stable, when the selected feature number is from 40 to 70. The overall accuracy of tripeptides optimal feature set is 94.12%, and the overall accuracy of Chou's amphiphilic PseAAC method is 76.2%. The experimental results also demonstrate that the overall accuracy is improved by nearly 18% when using the tripeptides optimal feature set. CONCLUSION The paper proposed an efficient method for identifying endolysin and autolysin. In this paper, support vector machine is used to predict the type of cell lytic enzyme. The experimental results show that the overall accuracy of the proposed method is 94.12%, which is better than some existing methods. In conclusion, the selected 44 features can improve the overall accuracy for identification of the type of cell lytic enzyme. Support vector machine performs better than other classifiers when using the selected feature set on the benchmark data set.
Collapse
Affiliation(s)
- Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
| | - Guangmin Liang
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
| | - Baowen Chen
- School of Software, Shenzhen Institute of Information Technology, Shenzhen, China
| | - Xu Tan
- School of Software, Shenzhen Institute of Information Technology, Shenzhen, China
| | - Huaikun Xiang
- School of Automotive and Transportation Engineering, Shenzhen Polytechnic, Shenzhen, China
| | - Changrui Liao
- Key Laboratory of Optoelectronic Devices and Systems of Ministry of Education and Guangdong Province, College of Optoelectronic Engineering, Shenzhen University, Shenzhen, China
| |
Collapse
|
14
|
Feng P, Feng L. Recent Advances on Antioxidant Identification Based on Machine Learning Methods. Curr Drug Metab 2020; 21:804-809. [PMID: 32682368 DOI: 10.2174/1389200221666200719001449] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2020] [Revised: 03/17/2020] [Accepted: 05/13/2020] [Indexed: 11/22/2022]
Abstract
Antioxidants are molecules that can prevent damages to cells caused by free radicals. Recent studies also demonstrated that antioxidants play roles in preventing diseases. However, the number of known molecules with antioxidant activity is very small. Therefore, it is necessary to identify antioxidants from various resources. In the past several years, a series of computational methods have been proposed to identify antioxidants. In this review, we briefly summarized recent advances in computationally identifying antioxidants. The challenges and future perspectives for identifying antioxidants were also discussed. We hope this review will provide insights into researches on antioxidant identification.
Collapse
Affiliation(s)
- Pengmian Feng
- School of Basic Medical Sciences, Chengdu University of Traditional Chinese Medicine, Chengdu 611730, China
| | - Lijing Feng
- School of Sciences, North China University of Science and Technology, Tangshan 063000, China
| |
Collapse
|
15
|
Guan ZX, Li SH, Zhang ZM, Zhang D, Yang H, Ding H. A Brief Survey for MicroRNA Precursor Identification Using Machine Learning Methods. Curr Genomics 2020; 21:11-25. [PMID: 32655294 PMCID: PMC7324890 DOI: 10.2174/1389202921666200214125102] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Revised: 01/24/2020] [Accepted: 01/30/2020] [Indexed: 11/22/2022] Open
Abstract
MicroRNAs, a group of short non-coding RNA molecules, could regulate gene expression. Many diseases are associated with abnormal expression of miRNAs. Therefore, accurate identification of miRNA precursors is necessary. In the past 10 years, experimental methods, comparative genomics methods, and artificial intelligence methods have been used to identify pre-miRNAs. However, experimental methods and comparative genomics methods have their disadvantages, such as time-consuming. In contrast, machine learning-based method is a better choice. Therefore, the review summarizes the current advances in pre-miRNA recognition based on computational methods, including the construction of benchmark datasets, feature extraction methods, prediction algorithms, and the results of the models. And we also provide valid information about the predictors currently available. Finally, we give the future perspectives on the identification of pre-miRNAs. The review provides scholars with a whole background of pre-miRNA identification by using machine learning methods, which can help researchers have a clear understanding of progress of the research in this field.
Collapse
Affiliation(s)
- Zheng-Xing Guan
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Shi-Hao Li
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Zi-Mei Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Dan Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Hui Yang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| |
Collapse
|
16
|
Chou KC. An Insightful 10-year Recollection Since the Emergence of the 5-steps Rule. Curr Pharm Des 2020; 25:4223-4234. [PMID: 31782354 DOI: 10.2174/1381612825666191129164042] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Accepted: 11/25/2019] [Indexed: 11/22/2022]
Abstract
OBJECTIVE One of the most challenging and also the most difficult problems is how to formulate a biological sequence with a vector but considerably keep its sequence order information. METHODS To address such a problem, the approach of Pseudo Amino Acid Components or PseAAC has been developed. RESULTS AND CONCLUSION It has become increasingly clear via the 10-year recollection that the aforementioned proposal has been indeed very powerful.
Collapse
Affiliation(s)
- Kuo-Chen Chou
- Gordon Life Science Institute, Boston, Massachusetts 02478, United States.,Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
17
|
Zhang D, Guan ZX, Zhang ZM, Li SH, Dao FY, Tang H, Lin H. Recent Development of Computational Predicting Bioluminescent Proteins. Curr Pharm Des 2020; 25:4264-4273. [PMID: 31696804 DOI: 10.2174/1381612825666191107100758] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Accepted: 11/04/2019] [Indexed: 12/22/2022]
Abstract
Bioluminescent Proteins (BLPs) are widely distributed in many living organisms that act as a key role of light emission in bioluminescence. Bioluminescence serves various functions in finding food and protecting the organisms from predators. With the routine biotechnological application of bioluminescence, it is recognized to be essential for many medical, commercial and other general technological advances. Therefore, the prediction and characterization of BLPs are significant and can help to explore more secrets about bioluminescence and promote the development of application of bioluminescence. Since the experimental methods are money and time-consuming for BLPs identification, bioinformatics tools have played important role in fast and accurate prediction of BLPs by combining their sequences information with machine learning methods. In this review, we summarized and compared the application of machine learning methods in the prediction of BLPs from different aspects. We wish that this review will provide insights and inspirations for researches on BLPs.
Collapse
Affiliation(s)
- Dan Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Zheng-Xing Guan
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Zi-Mei Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Shi-Hao Li
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Fu-Ying Dao
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hua Tang
- Department of Pathophysiology, Southwest Medical University, Luzhou 646000, China
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
18
|
Zheng L, Huang S, Mu N, Zhang H, Zhang J, Chang Y, Yang L, Zuo Y. RAACBook: a web server of reduced amino acid alphabet for sequence-dependent inference by using Chou's five-step rule. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2019:5650975. [PMID: 31802128 PMCID: PMC6893003 DOI: 10.1093/database/baz131] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Revised: 10/16/2019] [Accepted: 10/17/2019] [Indexed: 12/12/2022]
Abstract
By reducing amino acid alphabet, the protein complexity can be significantly simplified, which could improve computational efficiency, decrease information redundancy and reduce chance of overfitting. Although some reduced alphabets have been proposed, different classification rules could produce distinctive results for protein sequence analysis. Thus, it is urgent to construct a systematical frame for reduced alphabets. In this work, we constructed a comprehensive web server called RAACBook for protein sequence analysis and machine learning application by integrating reduction alphabets. The web server contains three parts: (i) 74 types of reduced amino acid alphabet were manually extracted to generate 673 reduced amino acid clusters (RAACs) for dealing with unique protein problems. It is easy for users to select desired RAACs from a multilayer browser tool. (ii) An online tool was developed to analyze primary sequence of protein. The tool could produce K-tuple reduced amino acid composition by defining three correlation parameters (K-tuple, g-gap, λ-correlation). The results are visualized as sequence alignment, mergence of RAA composition, feature distribution and logo of reduced sequence. (iii) The machine learning server is provided to train the model of protein classification based on K-tuple RAAC. The optimal model could be selected according to the evaluation indexes (ROC, AUC, MCC, etc.). In conclusion, RAACBook presents a powerful and user-friendly service in protein sequence analysis and computational proteomics. RAACBook can be freely available at http://bioinfor.imu.edu.cn/raacbook. Database URL: http://bioinfor.imu.edu.cn/raacbook
Collapse
Affiliation(s)
- Lei Zheng
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Zhaojun Road No.24, Hohhot, 010070, China
| | - Shenghui Huang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Zhaojun Road No.24, Hohhot, 010070, China
| | - Nengjiang Mu
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Zhaojun Road No.24, Hohhot, 010070, China
| | - Haoyue Zhang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Zhaojun Road No.24, Hohhot, 010070, China
| | - Jiayu Zhang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Zhaojun Road No.24, Hohhot, 010070, China
| | - Yu Chang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Zhaojun Road No.24, Hohhot, 010070, China
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Baojian Road No.157, Harbin 150081, China
| | - Yongchun Zuo
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Zhaojun Road No.24, Hohhot, 010070, China
| |
Collapse
|
19
|
Liu T, Tang H. A Brief Survey of Machine Learning Methods in Identification of Mitochondria Proteins in Malaria Parasite. Curr Pharm Des 2020; 26:3049-3058. [PMID: 32156226 DOI: 10.2174/1381612826666200310122324] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2020] [Accepted: 02/10/2020] [Indexed: 11/22/2022]
Abstract
The number of human deaths caused by malaria is increasing day-by-day. In fact, the mitochondrial proteins of the malaria parasite play vital roles in the organism. For developing effective drugs and vaccines against infection, it is necessary to accurately identify mitochondrial proteins of the malaria parasite. Although precise details for the mitochondrial proteins can be provided by biochemical experiments, they are expensive and time-consuming. In this review, we summarized the machine learning-based methods for mitochondrial proteins identification in the malaria parasite and compared the construction strategies of these computational methods. Finally, we also discussed the future development of mitochondrial proteins recognition with algorithms.
Collapse
Affiliation(s)
- Ting Liu
- Department of Pathophysiology, Key Laboratory of Medical Electrophysiology, Ministry of Education, Southwest Medical University, Luzhou 646000, China
| | - Hua Tang
- Department of Pathophysiology, Key Laboratory of Medical Electrophysiology, Ministry of Education, Southwest Medical University, Luzhou 646000, China
| |
Collapse
|
20
|
Li H, Du H, Wang X, Gao P, Liu Y, Lin W. Remarks on Computational Method for Identifying Acid and Alkaline Enzymes. Curr Pharm Des 2020; 26:3105-3114. [PMID: 32552636 DOI: 10.2174/1381612826666200617170826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2020] [Accepted: 05/07/2020] [Indexed: 11/22/2022]
Abstract
The catalytic efficiency of the enzyme is thousands of times higher than that of ordinary catalysts. Thus, they are widely used in industrial and medical fields. However, enzymes with protein structure can be destroyed and inactivated in high temperature, over acid or over alkali environment. It is well known that most of enzymes work well in an environment with pH of 6-8, while some special enzymes remain active only in an alkaline environment with pH > 8 or an acidic environment with pH < 6. Therefore, the identification of acidic and alkaline enzymes has become a key task for industrial production. Because of the wide varieties of enzymes, it is hard work to determine the acidity and alkalinity of the enzyme by experimental methods, and even this task cannot be achieved. Converting protein sequences into digital features and building computational models can efficiently and accurately identify the acidity and alkalinity of enzymes. This review summarized the progress of the digital features to express proteins and computational methods to identify acidic and alkaline enzymes. We hope that this paper will provide more convenience, ideas, and guides for computationally classifying acid and alkaline enzymes.
Collapse
Affiliation(s)
- Hongfei Li
- School of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China
| | - Haoze Du
- Department of Computer Science, Wake Forest University, Winston-Salem, NC, 27109, United States
| | - Xianfang Wang
- School of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China
| | - Peng Gao
- School of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China
| | - Yifeng Liu
- School of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China
| | - Weizhong Lin
- Department of Computer Science, University of Missouri, Columbia, MO, 65211, United States
| |
Collapse
|
21
|
Chandra A, Sharma A, Dehzangi A, Shigemizu D, Tsunoda T. Bigram-PGK: phosphoglycerylation prediction using the technique of bigram probabilities of position specific scoring matrix. BMC Mol Cell Biol 2019; 20:57. [PMID: 31856704 PMCID: PMC6923822 DOI: 10.1186/s12860-019-0240-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2019] [Accepted: 11/20/2019] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND The biological process known as post-translational modification (PTM) is a condition whereby proteomes are modified that affects normal cell biology, and hence the pathogenesis. A number of PTMs have been discovered in the recent years and lysine phosphoglycerylation is one of the fairly recent developments. Even with a large number of proteins being sequenced in the post-genomic era, the identification of phosphoglycerylation remains a big challenge due to factors such as cost, time consumption and inefficiency involved in the experimental efforts. To overcome this issue, computational techniques have emerged to accurately identify phosphoglycerylated lysine residues. However, the computational techniques proposed so far hold limitations to correctly predict this covalent modification. RESULTS We propose a new predictor in this paper called Bigram-PGK which uses evolutionary information of amino acids to try and predict phosphoglycerylated sites. The benchmark dataset which contains experimentally labelled sites is employed for this purpose and profile bigram occurrences is calculated from position specific scoring matrices of amino acids in the protein sequences. The statistical measures of this work, such as sensitivity, specificity, precision, accuracy, Mathews correlation coefficient and area under ROC curve have been reported to be 0.9642, 0.8973, 0.8253, 0.9193, 0.8330, 0.9306, respectively. CONCLUSIONS The proposed predictor, based on the feature of evolutionary information and support vector machine classifier, has shown great potential to effectively predict phosphoglycerylated and non-phosphoglycerylated lysine residues when compared against the existing predictors. The data and software of this work can be acquired from https://github.com/abelavit/Bigram-PGK.
Collapse
Affiliation(s)
- Abel Chandra
- School of Engineering and Physics, Faculty of Science Technology and Environment, University of the South Pacific, Suva, Fiji.
| | - Alok Sharma
- School of Engineering and Physics, Faculty of Science Technology and Environment, University of the South Pacific, Suva, Fiji. .,Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, QLD, 4111, Australia. .,Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University (TMDU), Tokyo, 113-8510, Japan. .,Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, 230-0045, Japan. .,CREST, JST, Tokyo, 102-8666, Japan.
| | - Abdollah Dehzangi
- Department of Computer Science, Morgan State University, Baltimore, MD, USA
| | - Daichi Shigemizu
- Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University (TMDU), Tokyo, 113-8510, Japan.,Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, 230-0045, Japan.,CREST, JST, Tokyo, 102-8666, Japan.,Medical Genome Center, National Center for Geriatrics and Gerontology, Obu, Aichi, 474-8511, Japan
| | - Tatsuhiko Tsunoda
- Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University (TMDU), Tokyo, 113-8510, Japan.,Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, 230-0045, Japan.,CREST, JST, Tokyo, 102-8666, Japan.,Laboratory for Medical Science Mathematics, Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo, 108-8639, Japan
| |
Collapse
|
22
|
Zhong W, Zhong B, Zhang H, Chen Z, Chen Y. Identification of Anti-cancer Peptides Based on Multi-classifier System. Comb Chem High Throughput Screen 2019; 22:694-704. [PMID: 31793417 DOI: 10.2174/1386207322666191203141102] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Revised: 07/18/2019] [Accepted: 07/30/2019] [Indexed: 01/01/2023]
Abstract
AIMS AND OBJECTIVE Cancer is one of the deadliest diseases, taking the lives of millions every year. Traditional methods of treating cancer are expensive and toxic to normal cells. Fortunately, anti-cancer peptides (ACPs) can eliminate this side effect. However, the identification and development of new anti-cancer peptides through experiments take a lot of time and money, therefore, it is necessary to develop a fast and accurate calculation model to identify the anti-cancer peptide. Machine learning algorithms are a good choice. MATERIALS AND METHODS In our study, a multi-classifier system was used, combined with multiple machine learning models, to predict anti-cancer peptides. These individual learners are composed of different feature information and algorithms, and form a multi-classifier system by voting. RESULTS AND CONCLUSION The experiments show that the overall prediction rate of each individual learner is above 80% and the overall accuracy of multi-classifier system for anti-cancer peptides prediction can reach 95.93%, which is better than the existing prediction model.
Collapse
Affiliation(s)
- Wanben Zhong
- School of Computer Science and Technology, Huaqiao University, Xiamen, Fujian, 361021, China
| | - Bineng Zhong
- School of Computer Science and Technology, Huaqiao University, Xiamen, Fujian, 361021, China.,Key Laboratory of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education, Nanjing University of Science and Technology, Nanjing, 210094, China
| | - Hongbo Zhang
- School of Computer Science and Technology, Huaqiao University, Xiamen, Fujian, 361021, China
| | - Ziyi Chen
- School of Computer Science and Technology, Huaqiao University, Xiamen, Fujian, 361021, China
| | - Yan Chen
- School of Computer Science and Technology, Huaqiao University, Xiamen, Fujian, 361021, China
| |
Collapse
|
23
|
Ju Z, Wang SY. Identify Lysine Neddylation Sites Using Bi-profile Bayes Feature Extraction via the Chou's 5-steps Rule and General Pseudo Components. Curr Genomics 2019; 20:592-601. [PMID: 32581647 PMCID: PMC7290059 DOI: 10.2174/1389202921666191223154629] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2019] [Revised: 10/19/2019] [Accepted: 11/07/2019] [Indexed: 01/06/2023] Open
Abstract
Introduction Neddylation is a highly dynamic and reversible post-translational modification. The abnormality of neddylation has previously been shown to be closely related to some human diseases. The detection of neddylation sites is essential for elucidating the regulation mechanisms of protein neddylation. Objective As the detection of the lysine neddylation sites by the traditional experimental method is often expensive and time-consuming, it is imperative to design computational methods to identify neddylation sites. Methods In this study, a bioinformatics tool named NeddPred is developed to identify underlying protein neddylation sites. A bi-profile bayes feature extraction is used to encode neddylation sites and a fuzzy support vector machine model is utilized to overcome the problem of noise and class imbalance in the prediction. Results Matthew's correlation coefficient of NeddPred achieved 0.7082 and an area under the receiver operating characteristic curve of 0.9769. Independent tests show that NeddPred significantly outperforms existing lysine neddylation sites predictor NeddyPreddy. Conclusion Therefore, NeddPred can be a complement to the existing tools for the prediction of neddylation sites. A user-friendly webserver for NeddPred is accessible at 123.206.31.171/NeddPred/.
Collapse
Affiliation(s)
- Zhe Ju
- College of Science, Shenyang Aerospace University, Shenyang110136, P.R. China
| | - Shi-Yun Wang
- College of Science, Shenyang Aerospace University, Shenyang110136, P.R. China
| |
Collapse
|
24
|
Chou KC. Impacts of Pseudo Amino Acid Components and 5-steps Rule to Proteomics and Proteome Analysis. Curr Top Med Chem 2019; 19:2283-2300. [DOI: 10.2174/1568026619666191018100141] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Revised: 08/18/2019] [Accepted: 08/26/2019] [Indexed: 01/27/2023]
Abstract
Stimulated by the 5-steps rule during the last decade or so, computational proteomics has achieved remarkable progresses in the following three areas: (1) protein structural class prediction; (2) protein subcellular location prediction; (3) post-translational modification (PTM) site prediction. The results obtained by these predictions are very useful not only for an in-depth study of the functions of proteins and their biological processes in a cell, but also for developing novel drugs against major diseases such as cancers, Alzheimer’s, and Parkinson’s. Moreover, since the targets to be predicted may have the multi-label feature, two sets of metrics are introduced: one is for inspecting the global prediction quality, while the other for the local prediction quality. All the predictors covered in this review have a userfriendly web-server, through which the majority of experimental scientists can easily obtain their desired data without the need to go through the complicated mathematics.
Collapse
Affiliation(s)
- Kuo-Chen Chou
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China
| |
Collapse
|
25
|
Xie NZ, Li JX, Huang RB. Biological Production of (S)-acetoin: A State-of-the-Art Review. Curr Top Med Chem 2019; 19:2348-2356. [PMID: 31648637 DOI: 10.2174/1568026619666191018111424] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Revised: 08/28/2019] [Accepted: 09/02/2019] [Indexed: 12/24/2022]
Abstract
Acetoin is an important four-carbon compound that has many applications in foods, chemical synthesis, cosmetics, cigarettes, soaps, and detergents. Its stereoisomer (S)-acetoin, a high-value chiral compound, can also be used to synthesize optically active drugs, which could enhance targeting properties and reduce side effects. Recently, considerable progress has been made in the development of biotechnological routes for (S)-acetoin production. In this review, various strategies for biological (S)- acetoin production are summarized, and their constraints and possible solutions are described. Furthermore, future prospects of biological production of (S)-acetoin are discussed.
Collapse
Affiliation(s)
- Neng-Zhong Xie
- National Engineering Research Center for Non-Food Biorefinery, State Key Laboratory of Non-Food Biomass and Enzyme Technology, Guangxi Key Laboratory of Bio-refinery, Guangxi Biomass Engineering Technology Research Center, Guangxi Academy of Sciences, 98 Daling Road, Nanning, 530007, China
| | - Jian-Xiu Li
- National Engineering Research Center for Non-Food Biorefinery, State Key Laboratory of Non-Food Biomass and Enzyme Technology, Guangxi Key Laboratory of Bio-refinery, Guangxi Biomass Engineering Technology Research Center, Guangxi Academy of Sciences, 98 Daling Road, Nanning, 530007, China
| | - Ri-Bo Huang
- National Engineering Research Center for Non-Food Biorefinery, State Key Laboratory of Non-Food Biomass and Enzyme Technology, Guangxi Key Laboratory of Bio-refinery, Guangxi Biomass Engineering Technology Research Center, Guangxi Academy of Sciences, 98 Daling Road, Nanning, 530007, China.,State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, College of Life Science and Technology, Guangxi University, 100 Daxue Road, Nanning, 530004, China
| |
Collapse
|
26
|
Chou KC. Advances in Predicting Subcellular Localization of Multi-label Proteins and its Implication for Developing Multi-target Drugs. Curr Med Chem 2019; 26:4918-4943. [PMID: 31060481 DOI: 10.2174/0929867326666190507082559] [Citation(s) in RCA: 78] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2018] [Revised: 01/29/2019] [Accepted: 01/31/2019] [Indexed: 12/16/2022]
Abstract
The smallest unit of life is a cell, which contains numerous protein molecules. Most
of the functions critical to the cell’s survival are performed by these proteins located in its different
organelles, usually called ‘‘subcellular locations”. Information of subcellular localization
for a protein can provide useful clues about its function. To reveal the intricate pathways at the
cellular level, knowledge of the subcellular localization of proteins in a cell is prerequisite.
Therefore, one of the fundamental goals in molecular cell biology and proteomics is to determine
the subcellular locations of proteins in an entire cell. It is also indispensable for prioritizing
and selecting the right targets for drug development. Unfortunately, it is both timeconsuming
and costly to determine the subcellular locations of proteins purely based on experiments.
With the avalanche of protein sequences generated in the post-genomic age, it is highly
desired to develop computational methods for rapidly and effectively identifying the subcellular
locations of uncharacterized proteins based on their sequences information alone. Actually,
considerable progresses have been achieved in this regard. This review is focused on those
methods, which have the capacity to deal with multi-label proteins that may simultaneously
exist in two or more subcellular location sites. Protein molecules with this kind of characteristic
are vitally important for finding multi-target drugs, a current hot trend in drug development.
Focused in this review are also those methods that have use-friendly web-servers established so
that the majority of experimental scientists can use them to get the desired results without the
need to go through the detailed mathematics involved.
Collapse
Affiliation(s)
- Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, United States
| |
Collapse
|
27
|
Kang C. 19F-NMR in Target-based Drug Discovery. Curr Med Chem 2019; 26:4964-4983. [PMID: 31187703 DOI: 10.2174/0929867326666190610160534] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2018] [Revised: 08/14/2018] [Accepted: 03/13/2019] [Indexed: 02/06/2023]
Abstract
Solution NMR spectroscopy plays important roles in understanding protein structures, dynamics and protein-protein/ligand interactions. In a target-based drug discovery project, NMR can serve an important function in hit identification and lead optimization. Fluorine is a valuable probe for evaluating protein conformational changes and protein-ligand interactions. Accumulated studies demonstrate that 19F-NMR can play important roles in fragment- based drug discovery (FBDD) and probing protein-ligand interactions. This review summarizes the application of 19F-NMR in understanding protein-ligand interactions and drug discovery. Several examples are included to show the roles of 19F-NMR in confirming identified hits/leads in the drug discovery process. In addition to identifying hits from fluorinecontaining compound libraries, 19F-NMR will play an important role in drug discovery by providing a fast and robust way in novel hit identification. This technique can be used for ranking compounds with different binding affinities and is particularly useful for screening competitive compounds when a reference ligand is available.
Collapse
Affiliation(s)
- CongBao Kang
- Experimental Drug Development Centre (EDDC), Agency for Science, Technology and Research (A*STAR), 10 Biopolis Road, #05-01, Singapore, 138670, Singapore
| |
Collapse
|
28
|
Abstract
The smallest unit of life is a cell, which contains numerous protein molecules. Most
of the functions critical to the cell’s survival are performed by these proteins located in its different
organelles, usually called ‘‘subcellular locations”. Information of subcellular localization
for a protein can provide useful clues about its function. To reveal the intricate pathways at the
cellular level, knowledge of the subcellular localization of proteins in a cell is prerequisite.
Therefore, one of the fundamental goals in molecular cell biology and proteomics is to determine
the subcellular locations of proteins in an entire cell. It is also indispensable for prioritizing
and selecting the right targets for drug development. Unfortunately, it is both timeconsuming
and costly to determine the subcellular locations of proteins purely based on experiments.
With the avalanche of protein sequences generated in the post-genomic age, it is highly
desired to develop computational methods for rapidly and effectively identifying the subcellular
locations of uncharacterized proteins based on their sequences information alone. Actually,
considerable progresses have been achieved in this regard. This review is focused on those
methods, which have the capacity to deal with multi-label proteins that may simultaneously
exist in two or more subcellular location sites. Protein molecules with this kind of characteristic
are vitally important for finding multi-target drugs, a current hot trend in drug development.
Focused in this review are also those methods that have use-friendly web-servers established so
that the majority of experimental scientists can use them to get the desired results without the
need to go through the detailed mathematics involved.
Collapse
Affiliation(s)
- Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, United States
| |
Collapse
|
29
|
Yang H, Yang W, Dao FY, Lv H, Ding H, Chen W, Lin H. A comparison and assessment of computational method for identifying recombination hotspots in Saccharomyces cerevisiae. Brief Bioinform 2019; 21:1568-1580. [PMID: 31633777 DOI: 10.1093/bib/bbz123] [Citation(s) in RCA: 67] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2019] [Revised: 05/03/2019] [Accepted: 08/19/2019] [Indexed: 12/27/2022] Open
Abstract
Meiotic recombination is one of the most important driving forces of biological evolution, which is initiated by double-strand DNA breaks. Recombination has important roles in genome diversity and evolution. This review firstly provides a comprehensive survey of the 15 computational methods developed for identifying recombination hotspots in Saccharomyces cerevisiae. These computational methods were discussed and compared in terms of underlying algorithms, extracted features, predictive capability and practical utility. Subsequently, a more objective benchmark data set was constructed to develop a new predictor iRSpot-Pse6NC2.0 (http://lin-group.cn/server/iRSpot-Pse6NC2.0). To further demonstrate the generalization ability of these methods, we compared iRSpot-Pse6NC2.0 with existing methods on the chromosome XVI of S. cerevisiae. The results of the independent data set test demonstrated that the new predictor is superior to existing tools in the identification of recombination hotspots. The iRSpot-Pse6NC2.0 will become an important tool for identifying recombination hotspot.
Collapse
Affiliation(s)
- Hui Yang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Wuritu Yang
- Development and Planning Department, Inner Mongolia University, Hohhot 010021, China
| | - Fu-Ying Dao
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hao Lv
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Wei Chen
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611730, China
| | - Hao Lin
- Center for Genomics and Computational Biology, School of Life Sciences, North China University of Science and Technology, Tangshan 063000, China
| |
Collapse
|
30
|
Jin AH, Muttenthaler M, Dutertre S, Himaya SWA, Kaas Q, Craik DJ, Lewis RJ, Alewood PF. Conotoxins: Chemistry and Biology. Chem Rev 2019; 119:11510-11549. [PMID: 31633928 DOI: 10.1021/acs.chemrev.9b00207] [Citation(s) in RCA: 172] [Impact Index Per Article: 28.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
The venom of the marine predatory cone snails (genus Conus) has evolved for prey capture and defense, providing the basis for survival and rapid diversification of the now estimated 750+ species. A typical Conus venom contains hundreds to thousands of bioactive peptides known as conotoxins. These mostly disulfide-rich and well-structured peptides act on a wide range of targets such as ion channels, G protein-coupled receptors, transporters, and enzymes. Conotoxins are of interest to neuroscientists as well as drug developers due to their exquisite potency and selectivity, not just against prey but also mammalian targets, thereby providing a rich source of molecular probes and therapeutic leads. The rise of integrated venomics has accelerated conotoxin discovery with now well over 10,000 conotoxin sequences published. However, their structural and pharmacological characterization lags considerably behind. In this review, we highlight the diversity of new conotoxins uncovered since 2014, their three-dimensional structures and folds, novel chemical approaches to their syntheses, and their value as pharmacological tools to unravel complex biology. Additionally, we discuss challenges and future directions for the field.
Collapse
Affiliation(s)
- Ai-Hua Jin
- Institute for Molecular Bioscience , The University of Queensland , Brisbane Queensland 4072 , Australia
| | - Markus Muttenthaler
- Institute for Molecular Bioscience , The University of Queensland , Brisbane Queensland 4072 , Australia.,Institute of Biological Chemistry, Faculty of Chemistry , University of Vienna , 1090 Vienna , Austria
| | - Sebastien Dutertre
- Département des Acides Amines, Peptides et Protéines, Unité Mixte de Recherche 5247, Université Montpellier 2-Centre Nationale de la Recherche Scientifique , Institut des Biomolécules Max Mousseron , Place Eugène Bataillon , 34095 Montpellier Cedex 5 , France
| | - S W A Himaya
- Institute for Molecular Bioscience , The University of Queensland , Brisbane Queensland 4072 , Australia
| | - Quentin Kaas
- Institute for Molecular Bioscience , The University of Queensland , Brisbane Queensland 4072 , Australia
| | - David J Craik
- Institute for Molecular Bioscience , The University of Queensland , Brisbane Queensland 4072 , Australia
| | - Richard J Lewis
- Institute for Molecular Bioscience , The University of Queensland , Brisbane Queensland 4072 , Australia
| | - Paul F Alewood
- Institute for Molecular Bioscience , The University of Queensland , Brisbane Queensland 4072 , Australia
| |
Collapse
|
31
|
Liang R, Xie J, Zhang C, Zhang M, Huang H, Huo H, Cao X, Niu B. Identifying Cancer Targets Based on Machine Learning Methods via Chou's 5-steps Rule and General Pseudo Components. Curr Top Med Chem 2019; 19:2301-2317. [PMID: 31622219 DOI: 10.2174/1568026619666191016155543] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2019] [Revised: 07/19/2019] [Accepted: 08/26/2019] [Indexed: 01/09/2023]
Abstract
In recent years, the successful implementation of human genome project has made people realize that genetic, environmental and lifestyle factors should be combined together to study cancer due to the complexity and various forms of the disease. The increasing availability and growth rate of 'big data' derived from various omics, opens a new window for study and therapy of cancer. In this paper, we will introduce the application of machine learning methods in handling cancer big data including the use of artificial neural networks, support vector machines, ensemble learning and naïve Bayes classifiers.
Collapse
Affiliation(s)
- Ruirui Liang
- School of Life Sciences, Shanghai University, Shanghai, 200444, China
| | - Jiayang Xie
- School of Life Sciences, Shanghai University, Shanghai, 200444, China
| | - Chi Zhang
- Foshan Huaxia Eye Hospital, Huaxia Eye Hospital Group, Foshan 528000, China
| | - Mengying Zhang
- School of Life Sciences, Shanghai University, Shanghai, 200444, China
| | - Hai Huang
- School of Life Sciences, Shanghai University, Shanghai, 200444, China
| | - Haizhong Huo
- Department of General Surgery, Shanghai Ninth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai 200011, China
| | - Xin Cao
- Zhongshan Hospital, Institute of Clinical Science, Shanghai Medical College, Fudan University, Shanghai 200032, China
| | - Bing Niu
- School of Life Sciences, Shanghai University, Shanghai, 200444, China
| |
Collapse
|
32
|
Li SH, Guan ZX, Zhang D, Zhang ZM, Huang J, Yang W, Lin H. Recent Advancement in Predicting Subcellular Localization of Mycobacterial Protein with Machine Learning Methods. Med Chem 2019; 16:605-619. [PMID: 31584379 DOI: 10.2174/1573406415666191004101913] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2019] [Revised: 06/25/2019] [Accepted: 08/23/2019] [Indexed: 01/28/2023]
Abstract
Mycobacterium tuberculosis (MTB) can cause the terrible tuberculosis (TB), which is reported as one of the most dreadful epidemics. Although many biochemical molecular drugs have been developed to cope with this disease, the drug resistance-especially the multidrug-resistant (MDR) and extensively drug-resistance (XDR)-poses a huge threat to the treatment. However, traditional biochemical experimental method to tackle TB is time-consuming and costly. Benefited by the appearance of the enormous genomic and proteomic sequence data, TB can be treated via sequence-based biological computational approach-bioinformatics. Studies on predicting subcellular localization of mycobacterial protein (MBP) with high precision and efficiency may help figure out the biological function of these proteins and then provide useful insights for protein function annotation as well as drug design. In this review, we reported the progress that has been made in computational prediction of subcellular localization of MBP including the following aspects: 1) Construction of benchmark datasets. 2) Methods of feature extraction. 3) Techniques of feature selection. 4) Application of several published prediction algorithms. 5) The published results. 6) The further study on prediction of subcellular localization of MBP.
Collapse
Affiliation(s)
- Shi-Hao Li
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Zheng-Xing Guan
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Dan Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Zi-Mei Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Jian Huang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Wuritu Yang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,Development and Planning Department, Inner Mongolia University, Hohhot, P.R. China
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
33
|
Identifying DNase I hypersensitive sites using multi-features fusion and F-score features selection via Chou's 5-steps rule. Biophys Chem 2019; 253:106227. [DOI: 10.1016/j.bpc.2019.106227] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2019] [Revised: 07/04/2019] [Accepted: 07/10/2019] [Indexed: 01/12/2023]
|
34
|
Du X, Diao Y, Liu H, Li S. MsDBP: Exploring DNA-Binding Proteins by Integrating Multiscale Sequence Information via Chou’s Five-Step Rule. J Proteome Res 2019; 18:3119-3132. [DOI: 10.1021/acs.jproteome.9b00226] [Citation(s) in RCA: 58] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Affiliation(s)
- Xiuquan Du
- The School of Computer Science and Technology, Anhui University, Hefei, Anhui, China
| | - Yanyu Diao
- The School of Computer Science and Technology, Anhui University, Hefei, Anhui, China
| | - Heng Liu
- Department of Gastroenterology, The First Affiliated Hospital of Anhui Medical University, Hefei, Anhui, China
| | - Shuo Li
- Department of Medical Imaging, Western University, London, ON N6A 3K7, Canada
| |
Collapse
|
35
|
Messerli MA, Sarkar A. Advances in Electrochemistry for Monitoring Cellular Chemical Flux. Curr Med Chem 2019; 26:4984-5002. [PMID: 31057100 DOI: 10.2174/0929867326666190506111629] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2018] [Revised: 03/06/2019] [Accepted: 03/12/2019] [Indexed: 11/22/2022]
Abstract
The transport of organic and inorganic molecules, along with inorganic ions across the plasma membrane results in chemical fluxes that reflect the cellular function in healthy and diseased states. Measurement of these chemical fluxes enables the characterization of protein function and transporter stoichiometry, characterization of a single cell and embryo viability prior to implantation, and screening of pharmaceutical agents. Electrochemical sensors emerge as sensitive and non-invasive tools for measuring chemical fluxes immediately outside the cells in the boundary layer, that are capable of monitoring a diverse range of transported analytes including inorganic ions, gases, neurotransmitters, hormones, and pharmaceutical agents. Used on their own or in combination with other methods, these sensors continue to expand our understanding of the function of rare cells and small tissues. Advances in sensor construction and detection strategies continue to improve sensitivity under physiological conditions, diversify analyte detection, and increase throughput. These advances will be discussed in the context of addressing technical challenges to measuring chemical flux in the boundary layer of cells and measuring the resultant changes to the chemical concentration in the bulk media.
Collapse
Affiliation(s)
- Mark A Messerli
- Department of Biology and Microbiology, South Dakota State University, Brookings, SD. United States
| | - Anyesha Sarkar
- Department of Biology and Microbiology, South Dakota State University, Brookings, SD. United States
| |
Collapse
|
36
|
Barukab O, Khan YD, Khan SA, Chou KC. iSulfoTyr-PseAAC: Identify Tyrosine Sulfation Sites by Incorporating Statistical Moments via Chou's 5-steps Rule and Pseudo Components. Curr Genomics 2019; 20:306-320. [PMID: 32030089 PMCID: PMC6983959 DOI: 10.2174/1389202920666190819091609] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Revised: 08/04/2019] [Accepted: 08/06/2019] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND The amino acid residues, in protein, undergo post-translation modification (PTM) during protein synthesis, a process of chemical and physical change in an amino acid that in turn alters behavioral properties of proteins. Tyrosine sulfation is a ubiquitous posttranslational modification which is known to be associated with regulation of various biological functions and pathological pro-cesses. Thus its identification is necessary to understand its mechanism. Experimental determination through site-directed mutagenesis and high throughput mass spectrometry is a costly and time taking process, thus, the reliable computational model is required for identification of sulfotyrosine sites. METHODOLOGY In this paper, we present a computational model for the prediction of the sulfotyrosine sites named iSulfoTyr-PseAAC in which feature vectors are constructed using statistical moments of protein amino acid sequences and various position/composition relative features. These features are in-corporated into PseAAC. The model is validated by jackknife, cross-validation, self-consistency and in-dependent testing. RESULTS Accuracy determined through validation was 93.93% for jackknife test, 95.16% for cross-validation, 94.3% for self-consistency and 94.3% for independent testing. CONCLUSION The proposed model has better performance as compared to the existing predictors, how-ever, the accuracy can be improved further, in future, due to increasing number of sulfotyrosine sites in proteins.
Collapse
Affiliation(s)
| | | | - Sher Afzal Khan
- Address correspondence to this author at the Department of Information Technology, Faculty of Computing and Information Technology in Rabigh, King Abdulaziz University, P.O. Box 344, Rabigh, 21911, Saudi Arabia; and Department of Computer Sciences, Abdul Wali Khan University, Mardan, Pakistan; E-mail:
| | | |
Collapse
|
37
|
Ilyas S, Hussain W, Ashraf A, Khan YD, Khan SA, Chou KC. iMethylK_pseAAC: Improving Accuracy of Lysine Methylation Sites Identification by Incorporating Statistical Moments and Position Relative Features into General PseAAC via Chou's 5-steps Rule. Curr Genomics 2019; 20:275-292. [PMID: 32030087 PMCID: PMC6983956 DOI: 10.2174/1389202920666190809095206] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2019] [Revised: 07/02/2019] [Accepted: 07/26/2019] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND Methylation is one of the most important post-translational modifications in the human body which usually arises on lysine among the most intensely modified residues. It performs a dynamic role in numerous biological procedures, such as regulation of gene expression, regulation of protein function and RNA processing. Therefore, to identify lysine methylation sites is an important challenge as some experimental procedures are time-consuming. OBJECTIVE Herein, we propose a computational predictor named iMethylK_pseAAC to identify lysine methylation sites. METHODS Firstly, we constructed feature vectors based on PseAAC using position and composition rel-ative features and statistical moments. A neural network is trained based on the extracted features. The performance of the proposed method is then validated using cross-validation and jackknife testing. RESULTS The objective evaluation of the predictor showed accuracy of 96.7% for self-consistency, 91.61% for 10-fold cross-validation and 93.42% for jackknife testing. CONCLUSION It is concluded that iMethylK_pseAAC outperforms the counterparts to identify lysine methylation sites such as iMethyl_pseACC, BPB_pPMS and PMeS.
Collapse
Affiliation(s)
| | | | | | - Yaser Daanial Khan
- Address correspondence to this author at the Department of Computer Science, School of Systems and Technology, University of Management and Technology, P.O. Box 10033, C-II, Johar Town, Lahore, Pakistan; Tel: +923054440271; E-mail:
| | | | | |
Collapse
|
38
|
Chandra AA, Sharma A, Dehzangi A, Tsunoda T. EvolStruct-Phogly: incorporating structural properties and evolutionary information from profile bigrams for the phosphoglycerylation prediction. BMC Genomics 2019; 19:984. [PMID: 30999859 PMCID: PMC7402405 DOI: 10.1186/s12864-018-5383-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2018] [Accepted: 12/17/2018] [Indexed: 01/21/2023] Open
Abstract
Background Post-translational modification (PTM), which is a biological process, tends to modify proteome that leads to changes in normal cell biology and pathogenesis. In the recent times, there has been many reported PTMs. Out of the many modifications, phosphoglycerylation has become particularly the subject of interest. The experimental procedure for identification of phosphoglycerylated residues continues to be an expensive, inefficient and time-consuming effort, even with a large number of proteins that are sequenced in the post-genomic period. Computational methods are therefore being anticipated in order to effectively predict phosphoglycerylated lysines. Even though there are predictors available, the ability to detect phosphoglycerylated lysine residues still remains inadequate. Results We have introduced a new predictor in this paper named EvolStruct-Phogly that uses structural and evolutionary information relating to amino acids to predict phosphoglycerylated lysine residues. Benchmarked data is employed containing experimentally identified phosphoglycerylated and non-phosphoglycerylated lysines. We have then extracted the three structural information which are accessible surface area of amino acids, backbone torsion angles, amino acid’s local structure conformations and profile bigrams of position-specific scoring matrices. Conclusion EvolStruct-Phogly showed a noteworthy improvement in regards to the performance when compared with the previous predictors. The performance metrics obtained are as follows: sensitivity 0.7744, specificity 0.8533, precision 0.7368, accuracy 0.8275, and Mathews correlation coefficient of 0.6242. The software package and data of this work can be obtained from https://github.com/abelavit/EvolStruct-Phogly or www.alok-ai-lab.com
Collapse
Affiliation(s)
| | - Alok Sharma
- School of Engineering & Physics, University of the South Pacific, Suva, Fiji. .,Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan. .,Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia. .,CREST, JST, Tokyo, Japan.
| | - Abdollah Dehzangi
- Department of Computer Science, Morgan State University, Baltimore, MD, USA
| | - Tatushiko Tsunoda
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan.,CREST, JST, Tokyo, Japan.,Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University, Tokyo, Japan
| |
Collapse
|
39
|
Pan Q, Guo Y, Guo L, Liao S, Zhao C, Wang S, Liu HF. Mechanistic Insights of Chemicals and Drugs as Risk Factors for Systemic Lupus Erythematosus. Curr Med Chem 2019; 27:5175-5188. [PMID: 30947650 DOI: 10.2174/0929867326666190404140658] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2018] [Revised: 03/25/2019] [Accepted: 03/27/2019] [Indexed: 12/21/2022]
Abstract
Systemic Lupus Erythematosus (SLE) is a chronic and relapsing heterogenous autoimmune disease that primarily affects women of reproductive age. Genetic and environmental risk factors are involved in the pathogenesis of SLE, and susceptibility genes have recently been identified. However, as gene therapy is far from clinical application, further investigation of environmental risk factors could reveal important therapeutic approaches. We systematically explored two groups of environmental risk factors: chemicals (including silica, solvents, pesticides, hydrocarbons, heavy metals, and particulate matter) and drugs (including procainamide, hydralazine, quinidine, Dpenicillamine, isoniazid, and methyldopa). Furthermore, the mechanisms underlying risk factors, such as genetic factors, epigenetic change, and disrupted immune tolerance, were explored. This review identifies novel risk factors and their underlying mechanisms. Practicable measures for the management of these risk factors will benefit SLE patients and provide potential therapeutic strategies.
Collapse
Affiliation(s)
- Qingjun Pan
- Key Laboratory of Prevention and Management of Chronic Kidney Disease of Zhanjiang City, Affiliated Hospital of Guangdong Medical University, 57th South Renmin Road, Zhanjiang 524001, Guangdong, China
| | - Yun Guo
- Key Laboratory of Prevention and Management of Chronic Kidney Disease of Zhanjiang City, Affiliated Hospital of Guangdong Medical University, 57th South Renmin Road, Zhanjiang 524001, Guangdong, China
| | - Linjie Guo
- Key Laboratory of Prevention and Management of Chronic Kidney Disease of Zhanjiang City, Affiliated Hospital of Guangdong Medical University, 57th South Renmin Road, Zhanjiang 524001, Guangdong, China
| | - Shuzhen Liao
- Key Laboratory of Prevention and Management of Chronic Kidney Disease of Zhanjiang City, Affiliated Hospital of Guangdong Medical University, 57th South Renmin Road, Zhanjiang 524001, Guangdong, China
| | - Chunfei Zhao
- Key Laboratory of Prevention and Management of Chronic Kidney Disease of Zhanjiang City, Affiliated Hospital of Guangdong Medical University, 57th South Renmin Road, Zhanjiang 524001, Guangdong, China
| | - Sijie Wang
- Key Laboratory of Prevention and Management of Chronic Kidney Disease of Zhanjiang City, Affiliated Hospital of Guangdong Medical University, 57th South Renmin Road, Zhanjiang 524001, Guangdong, China
| | - Hua-Feng Liu
- Key Laboratory of Prevention and Management of Chronic Kidney Disease of Zhanjiang City, Affiliated Hospital of Guangdong Medical University, 57th South Renmin Road, Zhanjiang 524001, Guangdong, China
| |
Collapse
|
40
|
Mansbach RA, Travers T, McMahon BH, Fair JM, Gnanakaran S. Snails In Silico: A Review of Computational Studies on the Conopeptides. Mar Drugs 2019; 17:E145. [PMID: 30832207 PMCID: PMC6471681 DOI: 10.3390/md17030145] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2019] [Revised: 02/21/2019] [Accepted: 02/22/2019] [Indexed: 12/26/2022] Open
Abstract
Marine cone snails are carnivorous gastropods that use peptide toxins called conopeptides both as a defense mechanism and as a means to immobilize and kill their prey. These peptide toxins exhibit a large chemical diversity that enables exquisite specificity and potency for target receptor proteins. This diversity arises in terms of variations both in amino acid sequence and length, and in posttranslational modifications, particularly the formation of multiple disulfide linkages. Most of the functionally characterized conopeptides target ion channels of animal nervous systems, which has led to research on their therapeutic applications. Many facets of the underlying molecular mechanisms responsible for the specificity and virulence of conopeptides, however, remain poorly understood. In this review, we will explore the chemical diversity of conopeptides from a computational perspective. First, we discuss current approaches used for classifying conopeptides. Next, we review different computational strategies that have been applied to understanding and predicting their structure and function, from machine learning techniques for predictive classification to docking studies and molecular dynamics simulations for molecular-level understanding. We then review recent novel computational approaches for rapid high-throughput screening and chemical design of conopeptides for particular applications. We close with an assessment of the state of the field, emphasizing important questions for future lines of inquiry.
Collapse
Affiliation(s)
- Rachael A Mansbach
- Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, NM 87545, USA.
| | - Timothy Travers
- Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, NM 87545, USA.
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, NM 87545, USA.
| | - Benjamin H McMahon
- Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, NM 87545, USA.
| | - Jeanne M Fair
- Biosecurity and Public Health Group, Los Alamos National Laboratory, Los Alamos, NM 87545, USA.
| | - S Gnanakaran
- Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, NM 87545, USA.
| |
Collapse
|
41
|
HRGPred: Prediction of herbicide resistant genes with k-mer nucleotide compositional features and support vector machine. Sci Rep 2019; 9:778. [PMID: 30692561 PMCID: PMC6349872 DOI: 10.1038/s41598-018-37309-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Accepted: 12/03/2018] [Indexed: 02/07/2023] Open
Abstract
Herbicide resistance (HR) is a major concern for the agricultural producers as well as environmentalists. Resistance to commonly used herbicides are conferred due to mutation(s) in the genes encoding herbicide target sites/proteins (GETS). Identification of these genes through wet-lab experiments is time consuming and expensive. Thus, a supervised learning-based computational model has been proposed in this study, which is first of its kind for the prediction of seven classes of GETS. The cDNA sequences of the genes were initially transformed into numeric features based on the k-mer compositions and then supplied as input to the support vector machine. In the proposed SVM-based model, the prediction occurs in two stages, where a binary classifier in the first stage discriminates the genes involved in conferring the resistance to herbicides from other genes, followed by a multi-class classifier in the second stage that categorizes the predicted herbicide resistant genes in the first stage into any one of the seven resistant classes. Overall classification accuracies were observed to be ~89% and >97% for binary and multi-class classifications respectively. The proposed model confirmed higher accuracy than the homology-based algorithms viz., BLAST and Hidden Markov Model. Besides, the developed computational model achieved ~87% accuracy, while tested with an independent dataset. An online prediction server HRGPred (http://cabgrid.res.in:8080/hrgpred) has also been established to facilitate the prediction of GETS by the scientific community.
Collapse
|
42
|
Jiang QX. Structural Variability in the RLR-MAVS Pathway and Sensitive Detection of Viral RNAs. Med Chem 2019; 15:443-458. [PMID: 30569868 PMCID: PMC6858087 DOI: 10.2174/1573406415666181219101613] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2018] [Revised: 10/23/2018] [Accepted: 12/12/2018] [Indexed: 12/25/2022]
Abstract
Cells need high-sensitivity detection of non-self molecules in order to fight against pathogens. These cellular sensors are thus of significant importance to medicinal purposes, especially for treating novel emerging pathogens. RIG-I-like receptors (RLRs) are intracellular sensors for viral RNAs (vRNAs). Their active forms activate mitochondrial antiviral signaling protein (MAVS) and trigger downstream immune responses against viral infection. Functional and structural studies of the RLR-MAVS signaling pathway have revealed significant supramolecular variability in the past few years, which revealed different aspects of the functional signaling pathway. Here I will discuss the molecular events of RLR-MAVS pathway from the angle of detecting single copy or a very low copy number of vRNAs in the presence of non-specific competition from cytosolic RNAs, and review key structural variability in the RLR / vRNA complexes, the MAVS helical polymers, and the adapter-mediated interactions between the active RLR / vRNA complex and the inactive MAVS in triggering the initiation of the MAVS filaments. These structural variations may not be exclusive to each other, but instead may reflect the adaptation of the signaling pathways to different conditions or reach different levels of sensitivity in its response to exogenous vRNAs.
Collapse
Affiliation(s)
- Qiu-Xing Jiang
- Department of Microbiology and Cell Science, University of Florida, Gainesville, FL 32611, United States
| |
Collapse
|
43
|
Jia J, Li X, Qiu W, Xiao X, Chou KC. iPPI-PseAAC(CGR): Identify protein-protein interactions by incorporating chaos game representation into PseAAC. J Theor Biol 2019; 460:195-203. [DOI: 10.1016/j.jtbi.2018.10.021] [Citation(s) in RCA: 78] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2018] [Revised: 09/16/2018] [Accepted: 10/08/2018] [Indexed: 01/11/2023]
|
44
|
Pan Y, Wang S, Zhang Q, Lu Q, Su D, Zuo Y, Yang L. Analysis and prediction of animal toxins by various Chou's pseudo components and reduced amino acid compositions. J Theor Biol 2018; 462:221-229. [PMID: 30452961 DOI: 10.1016/j.jtbi.2018.11.010] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2018] [Revised: 11/06/2018] [Accepted: 11/15/2018] [Indexed: 01/19/2023]
Abstract
The animal toxin proteins are one of the disulfide rich small peptides that detected in venomous species. They are used as pharmacological tools and therapeutic agents in medicine for the high specificity of their targets. The successful analysis and prediction of toxin proteins may have important signification for the pharmacological and therapeutic researches of toxins. In this study, significant differences were found between the toxins and the non-toxins in amino acid compositions and several important biological properties. The random forest was firstly proposed to predict the animal toxin proteins by selecting 400 pseudo amino acid compositions and the dipeptide compositions of reduced amino acid alphabet as the input parameters. Based on dipeptide composition of reduced amino acid alphabet with 13 reduced amino acids, the best overall accuracy of 85.71% was obtained. These results indicated that our algorithm was an efficient tool for the animal toxin prediction.
Collapse
Affiliation(s)
- Yi Pan
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Shiyuan Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Qi Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Qianzi Lu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Dongqing Su
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Yongchun Zuo
- The State key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot 010070, China.
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China.
| |
Collapse
|
45
|
Mei J, Fu Y, Zhao J. Analysis and prediction of ion channel inhibitors by using feature selection and Chou's general pseudo amino acid composition. J Theor Biol 2018; 456:41-48. [DOI: 10.1016/j.jtbi.2018.07.040] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2018] [Revised: 07/20/2018] [Accepted: 07/29/2018] [Indexed: 12/23/2022]
|
46
|
Ariaeenejad S, Mousivand M, Moradi Dezfouli P, Hashemi M, Kavousi K, Hosseini Salekdeh G. A computational method for prediction of xylanase enzymes activity in strains of Bacillus subtilis based on pseudo amino acid composition features. PLoS One 2018; 13:e0205796. [PMID: 30346964 PMCID: PMC6197662 DOI: 10.1371/journal.pone.0205796] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2017] [Accepted: 10/02/2018] [Indexed: 01/09/2023] Open
Abstract
Xylanases are hydrolytic enzymes which based on physicochemical properties, structure, mode of action and substrate specificities are classified into various glycoside hydrolase (GH) families. The purpose of this study is to show that the activity of the members of the xylanase family in the specified pH and temperature conditions can be computationally predicted. The proposed computational regression model was trained and tested with the Pseudo Amino Acid Composition (PseAAC) features extracted solely from the amino acid sequences of enzymes. The xylanases with experimentally determined activities were used as the training dataset to adjust the model parameters. To develop the model, 41 strains of Bacillus subtilis isolated from field soil were screened. From them, 28 strains with the highest halo diameter were selected for further studies. The performance of the model for prediction of xylanase activity was evaluated in three different temperature and pH conditions using stratified cross-validation and jackknife methods. The trained model can be used for determining the activity of newly found xylanases in the specified condition. Such computational models help to scale down the experimental costs and save time by identifying enzymes with appropriate activity for scientific and industrial usage. Our methodology for activity prediction of xylanase enzymes can be potentially applied to the members of the other enzyme families. The availability of sufficient experimental data in specified pH and temperature conditions is a prerequisite for training the learning model and to achieve high accuracy.
Collapse
Affiliation(s)
- Shohreh Ariaeenejad
- Department of Systems Biology, Agricultural Biotechnology Research Institute of Iran (ABRII), Agricultural Research Education and Extension Organization (AREO), Karaj, Iran
| | - Maryam Mousivand
- Department of Microbial Biotechnology, Agricultural Biotechnology Research Institute of Iran (ABRII), Agricultural Research Education and Extension Organization (AREO), Karaj, Iran
| | - Parinaz Moradi Dezfouli
- Department of Microbial Biotechnology, Agricultural Biotechnology Research Institute of Iran (ABRII), Agricultural Research Education and Extension Organization (AREO), Karaj, Iran
| | - Maryam Hashemi
- Department of Microbial Biotechnology, Agricultural Biotechnology Research Institute of Iran (ABRII), Agricultural Research Education and Extension Organization (AREO), Karaj, Iran
| | - Kaveh Kavousi
- Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran, Iran
| | - Ghasem Hosseini Salekdeh
- Department of Systems Biology, Agricultural Biotechnology Research Institute of Iran (ABRII), Agricultural Research Education and Extension Organization (AREO), Karaj, Iran
| |
Collapse
|
47
|
Shao L, Gao H, Liu Z, Feng J, Tang L, Lin H. Identification of Antioxidant Proteins With Deep Learning From Sequence Information. Front Pharmacol 2018; 9:1036. [PMID: 30294271 PMCID: PMC6158654 DOI: 10.3389/fphar.2018.01036] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2018] [Accepted: 08/27/2018] [Indexed: 01/26/2023] Open
Abstract
Antioxidant proteins have been found closely linked to disease control for its ability to eliminate excess free radicals. Because of its medicinal value, the study of identifying antioxidant proteins is on the upsurge. Many machine-learning classifiers have performed poorly owing to the nonlinear and unbalanced nature of biological data. Recently, deep learning techniques showed advantages over many state-of-the-art machine learning methods in various fields. In this study, a deep learning based classifier was proposed to identify antioxidant proteins based on mixed g-gap dipeptide composition feature vector. The classifier employed deep autoencoder to extract nonlinear representation from raw input. The t-Distributed Stochastic Neighbor Embedding (t-SNE) was used for dimensionality reduction. Support vector machine was finally performed for classification. The classifier achieved F 1 score of 0.8842 and MCC of 0.7409 in 10-fold cross validation. Experimental results show that our proposed method outperformed the traditional machine learning methods and could be a promising tool for antioxidant protein identification. For the convenience of others' scientific research, we have developed a user-friendly web server called IDAod for antioxidant protein identification, which can be accessed freely at http://bigroup.uestc.edu.cn/IDAod/.
Collapse
Affiliation(s)
- Lifen Shao
- Center for Informational Biology, School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China
| | - Hui Gao
- Center for Informational Biology, School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China
| | - Zhen Liu
- Center for Informational Biology, School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China
| | - Juan Feng
- Key Laboratory for Neuro-Information of Ministry of Education, Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Lixia Tang
- Key Laboratory for Neuro-Information of Ministry of Education, Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
48
|
Using a Classifier Fusion Strategy to Identify Anti-angiogenic Peptides. Sci Rep 2018; 8:14062. [PMID: 30218091 PMCID: PMC6138733 DOI: 10.1038/s41598-018-32443-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2018] [Accepted: 09/07/2018] [Indexed: 12/27/2022] Open
Abstract
Anti-angiogenic peptides perform distinct physiological functions and potential therapies for angiogenesis-related diseases. Accurate identification of anti-angiogenic peptides may provide significant clues to understand the essential angiogenic homeostasis within tissues and develop antineoplastic therapies. In this study, an ensemble predictor is proposed for anti-angiogenic peptide prediction by fusing an individual classifier with the best sensitivity and another individual one with the best specificity. We investigate predictive capabilities of various feature spaces with respect to the corresponding optimal individual classifiers and ensemble classifiers. The accuracy and Matthew’s Correlation Coefficient (MCC) of the ensemble classifier trained by Bi-profile Bayes (BpB) features are 0.822 and 0.649, respectively, which represents the highest prediction results among the investigated prediction models. Discriminative features are obtained from BpB using the Relief algorithm followed by the Incremental Feature Selection (IFS) method. The sensitivity, specificity, accuracy, and MCC of the ensemble classifier trained by the discriminative features reach up to 0.776, 0.888, 0.832, and 0.668, respectively. Experimental results indicate that the proposed method is far superior to the previous study for anti-angiogenic peptide prediction.
Collapse
|
49
|
Chen W, Ding H, Zhou X, Lin H, Chou KC. iRNA(m6A)-PseDNC: Identifying N 6-methyladenosine sites using pseudo dinucleotide composition. Anal Biochem 2018; 561-562:59-65. [PMID: 30201554 DOI: 10.1016/j.ab.2018.09.002] [Citation(s) in RCA: 132] [Impact Index Per Article: 18.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Revised: 08/31/2018] [Accepted: 09/03/2018] [Indexed: 01/28/2023]
Abstract
As a prevalent post-transcriptional modification, N6-methyladenosine (m6A) plays key roles in a series of biological processes. Although experimental technologies have been developed and applied to identify m6A sites, they are still cost-ineffective for transcriptome-wide detections of m6A. As good complements to the experimental techniques, some computational methods have been proposed to identify m6A sites. However, their performance remains unsatisfactory. In this study, we firstly proposed an Euclidean distance based method to construct a high quality benchmark dataset. By encoding the RNA sequences using pseudo nucleotide composition, a new predictor called iRNA(m6A)-PseDNC was developed to identify m6A sites in the Saccharomyces cerevisiae genome. It has been demonstrated by the 10-fold cross validation test that the performance of iRNA(m6A)-PseDNC is superior to the existing methods. Meanwhile, for the convenience of most experimental scientists, established at the site http://lin-group.cn/server/iRNA(m6A)-PseDNC.php is its web-server, by which users can easily get their desired results without need to go through the detailed mathematics. It is anticipated that iRNA(m6A)-PseDNC will become a useful high throughput tool for identifying m6A sites in the S. cerevisiae genome.
Collapse
Affiliation(s)
- Wei Chen
- School of Sciences, Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan, 063000, China; Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, 611730, China; Gordon Life Science Institute, Boston, MA, 02478, USA.
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China.
| | - Xu Zhou
- School of Sciences, Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan, 063000, China.
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China; Gordon Life Science Institute, Boston, MA, 02478, USA.
| | - Kuo-Chen Chou
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China; Gordon Life Science Institute, Boston, MA, 02478, USA.
| |
Collapse
|
50
|
Tan JX, Dao FY, Lv H, Feng PM, Ding H. Identifying Phage Virion Proteins by Using Two-Step Feature Selection Methods. Molecules 2018; 23:molecules23082000. [PMID: 30103458 PMCID: PMC6222849 DOI: 10.3390/molecules23082000] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2018] [Revised: 07/30/2018] [Accepted: 08/08/2018] [Indexed: 12/31/2022] Open
Abstract
Accurate identification of phage virion protein is not only a key step for understanding the function of the phage virion protein but also helpful for further understanding the lysis mechanism of the bacterial cell. Since traditional experimental methods are time-consuming and costly for identifying phage virion proteins, it is extremely urgent to apply machine learning methods to accurately and efficiently identify phage virion proteins. In this work, a support vector machine (SVM) based method was proposed by mixing multiple sets of optimal g-gap dipeptide compositions. The analysis of variance (ANOVA) and the minimal-redundancy-maximal-relevance (mRMR) with an increment feature selection (IFS) were applied to single out the optimal feature set. In the five-fold cross-validation test, the proposed method achieved an overall accuracy of 87.95%. We believe that the proposed method will become an efficient and powerful method for scientists concerning phage virion proteins.
Collapse
Affiliation(s)
- Jiu-Xin Tan
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | - Fu-Ying Dao
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | - Hao Lv
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | - Peng-Mian Feng
- Hebei Province Key Laboratory of Occupational Health and Safety for Coal Industry, School of Public Health, North China University of Science and Technology, Tangshan 063000, China.
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| |
Collapse
|