1
|
Zhao SH, Tan J, Zhang W, Zhou Y, Ning YQ, Sun Y, Zhao JW, Jiang DM, Li XF. Identification of ice-binding proteins from Raphanus sativus and application in frozen dough. NPJ Sci Food 2025; 9:58. [PMID: 40274810 PMCID: PMC12022278 DOI: 10.1038/s41538-025-00420-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2024] [Accepted: 04/10/2025] [Indexed: 04/26/2025] Open
Abstract
Cryopreservation is a widely employed method for processing and preserving food. However, conventional antifreeze agents often hard to mitigate the mechanical damage caused by ice recrystallization during freeze-thaw cycles. In this study, two ice-binding proteins (IBPs), COR15B and COR47, were identified from Raphanus sativus using bioinformatics and molecular biology techniques. Both IBPs exhibited significant ice recrystallization inhibition (IRI) and ice crystal morphology modification activity. A novel, high-yield Bacillus subtilis expression system was developed for the heterologous production of these IBPs, achieving approximately 50 μg/mL through response surface optimization. These proteins, even when used at thousandths of the ratio, retained their IRI activity. Notably, the heterologously expressed IBPs significantly reduced freeze-induced damage in flour-based products and improved yeast survival and fermentative capacity during repeated freeze-thaw cycles. These results highlight the considerable potential of radish-derived IBPs as cryoprotectants for enhancing food storage stability.
Collapse
Affiliation(s)
- Shu-Heng Zhao
- School of Life Sciences, East China Normal University, 500 Dongchuan Rd., Shanghai, 200241, P R China
| | - Jing Tan
- School of Life Sciences, East China Normal University, 500 Dongchuan Rd., Shanghai, 200241, P R China
| | - Wei Zhang
- School of Life Sciences, East China Normal University, 500 Dongchuan Rd., Shanghai, 200241, P R China
| | - Yan Zhou
- School of Life Sciences, East China Normal University, 500 Dongchuan Rd., Shanghai, 200241, P R China
| | - Yi-Qiu Ning
- School of Life Sciences, East China Normal University, 500 Dongchuan Rd., Shanghai, 200241, P R China
| | - Yue Sun
- School of Life Sciences, East China Normal University, 500 Dongchuan Rd., Shanghai, 200241, P R China
| | - Jun-Wei Zhao
- Beijing Institute of Life Science and Technology, Courtyard 7, Yingcai South 1st Street, Future Science City South District, Beiqijia Town, Changping District, Beijing, 102200, PR China
| | - De-Ming Jiang
- School of Life Sciences, East China Normal University, 500 Dongchuan Rd., Shanghai, 200241, P R China.
| | - Xiao-Fang Li
- School of Life Sciences, East China Normal University, 500 Dongchuan Rd., Shanghai, 200241, P R China.
| |
Collapse
|
2
|
Lv Z, Wei M, Pei H, Peng S, Li M, Jiang L. PTSP-BERT: Predict the thermal stability of proteins using sequence-based bidirectional representations from transformer-embedded features. Comput Biol Med 2025; 185:109598. [PMID: 39708499 DOI: 10.1016/j.compbiomed.2024.109598] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2024] [Revised: 12/16/2024] [Accepted: 12/17/2024] [Indexed: 12/23/2024]
Abstract
Thermophilic proteins, mesophiles proteins and psychrophilic proteins have wide industrial applications, as enzymes with different optimal temperatures are often needed for different purposes. Convenient methods are needed to determine the optimal temperatures for proteins; however, laboratory methods for this purpose are time-consuming and laborious, and existing machine learning methods can only perform binary classification of thermophilic and non-thermophilic proteins, or psychrophilic and non-psychrophilic proteins. Here, we developed a deep learning model, PSTP-BERT, based on protein sequences that can directly perform Three classes identification of thermophilic, mesophilic, and psychrophilic proteins. By comparing BERT-bfd with other deep learning models using five-fold cross-validation, we found that BERT-bfd-extracted features achieved the highest accuracy under six classifiers. Furthermore, to improve the model's accuracy, we used SMOTE (synthetic minority oversampling technique) to balance the dataset and light gradient-boosting machine to rank BERT-bfd-extracted features according to their weights. We obtained the best-performing model with five-fold cross-validation accuracy of 89.59 % and independent test accuracy of 85.42 %. The performance of the PSTP-BERT is significantly better than that of existing models in Three classes identification task. In order to compare with previous binary classification models, we used PSTP-BERT to perform binary classification tasks of thermophilic and non-thermophilic protein, and psychrophilic and non-psychrophilic protein on an independent test set. PSTP-BERT achieved the highest accuracy on both binary classification tasks, with an accuracy of 93.33 % for thermophilic protein binary classification and 88.33 % for psychrophilic protein binary classification. The accuracy of the independent test of the model can reach between 89.8 % and 92.9 % after training and optimization of the training set with different sequence similarities, and the prediction accuracy of the new data can exceed 97 %. For the convenience of future researchers to use and reference, we have uploaded source code of PSTP-BERT to GitHub.
Collapse
Affiliation(s)
- Zhibin Lv
- College of Biomedical Engineering, Sichuan University, Chengdu, 610065, China.
| | - Mingxuan Wei
- College of Biomedical Engineering, Sichuan University, Chengdu, 610065, China
| | - Hongdi Pei
- Department of Biomedical Engineering, Johns Hopkins University, MD, 21218, USA
| | - Shiyu Peng
- College of Biomedical Engineering, Sichuan University, Chengdu, 610065, China
| | - Mingxin Li
- College of Biomedical Engineering, Sichuan University, Chengdu, 610065, China
| | - Liangzhen Jiang
- College of Food and Biological Engineering, Chengdu University, Chengdu, 610106, China; Country Key Laboratory of Coarse Cereal Processing, Ministry of Agriculture and Rural Affairs, Chengdu, 610106, China
| |
Collapse
|
3
|
Kumar N, Choudhury S, Bajiya N, Patiyal S, Raghava GPS. Prediction of Anti-Freezing Proteins From Their Evolutionary Profile. Proteomics 2025; 25:e202400157. [PMID: 39305039 DOI: 10.1002/pmic.202400157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 08/29/2024] [Accepted: 08/29/2024] [Indexed: 02/06/2025]
Abstract
Prediction of antifreeze proteins (AFPs) holds significant importance due to their diverse applications in healthcare. An inherent limitation of current AFP prediction methods is their reliance on unreviewed proteins for evaluation. This study evaluates, proposed and existing methods on an independent dataset containing 80 AFPs and 73 non-AFPs obtained from Uniport, which have been already reviewed by experts. Initially, we constructed machine learning models for AFP prediction using selected composition-based protein features and achieved a peak AUROC of 0.90 with an MCC of 0.69 on the independent dataset. Subsequently, we observed a notable enhancement in model performance, with the AUROC increasing from 0.90 to 0.93 upon incorporating evolutionary information instead of relying solely on the primary sequence of proteins. Furthermore, we explored hybrid models integrating our machine learning approaches with BLAST-based similarity and motif-based methods. However, the performance of these hybrid models either matched or was inferior to that of our best machine-learning model. Our best model based on evolutionary information outperforms all existing methods on independent/validation dataset. To facilitate users, a user-friendly web server with a standalone package named "AFPropred" was developed (https://webs.iiitd.edu.in/raghava/afpropred).
Collapse
Affiliation(s)
- Nishant Kumar
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Shubham Choudhury
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Nisha Bajiya
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Sumeet Patiyal
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
- Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, USA
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| |
Collapse
|
4
|
Song N, Jiang X, Gu J, Zhang B, Zhao H. Plant-based oat peptides as cryoprotectants mitigate freezing damage to Lactobacillus bulgaricus CICC 22163. Food Res Int 2025; 203:115855. [PMID: 40022378 DOI: 10.1016/j.foodres.2025.115855] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2024] [Revised: 01/18/2025] [Accepted: 01/23/2025] [Indexed: 03/03/2025]
Abstract
The freezing process leads to the death of lactic acid bacteria (LAB), making cryoprotection a significant research focus. In this study, plant-derived oat peptides demonstrated effective cryoprotective effects against Lactobacillus bulgaricus. First, the oat peptide extraction process was optimized with cell viability as the indicator: the yield was found to be 59.51 %. After freezing, it was identified that a 40 mg/mL oat peptide solution provided the best protective effect. The oat peptides enhanced fermentation vigor and reduced cell membrane damage. The mechanisms of action were explored. The oat peptides preserved the intact morphology of cells and significantly improved the viability of lactic dehydrogenase and β-galactosidase. Additionally, the peptides lowered the freezing point to -2.1 °C, which mitigated ice crystal edge formation and reduced both ice crystal diameter and area. The oat peptides physically absorbed onto the surface of the bacteria, exerting an antifreeze effect. Finally, based on amino acid evaluation, three peptide fragments (LSCDKYCFME, FDGCFMEN, and QHCWLGGK) were synthesized, and these synthesized peptides effectively increased the survival rate of L. bulgaricus, with QHCWLGGK also exhibiting protective effects. Therefore, plant-based oat peptides can be utilized as cryoprotectants for freezing LAB.
Collapse
Affiliation(s)
- Nannan Song
- College of Biological Science & Biotechnology, Beijing Key Laboratory of Forest Food Processing and Safety, Beijing Forestry University, Beijing 100083 China
| | - Xiaoying Jiang
- China National Research Institute of Food and Fermentation Industries Corporation Limited, Beijing 100015 China
| | - Jiabao Gu
- College of Biological Science & Biotechnology, Beijing Key Laboratory of Forest Food Processing and Safety, Beijing Forestry University, Beijing 100083 China
| | - Bolin Zhang
- College of Biological Science & Biotechnology, Beijing Key Laboratory of Forest Food Processing and Safety, Beijing Forestry University, Beijing 100083 China
| | - Hongfei Zhao
- College of Biological Science & Biotechnology, Beijing Key Laboratory of Forest Food Processing and Safety, Beijing Forestry University, Beijing 100083 China.
| |
Collapse
|
5
|
Jiang W, Yang F, Cai D, Du J, Wu M, Cai X, Chen X, Wang S. Peptidomics & Molecular Simulation-Based Specific Screening of Antifreeze Peptides from Evynnis japonica Scale and the Action Mechanism. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2025; 73:2634-2644. [PMID: 39804014 DOI: 10.1021/acs.jafc.4c09419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/30/2025]
Abstract
This study aims to explore the cryoprotective mechanisms of food-derived hydrolyzed peptides and develop novel cryoprotectants to enhance the quality of frozen foods. Evynnis japonica scale antifreeze peptides (Ej-AFP) were prepared using enzymatic hydrolysis, which had a 4-fold increase in protection efficiency for surimi compared to traditional cryoprotectants. Furthermore, Ej-AFP was able to control 63.60% of the ice crystals to sizes below 600 μm2. Three antifreeze peptide sequences were purified by using ice-affinity techniques and peptidomics. These sequences demonstrated a 21.75% enhancement in antifreeze activity and an increase of 1 °C in thermal hysteresis activity compared to Ej-AFP. Molecular simulation-elucidated ice-binding surface interacts with ice crystals through hydrogen bonds, while the nonice-binding surface disrupts the orderly arrangement of water molecules. This results in a tightly structured hydration layer around the peptide, increasing the curvature of the ice crystal surface and thereby demonstrating significant antifreeze activity in controlling ice crystal growth.
Collapse
Affiliation(s)
- Wenting Jiang
- College of Chemical Engineering, College of Biological Science and Engineering, Fuzhou University, Fuzhou 350108, P.R.China
| | - Fujia Yang
- College of Chemical Engineering, College of Biological Science and Engineering, Fuzhou University, Fuzhou 350108, P.R.China
- Marine and Agricultural Biotechnology Laboratory, Fuzhou Institute of Oceanography, College of Geography and Oceanography, Minjiang University, Fuzhou 350108, P.R.China
| | - Dongna Cai
- College of Chemical Engineering, College of Biological Science and Engineering, Fuzhou University, Fuzhou 350108, P.R.China
| | - Jia Du
- College of Chemical Engineering, College of Biological Science and Engineering, Fuzhou University, Fuzhou 350108, P.R.China
| | - Manman Wu
- College of Chemical Engineering, College of Biological Science and Engineering, Fuzhou University, Fuzhou 350108, P.R.China
| | - Xixi Cai
- College of Chemical Engineering, College of Biological Science and Engineering, Fuzhou University, Fuzhou 350108, P.R.China
| | - Xu Chen
- College of Chemical Engineering, College of Biological Science and Engineering, Fuzhou University, Fuzhou 350108, P.R.China
| | - Shaoyun Wang
- College of Chemical Engineering, College of Biological Science and Engineering, Fuzhou University, Fuzhou 350108, P.R.China
| |
Collapse
|
6
|
Wu J, Liu Y, Zhu Y, Yu DJ. Improving Antifreeze Proteins Prediction With Protein Language Models and Hybrid Feature Extraction Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:2349-2358. [PMID: 39316498 DOI: 10.1109/tcbb.2024.3467261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2024]
Abstract
Accurate identification of antifreeze proteins (AFPs) is crucial in developing biomimetic synthetic anti-icing materials and low-temperature organ preservation materials. Although numerous machine learning-based methods have been proposed for AFPs prediction, the complex and diverse nature of AFPs limits the prediction performance of existing methods. In this study, we propose AFP-Deep, a new deep learning method to predict antifreeze proteins by integrating embedding from protein sequences with pre-trained protein language models and evolutionary contexts with hybrid feature extraction networks. The experimental results demonstrated that the main advantage of AFP-Deep is its utilization of pre-trained protein language models, which can extract discriminative global contextual features from protein sequences. Additionally, the hybrid deep neural networks designed for protein language models and evolutionary context feature extraction enhance the correlation between embeddings and antifreeze pattern. The performance evaluation results show that AFP-Deep achieves superior performance compared to state-of-the-art models on benchmark datasets, achieving an AUPRC of 0.724 and 0.924, respectively.
Collapse
|
7
|
Majura JJ, Chen X, Chen Z, Tan M, Zhu G, Gao J, Lin H, Cao W. The cryoprotective effect of Litopenaeus vannamei head-derived peptides and its ice-binding mechanism. Curr Res Food Sci 2024; 9:100886. [PMID: 39469721 PMCID: PMC11513795 DOI: 10.1016/j.crfs.2024.100886] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2024] [Revised: 10/08/2024] [Accepted: 10/13/2024] [Indexed: 10/30/2024] Open
Abstract
Although discarded as waste, shrimp heads are a potential source of antifreeze peptides, which can be used as cryoprotectants in the food industry. Their utilization in frozen foods can help mitigate the negative effects caused by the freezing technique. Litopenaeus vannamei shrimp heads were autolyzed, and the shrimp head autolysate (SHA) was separated via ultra-filtration and ion exchange chromatography. The antifreeze effect of SHA on the biochemical properties of myofibrillar proteins of peeled shrimps during five freeze-thaw cycles was evaluated. Peptide screening was done using the LC-MS/MS technique. A molecular docking (MD) study of the interaction between ice and shrimp head-derived antifreeze peptides was done. Results showed that shrimp-head autolysate has a maximum thermal hysteresis value of 1.84 °C. During the freeze-thaw cycles, the shrimp-head autolysate exhibited an antifreeze effect on frozen peeled shrimps. 1.0 and 3.0%-SHA groups showed significantly lower freeze denaturation than the negative control group. The muscle tissues of SHA-treated groups were not as severely damaged as the negative control group. The molecular docking study revealed that the shrimp head-AFPs bound to ice via hydrogen bonding, and both hydrophilic and hydrophobic amino acid residues were involved in the ice-binding interactions. 6 ice-binding sites were involved in the peptide-ice interaction. Our findings suggest that shrimp head-derived AFPs can be developed into functional additives in frozen foods and add more insights into the existing literature on antifreeze peptides.
Collapse
Affiliation(s)
- Julieth Joram Majura
- College of Food Science and Technology, Guangdong Ocean University, Zhanjiang 524088, China
| | - Xiujuan Chen
- College of Food Science and Technology, Guangdong Ocean University, Zhanjiang 524088, China
| | - Zhongqin Chen
- College of Food Science and Technology, Guangdong Ocean University, Zhanjiang 524088, China
- Guangdong Provincial Key Laboratory of Aquatic Products Processing and Safety, Guangdong Provincial Engineering Technology Research Center of Seafood, Zhanjiang 524088, China
- Guangdong Province Engineering Laboratory for Marine Biological Products, Zhanjiang 524088, China
| | - Mingtang Tan
- College of Food Science and Technology, Guangdong Ocean University, Zhanjiang 524088, China
- Guangdong Provincial Key Laboratory of Aquatic Products Processing and Safety, Guangdong Provincial Engineering Technology Research Center of Seafood, Zhanjiang 524088, China
- Guangdong Province Engineering Laboratory for Marine Biological Products, Zhanjiang 524088, China
| | - Guoping Zhu
- College of Food Science and Technology, Guangdong Ocean University, Zhanjiang 524088, China
- Guangdong Provincial Key Laboratory of Aquatic Products Processing and Safety, Guangdong Provincial Engineering Technology Research Center of Seafood, Zhanjiang 524088, China
- Guangdong Province Engineering Laboratory for Marine Biological Products, Zhanjiang 524088, China
| | - Jialong Gao
- College of Food Science and Technology, Guangdong Ocean University, Zhanjiang 524088, China
- Guangdong Provincial Key Laboratory of Aquatic Products Processing and Safety, Guangdong Provincial Engineering Technology Research Center of Seafood, Zhanjiang 524088, China
- Guangdong Province Engineering Laboratory for Marine Biological Products, Zhanjiang 524088, China
| | - Haisheng Lin
- College of Food Science and Technology, Guangdong Ocean University, Zhanjiang 524088, China
- Guangdong Provincial Key Laboratory of Aquatic Products Processing and Safety, Guangdong Provincial Engineering Technology Research Center of Seafood, Zhanjiang 524088, China
- Guangdong Province Engineering Laboratory for Marine Biological Products, Zhanjiang 524088, China
| | - Wenhong Cao
- College of Food Science and Technology, Guangdong Ocean University, Zhanjiang 524088, China
- Guangdong Provincial Key Laboratory of Aquatic Products Processing and Safety, Guangdong Provincial Engineering Technology Research Center of Seafood, Zhanjiang 524088, China
- Guangdong Province Engineering Laboratory for Marine Biological Products, Zhanjiang 524088, China
| |
Collapse
|
8
|
Feng C, Wei H, Li X, Feng B, Xu C, Zhu X, Liu R. A stacking-based algorithm for antifreeze protein identification using combined physicochemical, pseudo amino acid composition, and reduction property features. Comput Biol Med 2024; 176:108534. [PMID: 38754217 DOI: 10.1016/j.compbiomed.2024.108534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Revised: 04/03/2024] [Accepted: 04/28/2024] [Indexed: 05/18/2024]
Abstract
Antifreeze proteins have wide applications in the medical and food industries. In this study, we propose a stacking-based classifier that can effectively identify antifreeze proteins. Initially, feature extraction was performed in three aspects: reduction properties, scalable pseudo amino acid composition, and physicochemical properties. A hybrid feature set comprised of the combined information from these three categories was obtained. Subsequently, we trained the training set based on LightGBM, XGBoost, and RandomForest algorithms, and the training outcomes were passed to the Logistic algorithm for matching, thereby establishing a stacking algorithm. The proposed algorithm was tested on the test set and an independent validation set. Experimental data indicates that the algorithm achieved a recognition accuracy of 98.3 %, and an accuracy of 98.5 % on the validation set. Lastly, we analyzed the reasons why numerical features achieved high recognition capabilities from multiple aspects. Data dimensionality reduction and the analysis from two-dimensional and three-dimensional views revealed separability between positive and negative samples, and the protein three-dimensional structure further demonstrated significant differences in related features between the two samples. Analysis of the classifier revealed that Hr*Hr, HrHr, and Sc-PseAAC_1, 188D(152,116,57,183) were among the seven most important numerical features affecting algorithm recognition. For Hr*Hr and HrHr, supportive sequence level evidence for the reduction dictionary was found in terms of conservation area analysis, multiple sequence alignment, and amino acid conservative substitution. Moreover, the importance of the reduction dictionary was recognized through a comparative analysis of importance before and after the reduction, realizing the effectiveness of the dictionary in improving feature importance. A decision tree model has been utilized to discern the distinctions between dipeptides associated with the physical and chemical properties of His(H), Iso(I), Leu(L), and Lys(K) and other dipeptides. We finally analyzed the other seven features of importance, and data analysis confirmed that hydrophobicity, secondary structure, charge properties, van der Waals forces, and solvent accessibility are also factors affecting the antifreeze capability of proteins.
Collapse
Affiliation(s)
- Changli Feng
- Department of Information Science and Technology, Taishan University, Taian, 271000, China.
| | - Haiyan Wei
- Department of Information Science and Technology, Taishan University, Taian, 271000, China.
| | - Xin Li
- Department of Information Science and Technology, Taishan University, Taian, 271000, China.
| | - Bin Feng
- Department of Information Science and Technology, Taishan University, Taian, 271000, China.
| | - Chugui Xu
- Department of Information Science and Technology, Taishan University, Taian, 271000, China.
| | - Xiaorong Zhu
- Department of Information Science and Technology, Taishan University, Taian, 271000, China.
| | - Ruijun Liu
- School of Software, Beihang University, Beijing, 100191, China.
| |
Collapse
|
9
|
Box ICH, van der Burg KRL, Marshall KE. Analysis of Ice-Binding Protein Evolution. Methods Mol Biol 2024; 2730:219-229. [PMID: 37943462 DOI: 10.1007/978-1-0716-3503-2_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2023]
Abstract
Discovering novel ice-binding proteins (IBPs) is important for understanding the evolution of IBPs but it is difficult to determine where resources should be directed in the search for novel IBPs. For this reason, we developed a simple bioinformatic approach for aiding in the determination of where to direct efforts in the search for IBPs. First, BLAST is used to obtain a candidate list of putative IBPs. Next, phylogenetic trees are constructed to map the candidate list of putative IBPs to determine if any patterns are forming. These candidate putative IBPs and their patterns are then assessed through the production of ancestral sequences and reverse BLAST searches, in addition to the use of IBP calculators, to determine which sequences should be cut to produce the final putative IBP list. Finally, we explain an avenue to investigate these putative IBPs further for the development of hypotheses on their evolution.
Collapse
Affiliation(s)
- Isaiah C H Box
- Department of Zoology, University of British Columbia, Vancouver, BC, Canada
| | | | - Katie E Marshall
- Department of Zoology, University of British Columbia, Vancouver, BC, Canada.
| |
Collapse
|
10
|
Dhibar S, Jana B. Accurate Prediction of Antifreeze Protein from Sequences through Natural Language Text Processing and Interpretable Machine Learning Approaches. J Phys Chem Lett 2023; 14:10727-10735. [PMID: 38009833 DOI: 10.1021/acs.jpclett.3c02817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Antifreeze proteins (AFPs) bind to growing iceplanes owing to their structural complementarity nature, thereby inhibiting the ice-crystal growth by thermal hysteresis. Classification of AFPs from sequence is a difficult task due to their low sequence similarity, and therefore, the usual sequence similarity algorithms, like Blast and PSI-Blast, are not efficient. Here, a method combining n-gram feature vectors and machine learning models to accelerate the identification of potential AFPs from sequences is proposed. All these n-gram features are extracted from the K-mer counting method. The comparative analysis reveals that, among different machine learning models, Xgboost outperforms others in predicting AFPs from sequence when penta-mers are used as a feature vector. When tested on an independent dataset, our method performed better compared to other existing ones with sensitivity of 97.50%, recall of 98.30%, and f1 score of 99.10%. Further, we used the SHAP method, which provides important insight into the functional activity of AFPs.
Collapse
Affiliation(s)
- Saikat Dhibar
- School of Chemical Sciences, Indian Association for the Cultivation of Science, Jadavpur, Kolkata 700032, India
| | - Biman Jana
- School of Chemical Sciences, Indian Association for the Cultivation of Science, Jadavpur, Kolkata 700032, India
| |
Collapse
|
11
|
Vasconcelos Rissi D, Ijaz M, Baschien C. Comparative genome analysis of the freshwater fungus Filosporella fistucella indicates potential for plant-litter degradation at cold temperatures. G3 (BETHESDA, MD.) 2023; 13:jkad190. [PMID: 37619983 PMCID: PMC10627260 DOI: 10.1093/g3journal/jkad190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2023] [Revised: 08/03/2023] [Accepted: 08/09/2023] [Indexed: 08/26/2023]
Abstract
Freshwater fungi play an important role in the decomposition of organic matter of leaf litter in rivers and streams. They also possess the necessary mechanisms to endure lower temperatures caused by habitat and weather variations. This includes the production of cold-active enzymes and antifreeze proteins. To better understand the physiological activities of freshwater fungi in their natural environment, different methods are being applied, and genome sequencing is one in the spotlight. In our study, we sequenced the first genome of the freshwater fungus Filosporella fistucella (45.7 Mb) and compared the genome with the evolutionary close-related species Tricladium varicosporioides (48.2 Mb). The genomes were annotated using the carbohydrate-active enzyme database where we then filtered for leaf-litter degradation-related enzymes (cellulase, hemicellulase, laccase, pectinase, cutinase, amylase, xylanase, and xyloglucanase). Those enzymes were analyzed for antifreeze properties using a machine-learning approach. We discovered that F. fistucella has more enzymes to participate in the breakdown of sugar, leaf, and wood than T. varicosporioides (855 and 719, respectively). Filosporella fistucella shows a larger set of enzymes capable of resisting cold temperatures than T. varicosporioides (75 and 66, respectively). Our findings indicate that in comparison with T. varicosporioides, F. fistucella has a greater capacity for aquatic growth, adaptability to freshwater environments, and resistance to low temperatures.
Collapse
Affiliation(s)
- Daniel Vasconcelos Rissi
- Leibniz - Institute DSMZ, German Collection of Microorganisms and Cell Cultures, 38124 Braunschweig, Germany
| | - Maham Ijaz
- Leibniz - Institute DSMZ, German Collection of Microorganisms and Cell Cultures, 38124 Braunschweig, Germany
| | - Christiane Baschien
- Leibniz - Institute DSMZ, German Collection of Microorganisms and Cell Cultures, 38124 Braunschweig, Germany
| |
Collapse
|
12
|
Khan A, Uddin J, Ali F, Kumar H, Alghamdi W, Ahmad A. AFP-SPTS: An Accurate Prediction of Antifreeze Proteins Using Sequential and Pseudo-Tri-Slicing Evolutionary Features with an Extremely Randomized Tree. J Chem Inf Model 2023; 63:826-834. [PMID: 36649569 DOI: 10.1021/acs.jcim.2c01417] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
The development of intracellular ice in the bodies of cold-blooded living organisms may cause them to die. These species yield antifreeze proteins (AFPs) to live in subzero temperature environments. Additionally, AFPs are implemented in biotechnological, industrial, agricultural, and medical fields. Machine learning-based predictors were presented for AFP identification. However, more accurate predictors are still highly desirable for boosting the AFP prediction. This work presents a novel approach, named AFP-SPTS, for the correct prediction of AFPs. We explored the discriminative features with four schemes, namely, dipeptide deviation from the expected mean (DDE), reduced amino acid alphabet (RAAA), grouped dipeptide composition (GDPC), and a novel representative method, called pseudo-position-specific scoring matrix tri-slicing (PseTS-PSSM). Considering the advantages of ensemble learning strategy, we fused each feature vector into different combinations and trained the models with five machine learning algorithms, i.e., multilayer perceptron (MLP), extremely randomized tree (ERT), decision tree (DT), random forest (RF), and AdaBoost. Among all models, PseTS-PSSM + RAAA with an extremely randomized tree attained the best outcomes. The proposed predictor (AFP-SPTS) boosted the accuracies of AFPs in the literature by 1.82 and 4.1%.
Collapse
Affiliation(s)
- Adnan Khan
- Qurtuba University of Science and Information Technology, Peshawar5000, Khyber Pakhtunkhwa, Pakistan
| | - Jamal Uddin
- Qurtuba University of Science and Information Technology, Peshawar5000, Khyber Pakhtunkhwa, Pakistan
| | - Farman Ali
- Sarhad University of Science and Information Technology, Mardan Campus, Peshawar23200, Pakistan.,Department of Elementary and Secondary Education Department, Government of Khyber Pakhtunkhwa, Peshawar5000, Khyber Pakhtunkhwa, Pakistan
| | - Harish Kumar
- Department of Computer Science, College of Computer Science, King Khalid University, Abha61421, Saudi Arabia
| | - Wajdi Alghamdi
- Department of Information Technology, Faculty of Computing and Information Technology, King AbdulAziz University, Jeddah21589, Saudi Arabia
| | - Aftab Ahmad
- Department of Computer Science, Abdul Wali Khan University Mardan, Mardan23200, Pakistan
| |
Collapse
|
13
|
Khan A, Uddin J, Ali F, Ahmad A, Alghushairy O, Banjar A, Daud A. Prediction of antifreeze proteins using machine learning. Sci Rep 2022; 12:20672. [PMID: 36450775 PMCID: PMC9712683 DOI: 10.1038/s41598-022-24501-1] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Accepted: 11/16/2022] [Indexed: 12/03/2022] Open
Abstract
Living organisms including fishes, microbes, and animals can live in extremely cold weather. To stay alive in cold environments, these species generate antifreeze proteins (AFPs), also referred to as ice-binding proteins. Moreover, AFPs are extensively utilized in many important fields including medical, agricultural, industrial, and biotechnological. Several predictors were constructed to identify AFPs. However, due to the sequence and structural heterogeneity of AFPs, correct identification is still a challenging task. It is highly desirable to develop a more promising predictor. In this research, a novel computational method, named AFP-LXGB has been proposed for prediction of AFPs more precisely. The information is explored by Dipeptide Composition (DPC), Grouped Amino Acid Composition (GAAC), Position Specific Scoring Matrix-Segmentation-Autocorrelation Transformation (Sg-PSSM-ACT), and Pseudo Position Specific Scoring Matrix Tri-Slicing (PseTS-PSSM). Keeping the benefits of ensemble learning, these feature sets are concatenated into different combinations. The best feature set is selected by Extremely Randomized Tree-Recursive Feature Elimination (ERT-RFE). The models are trained by Light eXtreme Gradient Boosting (LXGB), Random Forest (RF), and Extremely Randomized Tree (ERT). Among classifiers, LXGB has obtained the best prediction results. The novel method (AFP-LXGB) improved the accuracies by 3.70% and 4.09% than the best methods. These results verified that AFP-LXGB can predict AFPs more accurately and can participate in a significant role in medical, agricultural, industrial, and biotechnological fields.
Collapse
Affiliation(s)
- Adnan Khan
- grid.444994.00000 0004 0609 284XQurtuba University of Science and Technology, Peshawar, Khyber Pakhtunkhwa Pakistan
| | - Jamal Uddin
- grid.444994.00000 0004 0609 284XQurtuba University of Science and Technology, Peshawar, Khyber Pakhtunkhwa Pakistan
| | - Farman Ali
- Department of Elementary and Secondary Education, Peshawar, Khyber Pakhtunkhwa Pakistan ,grid.444996.20000 0004 0609 292XSarhad University of Science and Information Technology, Mardan, Pakistan
| | - Ashfaq Ahmad
- grid.440522.50000 0004 0478 6450Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Omar Alghushairy
- grid.460099.2Department of Information Systems and Technology, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia
| | - Ameen Banjar
- grid.460099.2Department of Information Systems and Technology, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia
| | - Ali Daud
- Abu Dhabi School of Management, Abu Dhabi, United Arab Emirates ,grid.460099.2Department of Computer Science and Artificial Intelligence, University of Jeddah, Jeddah, Saudi Arabia
| |
Collapse
|
14
|
Satyakam, Zinta G, Singh RK, Kumar R. Cold adaptation strategies in plants—An emerging role of epigenetics and antifreeze proteins to engineer cold resilient plants. Front Genet 2022; 13:909007. [PMID: 36092945 PMCID: PMC9459425 DOI: 10.3389/fgene.2022.909007] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Accepted: 07/21/2022] [Indexed: 11/13/2022] Open
Abstract
Cold stress adversely affects plant growth, development, and yield. Also, the spatial and geographical distribution of plant species is influenced by low temperatures. Cold stress includes chilling and/or freezing temperatures, which trigger entirely different plant responses. Freezing tolerance is acquired via the cold acclimation process, which involves prior exposure to non-lethal low temperatures followed by profound alterations in cell membrane rigidity, transcriptome, compatible solutes, pigments and cold-responsive proteins such as antifreeze proteins. Moreover, epigenetic mechanisms such as DNA methylation, histone modifications, chromatin dynamics and small non-coding RNAs play a crucial role in cold stress adaptation. Here, we provide a recent update on cold-induced signaling and regulatory mechanisms. Emphasis is given to the role of epigenetic mechanisms and antifreeze proteins in imparting cold stress tolerance in plants. Lastly, we discuss genetic manipulation strategies to improve cold tolerance and develop cold-resistant plants.
Collapse
|
15
|
Zhu K, Zheng Z, Dai Z. Identification of antifreeze peptides in shrimp byproducts autolysate using peptidomics and bioinformatics. Food Chem 2022; 383:132568. [PMID: 35255363 DOI: 10.1016/j.foodchem.2022.132568] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 02/22/2022] [Accepted: 02/24/2022] [Indexed: 11/04/2022]
Abstract
In the present study, a novel method based on peptidomics and bioinformatic was applied to identification and characterization of antifreeze peptides (AFPs) from shrimp byproducts autolysate (SBPA). According to the results of in silico prediction and high peptide structural inflexibility, DEYEESGPGIVH and EQICINFCNEK were picked as potential AFP-1 and AFP-2, respectively. The outcomes of DSC determination indicated that TH of synthesized AFP-1 and AFP-2 (10 mg/mL) were 1.37 °C and 1.57 °C, respectively. Besides, 0.1 %-3 % AFPs showed significant cryoprotection in shrimp muscle after 3 and 6 freeze-thaw cycles, evidenced by higher SSP content, Ca2+-ATPase activity, sulfhydryl content and lower surface hydrophobicity than control; while the higher concentration resulted in better protection against freeze induced denaturation. Both AFP-1&2 showed favorable hydrogen bonding affinity which facilitated ice binding and ice crystal growth inhibition. This work could provide new ideals for identification and characterization of AFPs.
Collapse
Affiliation(s)
- Kai Zhu
- The Joint Key Laboratory of Aquatic Products Processing of Zhejiang Province, 310012 Hangzhou, China; Institute of Seafood, Zhejiang Gongshang University, 310012 Hangzhou, China
| | - Zhenxiao Zheng
- The Joint Key Laboratory of Aquatic Products Processing of Zhejiang Province, 310012 Hangzhou, China; Institute of Seafood, Zhejiang Gongshang University, 310012 Hangzhou, China
| | - Zhiyuan Dai
- The Joint Key Laboratory of Aquatic Products Processing of Zhejiang Province, 310012 Hangzhou, China; Institute of Seafood, Zhejiang Gongshang University, 310012 Hangzhou, China.
| |
Collapse
|
16
|
Box ICH, Matthews BJ, Marshall KE. Molecular evidence of intertidal habitats selecting for repeated ice-binding protein evolution in invertebrates. J Exp Biol 2022; 225:274373. [PMID: 35258616 DOI: 10.1242/jeb.243409] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Accepted: 12/20/2021] [Indexed: 12/21/2022]
Abstract
Ice-binding proteins (IBPs) have evolved independently in multiple taxonomic groups to improve their survival at sub-zero temperatures. Intertidal invertebrates in temperate and polar regions frequently encounter sub-zero temperatures, yet there is little information on IBPs in these organisms. We hypothesized that there are far more IBPs than are currently known and that the occurrence of freezing in the intertidal zone selects for these proteins. We compiled a list of genome-sequenced invertebrates across multiple habitats and a list of known IBP sequences and used BLAST to identify a wide array of putative IBPs in those invertebrates. We found that the probability of an invertebrate species having an IBP was significantly greater in intertidal species than in those primarily found in open ocean or freshwater habitats. These intertidal IBPs had high sequence similarity to fish and tick antifreeze glycoproteins and fish type II antifreeze proteins. Previously established classifiers based on machine learning techniques further predicted ice-binding activity in the majority of our newly identified putative IBPs. We investigated the potential evolutionary origin of one putative IBP from the hard-shelled mussel Mytilus coruscus and suggest that it arose through gene duplication and neofunctionalization. We show that IBPs likely readily evolve in response to freezing risk and that there is an array of uncharacterized IBPs, and highlight the need for broader laboratory-based surveys of the diversity of ice-binding activity across diverse taxonomic and ecological groups.
Collapse
Affiliation(s)
- Isaiah C H Box
- Department of Zoology, University of British Columbia, 6270 University Blvd, Vancouver, BC, CanadaV6T 1Z4
| | - Benjamin J Matthews
- Department of Zoology, University of British Columbia, 6270 University Blvd, Vancouver, BC, CanadaV6T 1Z4
| | - Katie E Marshall
- Department of Zoology, University of British Columbia, 6270 University Blvd, Vancouver, BC, CanadaV6T 1Z4
| |
Collapse
|
17
|
Ali F, Akbar S, Ghulam A, Maher ZA, Unar A, Talpur DB. AFP-CMBPred: Computational identification of antifreeze proteins by extending consensus sequences into multi-blocks evolutionary information. Comput Biol Med 2021; 139:105006. [PMID: 34749096 DOI: 10.1016/j.compbiomed.2021.105006] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2021] [Revised: 10/29/2021] [Accepted: 10/29/2021] [Indexed: 11/30/2022]
Abstract
In extremely cold environments, living organisms like plants, animals, fishes, and microbes can die due to the intracellular ice formation in their bodies. To sustain life in such cold environments, some cold-blooded species produced Antifreeze proteins (AFPs), also called ice-binding proteins. AFPs are not only limited to the medical field but also have diverse significance in the area of biotechnology, agriculture, and the food industry. Different AFPs exhibit high heterogeneity in their structures and sequences. Keeping the significance of AFPs, several machine-learning-based models have been developed by scientists for the prediction of AFPs. However, due to the complex and diverse nature of AFPs, the prediction performance of the existing methods is limited. Therefore, it is highly indispensable for researchers to develop a reliable computational model that can accurately predict AFPs. In this connection, this study presents a novel predictor for AFPs, named AFP-CMBPred. The sequences of AFPs are formulated via four different feature representation methods, such as Amphiphilic pseudo amino acid composition (Amp-PseAAC), Dipeptide Deviation from Expected Mean (DDE), Multi-Blocks Position Specific Scoring Matrix (MB-PSSM), and Consensus Sequence-based on Multi-Blocks Position Specific Scoring Matrix (CS-MB-PSSM) to collect local and global descriptors. In the next step, the extracted feature vectors are evaluated via Support Vector Machine (SVM) and Random Forest (RF) based classification learners. The prediction performance of both classifiers is further assessed using three validation methods i.e., jackknife test, 10-fold cross-validation test, and independent test. After examining the prediction rates of all validation tests, it was found that our proposed model achieved the higher prediction accuracies of ∼2.65%, ∼2.84%, and ∼3.37% using jackknife, K-fold, and independent test, respectively. The experimental outcomes validate that our proposed "AFP-CMBPred" predictor secured the highest prediction results than the existing models for the identification of AFPs. It is further anticipated that our proposed AFP-CMBPred model will be considered a valuable tool in the research academia and drug development.
Collapse
Affiliation(s)
- Farman Ali
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China.
| | - Shahid Akbar
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan
| | - Ali Ghulam
- Computerization and Network Section, Sindh Agriculture University, Tandojam, Pakistan
| | | | - Ahsanullah Unar
- School of Life Science, University of Science and Technology, China
| | - Dhani Bux Talpur
- School of Information and Communication Engineering, Guilin University of Electronic Technology, Guilin, China
| |
Collapse
|
18
|
Al-Saggaf UM, Usman M, Naseem I, Moinuddin M, Jiman AA, Alsaggaf MU, Alshoubaki HK, Khan S. ECM-LSE: Prediction of Extracellular Matrix Proteins Using Deep Latent Space Encoding of k-Spaced Amino Acid Pairs. Front Bioeng Biotechnol 2021; 9:752658. [PMID: 34722479 PMCID: PMC8552119 DOI: 10.3389/fbioe.2021.752658] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Accepted: 09/13/2021] [Indexed: 12/26/2022] Open
Abstract
Extracelluar matrix (ECM) proteins create complex networks of macromolecules which fill-in the extracellular spaces of living tissues. They provide structural support and play an important role in maintaining cellular functions. Identification of ECM proteins can play a vital role in studying various types of diseases. Conventional wet lab-based methods are reliable; however, they are expensive and time consuming and are, therefore, not scalable. In this research, we propose a sequence-based novel machine learning approach for the prediction of ECM proteins. In the proposed method, composition of k-spaced amino acid pair (CKSAAP) features are encoded into a classifiable latent space (LS) with the help of deep latent space encoding (LSE). A comprehensive ablation analysis is conducted for performance evaluation of the proposed method. Results are compared with other state-of-the-art methods on the benchmark dataset, and the proposed ECM-LSE approach has shown to comprehensively outperform the contemporary methods.
Collapse
Affiliation(s)
- Ubaid M. Al-Saggaf
- Center of Excellence in Intelligent Engineering Systems, King Abdulaziz University, Jeddah, Saudi Arabia
- Electrical and Computer Engineering Department, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Muhammad Usman
- Department of Computer Engineering, Chosun University, Gwangju, South Korea
| | - Imran Naseem
- Research and Development, Love For Data, Karachi, Pakistan
- School of Electrical, Electronic and Computer Engineering, The University of Western Australia, Perth, WA, Australia
- College of Engineering, Karachi Institute of Economics and Technology, Korangi Creek, Karachi, Pakistan
| | - Muhammad Moinuddin
- Center of Excellence in Intelligent Engineering Systems, King Abdulaziz University, Jeddah, Saudi Arabia
- Electrical and Computer Engineering Department, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Ahmad A. Jiman
- Electrical and Computer Engineering Department, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Mohammed U. Alsaggaf
- Center of Excellence in Intelligent Engineering Systems, King Abdulaziz University, Jeddah, Saudi Arabia
- Department of Radiology, Faculty of Medicine, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Hitham K. Alshoubaki
- Center of Excellence in Intelligent Engineering Systems, King Abdulaziz University, Jeddah, Saudi Arabia
- Electrical and Computer Engineering Department, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Shujaat Khan
- Department of Bio and Brain Engineering, Daejeon, South Korea
| |
Collapse
|
19
|
Malik AA, Chotpatiwetchkul W, Phanus-Umporn C, Nantasenamat C, Charoenkwan P, Shoombuatong W. StackHCV: a web-based integrative machine-learning framework for large-scale identification of hepatitis C virus NS5B inhibitors. J Comput Aided Mol Des 2021; 35:1037-1053. [PMID: 34622387 DOI: 10.1007/s10822-021-00418-1] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Accepted: 09/17/2021] [Indexed: 01/07/2023]
Abstract
Fast and accurate identification of inhibitors with potency against HCV NS5B polymerase is currently a challenging task. As conventional experimental methods is the gold standard method for the design and development of new HCV inhibitors, they often require costly investment of time and resources. In this study, we develop a novel machine learning-based meta-predictor (termed StackHCV) for accurate and large-scale identification of HCV inhibitors. Unlike the existing method, which is based on single-feature-based approach, we first constructed a pool of various baseline models by employing a wide range of heterogeneous molecular fingerprints with five popular machine learning algorithms (k-nearest neighbor, multi-layer perceptron, partial least squares, random forest and support vectors machine). Secondly, we integrated these baseline models in order to develop the final meta-based model by means of the stacking strategy. Extensive benchmarking experiments showed that StackHCV achieved a more accurate and stable performance as compared to its constituent baseline models on the training dataset and also outperformed the existing predictor on the independent test dataset. To facilitate the high-throughput identification of HCV inhibitors, we built a web server that can be freely accessed at http://camt.pythonanywhere.com/StackHCV . It is expected that StackHCV could be a useful tool for fast and precise identification of potential drugs against HCV NS5B particularly for liver cancer therapy and other clinical applications.
Collapse
Affiliation(s)
- Aijaz Ahmad Malik
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Warot Chotpatiwetchkul
- Applied Computational Chemistry Research Unit, Department of Chemistry, School of Science, King Mongkut's Institute of Technology Ladkrabang, Bangkok, 10520, Thailand
| | - Chuleeporn Phanus-Umporn
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Chanin Nantasenamat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, 50200, Thailand.
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| |
Collapse
|
20
|
iBitter-Fuse: A Novel Sequence-Based Bitter Peptide Predictor by Fusing Multi-View Features. Int J Mol Sci 2021; 22:ijms22168958. [PMID: 34445663 PMCID: PMC8396555 DOI: 10.3390/ijms22168958] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Revised: 08/08/2021] [Accepted: 08/17/2021] [Indexed: 12/19/2022] Open
Abstract
Accurate identification of bitter peptides is of great importance for better understanding their biochemical and biophysical properties. To date, machine learning-based methods have become effective approaches for providing a good avenue for identifying potential bitter peptides from large-scale protein datasets. Although few machine learning-based predictors have been developed for identifying the bitterness of peptides, their prediction performances could be improved. In this study, we developed a new predictor (named iBitter-Fuse) for achieving more accurate identification of bitter peptides. In the proposed iBitter-Fuse, we have integrated a variety of feature encoding schemes for providing sufficient information from different aspects, namely consisting of compositional information and physicochemical properties. To enhance the predictive performance, the customized genetic algorithm utilizing self-assessment-report (GA-SAR) was employed for identifying informative features followed by inputting optimal ones into a support vector machine (SVM)-based classifier for developing the final model (iBitter-Fuse). Benchmarking experiments based on both 10-fold cross-validation and independent tests indicated that the iBitter-Fuse was able to achieve more accurate performance as compared to state-of-the-art methods. To facilitate the high-throughput identification of bitter peptides, the iBitter-Fuse web server was established and made freely available online. It is anticipated that the iBitter-Fuse will be a useful tool for aiding the discovery and de novo design of bitter peptides.
Collapse
|
21
|
Charoenkwan P, Shoombuatong W, Nantasupha C, Muangmool T, Suprasert P, Charoenkwan K. iPMI: Machine Learning-Aided Identification of Parametrial Invasion in Women with Early-Stage Cervical Cancer. Diagnostics (Basel) 2021; 11:diagnostics11081454. [PMID: 34441388 PMCID: PMC8391438 DOI: 10.3390/diagnostics11081454] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2021] [Revised: 08/09/2021] [Accepted: 08/10/2021] [Indexed: 01/18/2023] Open
Abstract
Radical hysterectomy is a recommended treatment for early-stage cervical cancer. However, the procedure is associated with significant morbidities resulting from the removal of the parametrium. Parametrial cancer invasion (PMI) is found in a minority of patients but the efficient system used to predict it is lacking. In this study, we develop a novel machine learning (ML)-based predictive model based on a random forest model (called iPMI) for the practical identification of PMI in women. Data of 1112 stage IA-IIA cervical cancer patients who underwent primary surgery were collected and considered as the training dataset, while data from an independent cohort of 116 consecutive patients were used as the independent test dataset. Based on these datasets, iPMI-Econ was then developed by using basic clinicopathological data available prior to surgery, while iPMI-Power was also introduced by adding pelvic node metastasis and uterine corpus invasion to the iPMI-Econ. Both 10-fold cross-validations and independent test results showed that iPMI-Power outperformed other well-known ML classifiers (e.g., logistic regression, decision tree, k-nearest neighbor, multi-layer perceptron, naive Bayes, support vector machine, and extreme gradient boosting). Upon comparison, it was found that iPMI-Power was effective and had a superior performance to other well-known ML classifiers in predicting PMI. It is anticipated that the proposed iPMI may serve as a cost-effective and rapid approach to guide important clinical decision-making.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- College of Arts, Media and Technology, Chiang Mai University, Chiang Mai 50200, Thailand;
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 73170, Thailand;
| | - Chalaithorn Nantasupha
- Division of Gynecologic Oncology, Department of Obstetrics and Gynecology, Faculty of Medicine, Chiang Mai University, Chiang Mai 50200, Thailand; (C.N.); (T.M.); (P.S.)
| | - Tanarat Muangmool
- Division of Gynecologic Oncology, Department of Obstetrics and Gynecology, Faculty of Medicine, Chiang Mai University, Chiang Mai 50200, Thailand; (C.N.); (T.M.); (P.S.)
| | - Prapaporn Suprasert
- Division of Gynecologic Oncology, Department of Obstetrics and Gynecology, Faculty of Medicine, Chiang Mai University, Chiang Mai 50200, Thailand; (C.N.); (T.M.); (P.S.)
| | - Kittipat Charoenkwan
- Division of Gynecologic Oncology, Department of Obstetrics and Gynecology, Faculty of Medicine, Chiang Mai University, Chiang Mai 50200, Thailand; (C.N.); (T.M.); (P.S.)
- Correspondence:
| |
Collapse
|
22
|
Charoenkwan P, Anuwongcharoen N, Nantasenamat C, Hasan MM, Shoombuatong W. In Silico Approaches for the Prediction and Analysis of Antiviral Peptides: A Review. Curr Pharm Des 2021; 27:2180-2188. [PMID: 33138759 DOI: 10.2174/1381612826666201102105827] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2020] [Accepted: 08/20/2020] [Indexed: 11/22/2022]
Abstract
In light of the growing resistance toward current antiviral drugs, efforts to discover novel and effective antiviral therapeutic agents remain a pressing scientific effort. Antiviral peptides (AVPs) represent promising therapeutic agents due to their extraordinary advantages in terms of potency, efficacy and pharmacokinetic properties. The growing volume of newly discovered peptide sequences in the post-genomic era requires computational approaches for timely and accurate identification of AVPs. Machine learning (ML) methods such as random forest and support vector machine represent robust learning algorithms that are instrumental in successful peptide-based drug discovery. Therefore, this review summarizes the current state-of-the-art application of ML methods for identifying AVPs directly from the sequence information. We compare the efficiency of these methods in terms of the underlying characteristics of the dataset used along with feature encoding methods, ML algorithms, cross-validation methods and prediction performance. Finally, guidelines for the development of robust AVP models are also discussed. It is anticipated that this review will serve as a useful guide for the design and development of robust AVP and related therapeutic peptide predictors in the future.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai 50200, Thailand
| | - Nuttapat Anuwongcharoen
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Chanin Nantasenamat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Md Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| |
Collapse
|
23
|
Wang S, Deng L, Xia X, Cao Z, Fei Y. Predicting antifreeze proteins with weighted generalized dipeptide composition and multi-regression feature selection ensemble. BMC Bioinformatics 2021; 22:340. [PMID: 34162327 PMCID: PMC8220696 DOI: 10.1186/s12859-021-04251-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Accepted: 06/09/2021] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Antifreeze proteins (AFPs) are a group of proteins that inhibit body fluids from growing to ice crystals and thus improve biological antifreeze ability. It is vital to the survival of living organisms in extremely cold environments. However, little research is performed on sequences feature extraction and selection for antifreeze proteins classification in the structure and function prediction, which is of great significance. RESULTS In this paper, to predict the antifreeze proteins, a feature representation of weighted generalized dipeptide composition (W-GDipC) and an ensemble feature selection based on two-stage and multi-regression method (LRMR-Ri) are proposed. Specifically, four feature selection algorithms: Lasso regression, Ridge regression, Maximal information coefficient and Relief are used to select the feature sets, respectively, which is the first stage of LRMR-Ri method. If there exists a common feature subset among the above four sets, it is the optimal subset; otherwise we use Ridge regression to select the optimal subset from the public set pooled by the four sets, which is the second stage of LRMR-Ri. The LRMR-Ri method combined with W-GDipC was performed both on the antifreeze proteins dataset (binary classification), and on the membrane protein dataset (multiple classification). Experimental results show that this method has good performance in support vector machine (SVM), decision tree (DT) and stochastic gradient descent (SGD). The values of ACC, RE and MCC of LRMR-Ri and W-GDipC with antifreeze proteins dataset and SVM classifier have reached as high as 95.56%, 97.06% and 0.9105, respectively, much higher than those of each single method: Lasso, Ridge, Mic and Relief, nearly 13% higher than single Lasso for ACC. CONCLUSION The experimental results show that the proposed LRMR-Ri and W-GDipC method can significantly improve the accuracy of antifreeze proteins prediction compared with other similar single feature methods. In addition, our method has also achieved good results in the classification and prediction of membrane proteins, which verifies its widely reliability to a certain extent.
Collapse
Affiliation(s)
- Shunfang Wang
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, China.
| | - Lin Deng
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, China
| | - Xinnan Xia
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, China.
| | - Zicheng Cao
- School of Public Health (Shenzhen), Sun Yat-Sen University, Guangzhou, 510006, China
| | - Yu Fei
- School of Statistics and Mathematics, Yunnan University of Finance and Economics, Kunming, 650221, China.
| |
Collapse
|
24
|
Alim A, Rafay A, Naseem I. PoGB-pred: Prediction of Antifreeze Proteins Sequences Using Amino Acid Composition with Feature Selection Followed by a Sequential-based Ensemble Approach. Curr Bioinform 2021. [DOI: 10.2174/1574893615999200707141926] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Proteins contribute significantly in every task of cellular life. Their
functions encompass the building and repairing of tissues in human bodies and other organisms.
Hence they are the building blocks of bones, muscles, cartilage, skin, and blood. Similarly, antifreeze
proteins are of prime significance for organisms that live in very cold areas. With the help of
these proteins, the cold water organisms can survive below zero temperature and resist the water
crystallization process, which may cause the rupture in the internal cells and tissues. AFP’s have
also attracted attention and interest in food industries and cryopreservation.
Objective:
With the increase in the availability of genomic sequence
data of protein, an automated and sophisticated tool for AFP recognition and identification is in dire need. The sequence
and structures of AFP are highly distinct, therefore, most of the proposed methods fail to show promising results on
different structures. A consolidated method is proposed to produce the competitive performance on highly distinct AFP
structure.
Methods:
In this study, machine learning-based algorithms including Principal Component Analysis
(PCA) followed by Gradient Boosting (GB) were proposed to be used for anti-freeze protein
identification. To analyze the performance and validation of the proposed model, various
combinations of two segments' composition of amino acid and dipeptides are used. PCA, in
particular, is proposed for dimension reduction and high variance retaining of data, which is
followed by an ensemble method named gradient boosting for modeling and classification.
Results:
The proposed method obtained the
superfluous performance on PDB, Pfam and Uniprot dataset as compared with the RAFP-Pred method. In experiment-3,
by utilizing only 150 PCA components a high accuracy of 89.63 was achieved which is superior to the 87.41 utilizing 300
significant features reported for the RAFP-Pred method. Experiment-2 is conducted using two different dataset such that
non-AFP from the PISCES server and AFPs from Protein data bank. In this experiment-2, our proposed method attained
high sensitivity of 79.16 which is 12.50 better than state-of-the-art the RAFP-pred method.
Conclusion:
AFPs have a common function with distinct structure. Therefore, the development of a single model for
different sequences often fails to AFPs. A robust results have been shown by our proposed model on the diversity of
training and testing dataset. The results of the proposed model outperformed compared to the previous AFPs prediction method such as RAFP-Pred. Our model consists of PCA for dimension reduction followed by gradient boosting for
classification. Due to simplicity, scalability properties and high performance result our model can be easily extended for
analyzing the proteomic and genomic dataset.
Collapse
Affiliation(s)
- Affan Alim
- College of Computing and Information Sciences, Karachi Institute of Economics and Technology (KIET), Karachi 75190, Pakistan
| | - Abdul Rafay
- College of Computing and Information Sciences, Karachi Institute of Economics and Technology (KIET), Karachi 75190, Pakistan
| | - Imran Naseem
- School of Electrical, Electronic and Computer Engineering, the University of Western Australia, 35 Stirling Highway, Crawley, Western Australia 6009, Australia
| |
Collapse
|
25
|
Kozuch DJ, Stillinger FH, Debenedetti PG. Genetic Algorithm Approach for the Optimization of Protein Antifreeze Activity Using Molecular Simulations. J Chem Theory Comput 2020; 16:7866-7873. [PMID: 33201707 DOI: 10.1021/acs.jctc.0c00773] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Antifreeze proteins (AFPs) are of much interest for their ability to inhibit ice growth at low concentrations. In this work, we present a genetic algorithm for the in silico design of AFP mutants with improved antifreeze activity, measured as the predicted thermal hysteresis at a fixed concentration, ΔTC. Central to the algorithm is our recently developed neural network method for predicting ΔTC from molecular simulations [Kozuch et al., PNAS, 115, 13252 (2018)]. Applying the algorithm to three structurally diverse AFPs, wfAFP, rQAE, and RiAFP, we find that significantly improved mutants are discovered for rQAE and RiAFP. Testing of the optimized mutants shows an increase in ΔTC of 0.572 ± 0.11 K (262 ± 50.6%) and 1.33 ± 0.14 K (39.9 ± 4.19%) over the native structures for rQAE and RiAFP, respectively. Structural analysis of the optimized mutants reveals that the algorithm is able to exploit two pathways for enhancing the predicted antifreeze activity of the mutants: (1) increasing the local order of surface waters by encouraging the formation of internal water channels in the protein and (2) increasing the total ice-binding area by improving the planar structure of the ice-binding surface. Additionally, analysis of all mutants explored by the algorithm reveals that a subset of residues, mainly nonpolar, are particularly helpful in improving antifreeze activity at the ice-binding surface.
Collapse
Affiliation(s)
- Daniel J Kozuch
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08544, United States
| | - Frank H Stillinger
- Department of Chemistry, Princeton University, Princeton, New Jersey 08544, United States
| | - Pablo G Debenedetti
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08544, United States
| |
Collapse
|
26
|
Charoenkwan P, Yana J, Nantasenamat C, Hasan MM, Shoombuatong W. iUmami-SCM: A Novel Sequence-Based Predictor for Prediction and Analysis of Umami Peptides Using a Scoring Card Method with Propensity Scores of Dipeptides. J Chem Inf Model 2020; 60:6666-6678. [DOI: 10.1021/acs.jcim.0c00707] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai 50200, Thailand
| | - Janchai Yana
- Department of Chemistry, Faculty of Science and Technology, Chiang Mai Rajabhat University, Chiang Mai 50300, Thailand
| | - Chanin Nantasenamat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| | - Md. Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| |
Collapse
|
27
|
Charoenkwan P, Kanthawong S, Nantasenamat C, Hasan MM, Shoombuatong W. iAMY-SCM: Improved prediction and analysis of amyloid proteins using a scoring card method with propensity scores of dipeptides. Genomics 2020; 113:689-698. [PMID: 33017626 DOI: 10.1016/j.ygeno.2020.09.065] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 09/21/2020] [Accepted: 09/30/2020] [Indexed: 01/09/2023]
Abstract
Fast, accurate identification and characterization of amyloid proteins at a large-scale is essential for understating their role in therapeutic intervention strategies. As a matter of fact, there exist only one in silico model for amyloid protein identification using the random forest (RF) model in conjunction with various feature types namely the RFAmy. However, it suffers from low interpretability for biologists. Thus, it is highly desirable to develop a simple and easily interpretable prediction method with robust accuracy as compared to the existing complicated model. In this study, we propose iAMY-SCM, the first scoring card method-based predictor for predicting and analyzing amyloid proteins. Herein, the iAMY-SCM made use of a simple weighted-sum function in conjunction with the propensity scores of dipeptides for the amyloid protein identification. Cross-validation results indicated that iAMY-SCM provided an accuracy of 0.895 that corresponded to 10-22% higher performance than that of widely used machine learning models. Furthermore, iAMY-SCM achieving an accuracy of 0.827 as evaluated by an independent test, which was found to be comparable to that of RFAmy and was approximately 9-13% higher than widely used machine learning models. Furthermore, the analysis of estimated propensity scores of amino acids and dipeptides were performed to provide insights into the biophysical and biochemical properties of amyloid proteins. As such, this demonstrates that the proposed iAMY-SCM is efficient and reliable in terms of simplicity, interpretability and implementation. To facilitate ease of use of the proposed iAMY-SCM, a user-friendly and publicly accessible web server at http://camt.pythonanywhere.com/iAMY-SCM has been established. We anticipate that that iAMY-SCM will be an important tool for facilitating the large-scale prediction and characterization of amyloid protein.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai 50200, Thailand
| | - Sakawrat Kanthawong
- Department of Microbiology, Faculty of Medicine, Khon Kaen University, Khon Kaen 40002, Thailand
| | - Chanin Nantasenamat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| | - Md Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand.
| |
Collapse
|
28
|
iBitter-SCM: Identification and characterization of bitter peptides using a scoring card method with propensity scores of dipeptides. Genomics 2020; 112:2813-2822. [DOI: 10.1016/j.ygeno.2020.03.019] [Citation(s) in RCA: 58] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2020] [Revised: 03/19/2020] [Accepted: 03/22/2020] [Indexed: 12/21/2022]
|
29
|
Meta-iPVP: a sequence-based meta-predictor for improving the prediction of phage virion proteins using effective feature representation. J Comput Aided Mol Des 2020; 34:1105-1116. [DOI: 10.1007/s10822-020-00323-z] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2020] [Accepted: 06/10/2020] [Indexed: 12/11/2022]
|
30
|
iTTCA-Hybrid: Improved and robust identification of tumor T cell antigens by utilizing hybrid feature representation. Anal Biochem 2020; 599:113747. [DOI: 10.1016/j.ab.2020.113747] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2020] [Revised: 04/13/2020] [Accepted: 04/16/2020] [Indexed: 02/07/2023]
|
31
|
Usman M, Khan S, Lee JA. AFP-LSE: Antifreeze Proteins Prediction Using Latent Space Encoding of Composition of k-Spaced Amino Acid Pairs. Sci Rep 2020; 10:7197. [PMID: 32345989 PMCID: PMC7188683 DOI: 10.1038/s41598-020-63259-2] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2020] [Accepted: 03/26/2020] [Indexed: 02/06/2023] Open
Abstract
Species living in extremely cold environments resist the freezing conditions through antifreeze proteins (AFPs). Apart from being essential proteins for various organisms living in sub-zero temperatures, AFPs have numerous applications in different industries. They possess very small resemblance to each other and cannot be easily identified using simple search algorithms such as BLAST and PSI-BLAST. Diverse AFPs found in fishes (Type I, II, III, IV and antifreeze glycoproteins (AFGPs)), are sub-types and show low sequence and structural similarity, making their accurate prediction challenging. Although several machine-learning methods have been proposed for the classification of AFPs, prediction methods that have greater reliability are required. In this paper, we propose a novel machine-learning-based approach for the prediction of AFP sequences using latent space learning through a deep auto-encoder method. For latent space pruning, we use the output of the auto-encoder with a deep neural network classifier to learn the non-linear mapping of the protein sequence descriptor and class label. The proposed method outperformed the existing methods, yielding excellent results in comparison. A comprehensive ablation study is performed, and the proposed method is evaluated in terms of widely used performance measures. In particular, the proposed method demonstrated a high Matthews correlation coefficient of 0.52, F-score of 0.49, and Youden’s index of 0.81 on an independent test dataset, thereby outperforming the existing methods for AFP prediction.
Collapse
Affiliation(s)
- Muhammad Usman
- Department of Computer Engineering, Chosun University, Gwangju, 61452, Republic of Korea
| | - Shujaat Khan
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
| | - Jeong-A Lee
- Department of Computer Engineering, Chosun University, Gwangju, 61452, Republic of Korea.
| |
Collapse
|
32
|
Sun S, Ding H, Wang D, Han S. Identifying Antifreeze Proteins Based on Key Evolutionary Information. Front Bioeng Biotechnol 2020; 8:244. [PMID: 32274383 PMCID: PMC7113384 DOI: 10.3389/fbioe.2020.00244] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2020] [Accepted: 03/09/2020] [Indexed: 01/08/2023] Open
Abstract
Antifreeze proteins are important antifreeze materials that have been widely used in industry, including in cryopreservation, de-icing, and food storage applications. However, the quantity of some commercially produced antifreeze proteins is insufficient for large-scale industrial applications. Further, many antifreeze proteins have properties such as cytotoxicity, severely hindering their applications. Understanding the mechanisms underlying the protein-ice interactions and identifying novel antifreeze proteins are, therefore, urgently needed. In this study, to uncover the mechanisms underlying protein-ice interactions and provide an efficient and accurate tool for identifying antifreeze proteins, we assessed various evolutionary features based on position-specific scoring matrices (PSSMs) and evaluated their importance for discriminating of antifreeze and non-antifreeze proteins. We then parsimoniously selected seven key features with the highest importance. We found that the selected features showed opposite tendencies (regarding the conservation of certain amino acids) between antifreeze and non-antifreeze proteins. Five out of the seven features had relatively high contributions to the discrimination of antifreeze and non-antifreeze proteins, as revealed by a principal component analysis, i.e., the conservation of the replacement of Cys, Trp, and Gly in antifreeze proteins by Ala, Met, and Ala, respectively, in the related proteins, and the conservation of the replacement of Arg in non-antifreeze proteins by Ser and Arg in the related proteins. Based on the seven parsimoniously selected key features, we established a classifier using support vector machine, which outperformed the state-of-the-art tools. These results suggest that understanding evolutionary information is crucial to designing accurate automated methods for discriminating antifreeze and non-antifreeze proteins. Our classifier, therefore, is an efficient tool for annotating new proteins with antifreeze functions based on sequence information and can facilitate their application in industry.
Collapse
Affiliation(s)
- Shanwen Sun
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Hui Ding
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Donghua Wang
- Department of General Surgery, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Shuguang Han
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
33
|
Charoenkwan P, Kanthawong S, Schaduangrat N, Yana J, Shoombuatong W. PVPred-SCM: Improved Prediction and Analysis of Phage Virion Proteins Using a Scoring Card Method. Cells 2020; 9:E353. [PMID: 32028709 PMCID: PMC7072630 DOI: 10.3390/cells9020353] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2019] [Revised: 01/20/2020] [Accepted: 01/27/2020] [Indexed: 12/16/2022] Open
Abstract
Although, existing methods have been successful in predicting phage (or bacteriophage) virion proteins (PVPs) using various types of protein features and complex classifiers, such as support vector machine and naïve Bayes, these two methods do not allow interpretability. However, the characterization and analysis of PVPs might be of great significance to understanding the molecular mechanisms of bacteriophage genetics and the development of antibacterial drugs. Hence, we herein proposed a novel method (PVPred-SCM) based on the scoring card method (SCM) in conjunction with dipeptide composition to identify and characterize PVPs. In PVPred-SCM, the propensity scores of 400 dipeptides were calculated using the statistical discrimination approach. Rigorous independent validation test showed that PVPred-SCM utilizing only dipeptide composition yielded an accuracy of 77.56%, indicating that PVPred-SCM performed well relative to the state-of-the-art method utilizing a number of protein features. Furthermore, the propensity scores of dipeptides were used to provide insights into the biochemical and biophysical properties of PVPs. Upon comparison, it was found that PVPred-SCM was superior to the existing methods considering its simplicity, interpretability, and implementation. Finally, in an effort to facilitate high-throughput prediction of PVPs, we provided a user-friendly web-server for identifying the likelihood of whether or not these sequences are PVPs. It is anticipated that PVPred-SCM will become a useful tool or at least a complementary existing method for predicting and analyzing PVPs.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai 50200, Thailand;
| | - Sakawrat Kanthawong
- Department of Microbiology, Faculty of Medicine, Khon Kaen University, Khon Kaen 40002, Thailand;
| | - Nalini Schaduangrat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand;
| | - Janchai Yana
- Department of Chemistry, Faculty of Science and Technology, Chiang Mai Rajabhat University, Chiang Mai 50300, Thailand;
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand;
| |
Collapse
|
34
|
Basith S, Manavalan B, Hwan Shin T, Lee G. Machine intelligence in peptide therapeutics: A next‐generation tool for rapid disease screening. Med Res Rev 2020; 40:1276-1314. [DOI: 10.1002/med.21658] [Citation(s) in RCA: 139] [Impact Index Per Article: 27.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2019] [Revised: 11/26/2019] [Accepted: 12/16/2019] [Indexed: 12/12/2022]
Affiliation(s)
- Shaherin Basith
- Department of PhysiologyAjou University School of MedicineSuwon Republic of Korea
| | | | - Tae Hwan Shin
- Department of PhysiologyAjou University School of MedicineSuwon Republic of Korea
| | - Gwang Lee
- Department of PhysiologyAjou University School of MedicineSuwon Republic of Korea
| |
Collapse
|
35
|
iQSP: A Sequence-Based Tool for the Prediction and Analysis of Quorum Sensing Peptides via Chou's 5-Steps Rule and Informative Physicochemical Properties. Int J Mol Sci 2019; 21:ijms21010075. [PMID: 31861928 PMCID: PMC6981611 DOI: 10.3390/ijms21010075] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Revised: 12/13/2019] [Accepted: 12/18/2019] [Indexed: 01/18/2023] Open
Abstract
Understanding of quorum-sensing peptides (QSPs) in their functional mechanism plays an essential role in finding new opportunities to combat bacterial infections by designing drugs. With the avalanche of the newly available peptide sequences in the post-genomic age, it is highly desirable to develop a computational model for efficient, rapid and high-throughput QSP identification purely based on the peptide sequence information alone. Although, few methods have been developed for predicting QSPs, their prediction accuracy and interpretability still requires further improvements. Thus, in this work, we proposed an accurate sequence-based predictor (called iQSP) and a set of interpretable rules (called IR-QSP) for predicting and analyzing QSPs. In iQSP, we utilized a powerful support vector machine (SVM) cooperating with 18 informative features from physicochemical properties (PCPs). Rigorous independent validation test showed that iQSP achieved maximum accuracy and MCC of 93.00% and 0.86, respectively. Furthermore, a set of interpretable rules IR-QSP was extracted by using random forest model and the 18 informative PCPs. Finally, for the convenience of experimental scientists, the iQSP web server was established and made freely available online. It is anticipated that iQSP will become a useful tool or at least as a complementary existing method for predicting and analyzing QSPs.
Collapse
|
36
|
Schaduangrat N, Nantasenamat C, Prachayasittikul V, Shoombuatong W. Meta-iAVP: A Sequence-Based Meta-Predictor for Improving the Prediction of Antiviral Peptides Using Effective Feature Representation. Int J Mol Sci 2019; 20:ijms20225743. [PMID: 31731751 PMCID: PMC6888698 DOI: 10.3390/ijms20225743] [Citation(s) in RCA: 82] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Revised: 11/07/2019] [Accepted: 11/13/2019] [Indexed: 12/31/2022] Open
Abstract
In spite of the large-scale production and widespread distribution of vaccines and antiviral drugs, viruses remain a prominent human disease. Recently, the discovery of antiviral peptides (AVPs) has become an influential antiviral agent due to their extraordinary advantages. With the avalanche of newly-found peptide sequences in the post-genomic era, there is a great demand to develop a sequence-based predictor for timely identifying AVPs as this information is very useful for both basic research and drug development. In this study, we propose a novel sequence-based meta-predictor with an effective feature representation, called Meta-iAVP, for the accurate prediction of AVPs from given peptide sequences. Herein, the effective feature representation was extracted from a set of prediction scores derived from various machine learning algorithms and types of features. To the best of our knowledge, the model proposed herein represents the first meta-based approach for the prediction of AVPs. An overall accuracy and Matthews correlation coefficient of 95.20% and 0.90, respectively, was achieved from the independent test set on an objective benchmark dataset. Comparative analysis suggested that Meta-iAVP was superior to that of existing methods and therefore represents a useful tool for AVP prediction. Finally, in an effort to facilitate high-throughput prediction of AVPs, the model was deployed as the Meta-iAVP web server and is made freely available online at http://codes.bio/meta-iavp/ where users can submit query peptide sequences for determining the likelihood of whether or not these peptides are AVPs.
Collapse
Affiliation(s)
- Nalini Schaduangrat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand; (N.S.); (C.N.)
| | - Chanin Nantasenamat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand; (N.S.); (C.N.)
| | - Virapong Prachayasittikul
- Department of Clinical Microbiology and Applied Technology, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand;
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand; (N.S.); (C.N.)
- Correspondence: ; Tel.: +66-2441-4371 (ext. 2715)
| |
Collapse
|
37
|
Surís-Valls R, Voets IK. Peptidic Antifreeze Materials: Prospects and Challenges. Int J Mol Sci 2019; 20:E5149. [PMID: 31627404 PMCID: PMC6834126 DOI: 10.3390/ijms20205149] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2019] [Revised: 10/05/2019] [Accepted: 10/10/2019] [Indexed: 12/28/2022] Open
Abstract
Necessitated by the subzero temperatures and seasonal exposure to ice, various organisms have developed a remarkably effective means to survive the harsh climate of their natural habitats. Their ice-binding (glyco)proteins keep the nucleation and growth of ice crystals in check by recognizing and binding to specific ice crystal faces, which arrests further ice growth and inhibits ice recrystallization (IRI). Inspired by the success of this adaptive strategy, various approaches have been proposed over the past decades to engineer materials that harness these cryoprotective features. In this review we discuss the prospects and challenges associated with these advances focusing in particular on peptidic antifreeze materials both identical and akin to natural ice-binding proteins (IBPs). We address the latest advances in their design, synthesis, characterization and application in preservation of biologics and foods. Particular attention is devoted to insights in structure-activity relations culminating in the synthesis of de novo peptide analogues. These are sequences that resemble but are not identical to naturally occurring IBPs. We also draw attention to impactful developments in solid-phase peptide synthesis and 'greener' synthesis routes, which may aid to overcome one of the major bottlenecks in the translation of this technology: unavailability of large quantities of low-cost antifreeze materials with excellent IRI activity at (sub)micromolar concentrations.
Collapse
Affiliation(s)
- Romà Surís-Valls
- Laboratory of Self-Organizing Soft Matter, Laboratory of Macro-Organic Chemistry, Department of Chemical Engineering and Chemistry & Institute for Complex Molecular Systems, Eindhoven University of Technology, Post Office Box 513, 5600 MD Eindhoven, The Netherlands.
| | - Ilja K Voets
- Laboratory of Self-Organizing Soft Matter, Laboratory of Macro-Organic Chemistry, Department of Chemical Engineering and Chemistry & Institute for Complex Molecular Systems, Eindhoven University of Technology, Post Office Box 513, 5600 MD Eindhoven, The Netherlands.
| |
Collapse
|
38
|
Laengsri V, Nantasenamat C, Schaduangrat N, Nuchnoi P, Prachayasittikul V, Shoombuatong W. TargetAntiAngio: A Sequence-Based Tool for the Prediction and Analysis of Anti-Angiogenic Peptides. Int J Mol Sci 2019; 20:E2950. [PMID: 31212918 PMCID: PMC6628072 DOI: 10.3390/ijms20122950] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2019] [Revised: 06/13/2019] [Accepted: 06/14/2019] [Indexed: 11/21/2022] Open
Abstract
Cancer remains one of the major causes of death worldwide. Angiogenesis is crucial for the pathogenesis of various human diseases, especially solid tumors. The discovery of anti-angiogenic peptides is a promising therapeutic route for cancer treatment. Thus, reliably identifying anti-angiogenic peptides is extremely important for understanding their biophysical and biochemical properties that serve as the basis for the discovery of new anti-cancer drugs. This study aims to develop an efficient and interpretable computational model called TargetAntiAngio for predicting and characterizing anti-angiogenic peptides. TargetAntiAngio was developed using the random forest classifier in conjunction with various classes of peptide features. It was observed via an independent validation test that TargetAntiAngio can identify anti-angiogenic peptides with an average accuracy of 77.50% on an objective benchmark dataset. Comparisons demonstrated that TargetAntiAngio is superior to other existing methods. In addition, results revealed the following important characteristics of anti-angiogenic peptides: (i) disulfide bond forming Cys residues play an important role for inhibiting blood vessel proliferation; (ii) Cys located at the C-terminal domain can decrease endothelial formatting activity and suppress tumor growth; and (iii) Cyclic disulfide-rich peptides contribute to the inhibition of angiogenesis and cell migration, selectivity and stability. Finally, for the convenience of experimental scientists, the TargetAntiAngio web server was established and made freely available online.
Collapse
Affiliation(s)
- Vishuda Laengsri
- Department of Clinical Microscopy, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand.
- Center for Research and Innovation, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand.
| | - Chanin Nantasenamat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand.
| | - Nalini Schaduangrat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand.
| | - Pornlada Nuchnoi
- Department of Clinical Microscopy, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand.
- Center for Research and Innovation, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand.
| | - Virapong Prachayasittikul
- Department of Clinical Microbiology and Applied Technology, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand.
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand.
| |
Collapse
|
39
|
THPep: A machine learning-based approach for predicting tumor homing peptides. Comput Biol Chem 2019; 80:441-451. [PMID: 31151025 DOI: 10.1016/j.compbiolchem.2019.05.008] [Citation(s) in RCA: 59] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2018] [Revised: 04/18/2019] [Accepted: 05/17/2019] [Indexed: 01/24/2023]
Abstract
In the present era, a major drawback of current anti-cancer drugs is the lack of satisfactory specificity towards tumor cells. Despite the presence of several therapies against cancer, tumor homing peptides are gaining importance as therapeutic agents. In this regard, the huge number of therapeutic peptides generated in recent years, demands the need to develop an effective and interpretable computational model for rapidly, effectively and automatically predicting tumor homing peptides. Therefore, a sequence-based approach referred herein as THPep has been developed to predict and analyze tumor homing peptides by using an interpretable random forest classifier in concomitant with amino acid composition, dipeptide composition and pseudo amino acid composition. An overall accuracy and Matthews correlation coefficient of 90.13% and 0.76, respectively, were achieved from the independent test set on an objective benchmark dataset. Upon comparison, it was found that THPep was superior to the existing method and holds high potential as a useful tool for predicting tumor homing peptides. For the convenience of experimental scientists, a web server for this proposed method is provided publicly at http://codes.bio/thpep/.
Collapse
|
40
|
Schaduangrat N, Nantasenamat C, Prachayasittikul V, Shoombuatong W. ACPred: A Computational Tool for the Prediction and Analysis of Anticancer Peptides. Molecules 2019; 24:E1973. [PMID: 31121946 PMCID: PMC6571645 DOI: 10.3390/molecules24101973] [Citation(s) in RCA: 149] [Impact Index Per Article: 24.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Revised: 05/07/2019] [Accepted: 05/17/2019] [Indexed: 01/01/2023] Open
Abstract
Anticancer peptides (ACPs) have emerged as a new class of therapeutic agent for cancer treatment due to their lower toxicity as well as greater efficacy, selectivity and specificity when compared to conventional small molecule drugs. However, the experimental identification of ACPs still remains a time-consuming and expensive endeavor. Therefore, it is desirable to develop and improve upon existing computational models for predicting and characterizing ACPs. In this study, we present a bioinformatics tool called the ACPred, which is an interpretable tool for the prediction and characterization of the anticancer activities of peptides. ACPred was developed by utilizing powerful machine learning models (support vector machine and random forest) and various classes of peptide features. It was observed by a jackknife cross-validation test that ACPred can achieve an overall accuracy of 95.61% in identifying ACPs. In addition, analysis revealed the following distinguishing characteristics that ACPs possess: (i) hydrophobic residue enhances the cationic properties of α-helical ACPs resulting in better cell penetration; (ii) the amphipathic nature of the α-helical structure plays a crucial role in its mechanism of cytotoxicity; and (iii) the formation of disulfide bridges on β-sheets is vital for structural maintenance which correlates with its ability to kill cancer cells. Finally, for the convenience of experimental scientists, the ACPred web server was established and made freely available online.
Collapse
Affiliation(s)
- Nalini Schaduangrat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand.
| | - Chanin Nantasenamat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand.
| | - Virapong Prachayasittikul
- Department of Clinical Microbiology and Applied Technology, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand.
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand.
| |
Collapse
|
41
|
Abstract
AIM Hypertension is associated with development of cardiovascular disease and has become a significant health problem worldwide. Naturally-derived antihypertensive peptides have emerged as promising alternatives to synthetic drugs. MATERIALS & METHODS This study introduces predictor of antihypertensive activity of peptides constructed using random forest classifier as a function of various combinations of amino acid, dipeptide and pseudoamino acid composition descriptors. RESULTS Classification models were assessed via independent test set that demonstrated accuracy of 84.73%. Feature importance analysis revealed the preference of proline and hydrophobic amino acids at the C-terminal as well as the preference of short peptides for robust activity. CONCLUSION Model presented herein serves as a useful tool for predicting and analysis of antihypertensive activity of peptides.
Collapse
|
42
|
Angamuthu K, Piramanayagam S. Evaluation of in silico protein secondary structure prediction methods by employing statistical techniques. BIOMEDICAL AND BIOTECHNOLOGY RESEARCH JOURNAL (BBRJ) 2017. [DOI: 10.4103/bbrj.bbrj_28_17] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
|