1
|
Chen Q, Zhang Y, Gao J, Zhang J. CPPCGM: A Highly Efficient Sequence-Based Tool for Simultaneously Identifying and Generating Cell-Penetrating Peptides. J Chem Inf Model 2025; 65:3357-3369. [PMID: 40105337 DOI: 10.1021/acs.jcim.5c00199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/20/2025]
Abstract
Cell-penetrating peptides (CPPs) are usually short oligopeptides with 5-30 amino acid residues. CPPs have been proven as important drug delivery vehicles into cells through different mechanisms, demonstrating their potential as therapeutic candidates. However, experimental screening and synthesis of CPPs could be time-consuming and expensive. Recently, numerous attempts have been made to develop computational methods as a cost-effective way for screening a number of potential CPP candidates. Despite significant advancements, current methods exhibit limited feature representation capabilities, thereby constraining the potential for further performance enhancements. In this study, we developed a deep learning framework called CPPCGM, which uses protein language models (PLMs) to identify and generate novel CPPs. There are two separate blocks in this framework: CPPClassifier and CPPGenerator. The former utilizes three pretrained models for simple voting, thereby accurately categorizing CPPs and non-CPPs. The latter, similar to a generative adversarial network, including a discriminator and a generator, generates peptides that are not present in the training data set. Our proposed CPPCGM has achieved remarkably high Matthews correlation coefficient scores of 0.876, 0.923, and 0.664 on three data sets based on the classification results. Compared with the state-of-the-art methods, the performance of our method is significantly improved. The results also demonstrated the generating potential of CPPCGM through qualitative and quantitative evaluation of the generated samples. Significantly, using PLM-based methods can optimize peptides for biochemical functions, benefiting drug delivery and biomedical applications. Materials related are publicly available at https://github.com/QiufenChen/CPPCGM.
Collapse
Affiliation(s)
- Qiufen Chen
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518055, China
| | - Yuewei Zhang
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518055, China
| | - Jiali Gao
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518055, China
- School of Chemical Biology and Biotechnology, Peking University Shenzhen Graduate School, Shenzhen 518055, China
- Department of Chemistry and Supercomputing Institute, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Jun Zhang
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518055, China
| |
Collapse
|
2
|
Imre A, Balogh B, Mándity I. GraphCPP: The new state-of-the-art method for cell-penetrating peptide prediction via graph neural networks. Br J Pharmacol 2025; 182:495-509. [PMID: 39568115 DOI: 10.1111/bph.17388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 08/07/2024] [Accepted: 10/07/2024] [Indexed: 11/22/2024] Open
Abstract
BACKGROUND AND PURPOSE Cell-penetrating peptides (CPPs) are short amino acid sequences that can penetrate cell membranes and deliver molecules into cells. Several models have been developed for their discovery, yet these models often face challenges in accurately predicting membrane penetration due to the complex nature of peptide-cell interactions. Hence, there is a need for innovative approaches that can enhance predictive performance. EXPERIMENTAL APPROACH In this study, we present the application GraphCPP, a novel graph neural network (GNN) for the prediction of membrane penetration capability of peptides. KEY RESULTS A new comprehensive dataset-dubbed CPP1708-was constructed resulting in the largest reliable database of CPPs to date. Comparative analyses with previous methods, such as MLCPP2, C2Pred, CellPPD and CellPPD-Mod, demonstrated the superior predictive performance of our model. Upon testing against other published methods, GraphCPP performs exceptionally, achieving 0.5787 Matthews correlation coefficient and 0.8459 area under the curve (AUC) values on one dataset. This means a 92.8% and 23.3% improvement in Matthews correlation coefficient and AUC measures respectively compared with the next best model. The capability of the model to effectively learn peptide representations was demonstrated through t-distributed stochastic neighbour embedding plots. Additionally, the uncertainty analysis revealed that GraphCPP maintains high confidence in predictions for peptides shorter than 40 amino acids. The source code is available at https://github.com/attilaimre99/GraphCPP. CONCLUSION AND IMPLICATIONS These findings indicate the potential of GNN-based models to improve CPP penetration prediction and it may contribute towards the development of more efficient drug delivery systems.
Collapse
Affiliation(s)
- Attila Imre
- Department of Organic Chemistry, Faculty of Pharmacy, Semmelweis University, Budapest, Hungary
- Center for Health Technology Assessment, Semmelweis University, Budapest, Hungary
- Center for Pharmacology and Drug Research & Development, Semmelweis University, Budapest, Hungary
| | - Balázs Balogh
- Department of Organic Chemistry, Faculty of Pharmacy, Semmelweis University, Budapest, Hungary
- Center for Pharmacology and Drug Research & Development, Semmelweis University, Budapest, Hungary
| | - István Mándity
- Department of Organic Chemistry, Faculty of Pharmacy, Semmelweis University, Budapest, Hungary
- Center for Pharmacology and Drug Research & Development, Semmelweis University, Budapest, Hungary
- Artificial Transporters Research Group, Research Centre for Natural Sciences, Budapest, Hungary
| |
Collapse
|
3
|
Ramasundaram M, Sohn H, Madhavan T. A bird's-eye view of the biological mechanism and machine learning prediction approaches for cell-penetrating peptides. Front Artif Intell 2025; 7:1497307. [PMID: 39839972 PMCID: PMC11747587 DOI: 10.3389/frai.2024.1497307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2024] [Accepted: 12/13/2024] [Indexed: 01/23/2025] Open
Abstract
Cell-penetrating peptides (CPPs) are highly effective at passing through eukaryotic membranes with various cargo molecules, like drugs, proteins, nucleic acids, and nanoparticles, without causing significant harm. Creating drug delivery systems with CPP is associated with cancer, genetic disorders, and diabetes due to their unique chemical properties. Wet lab experiments in drug discovery methodologies are time-consuming and expensive. Machine learning (ML) techniques can enhance and accelerate the drug discovery process with accurate and intricate data quality. ML classifiers, such as support vector machine (SVM), random forest (RF), gradient-boosted decision trees (GBDT), and different types of artificial neural networks (ANN), are commonly used for CPP prediction with cross-validation performance evaluation. Functional CPP prediction is improved by using these ML strategies by using CPP datasets produced by high-throughput sequencing and computational methods. This review focuses on several ML-based CPP prediction tools. We discussed the CPP mechanism to understand the basic functioning of CPPs through cells. A comparative analysis of diverse CPP prediction methods was conducted based on their algorithms, dataset size, feature encoding, software utilities, assessment metrics, and prediction scores. The performance of the CPP prediction was evaluated based on accuracy, sensitivity, specificity, and Matthews correlation coefficient (MCC) on independent datasets. In conclusion, this review will encourage the use of ML algorithms for finding effective CPPs, which will have a positive impact on future research on drug delivery and therapeutics.
Collapse
Affiliation(s)
- Maduravani Ramasundaram
- Department of Genetic Engineering, Computational Biology Lab, School of Bioengineering, SRM Institute of Science and Technology, SRM Nagar, Chennai, India
| | - Honglae Sohn
- Department of Chemistry and Department of Carbon Materials, Chosun University, Gwangju, Republic of Korea
| | - Thirumurthy Madhavan
- Department of Genetic Engineering, Computational Biology Lab, School of Bioengineering, SRM Institute of Science and Technology, SRM Nagar, Chennai, India
| |
Collapse
|
4
|
Zhu L, Chen Z, Yang S. EnDM-CPP: A Multi-view Explainable Framework Based on Deep Learning and Machine Learning for Identifying Cell-Penetrating Peptides with Transformers and Analyzing Sequence Information. Interdiscip Sci 2024:10.1007/s12539-024-00673-4. [PMID: 39714579 DOI: 10.1007/s12539-024-00673-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Revised: 10/28/2024] [Accepted: 11/01/2024] [Indexed: 12/24/2024]
Abstract
Cell-Penetrating Peptides (CPPs) are a crucial carrier for drug delivery. Since the process of synthesizing new CPPs in the laboratory is both time- and resource-consuming, computational methods to predict potential CPPs can be used to find CPPs to enhance the development of CPPs in therapy. In this study, EnDM-CPP is proposed, which combines machine learning algorithms (SVM and CatBoost) with convolutional neural networks (CNN and TextCNN). For dataset construction, three previous CPP benchmark datasets, including CPPsite 2.0, MLCPP 2.0, and CPP924, are merged to improve the diversity and reduce homology. For feature generation, two language model-based features obtained from the Transformer architecture, including ProtT5 and ESM-2, are employed in CNN and TextCNN. Additionally, sequence features, such as CPRS, Hybrid PseAAC, KSC, etc., are input to SVM and CatBoost. Based on the result of each predictor, Logistic Regression (LR) is built to predict the final decision. The experiment results indicate that ProtT5 and ESM-2 fusion features significantly contribute to predicting CPP and that combining employed features and models demonstrates better association. On an independent test dataset comparison, EnDM-CPP achieved an accuracy of 0.9495 and a Matthews correlation coefficient of 0.9008 with an improvement of 2.23%-9.48% and 4.32%-19.02%, respectively, compared with other state-of-the-art methods. Code and data are available at https://github.com/tudou1231/EnDM-CPP.git .
Collapse
Affiliation(s)
- Lun Zhu
- School of Computer Science and Artificial Intelligence, Aliyun School of Big Data, School of Software, Changzhou University, Changzhou, 213164, China
| | - Zehua Chen
- School of Computer Science and Artificial Intelligence, Aliyun School of Big Data, School of Software, Changzhou University, Changzhou, 213164, China
| | - Sen Yang
- School of Computer Science and Artificial Intelligence, Aliyun School of Big Data, School of Software, Changzhou University, Changzhou, 213164, China.
- The Affiliated Changzhou No. 2 People's Hospital of Nanjing Medical University, Changzhou, 213164, China.
| |
Collapse
|
5
|
Shukla R, Singh TR. AlzGenPred - CatBoost-based gene classifier for predicting Alzheimer's disease using high-throughput sequencing data. Sci Rep 2024; 14:30294. [PMID: 39639110 PMCID: PMC11621786 DOI: 10.1038/s41598-024-82208-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2024] [Accepted: 12/03/2024] [Indexed: 12/07/2024] Open
Abstract
AD is a progressive neurodegenerative disorder characterized by memory loss. Due to the advancement in next-generation sequencing, an enormous amount of AD-associated genomics data is available. However, the information about the involvement of these genes in AD association is still a research topic. Therefore, AlzGenPred is developed to identify the AD-associated genes using machine-learning. A total of 13,504 features derived from eight sequence-encoding schemes were generated and evaluated using 16 machine learning algorithms. Network-based features significantly outperformed sequence-based features, effectively distinguishing AD-associated genes. In contrast, sequence-based features failed to classify accurately. To improve performance, we generated 24 fused features (6020 D) from sequence-based encodings, increasing accuracy by 5-7% using a two-step lightGBM-based recursive feature selection method. However, accuracy remained below 70% even after hyperparameter tuning. Therefore, network-based features were used to generate the CatBoost-based ML method AlzGenPred with 96.55% accuracy and 98.99% AUROC. The developed method is tested on the AlzGene dataset where it showed 96.43% accuracy. Then the model was validated using the transcriptomics dataset. AlzGenPred provides a reliable and user-friendly tool for identifying potential AD biomarkers, accelerating biomarker discovery, and advancing our understanding of AD. It is available at https://www.bioinfoindia.org/alzgenpred/ and https://github.com/shuklarohit815/AlzGenPred .
Collapse
Affiliation(s)
- Rohit Shukla
- Department of Biotechnology and Bioinformatics, Jaypee University of Information Technology (JUIT), Waknaghat, Solan, 173234, H.P., India
- Center of Excellence for Aging and Brain Repair, Morsani College of Medicine, University of South Florida, Tampa, 33613, FL, USA
| | - Tiratha Raj Singh
- Department of Biotechnology and Bioinformatics, Jaypee University of Information Technology (JUIT), Waknaghat, Solan, 173234, H.P., India.
- Centre of Healthcare Technologies and Informatics (CEHTI), Jaypee University of Information Technology (JUIT), Waknaghat, Solan, 173234, H.P., India.
| |
Collapse
|
6
|
Ma H, Zhou X, Zhang Z, Weng Z, Li G, Zhou Y, Yao Y. AI-Driven Design of Cell-Penetrating Peptides for Therapeutic Biotechnology. Int J Pept Res Ther 2024; 30:69. [DOI: 10.1007/s10989-024-10654-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/22/2024] [Indexed: 01/05/2025]
|
7
|
Zhang W, Ding Y, Wei L, Guo X, Ni F. Therapeutic peptides identification via kernel risk sensitive loss-based k-nearest neighbor model and multi-Laplacian regularization. Brief Bioinform 2024; 25:bbae534. [PMID: 39438076 PMCID: PMC11495874 DOI: 10.1093/bib/bbae534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2024] [Revised: 08/30/2024] [Accepted: 10/08/2024] [Indexed: 10/25/2024] Open
Abstract
Therapeutic peptides are therapeutic agents synthesized from natural amino acids, which can be used as carriers for precisely transporting drugs and can activate the immune system for preventing and treating various diseases. However, screening therapeutic peptides using biochemical assays is expensive, time-consuming, and limited by experimental conditions and biological samples, and there may be ethical considerations in the clinical stage. In contrast, screening therapeutic peptides using machine learning and computational methods is efficient, automated, and can accurately predict potential therapeutic peptides. In this study, a k-nearest neighbor model based on multi-Laplacian and kernel risk sensitive loss was proposed, which introduces a kernel risk loss function derived from the K-local hyperplane distance nearest neighbor model as well as combining the Laplacian regularization method to predict therapeutic peptides. The findings indicated that the suggested approach achieved satisfactory results and could effectively predict therapeutic peptide sequences.
Collapse
Affiliation(s)
- Wenyu Zhang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, No. 2006 Xiyuan Avenue, High tech Zone, Chengdu 610054, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, No.1 Chengdian Road, Kecheng District, Quzhou 324000, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, No.1 Chengdian Road, Kecheng District, Quzhou 324000, China
| | - Leyi Wei
- Macao Polytechnic University, Gomes Street, Macau Peninsula, Macau 999078, China
| | - Xiaoyi Guo
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, No.1 Chengdian Road, Kecheng District, Quzhou 324000, China
| | - Fengming Ni
- Department of Gastroenterology, The First Hospital of Jilin University, No. 71 Xinmin Street, Chaoyang District, Changchun 130021, China
| |
Collapse
|
8
|
Zou X, Ren L, Cai P, Zhang Y, Ding H, Deng K, Yu X, Lin H, Huang C. Accurately identifying hemagglutinin using sequence information and machine learning methods. Front Med (Lausanne) 2023; 10:1281880. [PMID: 38020152 PMCID: PMC10644030 DOI: 10.3389/fmed.2023.1281880] [Citation(s) in RCA: 58] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Accepted: 10/16/2023] [Indexed: 12/01/2023] Open
Abstract
Introduction Hemagglutinin (HA) is responsible for facilitating viral entry and infection by promoting the fusion between the host membrane and the virus. Given its significance in the process of influenza virus infestation, HA has garnered attention as a target for influenza drug and vaccine development. Thus, accurately identifying HA is crucial for the development of targeted vaccine drugs. However, the identification of HA using in-silico methods is still lacking. This study aims to design a computational model to identify HA. Methods In this study, a benchmark dataset comprising 106 HA and 106 non-HA sequences were obtained from UniProt. Various sequence-based features were used to formulate samples. By perform feature optimization and inputting them four kinds of machine learning methods, we constructed an integrated classifier model using the stacking algorithm. Results and discussion The model achieved an accuracy of 95.85% and with an area under the receiver operating characteristic (ROC) curve of 0.9863 in the 5-fold cross-validation. In the independent test, the model exhibited an accuracy of 93.18% and with an area under the ROC curve of 0.9793. The code can be found from https://github.com/Zouxidan/HA_predict.git. The proposed model has excellent prediction performance. The model will provide convenience for biochemical scholars for the study of HA.
Collapse
Affiliation(s)
- Xidan Zou
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Liping Ren
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu, China
| | - Peiling Cai
- School of Basic Medical Sciences, Chengdu University, Chengdu, China
| | - Yang Zhang
- Innovative Institute of Chinese Medicine and Pharmacy, Academy for Interdiscipline, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Hui Ding
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Kejun Deng
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Xiaolong Yu
- School of Materials Science and Engineering, Hainan University, Haikou, China
| | - Hao Lin
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Chengbing Huang
- School of Computer Science and Technology, Aba Teachers University, Aba, China
| |
Collapse
|
9
|
Chen S, Liao Y, Zhao J, Bin Y, Zheng C. PACVP: Prediction of Anti-Coronavirus Peptides Using a Stacking Learning Strategy With Effective Feature Representation. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:3106-3116. [PMID: 37022025 DOI: 10.1109/tcbb.2023.3238370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Due to the global outbreak of COVID-19 and its variants, antiviral peptides with anti-coronavirus activity (ACVPs) represent a promising new drug candidate for the treatment of coronavirus infection. At present, several computational tools have been developed to identify ACVPs, but the overall prediction performance is still not enough to meet the actual therapeutic application. In this study, we constructed an efficient and reliable prediction model PACVP (Prediction of Anti-CoronaVirus Peptides) for identifying ACVPs based on effective feature representation and a two-layer stacking learning framework. In the first layer, we use nine feature encoding methods with different feature representation angles to characterize the rich sequence information and fuse them into a feature matrix. Secondly, data normalization and unbalanced data processing are carried out. Next, 12 baseline models are constructed by combining three feature selection methods and four machine learning classification algorithms. In the second layer, we input the optimal probability features into the logistic regression algorithm (LR) to train the final model PACVP. The experiments show that PACVP achieves favorable prediction performance on independent test dataset, with ACC of 0.9208 and AUC of 0.9465. We hope that PACVP will become a useful method for identifying, annotating and characterizing novel ACVPs.
Collapse
|
10
|
Asrorov AM, Wang H, Zhang M, Wang Y, He Y, Sharipov M, Yili A, Huang Y. Cell penetrating peptides: Highlighting points in cancer therapy. Drug Dev Res 2023; 84:1037-1071. [PMID: 37195405 DOI: 10.1002/ddr.22076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Accepted: 04/29/2023] [Indexed: 05/18/2023]
Abstract
Cell-penetrating peptides (CPPs), first identified in HIV a few decades ago, deserved great attention in the last two decades; especially to support the penetration of anticancer drug means. In the drug delivery discipline, they have been involved in various approaches from mixing with hydrophobic drugs to the use of genetically conjugated proteins. The early classification as cationic and amphipathic CPPs has been extended to a few more classes such as hydrophobic and cyclic CPPs so far. Developing potential sequences utilized almost all methods of modern science: choosing high-efficiency peptides from natural protein sequences, sequence-based comparison, amino acid substitution, obtaining chemical and/or genetic conjugations, in silico approaches, in vitro analysis, animal experiments, etc. The bottleneck effect in this discipline reveals the complications that modern science faces in drug delivery research. Most CPP-based drug delivery systems (DDSs) efficiently inhibited tumor volume and weight in mice, but only in rare cases reduced their levels and continued further processes. The integration of chemical synthesis into the development of CPPs made a significant contribution and even reached the clinical stage as a diagnostic tool. But constrained efforts still face serious problems in overcoming biobarriers to reach further achievements. In this work, we reviewed the roles of CPPs in anticancer drug delivery, focusing on their amino acid composition and sequences. As the most suitable point, we relied on significant changes in tumor volume in mice resulting from CPPs. We provide a review of individual CPPs and/or their derivatives in a separate subsection.
Collapse
Affiliation(s)
- Akmal M Asrorov
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- Institute of Bioorganic Chemistry, AS of Uzbekistan, Tashkent, Uzbekistan
- Department of Natural Substances Chemistry, National University of Uzbekistan, Tashkent, Uzbekistan
| | - Huiyuan Wang
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
| | - Meng Zhang
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
| | - Yonghui Wang
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
| | - Yang He
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
| | - Mirkomil Sharipov
- Institute of Bioorganic Chemistry, AS of Uzbekistan, Tashkent, Uzbekistan
| | - Abulimiti Yili
- The Key Laboratory of Plant Resources and Chemistry of Arid Zone, Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, Xinjiang, China
| | - Yongzhuo Huang
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- Zhongshan Institute for Drug Discovery, Institutes of Drug Discovery and Development, Chinese Academy of Sciences, Zhongshan, China
- NMPA Key Laboratory for Quality Research and Evaluation of Pharmaceutical Excipients, Shanghai, China
| |
Collapse
|
11
|
Wang Y, Xie Y, Luo Y, Jia P, Wei J, Zhang J, Yan W, Huang J. iASMP: An interpretable in-silico predictive tool focusing on species-specific antimicrobial peptides. J Pept Sci 2023; 29:e3490. [PMID: 36994602 DOI: 10.1002/psc.3490] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 03/02/2023] [Accepted: 03/25/2023] [Indexed: 03/31/2023]
Abstract
Antimicrobial peptides (AMPs), a crucial part of the innate immune system, have been exploited as promising candidates for antibacterial agents. Many researchers have been devoting their efforts to develop novel AMPs in recent decades. In this term, many computational approaches have been developed to identify potential AMPs accurately. However, finding peptides specific to a particular bacterial species is challenging. Streptococcus mutans is a pathogen with an apparent cariogenic effect, and it is of great significance to study AMP that inhibit S. mutans for the prevention and treatment of caries. In this study, we proposed a sequence-based machine learning model, namely iASMP, to exactly identify potential anti-S. mutans peptides (ASMPs). After collecting ASMPs, the performances of models were compared by utilizing multiple feature descriptors and different classification algorithms. Among the baseline predictors, the model integrating the extra trees (ET) algorithm and the hybrid features exhibited optimal results. The feature selection method was utilized to remove redundant feature information to improve the model performance further. Finally, the proposed model achieved the maximum accuracy (ACC) of 0.962 on the training dataset and performed on the testing dataset with an ACC of 0.750. The results demonstrated that iASMP had an excellent predictive performance and was suitable for identifying potential ASMP. Furthermore, we also visualized the selected features and rationally explained the impact of individual features on the model output.
Collapse
Affiliation(s)
- Yuqiang Wang
- Key Laboratory of Dental Maxillofacial Reconstruction and Biological Intelligence Manufacturing of Gansu Province, School of Stomatology, Lanzhou University, Lanzhou, Gansu, China
| | - Yihao Xie
- The Institute of Pharmacology, Key Laboratory of Preclinical Study for New Drugs of Gansu Province, School of Basic Medical Sciences, Lanzhou University, Lanzhou, Gansu, China
| | - Yang Luo
- Key Laboratory of Dental Maxillofacial Reconstruction and Biological Intelligence Manufacturing of Gansu Province, School of Stomatology, Lanzhou University, Lanzhou, Gansu, China
| | - Pengfei Jia
- The Institute of Pharmacology, Key Laboratory of Preclinical Study for New Drugs of Gansu Province, School of Basic Medical Sciences, Lanzhou University, Lanzhou, Gansu, China
| | - Jiaqi Wei
- Key Laboratory of Dental Maxillofacial Reconstruction and Biological Intelligence Manufacturing of Gansu Province, School of Stomatology, Lanzhou University, Lanzhou, Gansu, China
| | - Jie Zhang
- Key Laboratory of Dental Maxillofacial Reconstruction and Biological Intelligence Manufacturing of Gansu Province, School of Stomatology, Lanzhou University, Lanzhou, Gansu, China
| | - Wenjin Yan
- The Institute of Pharmacology, Key Laboratory of Preclinical Study for New Drugs of Gansu Province, School of Basic Medical Sciences, Lanzhou University, Lanzhou, Gansu, China
| | - Jinqi Huang
- The Affiliated Hospital of Guangdong Medical University, Zhanjiang, Guangdong, China
| |
Collapse
|
12
|
Rodrigues CHM, Garg A, Keizer D, Pires DEV, Ascher DB. CSM-peptides: A computational approach to rapid identification of therapeutic peptides. Protein Sci 2022; 31:e4442. [PMID: 36173168 PMCID: PMC9518225 DOI: 10.1002/pro.4442] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2022] [Revised: 08/29/2022] [Accepted: 08/30/2022] [Indexed: 11/25/2022]
Abstract
Peptides are attractive alternatives for the development of new therapeutic strategies due to their versatility and low complexity of synthesis. Increasing interest in these molecules has led to the creation of large collections of experimentally characterized therapeutic peptides, which greatly contributes to development of data-driven computational approaches. Here we propose CSM-peptides, a novel machine learning method for rapid identification of eight different types of therapeutic peptides: anti-angiogenic, anti-bacterial, anti-cancer, anti-inflammatory, anti-viral, cell-penetrating, quorum sensing, and surface binding. Our method has shown to outperform existing approaches, achieving an AUC of up to 0.92 on independent blind tests, and consistent performance on cross-validation. We anticipate CSM-peptides to be of great value in helping screening large libraries to identify novel peptides with therapeutic potential and have made it freely available as a user-friendly web server and Application Programming Interface at https://biosig.lab.uq.edu.au/csm_peptides.
Collapse
Affiliation(s)
- Carlos H. M. Rodrigues
- Structural Biology and Bioinformatics, Department of BiochemistryUniversity of MelbourneMelbourneVictoriaAustralia
- Systems and Computational Biology, Bio21 Institute, University of MelbourneMelbourneVictoriaAustralia
- Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneVictoriaAustralia
- School of Chemistry and Molecular BiosciencesUniversity of QueenslandSt LuciaQueenslandAustralia
| | - Anjali Garg
- Structural Biology and Bioinformatics, Department of BiochemistryUniversity of MelbourneMelbourneVictoriaAustralia
- Systems and Computational Biology, Bio21 Institute, University of MelbourneMelbourneVictoriaAustralia
| | - David Keizer
- Structural Biology and Bioinformatics, Department of BiochemistryUniversity of MelbourneMelbourneVictoriaAustralia
- Systems and Computational Biology, Bio21 Institute, University of MelbourneMelbourneVictoriaAustralia
| | - Douglas E. V. Pires
- Systems and Computational Biology, Bio21 Institute, University of MelbourneMelbourneVictoriaAustralia
- Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneVictoriaAustralia
- School of Computing and Information SystemsUniversity of MelbourneMelbourneVictoriaAustralia
| | - David B. Ascher
- Structural Biology and Bioinformatics, Department of BiochemistryUniversity of MelbourneMelbourneVictoriaAustralia
- Systems and Computational Biology, Bio21 Institute, University of MelbourneMelbourneVictoriaAustralia
- Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneVictoriaAustralia
- School of Chemistry and Molecular BiosciencesUniversity of QueenslandSt LuciaQueenslandAustralia
| |
Collapse
|
13
|
Schaduangrat N, Anuwongcharoen N, Moni MA, Lio' P, Charoenkwan P, Shoombuatong W. StackPR is a new computational approach for large-scale identification of progesterone receptor antagonists using the stacking strategy. Sci Rep 2022; 12:16435. [PMID: 36180453 PMCID: PMC9525257 DOI: 10.1038/s41598-022-20143-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 09/09/2022] [Indexed: 11/24/2022] Open
Abstract
Progesterone receptors (PRs) are implicated in various cancers since their presence/absence can determine clinical outcomes. The overstimulation of progesterone can facilitate oncogenesis and thus, its modulation through PR inhibition is urgently needed. To address this issue, a novel stacked ensemble learning approach (termed StackPR) is presented for fast, accurate, and large-scale identification of PR antagonists using only SMILES notation without the need for 3D structural information. We employed six popular machine learning (ML) algorithms (i.e., logistic regression, partial least squares, k-nearest neighbor, support vector machine, extremely randomized trees, and random forest) coupled with twelve conventional molecular descriptors to create 72 baseline models. Then, a genetic algorithm in conjunction with the self-assessment-report approach was utilized to determine m out of the 72 baseline models as means of developing the final meta-predictor using the stacking strategy and tenfold cross-validation test. Experimental results on the independent test dataset show that StackPR achieved impressive predictive performance with an accuracy of 0.966 and Matthew's coefficient correlation of 0.925. In addition, analysis based on the SHapley Additive exPlanation algorithm and molecular docking indicates that aliphatic hydrocarbons and nitrogen-containing substructures were the most important features for having PR antagonist activity. Finally, we implemented an online webserver using StackPR, which is freely accessible at http://pmlabstack.pythonanywhere.com/StackPR . StackPR is anticipated to be a powerful computational tool for the large-scale identification of unknown PR antagonist candidates for follow-up experimental validation.
Collapse
Affiliation(s)
- Nalini Schaduangrat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Nuttapat Anuwongcharoen
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Mohammad Ali Moni
- Artificial Intelligence & Digital Health Data Science, School of Health and Rehabilitation Sciences, Faculty of Health and Behavioural Sciences, The University of Queensland, St Lucia, QLD, 4072, Australia
| | - Pietro Lio'
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK
| | - Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, 50200, Thailand.
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| |
Collapse
|
14
|
Antimicrobial peptides with cell-penetrating activity as prophylactic and treatment drugs. Biosci Rep 2022; 42:231731. [PMID: 36052730 PMCID: PMC9508529 DOI: 10.1042/bsr20221789] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Revised: 08/31/2022] [Accepted: 09/01/2022] [Indexed: 01/18/2023] Open
Abstract
Health is fundamental for the development of individuals and evolution of species. In that sense, for human societies is relevant to understand how the human body has developed molecular strategies to maintain health. In the present review, we summarize diverse evidence that support the role of peptides in this endeavor. Of particular interest to the present review are antimicrobial peptides (AMP) and cell-penetrating peptides (CPP). Different experimental evidence indicates that AMP/CPP are able to regulate autophagy, which in turn regulates the immune system response. AMP also assists in the establishment of the microbiota, which in turn is critical for different behavioral and health aspects of humans. Thus, AMP and CPP are multifunctional peptides that regulate two aspects of our bodies that are fundamental to our health: autophagy and microbiota. While it is now clear the multifunctional nature of these peptides, we are still in the early stages of the development of computational strategies aimed to assist experimentalists in identifying selective multifunctional AMP/CPP to control nonhealthy conditions. For instance, both AMP and CPP are computationally characterized as amphipatic and cationic, yet none of these features are relevant to differentiate these peptides from non-AMP or non-CPP. The present review aims to highlight current knowledge that may facilitate the development of AMP’s design tools for preventing or treating illness.
Collapse
|
15
|
Hasan MM, Tsukiyama S, Cho JY, Kurata H, Alam MA, Liu X, Manavalan B, Deng HW. Deepm5C: A deep-learning-based hybrid framework for identifying human RNA N5-methylcytosine sites using a stacking strategy. Mol Ther 2022; 30:2856-2867. [PMID: 35526094 PMCID: PMC9372321 DOI: 10.1016/j.ymthe.2022.05.001] [Citation(s) in RCA: 54] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2021] [Revised: 04/25/2022] [Accepted: 05/03/2022] [Indexed: 11/30/2022] Open
Abstract
As one of the most prevalent post-transcriptional epigenetic modifications, N5-methylcytosine (m5C) plays an essential role in various cellular processes and disease pathogenesis. Therefore, it is important accurately identify m5C modifications in order to gain a deeper understanding of cellular processes and other possible functional mechanisms. Although a few computational methods have been proposed, their respective models have been developed using small training datasets. Hence, their practical application is quite limited in genome-wide detection. To overcome the existing limitations, we propose Deepm5C, a bioinformatics method for identifying RNA m5C sites throughout the human genome. To develop Deepm5C, we constructed a novel benchmarking dataset and investigated a mixture of three conventional feature-encoding algorithms and a feature derived from word-embedding approaches. Afterward, four variants of deep-learning classifiers and four commonly used conventional classifiers were employed and trained with the four encodings, ultimately obtaining 32 baseline models. A stacking strategy is effectively utilized by integrating the predicted output of the optimal baseline models and trained with a one-dimensional (1D) convolutional neural network. As a result, the Deepm5C predictor achieved excellent performance during cross-validation with a Matthews correlation coefficient and an accuracy of 0.697 and 0.855, respectively. The corresponding metrics during the independent test were 0.691 and 0.852, respectively. Overall, Deepm5C achieved a more accurate and stable performance than the baseline models and significantly outperformed the existing predictors, demonstrating the effectiveness of our proposed hybrid framework. Furthermore, Deepm5C is expected to assist community-wide efforts in identifying putative m5Cs and to formulate the novel testable biological hypothesis.
Collapse
Affiliation(s)
- Md Mehedi Hasan
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA 70112, USA.
| | - Sho Tsukiyama
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
| | - Jae Youl Cho
- Molecular Immunology Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Korea
| | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
| | - Md Ashad Alam
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA 70112, USA
| | - Xiaowen Liu
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA 70112, USA
| | - Balachandran Manavalan
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Korea.
| | - Hong-Wen Deng
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA 70112, USA.
| |
Collapse
|
16
|
Chen Q, Yang C, Xie Y, Wang Y, Li X, Wang K, Huang J, Yan W. GM-Pep: A High Efficiency Strategy to De Novo Design Functional Peptide Sequences. J Chem Inf Model 2022; 62:2617-2629. [PMID: 35533298 DOI: 10.1021/acs.jcim.2c00089] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Although peptides are regarded as ideal therapeutic agents, only a small proportion of the marketed drugs are peptides. In the past decade, pharmacists have paid great attention to the development of peptide therapeutics. Except a few approved chemically/rationally designed peptides, most attempts failed due to unsatisfactory efficacy or safety. Luckily, computation methods, such as artificial intelligence, have been utilized to accelerate the discovery of therapeutic peptides by predicting the activity, toxicity, and absorption, distribution, metabolism, and excretion of polypeptides. Usually, a specific biological activity of a peptide could be accurately determined by an interest-oriented binary classification constructed of a positive set and another un-experimentally validated negative set regardless of other characteristics, which suggests that it could be challenging to realize the comprehensive evaluation of the research object in the early stage of drug research and development. Herein, we proposed an integrated method (GM-Pep) that contained a conditional variational autoencoder model (CVAE) and a positive sample training multiclassifier (Deep-Multiclassifier) to effectively generate a single bioactive peptide sequence without toxicity and referential side effects. The results showed that our Deep-Multiclassifier model gave a sequence accuracy of up to 96.41% [toxicity (94.48%), antifungal (96.58%), antihypertensive (97.18%), and antibacterial (96.91%), respectively]. The properties of Deep-Multiclassifier and CVAE were validated through 12 first synthesized antibacterial peptides or compared to random peptides. The source code and data sets are available at https://github.com/TimothyChen225/GM-Pep.
Collapse
Affiliation(s)
- Qushuo Chen
- The Institute of Pharmacology, Key Laboratory of Preclinical Study for New Drugs of Gansu Province, School of Basic Medical Sciences, Lanzhou University, Lanzhou, Gansu 730000, China
| | - Changyan Yang
- The Institute of Pharmacology, Key Laboratory of Preclinical Study for New Drugs of Gansu Province, School of Basic Medical Sciences, Lanzhou University, Lanzhou, Gansu 730000, China
| | - Yihao Xie
- The Institute of Pharmacology, Key Laboratory of Preclinical Study for New Drugs of Gansu Province, School of Basic Medical Sciences, Lanzhou University, Lanzhou, Gansu 730000, China
| | - Yuqiang Wang
- School of Stomatology, Lanzhou University,Lanzhou, Gansu 730000, China
| | - Xiaoxu Li
- School of Computer and Communication, Lanzhou University of Technology, Lanzhou, Gansu 730050, China
| | - Kairong Wang
- The Institute of Pharmacology, Key Laboratory of Preclinical Study for New Drugs of Gansu Province, School of Basic Medical Sciences, Lanzhou University, Lanzhou, Gansu 730000, China
| | - Jinqi Huang
- Department of Hematology, Affiliated Hospital of Guangdong Medical University, Zhanjiang, Guangdong 524000, China
| | - Wenjin Yan
- The Institute of Pharmacology, Key Laboratory of Preclinical Study for New Drugs of Gansu Province, School of Basic Medical Sciences, Lanzhou University, Lanzhou, Gansu 730000, China
| |
Collapse
|
17
|
de Oliveira ECL, da Costa KS, Taube PS, Lima AH, Junior CDSDS. Biological Membrane-Penetrating Peptides: Computational Prediction and Applications. Front Cell Infect Microbiol 2022; 12:838259. [PMID: 35402305 PMCID: PMC8992797 DOI: 10.3389/fcimb.2022.838259] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Accepted: 02/21/2022] [Indexed: 12/14/2022] Open
Abstract
Peptides comprise a versatile class of biomolecules that present a unique chemical space with diverse physicochemical and structural properties. Some classes of peptides are able to naturally cross the biological membranes, such as cell membrane and blood-brain barrier (BBB). Cell-penetrating peptides (CPPs) and blood-brain barrier-penetrating peptides (B3PPs) have been explored by the biotechnological and pharmaceutical industries to develop new therapeutic molecules and carrier systems. The computational prediction of peptides’ penetration into biological membranes has been emerged as an interesting strategy due to their high throughput and low-cost screening of large chemical libraries. Structure- and sequence-based information of peptides, as well as atomistic biophysical models, have been explored in computer-assisted discovery strategies to classify and identify new structures with pharmacokinetic properties related to the translocation through biomembranes. Computational strategies to predict the permeability into biomembranes include cheminformatic filters, molecular dynamics simulations, artificial intelligence algorithms, and statistical models, and the choice of the most adequate method depends on the purposes of the computational investigation. Here, we exhibit and discuss some principles and applications of these computational methods widely used to predict the permeability of peptides into biomembranes, exhibiting some of their pharmaceutical and biotechnological applications.
Collapse
Affiliation(s)
- Ewerton Cristhian Lima de Oliveira
- Institute of Technology, Federal University of Pará, Belém, Brazil
- *Correspondence: Kauê Santana da Costa, ; Ewerton Cristhian Lima de Oliveira,
| | - Kauê Santana da Costa
- Laboratory of Computational Simulation, Institute of Biodiversity, Federal University of Western Pará, Santarém, Brazil
- *Correspondence: Kauê Santana da Costa, ; Ewerton Cristhian Lima de Oliveira,
| | - Paulo Sérgio Taube
- Laboratory of Computational Simulation, Institute of Biodiversity, Federal University of Western Pará, Santarém, Brazil
| | - Anderson H. Lima
- Laboratório de Planejamento e Desenvolvimento de Fármacos, Instituto de Ciências Exatas e Naturais, Universidade Federal do Pará, Belém, Brazil
| | | |
Collapse
|
18
|
Abstract
Background:
Therapeutic peptide prediction is critical for drug development and therapy. Researchers have been studying this essential task, developing several computational methods to identify different therapeutic peptide types.
Objective:
Most predictors are the specific methods for certain peptides. Currently, developing methods to predict the presence of multiple peptides remains a challenging problem. Moreover, it is still challenging to combine different features to make the therapeutic prediction.
Method:
In this paper, we proposed a new ensemble method TP-MV for general therapeutic peptide recognition. TP-MV is developed using the stacking framework in conjunction with the KNN, SVM, ET, RF, and XGB. Then TP-MV constructs a multi-view learning model as meta-classifiers to extract the discriminative feature for different peptides.
Results:
In the experiment, the proposed method outperforms the other existing methods on the benchmark datasets, indicating that the proposed method has the ability to predict multiple therapeutic peptides simultaneously.
Conclusion:
The TP-MV is a useful tool for predicting therapeutic peptides.
Collapse
Affiliation(s)
- Ke Yan
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Hongwu Lv
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Yichen Guo
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Jie Wen
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
19
|
Charoenkwan P, Nantasenamat C, Hasan MM, Moni MA, Manavalan B, Shoombuatong W. UMPred-FRL: A New Approach for Accurate Prediction of Umami Peptides Using Feature Representation Learning. Int J Mol Sci 2021; 22:ijms222313124. [PMID: 34884927 PMCID: PMC8658322 DOI: 10.3390/ijms222313124] [Citation(s) in RCA: 49] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Revised: 12/01/2021] [Accepted: 12/02/2021] [Indexed: 11/16/2022] Open
Abstract
Umami ingredients have been identified as important factors in food seasoning and production. Traditional experimental methods for characterizing peptides exhibiting umami sensory properties (umami peptides) are time-consuming, laborious, and costly. As a result, it is preferable to develop computational tools for the large-scale identification of available sequences in order to identify novel peptides with umami sensory properties. Although a computational tool has been developed for this purpose, its predictive performance is still insufficient. In this study, we use a feature representation learning approach to create a novel machine-learning meta-predictor called UMPred-FRL for improved umami peptide identification. We combined six well-known machine learning algorithms (extremely randomized trees, k-nearest neighbor, logistic regression, partial least squares, random forest, and support vector machine) with seven different feature encodings (amino acid composition, amphiphilic pseudo-amino acid composition, dipeptide composition, composition-transition-distribution, and pseudo-amino acid composition) to develop the final meta-predictor. Extensive experimental results demonstrated that UMPred-FRL was effective and achieved more accurate performance on the benchmark dataset compared to its baseline models, and consistently outperformed the existing method on the independent test dataset. Finally, to aid in the high-throughput identification of umami peptides, the UMPred-FRL web server was established and made freely available online. It is expected that UMPred-FRL will be a powerful tool for the cost-effective large-scale screening of candidate peptides with potential umami sensory properties.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai 50200, Thailand;
| | - Chanin Nantasenamat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand;
| | - Md Mehedi Hasan
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA 70112, USA;
| | - Mohammad Ali Moni
- Artificial Intelligence & Digital Health Data Science, School of Health and Rehabilitation Sciences, Faculty of Health and Behavioural Sciences, The University of Queensland, St Lucia, QLD 4072, Australia;
| | - Balachandran Manavalan
- Department of Physiology, Ajou University School of Medicine, Suwon 16499, Korea
- Correspondence: (B.M.); (W.S.)
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand;
- Correspondence: (B.M.); (W.S.)
| |
Collapse
|
20
|
Chen L, Guo X, Wang L, Geng J, Wu J, Hu B, Wang T, Li J, Liu C, Wang H. In silico identification and experimental validation of cellular uptake by a new cell penetrating peptide P1 derived from MARCKS. Drug Deliv 2021; 28:1637-1648. [PMID: 34338123 PMCID: PMC8330795 DOI: 10.1080/10717544.2021.1960922] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2021] [Revised: 07/13/2021] [Accepted: 07/19/2021] [Indexed: 12/28/2022] Open
Abstract
Viral vectors for vaccine delivery are challenged by recently reported safety issues like immunogenicity and risk for cancer development, and thus there is a growing need for the development of non-viral vectors. Cell penetrating peptides (CPPs) are non-viral vectors that can enter plasma membranes efficiently and deliver a broad range of cargoes. Our bioinformatic prediction and wet-lab validation data suggested that peptide P1 derived from MARCKS protein phosphorylation site domain is a new potential CPP candidate. We found that peptide P1 can efficiently internalize into various cell lines in a concentration-dependent manner. Receptor-mediated endocytosis pathway is the major mechanism of P1 penetration, although P1 also directly penetrates the plasma membrane. We also found that peptide P1 has low cytotoxicity in cultured cell lines as well as mouse red blood cells. Furthermore, peptide P1 not only can enter into cultured cells itself, but it also can interact with plasmid DNA and mediate the functional delivery of plasmid DNA into cultured cells, even in hard-to-transfect cells. Combined, these findings indicate that P1 may be a promising vector for efficient intracellular delivery of bioactive cargos.
Collapse
Affiliation(s)
- Linlin Chen
- Department of Pathology and Immunology, Medical School, China Three Gorges University, Yichang, China
- Hubei Key Laboratory of Tumor Microenvironment and Immunotherapy, China Three Gorges University, Yichang, China
- Affiliated Ren He Hospital of China Three Gorges University, Yichang, China
| | - Xiangli Guo
- Department of Pathology and Immunology, Medical School, China Three Gorges University, Yichang, China
- Hubei Key Laboratory of Tumor Microenvironment and Immunotherapy, China Three Gorges University, Yichang, China
| | - Lidan Wang
- Department of Pathology and Immunology, Medical School, China Three Gorges University, Yichang, China
- Hubei Key Laboratory of Tumor Microenvironment and Immunotherapy, China Three Gorges University, Yichang, China
| | - Jingping Geng
- Department of Pathology and Immunology, Medical School, China Three Gorges University, Yichang, China
- Hubei Key Laboratory of Tumor Microenvironment and Immunotherapy, China Three Gorges University, Yichang, China
| | - Jiao Wu
- Affiliated Ren He Hospital of China Three Gorges University, Yichang, China
| | - Bin Hu
- Affiliated Ren He Hospital of China Three Gorges University, Yichang, China
| | - Tao Wang
- The First Clinical Medical College of China Three Gorges University, Yichang, China
| | - Jason Li
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Changbai Liu
- Department of Pathology and Immunology, Medical School, China Three Gorges University, Yichang, China
- Hubei Key Laboratory of Tumor Microenvironment and Immunotherapy, China Three Gorges University, Yichang, China
| | - Hu Wang
- Department of Pathology and Immunology, Medical School, China Three Gorges University, Yichang, China
| |
Collapse
|
21
|
Guo X, Chen L, Wang L, Geng J, Wang T, Hu J, Li J, Liu C, Wang H. In silico identification and experimental validation of cellular uptake and intracellular labeling by a new cell penetrating peptide derived from CDN1. Drug Deliv 2021; 28:1722-1736. [PMID: 34463179 PMCID: PMC8409945 DOI: 10.1080/10717544.2021.1963352] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 07/25/2021] [Accepted: 07/26/2021] [Indexed: 12/18/2022] Open
Abstract
Bioactive therapeutic molecules are generally impermeable to the cell membrane, hindering their utility and efficacy. A group of peptides called cell-penetrating peptides (CPPs) were found to have the capability of transporting different types of cargo molecules across the cell membrane. Here, we identified a short peptide named P2, which has a higher proportion of basic residues than the CDN1 (cyclin-dependent kinase inhibitor 1) protein it is derived from, and we used bioinformatic analysis and experimental validation to confirm the penetration property of peptide P2. We found that peptide P2 can efficiently enter different cell lines in a concentration-dependent manner. The endocytosis pathway, especially receptor-related endocytosis, may be involved in the process of P2 penetration. Our data also showed that peptide P2 is safe in cultured cell lines and red blood cells. Lastly, peptide P2 can efficiently deliver self-labeling protein HaloTag into cells for imaging. Our study illustrates that peptide P2 is a promising imaging agent delivery vehicle for future applications.
Collapse
Affiliation(s)
- Xiangli Guo
- Department of Pathology and Immunology, Medical School, China Three Gorges University, Yichang, China
- Hubei Key Lab of Tumor Microenvironment and Immunotherapy, China Three Gorges University, Yichang, China
| | - Linlin Chen
- Hubei Key Lab of Tumor Microenvironment and Immunotherapy, China Three Gorges University, Yichang, China
- Affiliated Ren He Hospital of China Three Gorges University, Yichang, China
| | - Lidan Wang
- Department of Pathology and Immunology, Medical School, China Three Gorges University, Yichang, China
- Hubei Key Lab of Tumor Microenvironment and Immunotherapy, China Three Gorges University, Yichang, China
| | - Jingping Geng
- Department of Pathology and Immunology, Medical School, China Three Gorges University, Yichang, China
- Hubei Key Lab of Tumor Microenvironment and Immunotherapy, China Three Gorges University, Yichang, China
| | - Tao Wang
- The First Clinical Medical College of China Three Gorges University, Yichang, China
| | - Jixiong Hu
- College of Life Science, Yangtze University, Jingzhou, China
| | - Jason Li
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Changbai Liu
- Hubei Key Lab of Tumor Microenvironment and Immunotherapy, China Three Gorges University, Yichang, China
| | - Hu Wang
- Department of Pathology and Immunology, Medical School, China Three Gorges University, Yichang, China
- Lead Contact
| |
Collapse
|
22
|
Guo Y, Yan K, Lv H, Liu B. PreTP-EL: prediction of therapeutic peptides based on ensemble learning. Brief Bioinform 2021; 22:6359002. [PMID: 34459488 DOI: 10.1093/bib/bbab358] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Revised: 07/27/2021] [Accepted: 08/11/2021] [Indexed: 01/02/2023] Open
Abstract
Therapeutic peptides are important for understanding the correlation between peptides and their therapeutic diagnostic potential. The therapeutic peptides can be further divided into different types based on therapeutic function sharing different characteristics. Although some computational approaches have been proposed to predict different types of therapeutic peptides, they failed to accurately predict all types of therapeutic peptides. In this study, a predictor called PreTP-EL has been proposed via employing the ensemble learning approach to fuse the different features and machine learning techniques in order to capture the different characteristics of various therapeutic peptides. Experimental results showed that PreTP-EL outperformed other competing methods. Availability and implementation: A user-friendly web-server of PreTP-EL predictor is available at http://bliulab.net/PreTP-EL.
Collapse
Affiliation(s)
- Yichen Guo
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Ke Yan
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Hongwu Lv
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
23
|
Predicting Cross-Species Infection of Swine Influenza Virus with Representation Learning of Amino Acid Features. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:6985008. [PMID: 34671417 PMCID: PMC8523279 DOI: 10.1155/2021/6985008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Revised: 09/27/2021] [Accepted: 09/28/2021] [Indexed: 11/17/2022]
Abstract
Swine influenza viruses (SIVs) can unforeseeably cross the species barriers and directly infect humans, which pose huge challenges for public health and trigger pandemic risk at irregular intervals. Computational tools are needed to predict infection phenotype and early pandemic risk of SIVs. For this purpose, we propose a feature representation algorithm to predict cross-species infection of SIVs. We built a high-quality dataset of 1902 viruses. A feature representation learning scheme was applied to learn feature representations from 64 well-trained random forest models with multiple feature descriptors of mutant amino acid in the viral proteins, including compositional information, position-specific information, and physicochemical properties. Class and probabilistic information were integrated into the feature representations, and redundant features were removed by feature space optimization. High performance was achieved using 20 informative features and 22 probabilistic information. The proposed method will facilitate SIV characterization of transmission phenotype.
Collapse
|
24
|
Xue Y, Ye X, Wei L, Zhang X, Sakurai T, Wei L. Better Performance with Transformer: CPPFormer in precise prediction of cell-Penetrating Peptides. Curr Med Chem 2021; 29:881-893. [PMID: 34544332 DOI: 10.2174/0929867328666210920103140] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Revised: 07/28/2021] [Accepted: 08/07/2021] [Indexed: 11/22/2022]
Abstract
With its superior performance, the Transformer model, which is based on the 'Encoder-Decoder' paradigm, has become the mainstream in natural language processing. On the other hand, bioinformatics has embraced machine learning and made great progress in drug design and protein property prediction. Cell-penetrating peptides (CPPs) are one kind of permeable protein that is convenient as a kind of 'postman' in drug penetration tasks. However, a small number of CPPs have been discovered by research, let alone practical applications in drug permeability. Therefore, correctly identifying the CPPs has opened up a new way to take macromolecules into cells without other potentially harmful materials in the drug. Most of the previous work only uses trivial machine learning techniques and hand-crafted features to construct a simple classifier. In CPPFormer, we learn from the idea of implementing the attention structure of Transformer, rebuilding the network based on the characteristics of CPPs according to its short length, and using an automatic feature extractor with a few manual engineered features to co-direct the predicted results. Compared to all previous methods and other classic text classification models, the empirical result has shown that our proposed deep model-based method has achieved the best performance of 92.16% accuracy in the CPP924 dataset and has passed various index tests.
Collapse
Affiliation(s)
- Yuyang Xue
- Department of Computer Science, University of Tsukuba, Tsukuba. Japan
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba. Japan
| | - Lesong Wei
- Department of Computer Science, University of Tsukuba, Tsukuba. Japan
| | - Xin Zhang
- School of Software, Shandong University, Jinan. China
| | - Tetsuya Sakurai
- Department of Computer Science, University of Tsukuba, Tsukuba. Japan
| | - Leyi Wei
- School of Software, Shandong University, Jinan. China
| |
Collapse
|
25
|
Zhao YW, Zhang S, Ding H. Recent development of machine learning methods in sumoylation sites prediction. Curr Med Chem 2021; 29:894-907. [PMID: 34525906 DOI: 10.2174/0929867328666210915112030] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 07/24/2021] [Accepted: 08/07/2021] [Indexed: 11/22/2022]
Abstract
Sumoylation of proteins is an important reversible post-translational modification of proteins and mediates a variety of cellular processes. Sumo-modified proteins can change their subcellular localization, activity and stability. In addition, it also plays an important role in various cellular processes such as transcriptional regulation and signal transduction. The abnormal sumoylation is involved in many diseases, including neurodegeneration and immune-related diseases, as well as the development of cancer. Therefore, identification of the sumoylation site (SUMO site) is fundamental to understanding their molecular mechanisms and regulatory roles. In contrast to labor-intensive and costly experimental approaches, computational prediction of sumoylation sites in silico also attracted much attention for its accuracy, convenience and speed. At present, many computational prediction models have been used to identify SUMO sites, but these contents have not been comprehensively summarized and reviewed. Therefore, the research progress of relevant models is summarized and discussed in this paper. We will briefly summarize the development of bioinformatics methods on sumoylation site prediction. We will mainly focus on the benchmark dataset construction, feature extraction, machine learning method, published results and online tools. We hope the review will provide more help for wet-experimental scholars.
Collapse
Affiliation(s)
- Yi-Wei Zhao
- School of Medicine, University of Electronic Science and Technology of China, Chengdu 610054. China
| | - Shihua Zhang
- College of Life Science and Health, Wuhan University of Science and Technology, Wuhan 430065. China
| | - Hui Ding
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| |
Collapse
|
26
|
Su R, Hu J, Zou Q, Manavalan B, Wei L. Empirical comparison and analysis of web-based cell-penetrating peptide prediction tools. Brief Bioinform 2021; 21:408-420. [PMID: 30649170 DOI: 10.1093/bib/bby124] [Citation(s) in RCA: 122] [Impact Index Per Article: 30.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2018] [Revised: 11/30/2018] [Accepted: 11/30/2018] [Indexed: 12/16/2022] Open
Abstract
Cell-penetrating peptides (CPPs) facilitate the delivery of therapeutically relevant molecules, including DNA, proteins and oligonucleotides, into cells both in vitro and in vivo. This unique ability explores the possibility of CPPs as therapeutic delivery and its potential applications in clinical therapy. Over the last few decades, a number of machine learning (ML)-based prediction tools have been developed, and some of them are freely available as web portals. However, the predictions produced by various tools are difficult to quantify and compare. In particular, there is no systematic comparison of the web-based prediction tools in performance, especially in practical applications. In this work, we provide a comprehensive review on the biological importance of CPPs, CPP database and existing ML-based methods for CPP prediction. To evaluate current prediction tools, we conducted a comparative study and analyzed a total of 12 models from 6 publicly available CPP prediction tools on 2 benchmark validation sets of CPPs and non-CPPs. Our benchmarking results demonstrated that a model from the KELM-CPPpred, namely KELM-hybrid-AAC, showed a significant improvement in overall performance, when compared to the other 11 prediction models. Moreover, through a length-dependency analysis, we find that existing prediction tools tend to more accurately predict CPPs and non-CPPs with the length of 20-25 residues long than peptides in other length ranges.
Collapse
Affiliation(s)
- Ran Su
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Jie Hu
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | | | - Leyi Wei
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
27
|
B3Pred: A Random-Forest-Based Method for Predicting and Designing Blood-Brain Barrier Penetrating Peptides. Pharmaceutics 2021; 13:pharmaceutics13081237. [PMID: 34452198 PMCID: PMC8399279 DOI: 10.3390/pharmaceutics13081237] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Revised: 07/07/2021] [Accepted: 07/14/2021] [Indexed: 12/14/2022] Open
Abstract
The blood–brain barrier is a major obstacle in treating brain-related disorders, as it does not allow the delivery of drugs into the brain. We developed a method for predicting blood–brain barrier penetrating peptides to facilitate drug delivery into the brain. These blood–brain barrier penetrating peptides (B3PPs) can act as therapeutics, as well as drug delivery agents. We trained, tested, and evaluated our models on blood–brain barrier peptides obtained from the B3Pdb database. First, we computed a wide range of peptide features. Then, we selected relevant peptide features. Finally, we developed numerous machine-learning-based models for predicting blood–brain barrier peptides using the selected features. The random-forest-based model performed the best with respect to the top 80 selected features and achieved a maximal 85.08% accuracy with an AUROC of 0.93. We also developed a webserver, B3pred, that implements our best models. It has three major modules that allow users to predict/design B3PPs and scan B3PPs in a protein sequence.
Collapse
|
28
|
Song B, Li Z, Lin X, Wang J, Wang T, Fu X. Pretraining model for biological sequence data. Brief Funct Genomics 2021; 20:181-195. [PMID: 34050350 PMCID: PMC8194843 DOI: 10.1093/bfgp/elab025] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Revised: 04/13/2021] [Accepted: 04/21/2021] [Indexed: 12/26/2022] Open
Abstract
With the development of high-throughput sequencing technology, biological sequence data reflecting life information becomes increasingly accessible. Particularly on the background of the COVID-19 pandemic, biological sequence data play an important role in detecting diseases, analyzing the mechanism and discovering specific drugs. In recent years, pretraining models that have emerged in natural language processing have attracted widespread attention in many research fields not only to decrease training cost but also to improve performance on downstream tasks. Pretraining models are used for embedding biological sequence and extracting feature from large biological sequence corpus to comprehensively understand the biological sequence data. In this survey, we provide a broad review on pretraining models for biological sequence data. Moreover, we first introduce biological sequences and corresponding datasets, including brief description and accessible link. Subsequently, we systematically summarize popular pretraining models for biological sequences based on four categories: CNN, word2vec, LSTM and Transformer. Then, we present some applications with proposed pretraining models on downstream tasks to explain the role of pretraining models. Next, we provide a novel pretraining scheme for protein sequences and a multitask benchmark for protein pretraining models. Finally, we discuss the challenges and future directions in pretraining models for biological sequences.
Collapse
Affiliation(s)
| | | | | | | | | | - Xiangzheng Fu
- Corresponding author: Xiangzheng Fu, College of Information Science and Engineering, Hunan University, Changsha, Hunan, China. Tel: 86-0731-88821907; E-mail:
| |
Collapse
|
29
|
Mu Z, Yu T, Liu X, Zheng H, Wei L, Liu J. FEGS: a novel feature extraction model for protein sequences and its applications. BMC Bioinformatics 2021; 22:297. [PMID: 34078264 PMCID: PMC8172329 DOI: 10.1186/s12859-021-04223-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2021] [Accepted: 05/28/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Feature extraction of protein sequences is widely used in various research areas related to protein analysis, such as protein similarity analysis and prediction of protein functions or interactions. RESULTS In this study, we introduce FEGS (Feature Extraction based on Graphical and Statistical features), a novel feature extraction model of protein sequences, by developing a new technique for graphical representation of protein sequences based on the physicochemical properties of amino acids and effectively employing the statistical features of protein sequences. By fusing the graphical and statistical features, FEGS transforms a protein sequence into a 578-dimensional numerical vector. When FEGS is applied to phylogenetic analysis on five protein sequence data sets, its performance is notably better than all of the other compared methods. CONCLUSION The FEGS method is carefully designed, which is practically powerful for extracting features of protein sequences. The current version of FEGS is developed to be user-friendly and is expected to play a crucial role in the related studies of protein sequence analyses.
Collapse
Affiliation(s)
- Zengchao Mu
- School of Mathematics and Statistics, Shandong University, Weihai, 264209, China
| | - Ting Yu
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao, 266237, China
| | - Xiaoping Liu
- Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Beijing, China
| | - Hongyu Zheng
- Department of Radiation Oncology, Qilu Hospital, Cheeloo College of Medicine, Shandong University, Jinan, 250012, China
| | - Leyi Wei
- School of Software, Shandong University, Jinan, China.
| | - Juntao Liu
- School of Mathematics and Statistics, Shandong University, Weihai, 264209, China.
| |
Collapse
|
30
|
Hasan MM, Alam MA, Shoombuatong W, Deng HW, Manavalan B, Kurata H. NeuroPred-FRL: an interpretable prediction model for identifying neuropeptide using feature representation learning. Brief Bioinform 2021; 22:6272801. [PMID: 33975333 DOI: 10.1093/bib/bbab167] [Citation(s) in RCA: 57] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2021] [Revised: 03/23/2021] [Accepted: 04/09/2021] [Indexed: 12/13/2022] Open
Abstract
Neuropeptides (NPs) are the most versatile neurotransmitters in the immune systems that regulate various central anxious hormones. An efficient and effective bioinformatics tool for rapid and accurate large-scale identification of NPs is critical in immunoinformatics, which is indispensable for basic research and drug development. Although a few NP prediction tools have been developed, it is mandatory to improve their NPs' prediction performances. In this study, we have developed a machine learning-based meta-predictor called NeuroPred-FRL by employing the feature representation learning approach. First, we generated 66 optimal baseline models by employing 11 different encodings, six different classifiers and a two-step feature selection approach. The predicted probability scores of NPs based on the 66 baseline models were combined to be deemed as the input feature vector. Second, in order to enhance the feature representation ability, we applied the two-step feature selection approach to optimize the 66-D probability feature vector and then inputted the optimal one into a random forest classifier for the final meta-model (NeuroPred-FRL) construction. Benchmarking experiments based on both cross-validation and independent tests indicate that the NeuroPred-FRL achieves a superior prediction performance of NPs compared with the other state-of-the-art predictors. We believe that the proposed NeuroPred-FRL can serve as a powerful tool for large-scale identification of NPs, facilitating the characterization of their functional mechanisms and expediting their applications in clinical therapy. Moreover, we interpreted some model mechanisms of NeuroPred-FRL by leveraging the robust SHapley Additive exPlanation algorithm.
Collapse
Affiliation(s)
- Md Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan.,Japan Society for the Promotion of Science, 5-3-1 Kojimachi, Chiyoda-ku, Tokyo 102-0083, Japan
| | - Md Ashad Alam
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA, 70112 USA
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| | - Hong-Wen Deng
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA, 70112 USA
| | | | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
| |
Collapse
|
31
|
Zeng R, Cheng S, Liao M. 4mCPred-MTL: Accurate Identification of DNA 4mC Sites in Multiple Species Using Multi-Task Deep Learning Based on Multi-Head Attention Mechanism. Front Cell Dev Biol 2021; 9:664669. [PMID: 34041243 PMCID: PMC8141656 DOI: 10.3389/fcell.2021.664669] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Accepted: 03/17/2021] [Indexed: 01/10/2023] Open
Abstract
DNA methylation is one of the most extensive epigenetic modifications. DNA 4mC modification plays a key role in regulating chromatin structure and gene expression. In this study, we proposed a generic 4mC computational predictor, namely, 4mCPred-MTL using multi-task learning coupled with Transformer to predict 4mC sites in multiple species. In this predictor, we utilize a multi-task learning framework, in which each task is to train species-specific data based on Transformer. Extensive experimental results show that our multi-task predictive model can significantly improve the performance of the model based on single task and outperform existing methods on benchmarking comparison. Moreover, we found that our model can sufficiently capture better characteristics of 4mC sites as compared to existing commonly used feature descriptors, demonstrating the strong feature learning ability of our model. Therefore, based on the above results, it can be expected that our 4mCPred-MTL can be a useful tool for research communities of interest.
Collapse
Affiliation(s)
- Rao Zeng
- Department of Software Engineering, School of Informatics, Xiamen University, Xiamen, China
| | - Song Cheng
- Department of Thoracic Surgery, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Minghong Liao
- Department of Software Engineering, School of Informatics, Xiamen University, Xiamen, China
| |
Collapse
|
32
|
Santana K, do Nascimento LD, Lima e Lima A, Damasceno V, Nahum C, Braga RC, Lameira J. Applications of Virtual Screening in Bioprospecting: Facts, Shifts, and Perspectives to Explore the Chemo-Structural Diversity of Natural Products. Front Chem 2021; 9:662688. [PMID: 33996755 PMCID: PMC8117418 DOI: 10.3389/fchem.2021.662688] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Accepted: 02/25/2021] [Indexed: 12/22/2022] Open
Abstract
Natural products are continually explored in the development of new bioactive compounds with industrial applications, attracting the attention of scientific research efforts due to their pharmacophore-like structures, pharmacokinetic properties, and unique chemical space. The systematic search for natural sources to obtain valuable molecules to develop products with commercial value and industrial purposes remains the most challenging task in bioprospecting. Virtual screening strategies have innovated the discovery of novel bioactive molecules assessing in silico large compound libraries, favoring the analysis of their chemical space, pharmacodynamics, and their pharmacokinetic properties, thus leading to the reduction of financial efforts, infrastructure, and time involved in the process of discovering new chemical entities. Herein, we discuss the computational approaches and methods developed to explore the chemo-structural diversity of natural products, focusing on the main paradigms involved in the discovery and screening of bioactive compounds from natural sources, placing particular emphasis on artificial intelligence, cheminformatics methods, and big data analyses.
Collapse
Affiliation(s)
- Kauê Santana
- Instituto de Biodiversidade, Universidade Federal do Oeste do Pará, Santarém, Brazil
| | | | - Anderson Lima e Lima
- Instituto de Ciências Exatas e Naturais, Universidade Federal do Pará, Belém, Brazil
| | - Vinícius Damasceno
- Instituto de Ciências Exatas e Naturais, Universidade Federal do Pará, Belém, Brazil
| | - Claudio Nahum
- Instituto de Ciências Exatas e Naturais, Universidade Federal do Pará, Belém, Brazil
| | | | - Jerônimo Lameira
- Instituto de Ciências Biológicas, Universidade Federal do Pará, Belém, Brazil
| |
Collapse
|
33
|
Holl NJ, Lee HJ, Huang YW. Evolutionary Timeline of Genetic Delivery and Gene Therapy. Curr Gene Ther 2021; 21:89-111. [PMID: 33292120 DOI: 10.2174/1566523220666201208092517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2020] [Revised: 11/17/2020] [Accepted: 11/22/2020] [Indexed: 11/22/2022]
Abstract
There are more than 3,500 genes that are being linked to hereditary diseases or correlated with an elevated risk of certain illnesses. As an alternative to conventional treatments with small molecule drugs, gene therapy has arisen as an effective treatment with the potential to not just alleviate disease conditions but also cure them completely. In order for these treatment regimens to work, genes or editing tools intended to correct diseased genetic material must be efficiently delivered to target sites. There have been many techniques developed to achieve such a goal. In this article, we systematically review a variety of gene delivery and therapy methods that include physical methods, chemical and biochemical methods, viral methods, and genome editing. We discuss their historical discovery, mechanisms, advantages, limitations, safety, and perspectives.
Collapse
Affiliation(s)
- Natalie J Holl
- Department of Biological Sciences, College of Arts, Sciences, and Business, Missouri University of Science and Technology, Rolla, MO 65409, United States
| | - Han-Jung Lee
- Department of Natural Resources and Environmental Studies, College of Environmental Studies, National Dong Hwa University, Hualien 974301, Taiwan
| | - Yue-Wern Huang
- Department of Biological Sciences, College of Arts, Sciences, and Business, Missouri University of Science and Technology, Rolla, MO 65409, United States
| |
Collapse
|
34
|
Predicting cell-penetrating peptides using machine learning algorithms and navigating in their chemical space. Sci Rep 2021; 11:7628. [PMID: 33828175 PMCID: PMC8027643 DOI: 10.1038/s41598-021-87134-w] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Accepted: 03/24/2021] [Indexed: 02/01/2023] Open
Abstract
Cell-penetrating peptides (CPPs) are naturally able to cross the lipid bilayer membrane that protects cells. These peptides share common structural and physicochemical properties and show different pharmaceutical applications, among which drug delivery is the most important. Due to their ability to cross the membranes by pulling high-molecular-weight polar molecules, they are termed Trojan horses. In this study, we proposed a machine learning (ML)-based framework named BChemRF-CPPred (beyond chemical rules-based framework for CPP prediction) that uses an artificial neural network, a support vector machine, and a Gaussian process classifier to differentiate CPPs from non-CPPs, using structure- and sequence-based descriptors extracted from PDB and FASTA formats. The performance of our algorithm was evaluated by tenfold cross-validation and compared with those of previously reported prediction tools using an independent dataset. The BChemRF-CPPred satisfactorily identified CPP-like structures using natural and synthetic modified peptide libraries and also obtained better performance than those of previously reported ML-based algorithms, reaching the independent test accuracy of 90.66% (AUC = 0.9365) for PDB, and an accuracy of 86.5% (AUC = 0.9216) for FASTA input. Moreover, our analyses of the CPP chemical space demonstrated that these peptides break some molecular rules related to the prediction of permeability of therapeutic molecules in cell membranes. This is the first comprehensive analysis to predict synthetic and natural CPP structures and to evaluate their chemical space using an ML-based framework. Our algorithm is freely available for academic use at http://comptools.linc.ufpa.br/BChemRF-CPPred .
Collapse
|
35
|
Yang X, Ye X, Li X, Wei L. iDNA-MT: Identification DNA Modification Sites in Multiple Species by Using Multi-Task Learning Based a Neural Network Tool. Front Genet 2021; 12:663572. [PMID: 33868390 PMCID: PMC8044371 DOI: 10.3389/fgene.2021.663572] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Accepted: 03/02/2021] [Indexed: 02/04/2023] Open
Abstract
Motivation DNA N4-methylcytosine (4mC) and N6-methyladenine (6mA) are two important DNA modifications and play crucial roles in a variety of biological processes. Accurate identification of the modifications is essential to better understand their biological functions and mechanisms. However, existing methods to identify 4mA or 6mC sites are all single tasks, which demonstrates that they can identify only a certain modification in one species. Therefore, it is desirable to develop a novel computational method to identify the modification sites in multiple species simultaneously. Results In this study, we proposed a computational method, called iDNA-MT, to identify 4mC sites and 6mA sites in multiple species, respectively. The proposed iDNA-MT mainly employed multi-task learning coupled with the bidirectional gated recurrent units (BGRU) to capture the sharing information among different species directly from DNA primary sequences. Experimental comparative results on two benchmark datasets, containing different species respectively, show that either for identifying 4mA or for 6mC site in multiple species, the proposed iDNA-MT outperforms other state-of-the-art single-task methods. The promising results have demonstrated that iDNA-MT has great potential to be a powerful and practically useful tool to accurately identify DNA modifications.
Collapse
Affiliation(s)
- Xiao Yang
- School of Software, Shandong University, Jinan, China
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba, Japan
| | - Xuehong Li
- Department of Rehabilitation, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Lesong Wei
- Department of Computer Science, University of Tsukuba, Tsukuba, Japan
| |
Collapse
|
36
|
Chen Y, Fu X, Li Z, Peng L, Zhuo L. Prediction of lncRNA-Protein Interactions via the Multiple Information Integration. Front Bioeng Biotechnol 2021; 9:647113. [PMID: 33718346 PMCID: PMC7947871 DOI: 10.3389/fbioe.2021.647113] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Accepted: 01/19/2021] [Indexed: 01/09/2023] Open
Abstract
The long non-coding RNA (lncRNA)-protein interaction plays an important role in the post-transcriptional gene regulation, such as RNA splicing, translation, signaling, and the development of complex diseases. The related research on the prediction of lncRNA-protein interaction relationship is beneficial in the excavation and the discovery of the mechanism of lncRNA function and action occurrence, which are important. Traditional experimental methods for detecting lncRNA-protein interactions are expensive and time-consuming. Therefore, computational methods provide many effective strategies to deal with this problem. In recent years, most computational methods only use the information of the lncRNA-lncRNA or the protein-protein similarity and cannot fully capture all features to identify their interactions. In this paper, we propose a novel computational model for the lncRNA-protein prediction on the basis of machine learning methods. First, a feature method is proposed for representing the information of the network topological properties of lncRNA and protein interactions. The basic composition feature information and evolutionary information based on protein, the lncRNA sequence feature information, and the lncRNA expression profile information are extracted. Finally, the above feature information is fused, and the optimized feature vector is used with the recursive feature elimination algorithm. The optimized feature vectors are input to the support vector machine (SVM) model. Experimental results show that the proposed method has good effectiveness and accuracy in the lncRNA-protein interaction prediction.
Collapse
Affiliation(s)
- Yifan Chen
- College of Information Science and Engineering, Hunan University, Changsha, China
- School of Computer and Information Science, Hunan Institute of Technology, Hengyang, China
| | - Xiangzheng Fu
- College of Information Science and Engineering, Hunan University, Changsha, China
| | - Zejun Li
- School of Computer and Information Science, Hunan Institute of Technology, Hengyang, China
| | - Li Peng
- College of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan, China
| | - Linlin Zhuo
- Department of Mathematics and Information Engineering, Wenzhou University Oujiang College, Wenzhou, China
| |
Collapse
|
37
|
Charoenkwan P, Chiangjong W, Lee VS, Nantasenamat C, Hasan MM, Shoombuatong W. Improved prediction and characterization of anticancer activities of peptides using a novel flexible scoring card method. Sci Rep 2021; 11:3017. [PMID: 33542286 PMCID: PMC7862624 DOI: 10.1038/s41598-021-82513-9] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Accepted: 01/18/2021] [Indexed: 01/30/2023] Open
Abstract
As anticancer peptides (ACPs) have attracted great interest for cancer treatment, several approaches based on machine learning have been proposed for ACP identification. Although existing methods have afforded high prediction accuracies, however such models are using a large number of descriptors together with complex ensemble approaches that consequently leads to low interpretability and thus poses a challenge for biologists and biochemists. Therefore, it is desirable to develop a simple, interpretable and efficient predictor for accurate ACP identification as well as providing the means for the rational design of new anticancer peptides with promising potential for clinical application. Herein, we propose a novel flexible scoring card method (FSCM) making use of propensity scores of local and global sequential information for the development of a sequence-based ACP predictor (named iACP-FSCM) for improving the prediction accuracy and model interpretability. To the best of our knowledge, iACP-FSCM represents the first sequence-based ACP predictor for rationalizing an in-depth understanding into the molecular basis for the enhancement of anticancer activities of peptides via the use of FSCM-derived propensity scores. The independent testing results showed that the iACP-FSCM provided accuracies of 0.825 and 0.910 as evaluated on the main and alternative datasets, respectively. Results from comparative benchmarking demonstrated that iACP-FSCM could outperform seven other existing ACP predictors with marked improvements of 7% and 17% for accuracy and MCC, respectively, on the main dataset. Furthermore, the iACP-FSCM (0.910) achieved very comparable results to that of the state-of-the-art ensemble model AntiCP2.0 (0.920) as evaluated on the alternative dataset. Comparative results demonstrated that iACP-FSCM was the most suitable choice for ACP identification and characterization considering its simplicity, interpretability and generalizability. It is highly anticipated that the iACP-FSCM may be a robust tool for the rapid screening and identification of promising ACPs for clinical use.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, 50200, Thailand
| | - Wararat Chiangjong
- Pediatric Translational Research Unit, Department of Pediatrics, Faculty of Medicine, Ramathibodi Hospital, Mahidol University, Bangkok, 10400, Thailand
| | - Vannajan Sanghiran Lee
- Department of Chemistry, Centre of Theoretical and Computational Physics, Faculty of Science, University of Malaya, 50603, Kuala Lumpur, Malaysia
| | - Chanin Nantasenamat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Md Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka, 820-8502, Japan
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| |
Collapse
|
38
|
Bai Z, Chen M, Lin Q, Ye Y, Fan H, Wen K, Zeng J, Huang D, Mo W, Lei Y, Liao Z. Identification of Methicillin-Resistant Staphylococcus Aureus From Methicillin-Sensitive Staphylococcus Aureus and Molecular Characterization in Quanzhou, China. Front Cell Dev Biol 2021; 9:629681. [PMID: 33553185 PMCID: PMC7858276 DOI: 10.3389/fcell.2021.629681] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2020] [Accepted: 01/04/2021] [Indexed: 12/17/2022] Open
Abstract
To distinguish Methicillin-Resistant Staphylococcus aureus (MRSA) from Methicillin-Sensitive Staphylococcus aureus (MSSA) in the protein sequences level, test the susceptibility to antibiotic of all Staphylococcus aureus isolates from Quanzhou hospitals, define the virulence factor and molecular characteristics of the MRSA isolates. MRSA and MSSA Pfam protein sequences were used to extract feature vectors of 188D, n-gram and 400D. Weka software was applied to classify the two Staphylococcus aureus and performance effect was evaluated. Antibiotic susceptibility testing of the 81 Staphylococcus aureus was performed by the Mérieux Microbial Analysis Instrument. The 65 MRSA isolates were characterized by Panton-Valentine leukocidin (PVL), X polymorphic region of Protein A (spa), multilocus sequence typing test (MLST), staphylococcus chromosomal cassette mec (SCCmec) typing. After comparing the results of Weka six classifiers, the highest correctly classified rates were 91.94, 70.16, and 62.90% from 188D, n-gram and 400D, respectively. Antimicrobial susceptibility test of the 81 Staphylococcus aureus: Penicillin-resistant rate was 100%. No resistance to teicoplanin, linezolid, and vancomycin. The resistance rate of the MRSA isolates to clindamycin, erythromycin and tetracycline was higher than that of the MSSAs. Among the 65 MRSA isolates, the positive rate of PVL gene was 47.7% (31/65). Seventeen sequence types (STs) were identified among the 65 isolates, and ST59 was the most prevalent. SCCmec type III and IV were observed at 24.6 and 72.3%, respectively. Two isolates did not be typed. Twenty-one spa types were identified, spa t437 (34/65, 52.3%) was the most predominant type. MRSA major clone type of molecular typing was CC59-ST59-spa t437-IV (28/65, 43.1%). Overall, 188D feature vectors can be applied to successfully distinguish MRSA from MSSA. In Quanzhou, the detection rate of PVL virulence factor was high, suggesting a high pathogenic risk of MRSA infection. The cross-infection of CA-MRSA and HA-MRSA was presented, the molecular characteristics were increasingly blurred, HA-MRSA with typical CA-MRSA molecular characteristics has become an important cause of healthcare-related infections. CC59-ST59-spa t437-IV was the main clone type in Quanzhou, which was rare in other parts of mainland China.
Collapse
Affiliation(s)
- Zhimin Bai
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, China.,Department of Clinical Laboratory, Jinjiang Municipal Hospital, Jinjiang, China
| | - Min Chen
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, China.,Microbiological Laboratory Sanming Center for Disease Control and Prevention, Sanming, China
| | - Qiaofa Lin
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, China
| | - Ying Ye
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, China
| | - Hongmei Fan
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, China
| | - Kaizhen Wen
- Department of Clinical Laboratory, Jinjiang Municipal Hospital, Jinjiang, China
| | - Jianxing Zeng
- Department of Clinical Laboratory, Jinjiang Municipal Hospital, Jinjiang, China
| | - Donghong Huang
- Department of Clinical Laboratory, The Second Affiliated Hospital of Fujian Medical University, Quanzhou, China
| | - Wenfei Mo
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, China
| | - Ying Lei
- Department of Clinical Laboratory, Quanzhou Women's and Children's Hospital, Quanzhou, China
| | - Zhijun Liao
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, China
| |
Collapse
|
39
|
Hasan MM, Schaduangrat N, Basith S, Lee G, Shoombuatong W, Manavalan B. HLPpred-Fuse: improved and robust prediction of hemolytic peptide and its activity by fusing multiple feature representation. Bioinformatics 2020; 36:3350-3356. [PMID: 32145017 DOI: 10.1093/bioinformatics/btaa160] [Citation(s) in RCA: 148] [Impact Index Per Article: 29.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2020] [Revised: 02/19/2020] [Accepted: 03/03/2020] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION Therapeutic peptides failing at clinical trials could be attributed to their toxicity profiles like hemolytic activity, which hamper further progress of peptides as drug candidates. The accurate prediction of hemolytic peptides (HLPs) and its activity from the given peptides is one of the challenging tasks in immunoinformatics, which is essential for drug development and basic research. Although there are a few computational methods that have been proposed for this aspect, none of them are able to identify HLPs and their activities simultaneously. RESULTS In this study, we proposed a two-layer prediction framework, called HLPpred-Fuse, that can accurately and automatically predict both hemolytic peptides (HLPs or non-HLPs) as well as HLPs activity (high and low). More specifically, feature representation learning scheme was utilized to generate 54 probabilistic features by integrating six different machine learning classifiers and nine different sequence-based encodings. Consequently, the 54 probabilistic features were fused to provide sufficiently converged sequence information which was used as an input to extremely randomized tree for the development of two final prediction models which independently identify HLP and its activity. Performance comparisons over empirical cross-validation analysis, independent test and case study against state-of-the-art methods demonstrate that HLPpred-Fuse consistently outperformed these methods in the identification of hemolytic activity. AVAILABILITY AND IMPLEMENTATION For the convenience of experimental scientists, a web-based tool has been established at http://thegleelab.org/HLPpred-Fuse. CONTACT glee@ajou.ac.kr or watshara.sho@mahidol.ac.th or bala@ajou.ac.kr. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Md Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Iizuka, Fukuoka 820-8502, Japan.,Japan Society for the Promotion of Science, Chiyoda-ku, Tokyo 102-0083, Japan
| | - Nalini Schaduangrat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| | - Shaherin Basith
- Department of Physiology, Ajou University School of Medicine, Suwon 16499, Republic of Korea
| | - Gwang Lee
- Department of Physiology, Ajou University School of Medicine, Suwon 16499, Republic of Korea.,Department of Molecular Science and Technology, Ajou University, Suwon 16499, Republic of Korea
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| | - Balachandran Manavalan
- Department of Physiology, Ajou University School of Medicine, Suwon 16499, Republic of Korea
| |
Collapse
|
40
|
Fu H, Cao Z, Li M, Wang S. ACEP: improving antimicrobial peptides recognition through automatic feature fusion and amino acid embedding. BMC Genomics 2020; 21:597. [PMID: 32859150 PMCID: PMC7455913 DOI: 10.1186/s12864-020-06978-0] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Accepted: 08/11/2020] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Antimicrobial resistance is one of our most serious health threats. Antimicrobial peptides (AMPs), effecter molecules of innate immune system, can defend host organisms against microbes and most have shown a lowered likelihood for bacteria to form resistance compared to many conventional drugs. Thus, AMPs are gaining popularity as better substitute to antibiotics. To aid researchers in novel AMPs discovery, we design computational approaches to screen promising candidates. RESULTS In this work, we design a deep learning model that can learn amino acid embedding patterns, automatically extract sequence features, and fuse heterogeneous information. Results show that the proposed model outperforms state-of-the-art methods on recognition of AMPs. By visualizing data in some layers of the model, we overcome the black-box nature of deep learning, explain the working mechanism of the model, and find some import motifs in sequences. CONCLUSIONS ACEP model can capture similarity between amino acids, calculate attention scores for different parts of a peptide sequence in order to spot important parts that significantly contribute to final predictions, and automatically fuse a variety of heterogeneous information or features. For high-throughput AMPs recognition, open source software and datasets are made freely available at https://github.com/Fuhaoyi/ACEP .
Collapse
Affiliation(s)
- Haoyi Fu
- School of Information Science and Engineering, Yunnan University, Kunming, 650500, China
| | - Zicheng Cao
- School of Public Health (Shenzhen), Sun Yat-sen University, Guangzhou, 510006, China
| | - Mingyuan Li
- School of Information Science and Engineering, Yunnan University, Kunming, 650500, China
| | - Shunfang Wang
- School of Information Science and Engineering, Yunnan University, Kunming, 650500, China.
| |
Collapse
|
41
|
Bin Y, Zhang W, Tang W, Dai R, Li M, Zhu Q, Xia J. Prediction of Neuropeptides from Sequence Information Using Ensemble Classifier and Hybrid Features. J Proteome Res 2020; 19:3732-3740. [DOI: 10.1021/acs.jproteome.0c00276] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Affiliation(s)
- Yannan Bin
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, Anhui 230601, China
- School of Computer Science and Technology, Anhui University, Hefei, Anhui 230601, China
| | - Wei Zhang
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, Anhui 230601, China
| | - Wending Tang
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, Anhui 230601, China
| | - Ruyu Dai
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, Anhui 230601, China
| | - Menglu Li
- School of Computer Science and Technology, Anhui University, Hefei, Anhui 230601, China
| | - Qizhi Zhu
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, Anhui 230601, China
| | - Junfeng Xia
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, Anhui 230601, China
- School of Computer Science and Technology, Anhui University, Hefei, Anhui 230601, China
| |
Collapse
|
42
|
The Spectrum of Design Solutions for Improving the Activity-Selectivity Product of Peptide Antibiotics against Multidrug-Resistant Bacteria and Prostate Cancer PC-3 Cells. Molecules 2020; 25:molecules25153526. [PMID: 32752241 PMCID: PMC7436000 DOI: 10.3390/molecules25153526] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Revised: 07/28/2020] [Accepted: 07/30/2020] [Indexed: 12/24/2022] Open
Abstract
The link between the antimicrobial and anticancer activity of peptides has long been studied, and the number of peptides identified with both activities has recently increased considerably. In this work, we hypothesized that designed peptides with a wide spectrum of selective antimicrobial activity will also have anticancer activity, and tested this hypothesis with newly designed peptides. The spectrum of peptides, used as partial or full design templates, ranged from cell-penetrating peptides and putative bacteriocin to those from the simplest animals (placozoans) and the Chordata phylum (anurans). We applied custom computational tools to predict amino acid substitutions, conferring the increased product of bacteriostatic activity and selectivity. Experiments confirmed that better overall performance was achieved with respect to that of initial templates. Nine of our synthesized helical peptides had excellent bactericidal activity against both standard and multidrug-resistant bacteria. These peptides were then compared to a known anticancer peptide polybia-MP1, for their ability to kill prostate cancer cells and dermal primary fibroblasts. The therapeutic index was higher for seven of our peptides, and anticancer activity stronger for all of them. In conclusion, the peptides that we designed for selective antimicrobial activity also have promising potential for anticancer applications.
Collapse
|
43
|
Manavalan B, Basith S, Shin TH, Wei L, Lee G. mAHTPred: a sequence-based meta-predictor for improving the prediction of anti-hypertensive peptides using effective feature representation. Bioinformatics 2020; 35:2757-2765. [PMID: 30590410 DOI: 10.1093/bioinformatics/bty1047] [Citation(s) in RCA: 190] [Impact Index Per Article: 38.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2018] [Revised: 12/05/2018] [Accepted: 12/20/2018] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Cardiovascular disease is the primary cause of death globally accounting for approximately 17.7 million deaths per year. One of the stakes linked with cardiovascular diseases and other complications is hypertension. Naturally derived bioactive peptides with antihypertensive activities serve as promising alternatives to pharmaceutical drugs. So far, there is no comprehensive analysis, assessment of diverse features and implementation of various machine-learning (ML) algorithms applied for antihypertensive peptide (AHTP) model construction. RESULTS In this study, we utilized six different ML algorithms, namely, Adaboost, extremely randomized tree (ERT), gradient boosting (GB), k-nearest neighbor, random forest (RF) and support vector machine (SVM) using 51 feature descriptors derived from eight different feature encodings for the prediction of AHTPs. While ERT-based trained models performed consistently better than other algorithms regardless of various feature descriptors, we treated them as baseline predictors, whose predicted probability of AHTPs was further used as input features separately for four different ML-algorithms (ERT, GB, RF and SVM) and developed their corresponding meta-predictors using a two-step feature selection protocol. Subsequently, the integration of four meta-predictors through an ensemble learning approach improved the balanced prediction performance and model robustness on the independent dataset. Upon comparison with existing methods, mAHTPred showed superior performance with an overall improvement of approximately 6-7% in both benchmarking and independent datasets. AVAILABILITY AND IMPLEMENTATION The user-friendly online prediction tool, mAHTPred is freely accessible at http://thegleelab.org/mAHTPred. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Shaherin Basith
- Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea
| | - Tae Hwan Shin
- Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea.,Institute of Molecular Science and Technology, Ajou University, Suwon, Republic of Korea
| | - Leyi Wei
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Gwang Lee
- Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea.,Institute of Molecular Science and Technology, Ajou University, Suwon, Republic of Korea
| |
Collapse
|
44
|
Li Q, Dong B, Wang D, Wang S. Identification of Secreted Proteins From Malaria Protozoa With Few Features. IEEE ACCESS 2020; 8:89793-89801. [DOI: 10.1109/access.2020.2994206] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2025]
|
45
|
Rao B, Zhou C, Zhang G, Su R, Wei L. ACPred-Fuse: fusing multi-view information improves the prediction of anticancer peptides. Brief Bioinform 2019; 21:1846-1855. [DOI: 10.1093/bib/bbz088] [Citation(s) in RCA: 57] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2019] [Revised: 06/06/2019] [Accepted: 06/22/2019] [Indexed: 02/04/2023] Open
Abstract
Abstract
Fast and accurate identification of the peptides with anticancer activity potential from large-scale proteins is currently a challenging task. In this study, we propose a new machine learning predictor, namely, ACPred-Fuse, that can automatically and accurately predict protein sequences with or without anticancer activity in peptide form. Specifically, we establish a feature representation learning model that can explore class and probabilistic information embedded in anticancer peptides (ACPs) by integrating a total of 29 different sequence-based feature descriptors. In order to make full use of various multiview information, we further fused the class and probabilistic features with handcrafted sequential features and then optimized the representation ability of the multiview features, which are ultimately used as input for training our prediction model. By comparing the multiview features and existing feature descriptors, we demonstrate that the fused multiview features have more discriminative ability to capture the characteristics of ACPs. In addition, the information from different views is complementary for the performance improvement. Finally, our benchmarking comparison results showed that the proposed ACPred-Fuse is more precise and promising in the identification of ACPs than existing predictors. To facilitate the use of the proposed predictor, we built a web server, which is now freely available via http://server.malab.cn/ACPred-Fuse.
Collapse
Affiliation(s)
- Bing Rao
- School of Mechanical Electronic & Information Engineering, China University of Mining &Technology, Beijing, China
| | - Chen Zhou
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Guoying Zhang
- School of Mechanical Electronic & Information Engineering, China University of Mining &Technology, Beijing, China
| | - Ran Su
- School of Software, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Leyi Wei
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
46
|
4mCpred-EL: An Ensemble Learning Framework for Identification of DNA N4-methylcytosine Sites in the Mouse Genome. Cells 2019; 8:cells8111332. [PMID: 31661923 PMCID: PMC6912380 DOI: 10.3390/cells8111332] [Citation(s) in RCA: 77] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2019] [Revised: 10/21/2019] [Accepted: 10/24/2019] [Indexed: 12/24/2022] Open
Abstract
DNA N4-methylcytosine (4mC) is one of the key epigenetic alterations, playing essential roles in DNA replication, differentiation, cell cycle, and gene expression. To better understand 4mC biological functions, it is crucial to gain knowledge on its genomic distribution. In recent times, few computational studies, in particular machine learning (ML) approaches have been applied in the prediction of 4mC site predictions. Although ML-based methods are promising for 4mC identification in other species, none are available for detecting 4mCs in the mouse genome. Our novel computational approach, called 4mCpred-EL, is the first method for identifying 4mC sites in the mouse genome where four different ML algorithms with a wide range of seven feature encodings are utilized. Subsequently, those feature encodings predicted probabilistic values are used as a feature vector and are once again inputted to ML algorithms, whose corresponding models are integrated into ensemble learning. Our benchmarking results demonstrated that 4mCpred-EL achieved an accuracy and MCC values of 0.795 and 0.591, which significantly outperformed seven other classifiers by more than 1.5–5.9% and 3.2–11.7%, respectively. Additionally, 4mCpred-EL attained an overall accuracy of 79.80%, which is 1.8–5.1% higher than that yielded by seven other classifiers in the independent evaluation. We provided a user-friendly web server, namely 4mCpred-EL which could be implemented as a pre-screening tool for the identification of potential 4mC sites in the mouse genome.
Collapse
|
47
|
AtbPpred: A Robust Sequence-Based Prediction of Anti-Tubercular Peptides Using Extremely Randomized Trees. Comput Struct Biotechnol J 2019; 17:972-981. [PMID: 31372196 PMCID: PMC6658830 DOI: 10.1016/j.csbj.2019.06.024] [Citation(s) in RCA: 72] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2019] [Revised: 06/27/2019] [Accepted: 06/28/2019] [Indexed: 01/01/2023] Open
Abstract
Mycobacterium tuberculosis is one of the most dangerous pathogens in humans. It acts as an etiological agent of tuberculosis (TB), infecting almost one-third of the world's population. Owing to the high incidence of multidrug-resistant TB and extensively drug-resistant TB, there is an urgent need for novel and effective alternative therapies. Peptide-based therapy has several advantages, such as diverse mechanisms of action, low immunogenicity, and selective affinity to bacterial cell envelopes. However, the identification of anti-tubercular peptides (AtbPs) via experimentation is laborious and expensive; hence, the development of an efficient computational method is necessary for the prediction of AtbPs prior to both in vitro and in vivo experiments. To this end, we developed a two-layer machine learning (ML)-based predictor called AtbPpred for the identification of AtbPs. In the first layer, we applied a two-step feature selection procedure and identified the optimal feature set individually for nine different feature encodings, whose corresponding models were developed using extremely randomized tree (ERT). In the second-layer, the predicted probability of AtbPs from the above nine models were considered as input features to ERT and developed the final predictor. AtbPpred respectively achieved average accuracies of 88.3% and 87.3% during cross-validation and an independent evaluation, which were ~8.7% and 10.0% higher than the state-of-the-art method. Furthermore, we established a user-friendly webserver which is currently available at http://thegleelab.org/AtbPpred. We anticipate that this predictor could be useful in the high-throughput prediction of AtbPs and also provide mechanistic insights into its functions. We developed a novel computational framework for the identification of anti-tubercular peptides using Extremely randomized tree. AtbPpred displayed superior performance compared to the existing method on both benchmark and independent datasets. We constructed a user-friendly web server that implements the proposed AtbPpred method.
Collapse
|
48
|
Manavalan B, Basith S, Shin TH, Wei L, Lee G. Meta-4mCpred: A Sequence-Based Meta-Predictor for Accurate DNA 4mC Site Prediction Using Effective Feature Representation. MOLECULAR THERAPY. NUCLEIC ACIDS 2019; 16:733-744. [PMID: 31146255 PMCID: PMC6540332 DOI: 10.1016/j.omtn.2019.04.019] [Citation(s) in RCA: 169] [Impact Index Per Article: 28.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/10/2018] [Revised: 04/16/2019] [Accepted: 04/22/2019] [Indexed: 11/19/2022]
Abstract
DNA N4-methylcytosine (4mC) is an important genetic modification and plays crucial roles in differentiation between self and non-self DNA and in controlling DNA replication, cell cycle, and gene-expression levels. Accurate 4mC site identification is fundamental to improve the understanding of 4mC biological functions and mechanisms. Hence, it is necessary to develop in silico approaches for efficient and high-throughput 4mC site identification. Although some bioinformatic tools have been developed in this regard, their prediction accuracy and generalizability require improvement to optimize their usability in practical applications. For this purpose, we here proposed Meta-4mCpred, a meta-predictor for 4mC site prediction. In Meta-4mCpred, we employed a feature representation learning scheme and generated 56 probabilistic features based on four different machine-learning algorithms and seven feature encodings covering diverse sequence information, including compositional, physicochemical, and position-specific information. Subsequently, the probabilistic features were used as an input to support vector machine and developed a final meta-predictor. To the best of our knowledge, this is the first meta-predictor for 4mC site prediction. Cross-validation results show that Meta-4mCpred achieved an overall average accuracy of 84.2% from six different species, which is ∼2%–4% higher than those attainable using the state-of-the-art predictors. Furthermore, Meta-4mCpred achieved an overall average accuracy of 86% on independent datasets evaluation, which is over 4% higher than those yielded by the state-of-the-art predictors. The user-friendly webserver employed to implement the proposed Meta-4mCpred is freely accessible at http://thegleelab.org/Meta-4mCpred.
Collapse
Affiliation(s)
| | - Shaherin Basith
- Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea
| | - Tae Hwan Shin
- Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea; Institute of Molecular Science and Technology, Ajou University, Suwon, Republic of Korea
| | - Leyi Wei
- School of Computer Science and Technology, Tianjin University, China.
| | - Gwang Lee
- Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea; Institute of Molecular Science and Technology, Ajou University, Suwon, Republic of Korea.
| |
Collapse
|
49
|
mACPpred: A Support Vector Machine-Based Meta-Predictor for Identification of Anticancer Peptides. Int J Mol Sci 2019; 20:ijms20081964. [PMID: 31013619 PMCID: PMC6514805 DOI: 10.3390/ijms20081964] [Citation(s) in RCA: 142] [Impact Index Per Article: 23.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2019] [Revised: 04/08/2019] [Accepted: 04/18/2019] [Indexed: 12/24/2022] Open
Abstract
Anticancer peptides (ACPs) are promising therapeutic agents for targeting and killing cancer cells. The accurate prediction of ACPs from given peptide sequences remains as an open problem in the field of immunoinformatics. Recently, machine learning algorithms have emerged as a promising tool for helping experimental scientists predict ACPs. However, the performance of existing methods still needs to be improved. In this study, we present a novel approach for the accurate prediction of ACPs, which involves the following two steps: (i) We applied a two-step feature selection protocol on seven feature encodings that cover various aspects of sequence information (composition-based, physicochemical properties and profiles) and obtained their corresponding optimal feature-based models. The resultant predicted probabilities of ACPs were further utilized as feature vectors. (ii) The predicted probability feature vectors were in turn used as an input to support vector machine to develop the final prediction model called mACPpred. Cross-validation analysis showed that the proposed predictor performs significantly better than individual feature encodings. Furthermore, mACPpred significantly outperformed the existing methods compared in this study when objectively evaluated on an independent dataset.
Collapse
|
50
|
Wei L, Zhou C, Su R, Zou Q. PEPred-Suite: improved and robust prediction of therapeutic peptides using adaptive feature representation learning. Bioinformatics 2019; 35:4272-4280. [DOI: 10.1093/bioinformatics/btz246] [Citation(s) in RCA: 80] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2018] [Revised: 01/28/2019] [Accepted: 04/11/2019] [Indexed: 11/13/2022] Open
Abstract
Abstract
Motivation
Prediction of therapeutic peptides is critical for the discovery of novel and efficient peptide-based therapeutics. Computational methods, especially machine learning based methods, have been developed for addressing this need. However, most of existing methods are peptide-specific; currently, there is no generic predictor for multiple peptide types. Moreover, it is still challenging to extract informative feature representations from the perspective of primary sequences.
Results
In this study, we have developed PEPred-Suite, a bioinformatics tool for the generic prediction of therapeutic peptides. In PEPred-Suite, we introduce an adaptive feature representation strategy that can learn the most representative features for different peptide types. To be specific, we train diverse sequence-based feature descriptors, integrate the learnt class information into our features, and utilize a two-step feature optimization strategy based on the area under receiver operating characteristic curve to extract the most discriminative features. Using the learnt representative features, we trained eight random forest models for eight different types of functional peptides, respectively. Benchmarking results showed that as compared with existing predictors, PEPred-Suite achieves better and robust performance for different peptides. As far as we know, PEPred-Suite is currently the first tool that is capable of predicting so many peptide types simultaneously. In addition, our work demonstrates that the learnt features can reliably predict different peptides.
Availability and implementation
The user-friendly webserver implementing the proposed PEPred-Suite is freely accessible at http://server.malab.cn/PEPred-Suite.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Leyi Wei
- School of Computer Science and Technology, Tianjin University, Tianjin, China
| | - Chen Zhou
- School of Computer Science and Technology, Tianjin University, Tianjin, China
| | - Ran Su
- School of Computer Software, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|