1
|
Ahmed S, Schaduangrat N, Chumnanpuen P, Shoombuatong W. GRU4ACE: Enhancing ACE inhibitory peptide prediction by integrating gated recurrent unit with multi-source feature embeddings. Protein Sci 2025; 34:e70026. [PMID: 40371738 PMCID: PMC12079467 DOI: 10.1002/pro.70026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2024] [Revised: 12/12/2024] [Accepted: 12/19/2024] [Indexed: 05/16/2025]
Abstract
Accurate identification of angiotensin-I-converting enzyme (ACE) inhibitory peptides is essential for understanding the primary factor regulating the renin-angiotensin system and guiding the development of new drug candidates. Given the inherent challenges in experimental processes, computational methods for in silico peptide identification can be invaluable for enabling high-throughput characterization of ACE inhibitory peptides. This study introduces GRU4ACE, an innovative deep learning framework based on multi-view information for identifying ACE inhibitory peptides. First, GRU4ACE utilizes multi-source feature encoding methods to capture the information embedded in ACE inhibitory peptides, including sequential information, graphical information, semantic information, and contextual information. Specifically, the feature representations used herein are derived from conventional feature descriptors, natural language processing (NLP)-based embeddings, and pre-trained protein language model (PLM)-based embeddings. Next, multiple feature embeddings were fused, and the elastic net was employed for feature optimization. Finally, the optimal feature subset with strong feature representation was input into a gated recurrent unit (GRU). The proposed GRU4ACE approach demonstrated superior performance over existing methods in terms of the independent test. To be specific, the balanced accuracy, sensitivity, and MCC scores of GRU4ACE reached 0.948, 0.934, and 0.895, which were 6.46%, 8.92%, and 12.51% higher than those of the compared methods, respectively. In addition, when comparing well-regarded feature descriptors, we found that the proposed multi-view features effectively captured crucial information, leading to improved ACE inhibitory peptide prediction performance. These comprehensive results highlight that GRU4ACE enhances prediction accuracy and significantly narrows down the search for new potential antihypertensive drugs.
Collapse
Affiliation(s)
- Saeed Ahmed
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical TechnologyMahidol UniversityBangkokThailand
- Department of Computer ScienceUniversity of SwabiSwabisPakistan
| | - Nalini Schaduangrat
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical TechnologyMahidol UniversityBangkokThailand
| | - Pramote Chumnanpuen
- Department of Zoology, Faculty of ScienceKasetsart UniversityBangkokThailand
- Kasetsart University International College (KUIC)Kasetsart UniversityBangkokThailand
| | - Watshara Shoombuatong
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical TechnologyMahidol UniversityBangkokThailand
| |
Collapse
|
2
|
Zhu L, Fang Y, Liu S, Shen HB, De Neve W, Pan X. ToxDL 2.0: Protein toxicity prediction using a pretrained language model and graph neural networks. Comput Struct Biotechnol J 2025; 27:1538-1549. [PMID: 40276117 PMCID: PMC12018212 DOI: 10.1016/j.csbj.2025.04.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2024] [Revised: 03/31/2025] [Accepted: 04/01/2025] [Indexed: 04/26/2025] Open
Abstract
Motivation Assessing the potential toxicity of proteins is crucial for both therapeutic and agricultural applications. Traditional experimental methods for protein toxicity evaluation are time-consuming, expensive, and labor-intensive, highlighting the requirement for efficient computational approaches. Recent advancements in language models and deep learning have significantly improved protein toxicity prediction, yet current models often lack the ability to integrate evolutionary and structural information, which is crucial for accurate toxicity assessment of proteins. Results In this study, we present ToxDL 2.0, a novel multimodal deep learning model for protein toxicity prediction that integrates both evolutionary and structural information derived from a pretrained language model and AlphaFold2. ToxDL 2.0 consists of three key modules: (1) a Graph Convolutional Network (GCN) module for generating protein graph embeddings based on AlphaFold2-predicted structures, (2) a domain embedding module for capturing protein domain representations, and (3) a dense module that combines these embeddings to predict the toxicity. After constructing a comprehensive toxicity benchmark dataset, we obtained experimental results on both an original non-redundant test set (comprising pre-2022 protein sequences) and an independent non-redundant test set (a holdout set of post-2022 protein sequences), demonstrating that ToxDL 2.0 outperforms existing state-of-the-art methods. Additionally, we utilized Integrated Gradients to discover known toxic motifs associated with protein toxicity. A web server for ToxDL 2.0 is publicly available at www.csbio.sjtu.edu.cn/bioinf/ToxDL2/.
Collapse
Affiliation(s)
- Lin Zhu
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Yi Fang
- Department of Automation, Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai 200240, China
- Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Shuting Liu
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Hong-Bin Shen
- Department of Automation, Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai 200240, China
- Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Wesley De Neve
- Department for Electronics and Information Systems, IDLab, Ghent University, Ghent 9000, Belgium
- Department of Environmental Technology, Food Technology and Molecular Biotechnology, Center for Biotech Data Science, Ghent University Global Campus, Songdo, Incheon 305-701, South Korea
| | - Xiaoyong Pan
- Department of Automation, Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai 200240, China
- Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| |
Collapse
|
3
|
Özçelik R, van Weesep L, de Ruiter S, Grisoni F. peptidy: a light-weight Python library for peptide representation in machine learning. BIOINFORMATICS ADVANCES 2025; 5:vbaf058. [PMID: 40170887 PMCID: PMC11961219 DOI: 10.1093/bioadv/vbaf058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/08/2025] [Revised: 02/28/2025] [Accepted: 03/19/2025] [Indexed: 04/03/2025]
Abstract
Motivation Peptides are widely used in applications ranging from drug discovery to food technologies. Machine learning has become increasingly prominent in accelerating the search for new peptides, and user-friendly computational tools can further enhance these efforts. Results In this work, we introduce peptidy-a lightweight Python library that facilitates converting peptides (expressed as amino acid sequences) to numerical representations suited to machine learning. peptidy is free from external dependencies, integrates seamlessly into modern Python environments, and supports a range of encoding strategies suitable for both predictive and generative machine learning approaches. Additionally, peptidy supports peptides with post-translational modifications, such as phosphorylation, acetylation, and methylation, thereby extending the functionality of existing Python packages for peptides and proteins. Availability and implementation peptidy is freely available with a permissive license on GitHub at the following URL: https://github.com/molML/peptidy.
Collapse
Affiliation(s)
- Rıza Özçelik
- Department of Biomedical Engineering, Institute for Complex Molecular Systems, Eindhoven University of Technology, Eindhoven 5612AZ, Netherlands
- Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrech, Utrecht 3584CB, Netherlands
| | - Laura van Weesep
- Department of Biomedical Engineering, Institute for Complex Molecular Systems, Eindhoven University of Technology, Eindhoven 5612AZ, Netherlands
| | - Sarah de Ruiter
- Department of Biomedical Engineering, Institute for Complex Molecular Systems, Eindhoven University of Technology, Eindhoven 5612AZ, Netherlands
| | - Francesca Grisoni
- Department of Biomedical Engineering, Institute for Complex Molecular Systems, Eindhoven University of Technology, Eindhoven 5612AZ, Netherlands
- Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrech, Utrecht 3584CB, Netherlands
| |
Collapse
|
4
|
Zhenghui L, Wenxing H, Yan W, Jihong Z, Xiaojun X, Lixin G, Mengshan L. Ensemble learning based on bi-directional gated recurrent unit and convolutional neural network with word embedding module for bioactive peptide prediction. Food Chem 2025; 468:142464. [PMID: 39675273 DOI: 10.1016/j.foodchem.2024.142464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Revised: 11/12/2024] [Accepted: 12/11/2024] [Indexed: 12/17/2024]
Abstract
Bioactive peptides, as small protein fragments, are essential mediators of diverse physiological activities, such as antimicrobial, anti-inflammatory, anticancer, antioxidant, and immunomodulatory functions. Despite their substantial potential in pharmaceuticals and the food industry, conventional methods for peptide classification and activity prediction are limited by high costs, time-intensive procedures, and extensive data processing requirements. Here, we present BioPepPred-DLEmb, a novel computational model integrating Convolutional Neural Networks (CNNs) and Bidirectional Gated Recurrent Units (BiGRUs), augmented with natural language processing to encode amino acids into information-dense vectors. Evaluated across nine bioactive peptide datasets, BioPepPred-DLEmb demonstrates superior predictive accuracy (0.909) and sensitivity (0.911) compared to traditional methods. Through UMAP visualization and Kplogo analysis, the model effectively differentiates peptide activity states and identifies key biomarkers. The predicted antimicrobial peptides (Pred-AMPs) exhibit potent efficacy in vitro, achieving low micromolar inhibitory concentrations (2-16 μmol/L) against pathogens such as Escherichia coli and Acinetobacter baumannii. These findings establish a robust foundation for bioactive peptide development, with implications for advancements in precision medicine, personalized therapies, and functional food innovations.
Collapse
Affiliation(s)
- Lai Zhenghui
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou 341000, Jiangxi, China
| | - Hu Wenxing
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou 341000, Jiangxi, China
| | - Wu Yan
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou 341000, Jiangxi, China
| | - Zhu Jihong
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou 341000, Jiangxi, China
| | - Xie Xiaojun
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou 341000, Jiangxi, China
| | - Guan Lixin
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou 341000, Jiangxi, China
| | - Li Mengshan
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou 341000, Jiangxi, China.
| |
Collapse
|
5
|
Lin Y, Yang X, Zhang M, Cheng J, Lin H, Zhao Q. CLSSATP: Contrastive learning and self-supervised learning model for aquatic toxicity prediction. AQUATIC TOXICOLOGY (AMSTERDAM, NETHERLANDS) 2025; 279:107244. [PMID: 39805255 DOI: 10.1016/j.aquatox.2025.107244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/30/2024] [Revised: 12/17/2024] [Accepted: 01/07/2025] [Indexed: 01/16/2025]
Abstract
As compound concentrations in aquatic environments increase, the habitat degradation of aquatic organisms underscores the growing importance of studying the impact of chemicals on diverse aquatic populations. Understanding the potential impacts of different chemical substances on different species is a necessary requirement for protecting the environment and ensuring sustainable human development. In this regard, deep learning methods offer significant advantages over traditional experimental approaches in terms of cost, accuracy, and generalization ability. This research introduces CLSSATP, an efficient contrastive self-supervised learning deep neural network prediction model for organic toxicity. The model integrates two modules, a self-supervised learning module using molecular fingerprints for representation, and a contrastive learning module utilizing molecular graphs. Through dual-perspective learning, the model gains clear insights into the structural and property relationships of molecules. The experiment results indicate that our model outperforms comparative methods, demonstrating the effectiveness of our proposed architecture. Moreover, ablation experiments show that the self-supervised module and contrastive learning module respectively provide average performance improvements of 9.43 % and 10.98 % to CLSSATP. Furthermore, by visualizing the representations of our model, we observe that it correctly identifies the substructures that determine the molecular properties, granting itself with interpretability. In conclusion, CLSSATP offers a novel and effective perspective for future research in aquatic toxicity assessment. All of codes and datasets are freely available online at https://github.com/zhaoqi106/CLSSATP.
Collapse
Affiliation(s)
- Ye Lin
- College of Computer Science and Technology, Jilin University, Changchun, 130012, China
| | - Xin Yang
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan, 114051, China
| | - Mingxuan Zhang
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan, 114051, China; School of Electronic and Information Engineering, University of Science and Technology Liaoning, Anshan, 114051, China
| | - Jinyan Cheng
- Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, 325001, China
| | - Hai Lin
- Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, 325001, China.
| | - Qi Zhao
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan, 114051, China; Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, 325001, China.
| |
Collapse
|
6
|
Yang X, Sun J, Jin B, Lu Y, Cheng J, Jiang J, Zhao Q, Shuai J. Multi-task aquatic toxicity prediction model based on multi-level features fusion. J Adv Res 2025; 68:477-489. [PMID: 38844122 PMCID: PMC11785906 DOI: 10.1016/j.jare.2024.06.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2024] [Revised: 05/21/2024] [Accepted: 06/02/2024] [Indexed: 06/09/2024] Open
Abstract
INTRODUCTION With the escalating menace of organic compounds in environmental pollution imperiling the survival of aquatic organisms, the investigation of organic compound toxicity across diverse aquatic species assumes paramount significance for environmental protection. Understanding how different species respond to these compounds helps assess the potential ecological impact of pollution on aquatic ecosystems as a whole. Compared with traditional experimental methods, deep learning methods have higher accuracy in predicting aquatic toxicity, faster data processing speed and better generalization ability. OBJECTIVES This article presents ATFPGT-multi, an advanced multi-task deep neural network prediction model for organic toxicity. METHODS The model integrates molecular fingerprints and molecule graphs to characterize molecules, enabling the simultaneous prediction of acute toxicity for the same organic compound across four distinct fish species. Furthermore, to validate the advantages of multi-task learning, we independently construct prediction models, named ATFPGT-single, for each fish species. We employ cross-validation in our experiments to assess the performance and generalization ability of ATFPGT-multi. RESULTS The experimental results indicate, first, that ATFPGT-multi outperforms ATFPGT-single on four fish datasets with AUC improvements of 9.8%, 4%, 4.8%, and 8.2%, respectively, demonstrating the superiority of multi-task learning over single-task learning. Furthermore, in comparison with previous algorithms, ATFPGT-multi outperforms comparative methods, emphasizing that our approach exhibits higher accuracy and reliability in predicting aquatic toxicity. Moreover, ATFPGT-multi utilizes attention scores to identify molecular fragments associated with fish toxicity in organic molecules, as demonstrated by two organic molecule examples in the main text, demonstrating the interpretability of ATFPGT-multi. CONCLUSION In summary, ATFPGT-multi provides important support and reference for the further development of aquatic toxicity assessment. All of codes and datasets are freely available online at https://github.com/zhaoqi106/ATFPGT-multi.
Collapse
Affiliation(s)
- Xin Yang
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan 114051, China; Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou 325001, China
| | - Jianqiang Sun
- School of Information Science and Engineering, Linyi University, Linyi 276000, China
| | - Bingyu Jin
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan 114051, China
| | - Yuer Lu
- Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou 325001, China
| | - Jinyan Cheng
- Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou 325001, China
| | - Jiaju Jiang
- College of Life Sciences, Sichuan University, Chengdu 610064, China
| | - Qi Zhao
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan 114051, China.
| | - Jianwei Shuai
- Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou 325001, China; Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Wenzhou 325001, China.
| |
Collapse
|
7
|
Yue J, Li T, Xu J, Chen Z, Li Y, Liang S, Liu Z, Wang Y. Discovery of anticancer peptides from natural and generated sequences using deep learning. Int J Biol Macromol 2025; 290:138880. [PMID: 39706427 DOI: 10.1016/j.ijbiomac.2024.138880] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2024] [Revised: 12/10/2024] [Accepted: 12/16/2024] [Indexed: 12/23/2024]
Abstract
Anticancer peptides (ACPs) demonstrate significant potential in clinical cancer treatment due to their ability to selectively target and kill cancer cells. In recent years, numerous artificial intelligence (AI) algorithms have been developed. However, many predictive methods lack sufficient wet lab validation, thereby constraining the progress of models and impeding the discovery of novel ACPs. This study proposes a comprehensive research strategy by introducing CNBT-ACPred, an ACP prediction model based on a three-channel deep learning architecture, supported by extensive in vitro and in vivo experiments. CNBT-ACPred achieved an accuracy of 0.9554 and a Matthews Correlation Coefficient (MCC) of 0.8602. Compared to existing excellent models, CNBT-ACPred increased accuracy by at least 5 % and improved MCC by 15 %. Predictions were conducted on over 3.8 million sequences from Uniprot, along with 100,000 sequences generated by a deep generative model, ultimately identifying 37 out of 41 candidate peptides from >30 species that exhibited effective in vitro tumor inhibitory activity. Among these, tPep14 demonstrated significant anticancer effects in two mouse xenograft models without detectable toxicity. Finally, the study revealed correlations between the amino acid composition, structure, and function of the identified ACP candidates.
Collapse
Affiliation(s)
- Jianda Yue
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, Hunan, China; Peptide and small molecule drug R&D plateform, Furong Laboratory, Hunan Normal University, Changsha 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, Hunan, China.
| | - Tingting Li
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, Hunan, China; Peptide and small molecule drug R&D plateform, Furong Laboratory, Hunan Normal University, Changsha 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, Hunan, China.
| | - Jiawei Xu
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, Hunan, China; Peptide and small molecule drug R&D plateform, Furong Laboratory, Hunan Normal University, Changsha 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, Hunan, China.
| | - Zihui Chen
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, Hunan, China; Peptide and small molecule drug R&D plateform, Furong Laboratory, Hunan Normal University, Changsha 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, Hunan, China
| | - Yaqi Li
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, Hunan, China; Peptide and small molecule drug R&D plateform, Furong Laboratory, Hunan Normal University, Changsha 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, Hunan, China.
| | - Songping Liang
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, Hunan, China; Peptide and small molecule drug R&D plateform, Furong Laboratory, Hunan Normal University, Changsha 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, Hunan, China.
| | - Zhonghua Liu
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, Hunan, China; Peptide and small molecule drug R&D plateform, Furong Laboratory, Hunan Normal University, Changsha 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, Hunan, China.
| | - Ying Wang
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, Hunan, China; Peptide and small molecule drug R&D plateform, Furong Laboratory, Hunan Normal University, Changsha 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, Hunan, China.
| |
Collapse
|
8
|
Chen Z, Ji C, Xu W, Gao J, Huang J, Xu H, Qian G, Huang J. UniAMP: enhancing AMP prediction using deep neural networks with inferred information of peptides. BMC Bioinformatics 2025; 26:10. [PMID: 39799358 PMCID: PMC11725221 DOI: 10.1186/s12859-025-06033-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2024] [Accepted: 01/02/2025] [Indexed: 01/15/2025] Open
Abstract
Antimicrobial peptides (AMPs) have been widely recognized as a promising solution to combat antimicrobial resistance of microorganisms due to the increasing abuse of antibiotics in medicine and agriculture around the globe. In this study, we propose UniAMP, a systematic prediction framework for discovering AMPs. We observe that feature vectors used in various existing studies constructed from peptide information, such as sequence, composition, and structure, can be augmented and even replaced by information inferred by deep learning models. Specifically, we use a feature vector with 2924 values inferred by two deep learning models, UniRep and ProtT5, to demonstrate that such inferred information of peptides suffice for the task, with the help of our proposed deep neural network model composed of fully connected layers and transformer encoders for predicting the antibacterial activity of peptides. Evaluation results demonstrate superior performance of our proposed model on both balanced benchmark datasets and imbalanced test datasets compared with existing studies. Subsequently, we analyze the relations among peptide sequences, manually extracted features, and automatically inferred information by deep learning models, leading to observations that the inferred information is more comprehensive and non-redundant for the task of predicting AMPs. Moreover, this approach alleviates the impact of the scarcity of positive data and demonstrates great potential in future research and applications.
Collapse
Affiliation(s)
- Zixin Chen
- College of Artificial Intelligence, Nanjing Agricultural University, Weigang No.1, Nanjing, 210095, Jiangsu, China
| | - Chengming Ji
- College of Artificial Intelligence, Nanjing Agricultural University, Weigang No.1, Nanjing, 210095, Jiangsu, China
| | - Wenwen Xu
- College of Artificial Intelligence, Nanjing Agricultural University, Weigang No.1, Nanjing, 210095, Jiangsu, China
| | - Jianfeng Gao
- StarHelix Inc, Jiangmiao Road, Nanjing, 210000, Jiangsu, China
| | - Ji Huang
- College of Agriculture, Nanjing Agricultural University, Weigang No.1, Nanjing, 210095, Jiangsu, China
| | - Huanliang Xu
- College of Artificial Intelligence, Nanjing Agricultural University, Weigang No.1, Nanjing, 210095, Jiangsu, China
| | - Guoliang Qian
- College of Plant Protection, Nanjing Agricultural University, Weigang No.1, Nanjing, 210095, Jiangsu, China.
| | - Junxian Huang
- College of Artificial Intelligence, Nanjing Agricultural University, Weigang No.1, Nanjing, 210095, Jiangsu, China.
| |
Collapse
|
9
|
Wang R, Liang X, Zhao Y, Xue W, Liang G. UniBioPAN: A Novel Universal Classification Architecture for Bioactive Peptides Inspired by Video Action Recognition. J Chem Inf Model 2024; 64:9276-9285. [PMID: 39571078 DOI: 10.1021/acs.jcim.4c01599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2024]
Abstract
The classification of bioactive peptides is of great importance in protein biology, but there is still a lack of a universal and effective classifier. Inspired by video action recognition, we developed the UniBioPAN architecture to create a universal peptide classifier to solve this problem. The architecture treats the peptide sequence as a video sequence and the molecular image of each amino acid in the peptide sequence as a video frame, enabling feature extraction and classification using convolutional neural networks, bidirectional long short-term memory networks, and fully connected networks. As a novel peptide classification architecture, UniBioPAN significantly outperforms other universal architecture in ACC, AUC and MCC across 11 data sets, and F1 score in 9 data sets. UniBioPAN is available in three ways: python script, jupyter notebook script and web server (https://gzliang.cqu.edu.cn/software/UniBioPAN.html). In summary, UniBioPAN is a universal, convenient, and high-performance peptide classification architecture. UniBioPAN holds significant importance in the discovery of bioactive peptides and the advancement of peptide classifiers. All the codes and data sets are publicly available at https://github.com/sanwrh/UniBioPAN.
Collapse
Affiliation(s)
- Ruihong Wang
- Key Laboratory of Biorheological Science and Technology (Chongqing University), Ministry of Education, Bioengineering College, Chongqing University, Chongqing 400044, China
| | - Xiao Liang
- Key Laboratory of Biorheological Science and Technology (Chongqing University), Ministry of Education, Bioengineering College, Chongqing University, Chongqing 400044, China
| | - Yi Zhao
- Key Laboratory of Biorheological Science and Technology (Chongqing University), Ministry of Education, Bioengineering College, Chongqing University, Chongqing 400044, China
| | - Wenjun Xue
- Key Laboratory of Biorheological Science and Technology (Chongqing University), Ministry of Education, Bioengineering College, Chongqing University, Chongqing 400044, China
| | - Guizhao Liang
- Key Laboratory of Biorheological Science and Technology (Chongqing University), Ministry of Education, Bioengineering College, Chongqing University, Chongqing 400044, China
- Bioengineering College of Chongqing University, No. 174, Shazheng Street, Shapingba District, Chongqing 400030, China
| |
Collapse
|
10
|
Meng C, Hou Y, Zou Q, Shi L, Su X, Ju Y. Rore: robust and efficient antioxidant protein classification via a novel dimensionality reduction strategy based on learning of fewer features. Genomics Inform 2024; 22:29. [PMID: 39633440 PMCID: PMC11616364 DOI: 10.1186/s44342-024-00026-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2024] [Accepted: 10/03/2024] [Indexed: 12/07/2024] Open
Abstract
In protein identification, researchers increasingly aim to achieve efficient classification using fewer features. While many feature selection methods effectively reduce the number of model features, they often cause information loss caused by merely selecting or discarding features, which limits classifier performance. To address this issue, we present Rore, an algorithm based on a feature-dimensionality reduction strategy. By mapping the original features to a latent space, Rore retains all relevant feature information while using fewer representations of the latent features. This approach significantly preserves the original information and overcomes the information loss problem associated with previous feature selection. Through extensive experimental validation and analysis, Rore demonstrated excellent performance on an antioxidant protein dataset, achieving an accuracy of 95.88% and MCC of 91.78%, using vectors including only 15 features. The Rore algorithm is available online at http://112.124.26.17:8021/Rore .
Collapse
Affiliation(s)
- Chaolu Meng
- College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot, China
- Inner Mongolia Autonomous Region Key Laboratory of Big Data Research and Application of Agriculture and Animal Husbandry, Hohhot, China
| | - Yongqi Hou
- School of Computer Science, Inner Mongolia University, Hohhot, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Lei Shi
- Department of Spine Surgery, Changzheng Hospital, Naval Medical University, Huangpu District, No. 415, Fengyang Road, Shanghai, China
| | - Xi Su
- Foshan Women and Children Hospital, Foshan, China
| | - Ying Ju
- School of Informatics, Xiamen University, Xiamen, China.
| |
Collapse
|
11
|
Su RL, Cao XW, Zhao J, Wang FJ. A high hydrophobic moment arginine-rich peptide screened by a machine learning algorithm enhanced ADC antitumor activity. J Pept Sci 2024; 30:e3628. [PMID: 38950972 DOI: 10.1002/psc.3628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2024] [Revised: 04/15/2024] [Accepted: 06/05/2024] [Indexed: 07/03/2024]
Abstract
Cell-penetrating peptides (CPPs) with better biomolecule delivery properties will expand their clinical applications. Using the MLCPP2.0 machine algorithm, we screened multiple candidate sequences with potential cellular uptake ability from the nuclear localization signal/nuclear export signal database and verified them through cell-penetrating fluorescent tracing experiments. A peptide (NCR) derived from the Rev protein of the caprine arthritis-encephalitis virus exhibited efficient cell-penetrating activity, delivering over four times more EGFP than the classical CPP TAT, allowing it to accumulate in lysosomes. Structural and property analysis revealed that a high hydrophobic moment and an appropriate hydrophobic region contribute to the high delivery activity of NCR. Trastuzumab emtansine (T-DM1), a HER2-targeted antibody-drug conjugate, could improve its anti-tumor activity by enhancing targeted delivery efficiency and increasing lysosomal drug delivery. This study designed a new NCR vector to non-covalently bind T-DM1 by fusing domain Z, which can specifically bind to the Fc region of immunoglobulin G and effectively deliver T-DM1 to lysosomes. MTT results showed that the domain Z-NCR vector significantly enhanced the cytotoxicity of T-DM1 against HER2-positive tumor cells while maintaining drug specificity. Our results make a useful attempt to explore the potential application of CPP as a lysosome-targeted delivery tool.
Collapse
Affiliation(s)
- Ruo-Long Su
- Department of Applied Biology, East China University of Science and Technology, Shanghai, People's Republic of China
| | - Xue-Wei Cao
- Department of Applied Biology, East China University of Science and Technology, Shanghai, People's Republic of China
- ECUST-FONOW Joint Research Center for Innovative Medicines, East China University of Science and Technology, Shanghai, People's Republic of China
| | - Jian Zhao
- Department of Applied Biology, East China University of Science and Technology, Shanghai, People's Republic of China
- ECUST-FONOW Joint Research Center for Innovative Medicines, East China University of Science and Technology, Shanghai, People's Republic of China
| | - Fu-Jun Wang
- ECUST-FONOW Joint Research Center for Innovative Medicines, East China University of Science and Technology, Shanghai, People's Republic of China
- New Drug R&D Center, Zhejiang Fonow Medicine Co., Ltd., Zhejiang, People's Republic of China
- Institute of Chinese Materia Medica, Shanghai University of Traditional Chinese Medicine, Shanghai, People's Republic of China
| |
Collapse
|
12
|
Wei Y, Qiu T, Ai Y, Zhang Y, Xie J, Zhang D, Luo X, Sun X, Wang X, Qiu J. Advances of computational methods enhance the development of multi-epitope vaccines. Brief Bioinform 2024; 26:bbaf055. [PMID: 39951549 PMCID: PMC11827616 DOI: 10.1093/bib/bbaf055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2024] [Revised: 11/28/2024] [Accepted: 01/27/2025] [Indexed: 02/16/2025] Open
Abstract
Vaccine development is one of the most promising fields, and multi-epitope vaccine, which does not need laborious culture processes, is an attractive alternative to classical vaccines with the advantage of safety, and efficiency. The rapid development of algorithms and the accumulation of immune data have facilitated the advancement of computer-aided vaccine design. Here we systemically reviewed the in silico data and algorithms resource, for different steps of computational vaccine design, including immunogen selection, epitope prediction, vaccine construction, optimization, and evaluation. The performance of different available tools on epitope prediction and immunogenicity evaluation was tested and compared on benchmark datasets. Finally, we discuss the future research direction for the construction of a multiepitope vaccine.
Collapse
Affiliation(s)
- Yiwen Wei
- School of Health Science and Engineering, University of Shanghai for Science and Technology, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China
| | - Tianyi Qiu
- Institute of Clinical Science, Zhongshan Hospital; Intelligent Medicine Institute; Shanghai Institute of Infectious Disease and Biosecurity, Shanghai Medical College, Fudan University, No. 180, Fenglin Road, Xuhui Destrict, Shanghai 200032, China
| | - Yisi Ai
- School of Health Science and Engineering, University of Shanghai for Science and Technology, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China
| | - Yuxi Zhang
- School of Health Science and Engineering, University of Shanghai for Science and Technology, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China
| | - Junting Xie
- School of Health Science and Engineering, University of Shanghai for Science and Technology, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China
| | - Dong Zhang
- School of Health Science and Engineering, University of Shanghai for Science and Technology, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China
| | - Xiaochuan Luo
- School of Health Science and Engineering, University of Shanghai for Science and Technology, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China
| | - Xiulan Sun
- State Key Laboratory of Food Science and Technology, School of Food Science and Technology, National Engineering Research Center for Functional Foods, Synergetic Innovation Center of Food Safety and Nutrition, Jiangnan University, Lihu Avenue 1800, Wuxi, Jiangsu 214122, China
| | - Xin Wang
- School of Health Science and Engineering, University of Shanghai for Science and Technology, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China
- Shanghai Collaborative Innovation Center of Energy Therapy for Tumors, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China
| | - Jingxuan Qiu
- School of Health Science and Engineering, University of Shanghai for Science and Technology, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China
- Shanghai Collaborative Innovation Center of Energy Therapy for Tumors, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China
| |
Collapse
|
13
|
Li S, Peng L, Chen L, Que L, Kang W, Hu X, Ma J, Di Z, Liu Y. Discovery of Highly Bioactive Peptides through Hierarchical Structural Information and Molecular Dynamics Simulations. J Chem Inf Model 2024; 64:8164-8175. [PMID: 39466714 DOI: 10.1021/acs.jcim.4c01006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/30/2024]
Abstract
Peptide drugs play an essential role in modern therapeutics, but the computational design of these molecules is hindered by several challenges. Traditional methods like molecular docking and molecular dynamics (MD) simulation, as well as recent deep learning approaches, often face limitations related to computational resource demands, complex binding affinity assessments, extensive data requirements, and poor model interpretability. Here, we introduce PepHiRe, an innovative methodology that utilizes the hierarchical structural information in peptide sequences and employs a novel strategy called Ladderpath, rooted in algorithmic information theory, to rapidly generate and enhance the efficiency and clarity of novel peptide design. We applied PepHiRe to develop BH3-like peptide inhibitors targeting myeloid cell leukemia-1, a protein associated with various cancers. By analyzing just eight known bioactive BH3 peptide sequences, PepHiRe effectively derived a hierarchy of subsequences used to create new BH3-like peptides. These peptides underwent screening through MD simulations, leading to the selection of five candidates for synthesis and subsequent in vitro testing. Experimental results demonstrated that these five peptides possess high inhibitory activity, with IC50 values ranging from 28.13 ± 7.93 to 167.42 ± 22.15 nM. Our study explores a white-box model driven technique and a structured screening pipeline for identifying and generating novel peptides with potential bioactivity.
Collapse
Affiliation(s)
- Shu Li
- Centre of Artificial Intelligence Driven Drug Discovery, Faculty of Applied Science, Macao Polytechnic University, Macao SAR 999078, China
| | - Lu Peng
- Department of Systems Science, Faculty of Arts and Sciences, Beijing Normal University, Zhuhai 519087, China
- International Academic Center of Complex Systems, Beijing Normal University, Zhuhai 519087, China
- School of Systems Science, Beijing Normal University, Beijing 100875, China
| | - Liuqing Chen
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Linjie Que
- Department of Systems Science, Faculty of Arts and Sciences, Beijing Normal University, Zhuhai 519087, China
- International Academic Center of Complex Systems, Beijing Normal University, Zhuhai 519087, China
| | - Wenqingqing Kang
- Centre of Artificial Intelligence Driven Drug Discovery, Faculty of Applied Science, Macao Polytechnic University, Macao SAR 999078, China
| | - Xiaojun Hu
- Department of Systems Science, Faculty of Arts and Sciences, Beijing Normal University, Zhuhai 519087, China
- International Academic Center of Complex Systems, Beijing Normal University, Zhuhai 519087, China
- School of Systems Science, Beijing Normal University, Beijing 100875, China
| | - Jun Ma
- Department of Systems Science, Faculty of Arts and Sciences, Beijing Normal University, Zhuhai 519087, China
- International Academic Center of Complex Systems, Beijing Normal University, Zhuhai 519087, China
- School of Systems Science, Beijing Normal University, Beijing 100875, China
| | - Zengru Di
- Department of Systems Science, Faculty of Arts and Sciences, Beijing Normal University, Zhuhai 519087, China
- International Academic Center of Complex Systems, Beijing Normal University, Zhuhai 519087, China
| | - Yu Liu
- Department of Systems Science, Faculty of Arts and Sciences, Beijing Normal University, Zhuhai 519087, China
- International Academic Center of Complex Systems, Beijing Normal University, Zhuhai 519087, China
| |
Collapse
|
14
|
Wang G, Feng H, Du M, Feng Y, Cao C. Multimodal Representation Learning via Graph Isomorphism Network for Toxicity Multitask Learning. J Chem Inf Model 2024; 64:8322-8338. [PMID: 39432821 DOI: 10.1021/acs.jcim.4c01061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2024]
Abstract
Toxicity is paramount for comprehending compound properties, particularly in the early stages of drug design. Due to the diversity and complexity of toxic effects, it became a challenge to compute compound toxicity tasks. To address this issue, we propose a multimodal representation learning model, termed multimodal graph isomorphism network (MMGIN), to address this challenge for compound toxicity multitask learning. Based on fingerprints and molecular graphs of compounds, our MMGIN model incorporates a multimodal representation learning model to acquire a comprehensive compound representation. This model adopts a two-channel structure to independently learn fingerprint representation and molecular graph representation. Subsequently, two feedforward neural networks utilize the learned multimodal compound representation to perform multitask learning, encompassing compound toxicity classification and multiple compound category classification simultaneously. To test the effectiveness of our model, we constructed a novel data set, termed the compound toxicity multitask learning (CTMTL) data set, derived from the TOXRIC data set. We compare our MMGIN model with other representative machine learning and deep learning models on the CTMTL and Tox21 data sets. The experimental results demonstrate significant advancements achieved by our MMGIN model. Furthermore, the ablation study underscores the effectiveness of the introduced fingerprints, molecular graphs, the multimodal representation learning model, and the multitask learning model, showcasing the model's superior predictive capability and robustness.
Collapse
Affiliation(s)
- Guishen Wang
- School of Computer Science and Engineering, Changchun University of Technology, North Yuanda Street No. 3000, Changchun, 130012 Jilin, China
| | - Hui Feng
- School of Computer Science and Engineering, Changchun University of Technology, North Yuanda Street No. 3000, Changchun, 130012 Jilin, China
| | - Mengyan Du
- School of Computer Science and Engineering, Changchun University of Technology, North Yuanda Street No. 3000, Changchun, 130012 Jilin, China
| | - Yuncong Feng
- School of Computer Science and Engineering, Changchun University of Technology, North Yuanda Street No. 3000, Changchun, 130012 Jilin, China
| | - Chen Cao
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Longmian Avenue No. 101, Nanjing, 211166 Jiangsu, China
| |
Collapse
|
15
|
Fu X, Duan H, Zang X, Liu C, Li X, Zhang Q, Zhang Z, Zou Q, Cui F. Hyb_SEnc: An Antituberculosis Peptide Predictor Based on a Hybrid Feature Vector and Stacked Ensemble Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1897-1910. [PMID: 39083393 DOI: 10.1109/tcbb.2024.3425644] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/02/2024]
Abstract
Tuberculosis has plagued mankind since ancient times, and the struggle between humans and tuberculosis continues. Mycobacterium tuberculosis is the leading cause of tuberculosis, infecting nearly one-third of the world's population. The rise of peptide drugs has created a new direction in the treatment of tuberculosis. Therefore, for the treatment of tuberculosis, the prediction of anti-tuberculosis peptides is crucial. This paper proposes an anti-tuberculosis peptide prediction method based on hybrid features and stacked ensemble learning. First, a random forest (RF) and extremely randomized tree (ERT) are selected as first-level learning of stacked ensembles. Then, the five best-performing feature encoding methods are selected to obtain the hybrid feature vector, and then the decision tree and recursive feature elimination (DT-RFE) are used to refine the hybrid feature vector. After selection, the optimal feature subset is used as the input of the stacked ensemble model. At the same time, logistic regression (LR) is used as a stacked ensemble secondary learner to build the final stacked ensemble model Hyb_SEnc. The prediction accuracy of Hyb_SEnc achieved 94.68% and 95.74% on the independent test sets of AntiTb_MD and AntiTb_RD, respectively.
Collapse
|
16
|
Han J, Kong T, Liu J. PepNet: an interpretable neural network for anti-inflammatory and antimicrobial peptides prediction using a pre-trained protein language model. Commun Biol 2024; 7:1198. [PMID: 39341947 PMCID: PMC11438969 DOI: 10.1038/s42003-024-06911-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Accepted: 09/17/2024] [Indexed: 10/01/2024] Open
Abstract
Identifying anti-inflammatory peptides (AIPs) and antimicrobial peptides (AMPs) is crucial for the discovery of innovative and effective peptide-based therapies targeting inflammation and microbial infections. However, accurate identification of AIPs and AMPs remains a computational challenge mainly due to limited utilization of peptide sequence information. Here, we propose PepNet, an interpretable neural network for predicting both AIPs and AMPs by applying a pre-trained protein language model to fully utilize the peptide sequence information. It first captures the information of residue arrangements and physicochemical properties using a residual dilated convolution block, and then seizes the function-related diverse information by introducing a residual Transformer block to characterize the residue representations generated by a pre-trained protein language model. After training and testing, PepNet demonstrates great superiority over other leading AIP and AMP predictors and shows strong interpretability of its learned peptide representations. A user-friendly web server for PepNet is freely available at http://liulab.top/PepNet/server .
Collapse
Affiliation(s)
- Jiyun Han
- School of Mathematics and Statistics, Shandong University, 264209, Weihai, China
| | - Tongxin Kong
- School of Mathematics and Statistics, Shandong University, 264209, Weihai, China
| | - Juntao Liu
- School of Mathematics and Statistics, Shandong University, 264209, Weihai, China.
| |
Collapse
|
17
|
Yu Q, Zhang Z, Liu G, Li W, Tang Y. ToxGIN: an In silico prediction model for peptide toxicity via graph isomorphism networks integrating peptide sequence and structure information. Brief Bioinform 2024; 25:bbae583. [PMID: 39530430 PMCID: PMC11555482 DOI: 10.1093/bib/bbae583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2024] [Revised: 10/22/2024] [Accepted: 10/29/2024] [Indexed: 11/16/2024] Open
Abstract
Peptide drugs have demonstrated enormous potential in treating a variety of diseases, yet toxicity prediction remains a significant challenge in drug development. Existing models for prediction of peptide toxicity largely rely on sequence information and often neglect the three-dimensional (3D) structures of peptides. This study introduced a novel model for short peptide toxicity prediction, named ToxGIN. The model utilizes Graph Isomorphism Network (GIN), integrating the underlying amino acid sequence composition and the 3D structures of peptides. ToxGIN comprises three primary modules: (i) Sequence processing module, converting peptide 3D structures and sequences into information of nodes and edges; (ii) Feature extraction module, utilizing GIN to learn discriminative features from nodes and edges; (iii) Classification module, employing a fully connected classifier for toxicity prediction. ToxGIN performed well on the independent test set with F1 score = 0.83, AUROC = 0.91, and Matthews correlation coefficient = 0.68, better than existing models for prediction of peptide toxicity. These results validated the effectiveness of integrating 3D structural information with sequence data using GIN for peptide toxicity prediction. The proposed ToxGIN and data can be freely accessible at https://github.com/cihebiyql/ToxGIN.
Collapse
Affiliation(s)
- Qiule Yu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Zhixing Zhang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Guixia Liu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Weihua Li
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Yun Tang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| |
Collapse
|
18
|
Fernández-Díaz R, Cossio-Pérez R, Agoni C, Lam HT, Lopez V, Shields DC. AutoPeptideML: a study on how to build more trustworthy peptide bioactivity predictors. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae555. [PMID: 39292535 PMCID: PMC11438549 DOI: 10.1093/bioinformatics/btae555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Revised: 08/08/2024] [Accepted: 09/17/2024] [Indexed: 09/20/2024]
Abstract
MOTIVATION Automated machine learning (AutoML) solutions can bridge the gap between new computational advances and their real-world applications by enabling experimental scientists to build their own custom models. We examine different steps in the development life-cycle of peptide bioactivity binary predictors and identify key steps where automation cannot only result in a more accessible method, but also more robust and interpretable evaluation leading to more trustworthy models. RESULTS We present a new automated method for drawing negative peptides that achieves better balance between specificity and generalization than current alternatives. We study the effect of homology-based partitioning for generating the training and testing data subsets and demonstrate that model performance is overestimated when no such homology correction is used, which indicates that prior studies may have overestimated their performance when applied to new peptide sequences. We also conduct a systematic analysis of different protein language models as peptide representation methods and find that they can serve as better descriptors than a naive alternative, but that there is no significant difference across models with different sizes or algorithms. Finally, we demonstrate that an ensemble of optimized traditional machine learning algorithms can compete with more complex neural network models, while being more computationally efficient. We integrate these findings into AutoPeptideML, an easy-to-use AutoML tool to allow researchers without a computational background to build new predictive models for peptide bioactivity in a matter of minutes. AVAILABILITY AND IMPLEMENTATION Source code, documentation, and data are available at https://github.com/IBM/AutoPeptideML and a dedicated web-server at http://peptide.ucd.ie/AutoPeptideML. A static version of the software to ensure the reproduction of the results is available at https://zenodo.org/records/13363975.
Collapse
Affiliation(s)
- Raúl Fernández-Díaz
- IBM Research, Dublin, Dublin D15 HN66, Ireland
- School of Medicine, University College Dublin, Dublin D04 C1P1, Ireland
- Conway Institute of Biomolecular and Biomedical Science, University College Dublin, Dublin D04 C1P, Ireland
- The SFI Centre for Research Training in Genomics Data Science, Ireland
| | - Rodrigo Cossio-Pérez
- School of Medicine, University College Dublin, Dublin D04 C1P1, Ireland
- Conway Institute of Biomolecular and Biomedical Science, University College Dublin, Dublin D04 C1P, Ireland
- Department of Science and Technology, National University of Quilmes, Bernal B1876, Provincia de Buenos Aires, Argentina
| | - Clement Agoni
- School of Medicine, University College Dublin, Dublin D04 C1P1, Ireland
- Conway Institute of Biomolecular and Biomedical Science, University College Dublin, Dublin D04 C1P, Ireland
- Discipline of Pharmaceutical Sciences, School of Health Sciences, University of KwaZulu-Natal, Durban 4000, South Africa
| | | | | | - Denis C Shields
- School of Medicine, University College Dublin, Dublin D04 C1P1, Ireland
- Conway Institute of Biomolecular and Biomedical Science, University College Dublin, Dublin D04 C1P, Ireland
| |
Collapse
|
19
|
Rathore AS, Choudhury S, Arora A, Tijare P, Raghava GPS. ToxinPred 3.0: An improved method for predicting the toxicity of peptides. Comput Biol Med 2024; 179:108926. [PMID: 39038391 DOI: 10.1016/j.compbiomed.2024.108926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 05/17/2024] [Accepted: 07/17/2024] [Indexed: 07/24/2024]
Abstract
Toxicity emerges as a prominent challenge in the design of therapeutic peptides, causing the failure of numerous peptides during clinical trials. In 2013, our group developed ToxinPred, a computational method that has been extensively adopted by the scientific community for predicting peptide toxicity. In this paper, we propose a refined variant of ToxinPred that showcases improved reliability and accuracy in predicting peptide toxicity. Initially, we utilized a similarity/alignment-based approach employing BLAST to predict toxic peptides, which yielded satisfactory accuracy; however, the method suffered from inadequate coverage. Subsequently, we employed a motif-based approach using MERCI software to uncover specific patterns or motifs that are exclusively observed in toxic peptides. The search for these motifs in peptides allowed us to predict toxic peptides with a high level of specificity with poor sensitivity. To overcome the coverage limitations, we developed alignment-free methods using machine/deep learning techniques to balance sensitivity and specificity of prediction. Deep learning model (ANN - LSTM with fixed sequence length) developed using one-hot encoding achieved a maximum AUROC of 0.93 with MCC of 0.71 on an independent dataset. Machine learning model (extra tree) developed using compositional features of peptides achieved a maximum AUROC of 0.95 with MCC of 0.78. We also developed large language models and achieved maximum AUC of 0.93 using ESM2-t33. Finally, we developed hybrid or ensemble methods combining two or more methods to enhance performance. Our specific hybrid method, which combines a motif-based approach with a machine learning-based model, achieved a maximum AUROC of 0.98 with MCC 0.81 on an independent dataset. In this study, all models were trained and tested on 80 % of data using five-fold cross-validation and evaluated on the remaining 20 % of data called independent dataset. The evaluation of all methods on an independent dataset revealed that the method proposed in this study exhibited better performance than existing methods. To cater to the needs of the scientific community, we have developed a standalone software, pip package and web-based server ToxinPred3 (https://github.com/raghavagps/toxinpred3 and https://webs.iiitd.edu.in/raghava/toxinpred3/).
Collapse
Affiliation(s)
- Anand Singh Rathore
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India.
| | - Shubham Choudhury
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India.
| | - Akanksha Arora
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India.
| | - Purva Tijare
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India.
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India.
| |
Collapse
|
20
|
Wang C, Yuan C, Wang Y, Shi Y, Zhang T, Patti GJ. Predicting Collision Cross-Section Values for Small Molecules through Chemical Class-Based Multimodal Graph Attention Network. J Chem Inf Model 2024; 64:6305-6315. [PMID: 38959055 DOI: 10.1021/acs.jcim.3c01934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/04/2024]
Abstract
Libraries of collision cross-section (CCS) values have the potential to facilitate compound identification in metabolomics. Although computational methods provide an opportunity to increase library size rapidly, accurate prediction of CCS values remains challenging due to the structural diversity of small molecules. Here, we developed a machine learning (ML) model that integrates graph attention networks and multimodal molecular representations to predict CCS values on the basis of chemical class. Our approach, referred to as MGAT-CCS, had superior performance in comparison to other ML models in CCS prediction. MGAT-CCS achieved a median relative error of 0.47%/1.14% (positive/negative mode) and 1.40%/1.63% (positive/negative mode) for lipids and metabolites, respectively. When MGAT-CCS was applied to real-world metabolomics data, it reduced the number of false metabolite candidates by roughly 25% across multiple sample types ranging from plasma and urine to cells. To facilitate its application, we developed a user-friendly stand-alone web server for MGAT-CCS that is freely available at https://mgat-ccs-web.onrender.com. This work represents a step forward in predicting CCS values and can potentially facilitate the identification of small molecules when using ion mobility spectrometry coupled with mass spectrometry.
Collapse
Affiliation(s)
- Cheng Wang
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan 250012, China
- National Institute of Health Data Science of China, Shandong University, Jinan 250000, China
- Department of Chemistry, Washington University in St. Louis, St. Louis, Missouri 63130 United States
| | - Chuang Yuan
- School of Life Sciences, and Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing 100871, China
- Department of Biochemistry and Biophysics, School of Basic Medical Sciences, Peking University, Beijing 100191, China
| | - Yahui Wang
- Department of Chemistry, Washington University in St. Louis, St. Louis, Missouri 63130 United States
- Department of Medicine, Washington University in St. Louis, St. Louis, Missouri 63130, United States
| | - Yuying Shi
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan 250012, China
- National Institute of Health Data Science of China, Shandong University, Jinan 250000, China
| | - Tao Zhang
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan 250012, China
- National Institute of Health Data Science of China, Shandong University, Jinan 250000, China
| | - Gary J Patti
- Department of Chemistry, Washington University in St. Louis, St. Louis, Missouri 63130 United States
- Department of Medicine, Washington University in St. Louis, St. Louis, Missouri 63130, United States
- Siteman Cancer Center, Washington University in St. Louis, St. Louis, Missouri 63130, United States
- Center for Metabolomics and Isotope Tracing, Washington University in St. Louis, St. Louis, Missouri 63130, United States
| |
Collapse
|
21
|
Yao L, Xie P, Guan J, Chung CR, Zhang W, Deng J, Huang Y, Chiang YC, Lee TY. ACP-CapsPred: an explainable computational framework for identification and functional prediction of anticancer peptides based on capsule network. Brief Bioinform 2024; 25:bbae460. [PMID: 39293807 PMCID: PMC11410379 DOI: 10.1093/bib/bbae460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2024] [Revised: 07/22/2024] [Accepted: 09/09/2024] [Indexed: 09/20/2024] Open
Abstract
Cancer is a severe illness that significantly threatens human life and health. Anticancer peptides (ACPs) represent a promising therapeutic strategy for combating cancer. In silico methods enable rapid and accurate identification of ACPs without extensive human and material resources. This study proposes a two-stage computational framework called ACP-CapsPred, which can accurately identify ACPs and characterize their functional activities across different cancer types. ACP-CapsPred integrates a protein language model with evolutionary information and physicochemical properties of peptides, constructing a comprehensive profile of peptides. ACP-CapsPred employs a next-generation neural network, specifically capsule networks, to construct predictive models. Experimental results demonstrate that ACP-CapsPred exhibits satisfactory predictive capabilities in both stages, reaching state-of-the-art performance. In the first stage, ACP-CapsPred achieves accuracies of 80.25% and 95.71%, as well as F1-scores of 79.86% and 95.90%, on benchmark datasets Set 1 and Set 2, respectively. In the second stage, tasked with characterizing the functional activities of ACPs across five selected cancer types, ACP-CapsPred attains an average accuracy of 90.75% and an F1-score of 91.38%. Furthermore, ACP-CapsPred demonstrates excellent interpretability, revealing regions and residues associated with anticancer activity. Consequently, ACP-CapsPred presents a promising solution to expedite the development of ACPs and offers a novel perspective for other biological sequence analyses.
Collapse
Affiliation(s)
- Lantian Yao
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, 2001 Longxiang Road, Shenzhen 518172, China
- School of Science and Engineering, The Chinese University of Hong Kong, 2001 Longxiang Road, Shenzhen 518172, China
| | - Peilin Xie
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, 2001 Longxiang Road, Shenzhen 518172, China
- School of Science and Engineering, The Chinese University of Hong Kong, 2001 Longxiang Road, Shenzhen 518172, China
| | - Jiahui Guan
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, 2001 Longxiang Road, Shenzhen 518172, China
- School of Medicine, The Chinese University of Hong Kong, 2001 Longxiang Road, Shenzhen 518172, China
| | - Chia-Ru Chung
- Department of Computer Science and Information Engineering, National Central University, 300 Zhongda Road, Taoyuan 320317, Taiwan
| | - Wenyang Zhang
- School of Medicine, The Chinese University of Hong Kong, 2001 Longxiang Road, Shenzhen 518172, China
| | - Junyang Deng
- School of Medicine, The Chinese University of Hong Kong, 2001 Longxiang Road, Shenzhen 518172, China
| | - Yixian Huang
- School of Medicine, The Chinese University of Hong Kong, 2001 Longxiang Road, Shenzhen 518172, China
| | - Ying-Chih Chiang
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, 2001 Longxiang Road, Shenzhen 518172, China
- School of Medicine, The Chinese University of Hong Kong, 2001 Longxiang Road, Shenzhen 518172, China
| | - Tzong-Yi Lee
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, 1001 Daxue Road, Hsinchu 300093, Taiwan
- Center for Intelligent Drug Systems and Smart Bio-devices (IDS2B), National Yang Ming Chiao Tung University, 1001 Daxue Road, Hsinchu 300093, Taiwan
| |
Collapse
|
22
|
Wang JH, Sung TY. ToxTeller: Predicting Peptide Toxicity Using Four Different Machine Learning Approaches. ACS OMEGA 2024; 9:32116-32123. [PMID: 39072096 PMCID: PMC11270677 DOI: 10.1021/acsomega.4c04246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/03/2024] [Revised: 06/20/2024] [Accepted: 06/25/2024] [Indexed: 07/30/2024]
Abstract
Examining the toxicity of peptides is essential for therapeutic peptide-based drug design. Machine learning approaches are frequently used to develop highly accurate predictors for peptide toxicity prediction. In this paper, we present ToxTeller, which provides four predictors using logistic regression, support vector machines, random forests, and XGBoost, respectively. For prediction model development, we construct a data set of toxic and nontoxic peptides from SwissProt and ConoServer databases with existence evidence levels checked. We also fully utilize the protein annotation in SwissProt to collect more toxic peptides than using keyword search alone. From this data set, we construct an independent test data set that shares at most 40% sequence similarity within itself and with the training data set. From a quite comprehensive list of 28 feature combinations, we conduct 10-fold cross-validation on the training data set to determine the optimized feature combination for model development. ToxTeller's performance is evaluated and compared with existing predictors on the independent test data set. Since toxic peptides must be avoided for drug design, we analyze strategies for reducing false-negative predictions of toxic peptides and suggest selecting models by top sensitivity instead of the widely used Matthews correlation coefficient, and also suggest using a meta-predictor approach with multiple predictors.
Collapse
Affiliation(s)
- Jen-Hung Wang
- Institute of Information
Science, Academia Sinica, Taipei 11529, Taiwan
| | - Ting-Yi Sung
- Institute of Information
Science, Academia Sinica, Taipei 11529, Taiwan
| |
Collapse
|
23
|
Ebrahimikondori H, Sutherland D, Yanai A, Richter A, Salehi A, Li C, Coombe L, Kotkoff M, Warren RL, Birol I. Structure-aware deep learning model for peptide toxicity prediction. Protein Sci 2024; 33:e5076. [PMID: 39196703 PMCID: PMC11193153 DOI: 10.1002/pro.5076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 04/26/2024] [Accepted: 05/28/2024] [Indexed: 08/30/2024]
Abstract
Antimicrobial resistance is a critical public health concern, necessitating the exploration of alternative treatments. While antimicrobial peptides (AMPs) show promise, assessing their toxicity using traditional wet lab methods is both time-consuming and costly. We introduce tAMPer, a novel multi-modal deep learning model designed to predict peptide toxicity by integrating the underlying amino acid sequence composition and the three-dimensional structure of peptides. tAMPer adopts a graph-based representation for peptides, encoding ColabFold-predicted structures, where nodes represent amino acids and edges represent spatial interactions. Structural features are extracted using graph neural networks, and recurrent neural networks capture sequential dependencies. tAMPer's performance was assessed on a publicly available protein toxicity benchmark and an AMP hemolysis data we generated. On the latter, tAMPer achieves an F1-score of 68.7%, outperforming the second-best method by 23.4%. On the protein benchmark, tAMPer exhibited an improvement of over 3.0% in the F1-score compared to current state-of-the-art methods. We anticipate tAMPer to accelerate AMP discovery and development by reducing the reliance on laborious toxicity screening experiments.
Collapse
Affiliation(s)
- Hossein Ebrahimikondori
- Canada's Michael Smith Genome Sciences CentreBC Cancer AgencyVancouverBritish ColumbiaCanada
- Bioinformatics Graduate ProgramUniversity of British ColumbiaVancouverBritish ColumbiaCanada
| | - Darcy Sutherland
- Canada's Michael Smith Genome Sciences CentreBC Cancer AgencyVancouverBritish ColumbiaCanada
- Public Health LaboratoryBritish Columbia Centre for Disease ControlVancouverBritish ColumbiaCanada
- Department of Pathology and Laboratory MedicineUniversity of British ColumbiaVancouverBritish ColumbiaCanada
| | - Anat Yanai
- Canada's Michael Smith Genome Sciences CentreBC Cancer AgencyVancouverBritish ColumbiaCanada
- Public Health LaboratoryBritish Columbia Centre for Disease ControlVancouverBritish ColumbiaCanada
| | - Amelia Richter
- Canada's Michael Smith Genome Sciences CentreBC Cancer AgencyVancouverBritish ColumbiaCanada
- Public Health LaboratoryBritish Columbia Centre for Disease ControlVancouverBritish ColumbiaCanada
| | - Ali Salehi
- Canada's Michael Smith Genome Sciences CentreBC Cancer AgencyVancouverBritish ColumbiaCanada
- Public Health LaboratoryBritish Columbia Centre for Disease ControlVancouverBritish ColumbiaCanada
| | - Chenkai Li
- Canada's Michael Smith Genome Sciences CentreBC Cancer AgencyVancouverBritish ColumbiaCanada
- Bioinformatics Graduate ProgramUniversity of British ColumbiaVancouverBritish ColumbiaCanada
| | - Lauren Coombe
- Canada's Michael Smith Genome Sciences CentreBC Cancer AgencyVancouverBritish ColumbiaCanada
| | - Monica Kotkoff
- Canada's Michael Smith Genome Sciences CentreBC Cancer AgencyVancouverBritish ColumbiaCanada
| | - René L. Warren
- Canada's Michael Smith Genome Sciences CentreBC Cancer AgencyVancouverBritish ColumbiaCanada
| | - Inanc Birol
- Canada's Michael Smith Genome Sciences CentreBC Cancer AgencyVancouverBritish ColumbiaCanada
- Public Health LaboratoryBritish Columbia Centre for Disease ControlVancouverBritish ColumbiaCanada
- Department of Pathology and Laboratory MedicineUniversity of British ColumbiaVancouverBritish ColumbiaCanada
- Department of Medical GeneticsUniversity of British ColumbiaVancouverBritish ColumbiaCanada
| |
Collapse
|
24
|
Chen Z, Wang R, Guo J, Wang X. The role and future prospects of artificial intelligence algorithms in peptide drug development. Biomed Pharmacother 2024; 175:116709. [PMID: 38713945 DOI: 10.1016/j.biopha.2024.116709] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2024] [Revised: 05/01/2024] [Accepted: 05/02/2024] [Indexed: 05/09/2024] Open
Abstract
Peptide medications have been more well-known in recent years due to their many benefits, including low side effects, high biological activity, specificity, effectiveness, and so on. Over 100 peptide medications have been introduced to the market to treat a variety of illnesses. Most of these peptide medications are developed on the basis of endogenous peptides or natural peptides, which frequently required expensive, time-consuming, and extensive tests to confirm. As artificial intelligence advances quickly, it is now possible to build machine learning or deep learning models that screen a large number of candidate sequences for therapeutic peptides. Therapeutic peptides, such as those with antibacterial or anticancer properties, have been developed by the application of artificial intelligence algorithms.The process of finding and developing peptide drugs is outlined in this review, along with a few related cases that were helped by AI and conventional methods. These resources will open up new avenues for peptide drug development and discovery, helping to meet the pressing needs of clinical patients for disease treatment. Although peptide drugs are a new class of biopharmaceuticals that distinguish them from chemical and small molecule drugs, their clinical purpose and value cannot be ignored. However, the traditional peptide drug research and development has a long development cycle and high investment, and the creation of peptide medications will be substantially hastened by the AI-assisted (AI+) mode, offering a new boost for combating diseases.
Collapse
Affiliation(s)
- Zhiheng Chen
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100083, China.
| | - Ruoxi Wang
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100083, China.
| | - Junqi Guo
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100083, China.
| | - Xiaogang Wang
- Guangdong Provincial Key Laboratory of Bone and Joint Degenerative Diseases, The Third Affiliated Hospital of Southern Medical University, Guangzhou, Guangdong 510630, China.
| |
Collapse
|
25
|
Mall R, Singh A, Patel CN, Guirimand G, Castiglione F. VISH-Pred: an ensemble of fine-tuned ESM models for protein toxicity prediction. Brief Bioinform 2024; 25:bbae270. [PMID: 38842509 PMCID: PMC11154842 DOI: 10.1093/bib/bbae270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 05/06/2024] [Accepted: 05/23/2024] [Indexed: 06/07/2024] Open
Abstract
Peptide- and protein-based therapeutics are becoming a promising treatment regimen for myriad diseases. Toxicity of proteins is the primary hurdle for protein-based therapies. Thus, there is an urgent need for accurate in silico methods for determining toxic proteins to filter the pool of potential candidates. At the same time, it is imperative to precisely identify non-toxic proteins to expand the possibilities for protein-based biologics. To address this challenge, we proposed an ensemble framework, called VISH-Pred, comprising models built by fine-tuning ESM2 transformer models on a large, experimentally validated, curated dataset of protein and peptide toxicities. The primary steps in the VISH-Pred framework are to efficiently estimate protein toxicities taking just the protein sequence as input, employing an under sampling technique to handle the humongous class-imbalance in the data and learning representations from fine-tuned ESM2 protein language models which are then fed to machine learning techniques such as Lightgbm and XGBoost. The VISH-Pred framework is able to correctly identify both peptides/proteins with potential toxicity and non-toxic proteins, achieving a Matthews correlation coefficient of 0.737, 0.716 and 0.322 and F1-score of 0.759, 0.696 and 0.713 on three non-redundant blind tests, respectively, outperforming other methods by over $10\%$ on these quality metrics. Moreover, VISH-Pred achieved the best accuracy and area under receiver operating curve scores on these independent test sets, highlighting the robustness and generalization capability of the framework. By making VISH-Pred available as an easy-to-use web server, we expect it to serve as a valuable asset for future endeavors aimed at discerning the toxicity of peptides and enabling efficient protein-based therapeutics.
Collapse
Affiliation(s)
- Raghvendra Mall
- Biotechnology Research Center, Technology Innovation Institute, P.O. Box 9639, Abu Dhabi, United Arab Emirates
| | - Ankita Singh
- Biotechnology Research Center, Technology Innovation Institute, P.O. Box 9639, Abu Dhabi, United Arab Emirates
| | - Chirag N Patel
- Biotechnology Research Center, Technology Innovation Institute, P.O. Box 9639, Abu Dhabi, United Arab Emirates
| | - Gregory Guirimand
- Biotechnology Research Center, Technology Innovation Institute, P.O. Box 9639, Abu Dhabi, United Arab Emirates
- Graduate School of Science, Technology and Innovation, Kobe University, 1-1 Rokkodai-cho, Nada-ku, Kobe, 657-8501, Japan
| | - Filippo Castiglione
- Biotechnology Research Center, Technology Innovation Institute, P.O. Box 9639, Abu Dhabi, United Arab Emirates
- Institute for Applied Computing, National Research Council of Italy, Via dei Taurini, 19, 00185, Rome, Italy
| |
Collapse
|
26
|
Tan X, Liu Q, Fang Y, Yang S, Chen F, Wang J, Ouyang D, Dong J, Zeng W. Introducing enzymatic cleavage features and transfer learning realizes accurate peptide half-life prediction across species and organs. Brief Bioinform 2024; 25:bbae350. [PMID: 39038937 PMCID: PMC11262833 DOI: 10.1093/bib/bbae350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2024] [Revised: 06/05/2024] [Accepted: 07/05/2024] [Indexed: 07/24/2024] Open
Abstract
Peptide drugs are becoming star drug agents with high efficiency and selectivity which open up new therapeutic avenues for various diseases. However, the sensitivity to hydrolase and the relatively short half-life have severely hindered their development. In this study, a new generation artificial intelligence-based system for accurate prediction of peptide half-life was proposed, which realized the half-life prediction of both natural and modified peptides and successfully bridged the evaluation possibility between two important species (human, mouse) and two organs (blood, intestine). To achieve this, enzymatic cleavage descriptors were integrated with traditional peptide descriptors to construct a better representation. Then, robust models with accurate performance were established by comparing traditional machine learning and transfer learning, systematically. Results indicated that enzymatic cleavage features could certainly enhance model performance. The deep learning model integrating transfer learning significantly improved predictive accuracy, achieving remarkable R2 values: 0.84 for natural peptides and 0.90 for modified peptides in human blood, 0.984 for natural peptides and 0.93 for modified peptides in mouse blood, and 0.94 for modified peptides in mouse intestine on the test set, respectively. These models not only successfully composed the above-mentioned system but also improved by approximately 15% in terms of correlation compared to related works. This study is expected to provide powerful solutions for peptide half-life evaluation and boost peptide drug development.
Collapse
Affiliation(s)
- Xiaorong Tan
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha 410083, P.R. China
| | - Qianhui Liu
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha 410083, P.R. China
| | - Yanpeng Fang
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha 410083, P.R. China
| | - Sen Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha 410083, P.R. China
| | - Fei Chen
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha 410083, P.R. China
| | - Jianmin Wang
- The Interdisciplinary Graduate Program in Integrative Biotechnology and Translational Medicine, Yonsei University, 214, Veritas A Hall, Yonsei Univeristy, 85 Songdogwahak-ro, Incheon 21983, Republic of Korea
| | - Defang Ouyang
- Institute of Chinese Medical Sciences (ICMS), State Key Laboratory of Quality Research in Chinese Medicine, University of Macau, Avenida da Universidade, Taipa, Macau 999078, China
| | - Jie Dong
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha 410083, P.R. China
| | - Wenbin Zeng
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172 Tongzipo Road, Yuelu District, Changsha 410083, P.R. China
| |
Collapse
|
27
|
Jiao S, Ye X, Sakurai T, Zou Q, Liu R. Integrated convolution and self-attention for improving peptide toxicity prediction. Bioinformatics 2024; 40:btae297. [PMID: 38696758 PMCID: PMC11654579 DOI: 10.1093/bioinformatics/btae297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 04/02/2024] [Accepted: 04/30/2024] [Indexed: 05/04/2024] Open
Abstract
MOTIVATION Peptides are promising agents for the treatment of a variety of diseases due to their specificity and efficacy. However, the development of peptide-based drugs is often hindered by the potential toxicity of peptides, which poses a significant barrier to their clinical application. Traditional experimental methods for evaluating peptide toxicity are time-consuming and costly, making the development process inefficient. Therefore, there is an urgent need for computational tools specifically designed to predict peptide toxicity accurately and rapidly, facilitating the identification of safe peptide candidates for drug development. RESULTS We provide here a novel computational approach, CAPTP, which leverages the power of convolutional and self-attention to enhance the prediction of peptide toxicity from amino acid sequences. CAPTP demonstrates outstanding performance, achieving a Matthews correlation coefficient of approximately 0.82 in both cross-validation settings and on independent test datasets. This performance surpasses that of existing state-of-the-art peptide toxicity predictors. Importantly, CAPTP maintains its robustness and generalizability even when dealing with data imbalances. Further analysis by CAPTP reveals that certain sequential patterns, particularly in the head and central regions of peptides, are crucial in determining their toxicity. This insight can significantly inform and guide the design of safer peptide drugs. AVAILABILITY AND IMPLEMENTATION The source code for CAPTP is freely available at https://github.com/jiaoshihu/CAPTP.
Collapse
Affiliation(s)
- Shihu Jiao
- Department of Computer Science, University of Tsukuba,
Tsukuba 3058577, Japan
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba,
Tsukuba 3058577, Japan
| | - Tetsuya Sakurai
- Department of Computer Science, University of Tsukuba,
Tsukuba 3058577, Japan
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic
Science and Technology of China, Chengdu 610054, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science
and Technology of China, Quzhou 324000, China
| | - Ruijun Liu
- School of Software, Beihang University, Beijing 100191,
China
| |
Collapse
|
28
|
Wan F, Wong F, Collins JJ, de la Fuente-Nunez C. Machine learning for antimicrobial peptide identification and design. NATURE REVIEWS BIOENGINEERING 2024; 2:392-407. [PMID: 39850516 PMCID: PMC11756916 DOI: 10.1038/s44222-024-00152-x] [Citation(s) in RCA: 25] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/25/2025]
Abstract
Artificial intelligence (AI) and machine learning (ML) models are being deployed in many domains of society and have recently reached the field of drug discovery. Given the increasing prevalence of antimicrobial resistance, as well as the challenges intrinsic to antibiotic development, there is an urgent need to accelerate the design of new antimicrobial therapies. Antimicrobial peptides (AMPs) are therapeutic agents for treating bacterial infections, but their translation into the clinic has been slow owing to toxicity, poor stability, limited cellular penetration and high cost, among other issues. Recent advances in AI and ML have led to breakthroughs in our abilities to predict biomolecular properties and structures and to generate new molecules. The ML-based modelling of peptides may overcome some of the disadvantages associated with traditional drug discovery and aid the rapid development and translation of AMPs. Here, we provide an introduction to this emerging field and survey ML approaches that can be used to address issues currently hindering AMP development. We also outline important limitations that can be addressed for the broader adoption of AMPs in clinical practice, as well as new opportunities in data-driven peptide design.
Collapse
Affiliation(s)
- Fangping Wan
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, USA
- Department of Chemical and Biomolecular Engineering, University of Pennsylvania, Philadelphia, PA, USA
- Department of Chemistry, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA, USA
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA
- These authors contributed equally: Fangping Wan, Felix Wong
| | - Felix Wong
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
- These authors contributed equally: Fangping Wan, Felix Wong
| | - James J. Collins
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, USA
- These authors jointly supervised this work: James J. Collins, Cesar de la Fuente-Nunez
| | - Cesar de la Fuente-Nunez
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, USA
- Department of Chemical and Biomolecular Engineering, University of Pennsylvania, Philadelphia, PA, USA
- Department of Chemistry, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA, USA
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA
- These authors jointly supervised this work: James J. Collins, Cesar de la Fuente-Nunez
| |
Collapse
|
29
|
Beltrán JF, Herrera-Belén L, Parraguez-Contreras F, Farías JG, Machuca-Sepúlveda J, Short S. MultiToxPred 1.0: a novel comprehensive tool for predicting 27 classes of protein toxins using an ensemble machine learning approach. BMC Bioinformatics 2024; 25:148. [PMID: 38609877 PMCID: PMC11010298 DOI: 10.1186/s12859-024-05748-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Accepted: 03/14/2024] [Indexed: 04/14/2024] Open
Abstract
Protein toxins are defense mechanisms and adaptations found in various organisms and microorganisms, and their use in scientific research as therapeutic candidates is gaining relevance due to their effectiveness and specificity against cellular targets. However, discovering these toxins is time-consuming and expensive. In silico tools, particularly those based on machine learning and deep learning, have emerged as valuable resources to address this challenge. Existing tools primarily focus on binary classification, determining whether a protein is a toxin or not, and occasionally identifying specific types of toxins. For the first time, we propose a novel approach capable of classifying protein toxins into 27 distinct categories based on their mode of action within cells. To accomplish this, we assessed multiple machine learning techniques and found that an ensemble model incorporating the Light Gradient Boosting Machine and Quadratic Discriminant Analysis algorithms exhibited the best performance. During the tenfold cross-validation on the training dataset, our model exhibited notable metrics: 0.840 accuracy, 0.827 F1 score, 0.836 precision, 0.840 sensitivity, and 0.989 AUC. In the testing stage, using an independent dataset, the model achieved 0.846 accuracy, 0.838 F1 score, 0.847 precision, 0.849 sensitivity, and 0.991 AUC. These results present a powerful next-generation tool called MultiToxPred 1.0, accessible through a web application. We believe that MultiToxPred 1.0 has the potential to become an indispensable resource for researchers, facilitating the efficient identification of protein toxins. By leveraging this tool, scientists can accelerate their search for these toxins and advance their understanding of their therapeutic potential.
Collapse
Affiliation(s)
- Jorge F Beltrán
- Department of Chemical Engineering, Faculty of Engineering and Science, Universidad de La Frontera, Ave. Francisco Salazar, 01145, Temuco, Chile.
| | - Lisandra Herrera-Belén
- Departamento de Ciencias Básicas, Facultad de Ciencias, Universidad Santo Tomas, Temuco, Chile
| | - Fernanda Parraguez-Contreras
- Department of Chemical Engineering, Faculty of Engineering and Science, Universidad de La Frontera, Ave. Francisco Salazar, 01145, Temuco, Chile
| | - Jorge G Farías
- Department of Chemical Engineering, Faculty of Engineering and Science, Universidad de La Frontera, Ave. Francisco Salazar, 01145, Temuco, Chile
| | - Jorge Machuca-Sepúlveda
- Department of Chemical Engineering, Faculty of Engineering and Science, Universidad de La Frontera, Ave. Francisco Salazar, 01145, Temuco, Chile
| | - Stefania Short
- Department of Chemical Engineering, Faculty of Engineering and Science, Universidad de La Frontera, Ave. Francisco Salazar, 01145, Temuco, Chile
| |
Collapse
|
30
|
Zhang X, Xiang H, Yang X, Dong J, Fu X, Zeng X, Chen H, Li K. Dual-View Learning Based on Images and Sequences for Molecular Property Prediction. IEEE J Biomed Health Inform 2024; 28:1564-1574. [PMID: 38153823 DOI: 10.1109/jbhi.2023.3347794] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2023]
Abstract
The prediction of molecular properties remains a challenging task in the field of drug design and development. Recently, there has been a growing interest in the analysis of biological images. Molecular images, as a novel representation, have proven to be competitive, yet they lack explicit information and detailed semantic richness. Conversely, semantic information in SMILES sequences is explicit but lacks spatial structural details. Therefore, in this study, we focus on and explore the relationship between these two types of representations, proposing a novel multimodal architecture named ISMol. ISMol relies on a cross-attention mechanism to extract information representations of molecules from both images and SMILES strings, thereby predicting molecular properties. Evaluation results on 14 small molecule ADMET datasets indicate that ISMol outperforms machine learning (ML) and deep learning (DL) models based on single-modal representations. In addition, we analyze our method through a large number of experiments to test the superiority, interpretability and generalizability of the method. In summary, ISMol offers a powerful deep learning toolbox for drug discovery in a variety of molecular properties.
Collapse
|
31
|
Ji S, An F, Zhang T, Lou M, Guo J, Liu K, Zhu Y, Wu J, Wu R. Antimicrobial peptides: An alternative to traditional antibiotics. Eur J Med Chem 2024; 265:116072. [PMID: 38147812 DOI: 10.1016/j.ejmech.2023.116072] [Citation(s) in RCA: 32] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 12/04/2023] [Accepted: 12/17/2023] [Indexed: 12/28/2023]
Abstract
As antibiotic-resistant bacteria and genes continue to emerge, the identification of effective alternatives to traditional antibiotics has become a pressing issue. Antimicrobial peptides are favored for their safety, low residue, and low resistance properties, and their unique antimicrobial mechanisms show significant potential in combating antibiotic resistance. However, the high production cost and weak activity of antimicrobial peptides limit their application. Moreover, traditional laboratory methods for identifying and designing new antimicrobial peptides are time-consuming and labor-intensive, hindering their development. Currently, novel technologies, such as artificial intelligence (AI) are being employed to develop and design new antimicrobial peptide resources, offering new opportunities for the advancement of antimicrobial peptides. This article summarizes the basic characteristics and antimicrobial mechanisms of antimicrobial peptides, as well as their advantages and limitations, and explores the application of AI in antimicrobial peptides prediction amd design. This highlights the crucial role of AI in enhancing the efficiency of antimicrobial peptide research and provides a reference for antimicrobial drug development.
Collapse
Affiliation(s)
- Shuaiqi Ji
- College of Food Science, Shenyang Agricultural University, Shenyang, 110866, PR China; Shenyang Key Laboratory of Microbial Fermentation Technology Innovation, Shenyang, 110866, PR China
| | - Feiyu An
- College of Food Science, Shenyang Agricultural University, Shenyang, 110866, PR China; Liaoning Engineering Research Center of Food Fermentation Technology, Shenyang, 110866, PR China
| | - Taowei Zhang
- College of Food Science, Shenyang Agricultural University, Shenyang, 110866, PR China; Shenyang Key Laboratory of Microbial Fermentation Technology Innovation, Shenyang, 110866, PR China
| | - Mengxue Lou
- College of Food Science, Shenyang Agricultural University, Shenyang, 110866, PR China; Liaoning Engineering Research Center of Food Fermentation Technology, Shenyang, 110866, PR China
| | - Jiawei Guo
- College of Food Science, Shenyang Agricultural University, Shenyang, 110866, PR China; Shenyang Key Laboratory of Microbial Fermentation Technology Innovation, Shenyang, 110866, PR China
| | - Kexin Liu
- College of Food Science, Shenyang Agricultural University, Shenyang, 110866, PR China; Shenyang Key Laboratory of Microbial Fermentation Technology Innovation, Shenyang, 110866, PR China
| | - Yi Zhu
- College of Food Science, Shenyang Agricultural University, Shenyang, 110866, PR China; Liaoning Engineering Research Center of Food Fermentation Technology, Shenyang, 110866, PR China
| | - Junrui Wu
- College of Food Science, Shenyang Agricultural University, Shenyang, 110866, PR China; Liaoning Engineering Research Center of Food Fermentation Technology, Shenyang, 110866, PR China; Shenyang Key Laboratory of Microbial Fermentation Technology Innovation, Shenyang, 110866, PR China.
| | - Rina Wu
- College of Food Science, Shenyang Agricultural University, Shenyang, 110866, PR China; Liaoning Engineering Research Center of Food Fermentation Technology, Shenyang, 110866, PR China; Shenyang Key Laboratory of Microbial Fermentation Technology Innovation, Shenyang, 110866, PR China.
| |
Collapse
|
32
|
Wang D, Jin J, Li Z, Wang Y, Fan M, Liang S, Su R, Wei L. StructuralDPPIV: a novel deep learning model based on atom structure for predicting dipeptidyl peptidase-IV inhibitory peptides. Bioinformatics 2024; 40:btae057. [PMID: 38305458 PMCID: PMC10904144 DOI: 10.1093/bioinformatics/btae057] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 12/07/2023] [Accepted: 01/30/2024] [Indexed: 02/03/2024] Open
Abstract
MOTIVATION Diabetes is a chronic metabolic disorder that has been a major cause of blindness, kidney failure, heart attacks, stroke, and lower limb amputation across the world. To alleviate the impact of diabetes, researchers have developed the next generation of anti-diabetic drugs, known as dipeptidyl peptidase IV inhibitory peptides (DPP-IV-IPs). However, the discovery of these promising drugs has been restricted due to the lack of effective peptide-mining tools. RESULTS Here, we presented StructuralDPPIV, a deep learning model designed for DPP-IV-IP identification, which takes advantage of both molecular graph features in amino acid and sequence information. Experimental results on the independent test dataset and two wet experiment datasets show that our model outperforms the other state-of-art methods. Moreover, to better study what StructuralDPPIV learns, we used CAM technology and perturbation experiment to analyze our model, which yielded interpretable insights into the reasoning behind prediction results. AVAILABILITY AND IMPLEMENTATION The project code is available at https://github.com/WeiLab-BioChem/Structural-DPP-IV.
Collapse
Affiliation(s)
- Ding Wang
- School of Software, Shandong University, Jinan 250101, China
| | - Junru Jin
- School of Software, Shandong University, Jinan 250101, China
| | - Zhongshen Li
- School of Software, Shandong University, Jinan 250101, China
| | - Yu Wang
- School of Software, Shandong University, Jinan 250101, China
| | - Mushuang Fan
- School of Software, Shandong University, Jinan 250101, China
| | - Sirui Liang
- School of Software, Shandong University, Jinan 250101, China
| | - Ran Su
- College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| | - Leyi Wei
- Faculty of Applied Sciences, Macao Polytechnic University, Macao 999078, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250101, China
| |
Collapse
|
33
|
Song Z, Chen J, Cheng J, Chen G, Qi Z. Computer-Aided Molecular Design of Ionic Liquids as Advanced Process Media: A Review from Fundamentals to Applications. Chem Rev 2024; 124:248-317. [PMID: 38108629 DOI: 10.1021/acs.chemrev.3c00223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
The unique physicochemical properties, flexible structural tunability, and giant chemical space of ionic liquids (ILs) provide them a great opportunity to match different target properties to work as advanced process media. The crux of the matter is how to efficiently and reliably tailor suitable ILs toward a specific application. In this regard, the computer-aided molecular design (CAMD) approach has been widely adapted to cover this family of high-profile chemicals, that is, to perform computer-aided IL design (CAILD). This review discusses the past developments that have contributed to the state-of-the-art of CAILD and provides a perspective about how future works could pursue the acceleration of the practical application of ILs. In a broad context of CAILD, key aspects related to the forward structure-property modeling and reverse molecular design of ILs are overviewed. For the former forward task, diverse IL molecular representations, modeling algorithms, as well as representative models on physical properties, thermodynamic properties, among others of ILs are introduced. For the latter reverse task, representative works formulating different molecular design scenarios are summarized. Beyond the substantial progress made, some future perspectives to move CAILD a step forward are finally provided.
Collapse
Affiliation(s)
- Zhen Song
- State Key laboratory of Chemical Engineering, School of Chemical Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Jiahui Chen
- State Key laboratory of Chemical Engineering, School of Chemical Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Jie Cheng
- State Key laboratory of Chemical Engineering, School of Chemical Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Guzhong Chen
- State Key laboratory of Chemical Engineering, School of Chemical Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Zhiwen Qi
- State Key laboratory of Chemical Engineering, School of Chemical Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| |
Collapse
|
34
|
Yu H, Wang R, Qiao J, Wei L. Multi-CGAN: Deep Generative Model-Based Multiproperty Antimicrobial Peptide Design. J Chem Inf Model 2024; 64:316-326. [PMID: 38135439 DOI: 10.1021/acs.jcim.3c01881] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2023]
Abstract
Antimicrobial peptides are peptides that are effective against bacteria and viruses, and the discovery of new antimicrobial peptides is of great importance to human life and health. Although the design of antimicrobial peptides using machine learning methods has achieved good results in recent years, it remains a challenge to learn and design novel antimicrobial peptides with multiple properties of interest from peptide data with certain property labels. To this end, we propose Multi-CGAN, a deep generative model-based architecture that can learn from single-attribute peptide data and generate antimicrobial peptide sequences with multiple attributes that we need, which may have a potentially wide range of uses in drug discovery. In particular, we verified that our Multi-CGAN generated peptides with the desired properties have good performance in terms of generation rate. Moreover, a comprehensive statistical analysis demonstrated that our generated peptides are diverse and have a low probability of being homologous to the training data. Interestingly, we found that the performance of many popular deep learning methods on the antimicrobial peptide prediction task can be improved by using Multi-CGAN to expand the data on the training set of the original task, indicating the high quality of our generated peptides and the robust ability of our method. In addition, we also investigated whether it is possible to directionally generate peptide sequences with specified properties by controlling the input noise sampling for our model.
Collapse
Affiliation(s)
- Haoqing Yu
- School of Software, Shandong University, Jinan 250101, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250101, China
| | - Ruheng Wang
- School of Software, Shandong University, Jinan 250101, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250101, China
| | - Jianbo Qiao
- School of Software, Shandong University, Jinan 250101, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250101, China
| | - Leyi Wei
- School of Software, Shandong University, Jinan 250101, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250101, China
| |
Collapse
|
35
|
Meng C, Yuan Y, Zhao H, Pei Y, Li Z. IIFS: An improved incremental feature selection method for protein sequence processing. Comput Biol Med 2023; 167:107654. [PMID: 37944304 DOI: 10.1016/j.compbiomed.2023.107654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 10/09/2023] [Accepted: 10/31/2023] [Indexed: 11/12/2023]
Abstract
MOTIVATION Discrete features can be obtained from protein sequences using a feature extraction method. These features are the basis of downstream processing of protein data, but it is necessary to screen and select some important features from them as they generally have data redundancy. RESULT Here, we report IIFS, an improved incremental feature selection method that exploits a new subset search strategy to find the optimal feature set. IIFS combines nonadjacent sorting features to prevent the drawbacks of data explosion and excessive reliance on feature sorting results. The comparative experimental results on 27 feature sorting data show that IIFS can find more accurate and important features compared to existing methods.The IIFS approach also handles data redundancy more efficiently and finds more representative and discriminatory features while ensuring minimal feature dimensionality and good evaluation metrics. Moreover, we wrap this method and deploy it on a web server for access at http://112.124.26.17:8005/.
Collapse
Affiliation(s)
- Chaolu Meng
- College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot, China; Inner Mongolia Autonomous Region Key Laboratory of Big Data Research and Application of Agriculture and Animal Husbandry, China
| | - Ye Yuan
- Beidahuang Industry Group General Hospital, Harbin, 150001, China
| | - Haiyan Zhao
- College of Integration of Traditional Chinese and Western Medicine to Southwest Medical University, Luzhou, Sichuan, 646000, China
| | - Yue Pei
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, 100190, China
| | - Zhi Li
- Department of Spleen and Stomach Diseases, The Affiliated Traditional Chinese Medicine Hospital of Southwest Medical University, Luzhou, Sichuan, 646000, China.
| |
Collapse
|
36
|
Ding Y, Zhou H, Zou Q, Yuan L. Identification of drug-side effect association via correntropy-loss based matrix factorization with neural tangent kernel. Methods 2023; 219:73-81. [PMID: 37783242 DOI: 10.1016/j.ymeth.2023.09.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 09/18/2023] [Accepted: 09/20/2023] [Indexed: 10/04/2023] Open
Abstract
Adverse drug reactions include side effects, allergic reactions, and secondary infections. Severe adverse reactions can cause cancer, deformity, or mutation. The monitoring of drug side effects is an important support for post marketing safety supervision of drugs, and an important basis for revising drug instructions. Its purpose is to timely detect and control drug safety risks. Traditional methods are time-consuming. To accelerate the discovery of side effects, we propose a machine learning based method, called correntropy-loss based matrix factorization with neural tangent kernel (CLMF-NTK), to solve the prediction of drug side effects. Our method and other computational methods are tested on three benchmark datasets, and the results show that our method achieves the best predictive performance.
Collapse
Affiliation(s)
- Yijie Ding
- Key Laboratory of Computational Science and Application of Hainan Province, Hainan Normal University, Haikou 571158, China; Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China; School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Hongmei Zhou
- Beidahuang Industry Group General Hospital, Harbin 150001, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China.
| | - Lei Yuan
- Department of Hepatobiliary Surgery, Quzhou People's Hospital, 100# Minjiang Main Road, Quzhou 324000, China.
| |
Collapse
|
37
|
Charoenkwan P, Kongsompong S, Schaduangrat N, Chumnanpuen P, Shoombuatong W. TIPred: a novel stacked ensemble approach for the accelerated discovery of tyrosinase inhibitory peptides. BMC Bioinformatics 2023; 24:356. [PMID: 37735626 PMCID: PMC10512532 DOI: 10.1186/s12859-023-05463-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Accepted: 09/01/2023] [Indexed: 09/23/2023] Open
Abstract
BACKGROUND Tyrosinase is an enzyme involved in melanin production in the skin. Several hyperpigmentation disorders involve the overproduction of melanin and instability of tyrosinase activity resulting in darker, discolored patches on the skin. Therefore, discovering tyrosinase inhibitory peptides (TIPs) is of great significance for basic research and clinical treatments. However, the identification of TIPs using experimental methods is generally cost-ineffective and time-consuming. RESULTS Herein, a stacked ensemble learning approach, called TIPred, is proposed for the accurate and quick identification of TIPs by using sequence information. TIPred explored a comprehensive set of various baseline models derived from well-known machine learning (ML) algorithms and heterogeneous feature encoding schemes from multiple perspectives, such as chemical structure properties, physicochemical properties, and composition information. Subsequently, 130 baseline models were trained and optimized to create new probabilistic features. Finally, the feature selection approach was utilized to determine the optimal feature vector for developing TIPred. Both tenfold cross-validation and independent test methods were employed to assess the predictive capability of TIPred by using the stacking strategy. Experimental results showed that TIPred significantly outperformed the state-of-the-art method in terms of the independent test, with an accuracy of 0.923, MCC of 0.757 and an AUC of 0.977. CONCLUSIONS The proposed TIPred approach could be a valuable tool for rapidly discovering novel TIPs and effectively identifying potential TIP candidates for follow-up experimental validation. Moreover, an online webserver of TIPred is publicly available at http://pmlabstack.pythonanywhere.com/TIPred .
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, 50200, Thailand
| | - Sasikarn Kongsompong
- Interdisciplinary Graduate Program in Bioscience, Faculty of Science, Kasetsart University, Bangkok, 10900, Thailand
| | - Nalini Schaduangrat
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Pramote Chumnanpuen
- Department of Zoology, Faculty of Science, Kasetsart University, Bangkok, 10900, Thailand.
- Omics Center for Agriculture, Bioresources, Food, and Health, Kasetsart University (OmiKU), Bangkok, 10900, Thailand.
| | - Watshara Shoombuatong
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| |
Collapse
|
38
|
Momanyi BM, Zulfiqar H, Grace-Mercure BK, Ahmed Z, Ding H, Gao H, Liu F. CFNCM: Collaborative filtering neighborhood-based model for predicting miRNA-disease associations. Comput Biol Med 2023; 163:107165. [PMID: 37315383 DOI: 10.1016/j.compbiomed.2023.107165] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 05/31/2023] [Accepted: 06/08/2023] [Indexed: 06/16/2023]
Abstract
MicroRNAs have a significant role in the emergence of various human disorders. Consequently, it is essential to understand the existing interactions between miRNAs and diseases, as this will help scientists better study and comprehend the diseases' biological mechanisms. Findings can be employed as biomarkers or drug targets to advance the detection, diagnosis, and treatment of complex human disorders by foretelling possible disease-related miRNAs. This study proposed a computational model for predicting potential miRNA-disease associations called the Collaborative Filtering Neighborhood-based Classification Model (CFNCM), in light of the shortcomings of conventional and biological experiments, which are expensive and time-consuming. The model generated integrated miRNA and disease similarity matrices using the validated associations and miRNA and disease similarity information and used them as the input features for CFNCM. To produce class labels, we first determined the association scores for brand-new pairs using user-based collaborative filtering. With zero as the threshold, the associations with scores >0 were labelled 1, indicating a potential positive association, otherwise, it is marked as 0. Then, we developed classification models using various machine-learning algorithms. By comparison, we discovered that the support vector machine (SVM) produced the best AUC of 0.96 with 10-fold cross-validation through the GridSearchCV technique for identifying optimal parameter values. In addition, the models were evaluated and verified by analyzing the top 50 breast and lung neoplasms-related miRNAs, of which 46 and 47 associations were verified in two authoritative databases, dbDEMC and miR2Disease.
Collapse
Affiliation(s)
- Biffon Manyura Momanyi
- School of Computer Science and Engineering, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hasan Zulfiqar
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China; Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, Zhejiang, 313001, China
| | - Bakanina Kissanga Grace-Mercure
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Zahoor Ahmed
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China; Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, Zhejiang, 313001, China
| | - Hui Ding
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China.
| | - Hui Gao
- School of Computer Science and Engineering, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.
| | - Fen Liu
- Department of Radiation Oncology, Peking University Cancer Hospital (Inner Mongolia Campus), Affiliated Cancer Hospital of Inner Mongolia Medical University, Inner Mongolia Cancer Hospital, Hohhot, China.
| |
Collapse
|
39
|
Teng S, Yin C, Wang Y, Chen X, Yan Z, Cui L, Wei L. MolFPG: Multi-level fingerprint-based Graph Transformer for accurate and robust drug toxicity prediction. Comput Biol Med 2023; 164:106904. [PMID: 37453376 DOI: 10.1016/j.compbiomed.2023.106904] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 03/20/2023] [Accepted: 04/10/2023] [Indexed: 07/18/2023]
Abstract
Drug toxicity prediction is essential to drug development, which can help screen compounds with potential toxicity and reduce the cost and risk of animal experiments and clinical trials. However, traditional handcrafted feature-based and molecular-graph-based approaches are insufficient for molecular representation learning. To address the problem, we developed an innovative molecular fingerprint Graph Transformer framework (MolFPG) with a global-aware module for interpretable toxicity prediction. Our approach encodes compounds using multiple molecular fingerprinting techniques and integrates Graph Transformer-based molecular representation for feature learning and toxic prediction. Experimental results show that our proposed approach has high accuracy and reliability in predicting drug toxicity. In addition, we explored the relationship between drug features and toxicity through an interpretive analysis approach, which improved the interpretability of the approach. Our results highlight the potential of Graph Transformers and multi-level fingerprints for accelerating the drug discovery process by reliably, effectively alarming drug safety. We believe that our study will provide vital support and reference for further development in the field of drug development and toxicity assessment.
Collapse
Affiliation(s)
- Saisai Teng
- School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| | - Chenglin Yin
- School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| | - Yu Wang
- School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| | | | - Zhongmin Yan
- School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China.
| | - Lizhen Cui
- School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China.
| | - Leyi Wei
- School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China.
| |
Collapse
|
40
|
Meng C, Pei Y, Zou Q, Yuan L. DP-AOP: A novel SVM-based antioxidant proteins identifier. Int J Biol Macromol 2023; 247:125499. [PMID: 37414318 DOI: 10.1016/j.ijbiomac.2023.125499] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 06/01/2023] [Accepted: 06/19/2023] [Indexed: 07/08/2023]
Abstract
The identification of antioxidant proteins is a challenging yet meaningful task, as they can protect against the damage caused by some free radicals. In addition to time-consuming, laborious, and expensive experimental identification methods, efficient identification of antioxidant proteins through machine learning algorithms has become increasingly common. In recent years, researchers have proposed models for identifying antioxidant proteins; unfortunately, although the accuracy of models is already high, their sensitivity is too low, indicating the possibility of overfitting in the model. Therefore, we developed a new model called DP-AOP for the recognition of antioxidant proteins. We used the SMOTE algorithm to balance the dataset, selected Wei's proposed feature extraction algorithm to obtain 473 dimensional feature vectors, and based on the sorting function in MRMD, scored and ranked each feature to obtain a feature set with contribution values ranging from high to low. To effectively reduce the feature dimension, we combined the dynamic programming idea to make the local eight features the optimal subset. After obtaining the 36 dimensional feature vectors, we finally selected 17 features through experimental analysis. The SVM classification algorithm was used to implement the model through the libsvm tool. The model achieved satisfactory performance, with an accuracy rate of 91.076 %, SN of 96.4 %, SP of 85.8 %, MCC of 82.6 %, and F1 core of 91.5 %. Furthermore, we built a free web server to facilitate researchers' subsequent unfolding studies of antioxidant protein recognition. The website is http://112.124.26.17:8003/#/.
Collapse
Affiliation(s)
- Chaolu Meng
- College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot, China; Inner Mongolia Autonomous Region Key Laboratory of Big Data Research and Application of Agriculture and Animal Husbandry, China.
| | - Yue Pei
- College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, China.
| | - Lei Yuan
- Department of Hepatobiliary Surgery, Quzhou People's Hospital, China.
| |
Collapse
|
41
|
Phan LT, Oh C, He T, Manavalan B. A comprehensive revisit of the machine-learning tools developed for the identification of enhancers in the human genome. Proteomics 2023; 23:e2200409. [PMID: 37021401 DOI: 10.1002/pmic.202200409] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 03/18/2023] [Accepted: 03/27/2023] [Indexed: 04/07/2023]
Abstract
Enhancers are non-coding DNA elements that play a crucial role in enhancing the transcription rate of a specific gene in the genome. Experiments for identifying enhancers can be restricted by their conditions and involve complicated, time-consuming, laborious, and costly steps. To overcome these challenges, computational platforms have been developed to complement experimental methods that enable high-throughput identification of enhancers. Over the last few years, the development of various enhancer computational tools has resulted in significant progress in predicting putative enhancers. Thus, researchers are now able to use a variety of strategies to enhance and advance enhancer study. In this review, an overview of machine learning (ML)-based prediction methods for enhancer identification and related databases has been provided. The existing enhancer-prediction methods have also been reviewed regarding their algorithms, feature selection processes, validation techniques, and software utility. In addition, the advantages and drawbacks of these ML approaches and guidelines for developing bioinformatic tools have been highlighted for a more efficient enhancer prediction. This review will serve as a useful resource for experimentalists in selecting the appropriate ML tool for their study, and for bioinformaticians in developing more accurate and advanced ML-based predictors.
Collapse
Affiliation(s)
- Le Thi Phan
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, Gyeonggi-do, South Korea
| | - Changmin Oh
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, Gyeonggi-do, South Korea
| | - Tao He
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Balachandran Manavalan
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, Gyeonggi-do, South Korea
| |
Collapse
|
42
|
Zulfiqar H, Ahmed Z, Kissanga Grace-Mercure B, Hassan F, Zhang ZY, Liu F. Computational prediction of promotors in Agrobacterium tumefaciens strain C58 by using the machine learning technique. Front Microbiol 2023; 14:1170785. [PMID: 37125199 PMCID: PMC10133480 DOI: 10.3389/fmicb.2023.1170785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Accepted: 03/17/2023] [Indexed: 05/02/2023] Open
Abstract
Promotors are those genomic regions on the upstream of genes, which are bound by RNA polymerase for starting gene transcription. Because it is the most critical element of gene expression, the recognition of promoters is crucial to understand the regulation of gene expression. This study aimed to develop a machine learning-based model to predict promotors in Agrobacterium tumefaciens (A. tumefaciens) strain C58. In the model, promotor sequences were encoded by three different kinds of feature descriptors, namely, accumulated nucleotide frequency, k-mer nucleotide composition, and binary encodings. The obtained features were optimized by using correlation and the mRMR-based algorithm. These optimized features were inputted into a random forest (RF) classifier to discriminate promotor sequences from non-promotor sequences in A. tumefaciens strain C58. The examination of 10-fold cross-validation showed that the proposed model could yield an overall accuracy of 0.837. This model will provide help for the study of promoters in A. tumefaciens C58 strain.
Collapse
Affiliation(s)
- Hasan Zulfiqar
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, China
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Zahoor Ahmed
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, China
| | - Bakanina Kissanga Grace-Mercure
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Farwa Hassan
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Zhao-Yue Zhang
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Fen Liu
- Department of Radiation Oncology, Peking University Cancer Hospital (Inner Mongolia Campus), Affiliated Cancer Hospital of Inner Mongolia Medical University, Inner Mongolia Cancer Hospital, Hohhot, China
| |
Collapse
|
43
|
Jiang Y, Wang R, Feng J, Jin J, Liang S, Li Z, Yu Y, Ma A, Su R, Zou Q, Ma Q, Wei L. Explainable Deep Hypergraph Learning Modeling the Peptide Secondary Structure Prediction. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2023; 10:e2206151. [PMID: 36794291 PMCID: PMC10104664 DOI: 10.1002/advs.202206151] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Revised: 01/20/2023] [Indexed: 06/18/2023]
Abstract
Accurately predicting peptide secondary structures remains a challenging task due to the lack of discriminative information in short peptides. In this study, PHAT is proposed, a deep hypergraph learning framework for the prediction of peptide secondary structures and the exploration of downstream tasks. The framework includes a novel interpretable deep hypergraph multi-head attention network that uses residue-based reasoning for structure prediction. The algorithm can incorporate sequential semantic information from large-scale biological corpus and structural semantic information from multi-scale structural segmentation, leading to better accuracy and interpretability even with extremely short peptides. The interpretable models are able to highlight the reasoning of structural feature representations and the classification of secondary substructures. The importance of secondary structures in peptide tertiary structure reconstruction and downstream functional analysis is further demonstrated, highlighting the versatility of our models. To facilitate the use of the model, an online server is established which is accessible via http://inner.wei-group.net/PHAT/. The work is expected to assist in the design of functional peptides and contribute to the advancement of structural biology research.
Collapse
Affiliation(s)
- Yi Jiang
- School of SoftwareShandong UniversityJinanShandong250101China
- Joint SDU‐NTU Centre for Artificial Intelligence Research (C‐FAIR)Shandong UniversityJinanShandong250101China
| | - Ruheng Wang
- School of SoftwareShandong UniversityJinanShandong250101China
- Joint SDU‐NTU Centre for Artificial Intelligence Research (C‐FAIR)Shandong UniversityJinanShandong250101China
| | - Jiuxin Feng
- School of SoftwareShandong UniversityJinanShandong250101China
- Joint SDU‐NTU Centre for Artificial Intelligence Research (C‐FAIR)Shandong UniversityJinanShandong250101China
| | - Junru Jin
- School of SoftwareShandong UniversityJinanShandong250101China
- Joint SDU‐NTU Centre for Artificial Intelligence Research (C‐FAIR)Shandong UniversityJinanShandong250101China
| | - Sirui Liang
- School of SoftwareShandong UniversityJinanShandong250101China
- Joint SDU‐NTU Centre for Artificial Intelligence Research (C‐FAIR)Shandong UniversityJinanShandong250101China
| | - Zhongshen Li
- School of SoftwareShandong UniversityJinanShandong250101China
- Joint SDU‐NTU Centre for Artificial Intelligence Research (C‐FAIR)Shandong UniversityJinanShandong250101China
| | - Yingying Yu
- School of SoftwareShandong UniversityJinanShandong250101China
- Joint SDU‐NTU Centre for Artificial Intelligence Research (C‐FAIR)Shandong UniversityJinanShandong250101China
| | - Anjun Ma
- Department of Biomedical InformaticsCollege of MedicineThe Ohio State UniversityColumbusOH43210USA
| | - Ran Su
- College of Intelligence and ComputingTianjin UniversityTianjin300350China
| | - Quan Zou
- Institute of Fundamental and Frontier SciencesUniversity of Electronic Science and Technology of ChinaChengduSichuan610054China
| | - Qin Ma
- Department of Biomedical InformaticsCollege of MedicineThe Ohio State UniversityColumbusOH43210USA
| | - Leyi Wei
- School of SoftwareShandong UniversityJinanShandong250101China
- Joint SDU‐NTU Centre for Artificial Intelligence Research (C‐FAIR)Shandong UniversityJinanShandong250101China
| |
Collapse
|
44
|
Yao L, Li W, Zhang Y, Deng J, Pang Y, Huang Y, Chung CR, Yu J, Chiang YC, Lee TY. Accelerating the Discovery of Anticancer Peptides through Deep Forest Architecture with Deep Graphical Representation. Int J Mol Sci 2023; 24:ijms24054328. [PMID: 36901759 PMCID: PMC10001941 DOI: 10.3390/ijms24054328] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2023] [Revised: 02/02/2023] [Accepted: 02/07/2023] [Indexed: 02/24/2023] Open
Abstract
Cancer is one of the leading diseases threatening human life and health worldwide. Peptide-based therapies have attracted much attention in recent years. Therefore, the precise prediction of anticancer peptides (ACPs) is crucial for discovering and designing novel cancer treatments. In this study, we proposed a novel machine learning framework (GRDF) that incorporates deep graphical representation and deep forest architecture for identifying ACPs. Specifically, GRDF extracts graphical features based on the physicochemical properties of peptides and integrates their evolutionary information along with binary profiles for constructing models. Moreover, we employ the deep forest algorithm, which adopts a layer-by-layer cascade architecture similar to deep neural networks, enabling excellent performance on small datasets but without complicated tuning of hyperparameters. The experiment shows GRDF exhibits state-of-the-art performance on two elaborate datasets (Set 1 and Set 2), achieving 77.12% accuracy and 77.54% F1-score on Set 1, as well as 94.10% accuracy and 94.15% F1-score on Set 2, exceeding existing ACP prediction methods. Our models exhibit greater robustness than the baseline algorithms commonly used for other sequence analysis tasks. In addition, GRDF is well-interpretable, enabling researchers to better understand the features of peptide sequences. The promising results demonstrate that GRDF is remarkably effective in identifying ACPs. Therefore, the framework presented in this study could assist researchers in facilitating the discovery of anticancer peptides and contribute to developing novel cancer treatments.
Collapse
Affiliation(s)
- Lantian Yao
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
- School of Science and Engineering, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
| | - Wenshuo Li
- School of Science and Engineering, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
- Warshel Institute for Computational Biology, School of Medicine, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
| | - Yuntian Zhang
- Warshel Institute for Computational Biology, School of Medicine, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
- School of Medicine, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
| | - Junyang Deng
- School of Medicine, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
| | - Yuxuan Pang
- School of Science and Engineering, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
- Warshel Institute for Computational Biology, School of Medicine, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
| | - Yixian Huang
- Warshel Institute for Computational Biology, School of Medicine, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
- School of Medicine, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
| | - Chia-Ru Chung
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
| | - Jinhan Yu
- Warshel Institute for Computational Biology, School of Medicine, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
- School of Medicine, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
| | - Ying-Chih Chiang
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
- Correspondence: (Y.-C.C.); (T.-Y.L.)
| | - Tzong-Yi Lee
- Warshel Institute for Computational Biology, School of Medicine, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
- Correspondence: (Y.-C.C.); (T.-Y.L.)
| |
Collapse
|
45
|
CSM-Toxin: A Web-Server for Predicting Protein Toxicity. Pharmaceutics 2023; 15:pharmaceutics15020431. [PMID: 36839752 PMCID: PMC9966851 DOI: 10.3390/pharmaceutics15020431] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Revised: 01/17/2023] [Accepted: 01/18/2023] [Indexed: 01/31/2023] Open
Abstract
Biologics are one of the most rapidly expanding classes of therapeutics, but can be associated with a range of toxic properties. In small-molecule drug development, early identification of potential toxicity led to a significant reduction in clinical trial failures, however we currently lack robust qualitative rules or predictive tools for peptide- and protein-based biologics. To address this, we have manually curated the largest set of high-quality experimental data on peptide and protein toxicities, and developed CSM-Toxin, a novel in-silico protein toxicity classifier, which relies solely on the protein primary sequence. Our approach encodes the protein sequence information using a deep learning natural languages model to understand "biological" language, where residues are treated as words and protein sequences as sentences. The CSM-Toxin was able to accurately identify peptides and proteins with potential toxicity, achieving an MCC of up to 0.66 across both cross-validation and multiple non-redundant blind tests, outperforming other methods and highlighting the robust and generalisable performance of our model. We strongly believe the CSM-Toxin will serve as a valuable platform to minimise potential toxicity in the biologic development pipeline. Our method is freely available as an easy-to-use webserver.
Collapse
|
46
|
Yan K, Lv H, Guo Y, Peng W, Liu B. sAMPpred-GAT: prediction of antimicrobial peptide by graph attention network and predicted peptide structure. Bioinformatics 2023; 39:btac715. [PMID: 36342186 PMCID: PMC9805557 DOI: 10.1093/bioinformatics/btac715] [Citation(s) in RCA: 87] [Impact Index Per Article: 43.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Revised: 10/24/2022] [Accepted: 11/04/2022] [Indexed: 11/09/2022] Open
Abstract
MOTIVATION Antimicrobial peptides (AMPs) are essential components of therapeutic peptides for innate immunity. Researchers have developed several computational methods to predict the potential AMPs from many candidate peptides. With the development of artificial intelligent techniques, the protein structures can be accurately predicted, which are useful for protein sequence and function analysis. Unfortunately, the predicted peptide structure information has not been applied to the field of AMP prediction so as to improve the predictive performance. RESULTS In this study, we proposed a computational predictor called sAMPpred-GAT for AMP identification. To the best of our knowledge, sAMPpred-GAT is the first approach based on the predicted peptide structures for AMP prediction. The sAMPpred-GAT predictor constructs the graphs based on the predicted peptide structures, sequence information and evolutionary information. The Graph Attention Network (GAT) is then performed on the graphs to learn the discriminative features. Finally, the full connection networks are utilized as the output module to predict whether the peptides are AMP or not. Experimental results show that sAMPpred-GAT outperforms the other state-of-the-art methods in terms of AUC, and achieves better or highly comparable performance in terms of the other metrics on the eight independent test datasets, demonstrating that the predicted peptide structure information is important for AMP prediction. AVAILABILITY AND IMPLEMENTATION A user-friendly webserver of sAMPpred-GAT can be accessed at http://bliulab.net/sAMPpred-GAT and the source code is available at https://github.com/HongWuL/sAMPpred-GAT/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ke Yan
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Hongwu Lv
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Yichen Guo
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Wei Peng
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
- Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing 100081, China
| |
Collapse
|
47
|
Perveen G, Alturise F, Alkhalifah T, Daanial Khan Y. Hemolytic-Pred: A machine learning-based predictor for hemolytic proteins using position and composition-based features. Digit Health 2023; 9:20552076231180739. [PMID: 37434723 PMCID: PMC10331097 DOI: 10.1177/20552076231180739] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Accepted: 05/22/2023] [Indexed: 07/13/2023] Open
Abstract
Objective The objective of this study is to propose a novel in-silico method called Hemolytic-Pred for identifying hemolytic proteins based on their sequences, using statistical moment-based features, along with position-relative and frequency-relative information. Methods Primary sequences were transformed into feature vectors using statistical and position-relative moment-based features. Varying machine learning algorithms were employed for classification. Computational models were rigorously evaluated using four different validation. The Hemolytic-Pred webserver is available for further analysis at http://ec2-54-160-229-10.compute-1.amazonaws.com/. Results XGBoost outperformed the other six classifiers with an accuracy value of 0.99, 0.98, 0.97, and 0.98 for self-consistency test, 10-fold cross-validation, Jackknife test, and independent set test, respectively. The proposed method with the XGBoost classifier is a workable and robust solution for predicting hemolytic proteins efficiently and accurately. Conclusions The proposed method of Hemolytic-Pred with XGBoost classifier is a reliable tool for the timely identification of hemolytic cells and diagnosis of various related severe disorders. The application of Hemolytic-Pred can yield profound benefits in the medical field.
Collapse
Affiliation(s)
- Gulnaz Perveen
- Department of Computer Science, School
of Systems and Technology, University of Management and Technology, Lahore, Punjab,
Pakistan
| | - Fahad Alturise
- Department of Computer, College of
Science and Arts in Ar Rass Qassim University, Buraidah, Qassim, Saudi Arabia
| | - Tamim Alkhalifah
- Department of Computer, College of
Science and Arts in Ar Rass Qassim University, Buraidah, Qassim, Saudi Arabia
| | - Yaser Daanial Khan
- Department of Computer Science, School
of Systems and Technology, University of Management and Technology, Lahore, Punjab,
Pakistan
| |
Collapse
|
48
|
Shi H, Li Y, Chen Y, Qin Y, Tang Y, Zhou X, Zhang Y, Wu Y. ToxMVA: An end-to-end multi-view deep autoencoder method for protein toxicity prediction. Comput Biol Med 2022; 151:106322. [PMID: 36435057 DOI: 10.1016/j.compbiomed.2022.106322] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Revised: 11/03/2022] [Accepted: 11/14/2022] [Indexed: 11/18/2022]
Abstract
Effectively predicting protein toxicity plays an essential step in the early stage of protein-based drug discovery, which is of great help to speed up novel drug screening and reduce costs. Recently, several relevant datasets have been designed, and then machine learning-based methods have been proposed to predict the toxicity of the protein and have shown satisfactory performance. However, previous studies generally directly concatenate different protein features, which may introduce irrelevant information and decrease model performance. In this study, we present a novel end-to-end deep learning-based method called ToxMVA, to predict protein toxicity. To be specific, we first build comprehensive feature profiles of proteins based on primary sequences, including sequential, physicochemical, and contextual semantic information. Next, an autoencoder network is introduced to integrate the multi-view information for obtaining a more concise and accurate feature representation. Extensive experimental results on three datasets demonstrate that ToxMVA has superior performance for protein toxicity prediction and shows better robustness among three different datasets.
Collapse
Affiliation(s)
- Hua Shi
- School of Opto-electronic and Communication Engineering, Xiamen University of Technology, Xiamen, 361024, Fujian, China
| | - Yan Li
- School of Opto-electronic and Communication Engineering, Xiamen University of Technology, Xiamen, 361024, Fujian, China
| | - Yi Chen
- School of Opto-electronic and Communication Engineering, Xiamen University of Technology, Xiamen, 361024, Fujian, China
| | - Yuming Qin
- Anesthesiology Department, The Affiliated Hospital of Southwest Medical University, Luzhou, 646000, Sichuan, China
| | - Yifan Tang
- Anesthesiology Department, The Affiliated Hospital of Southwest Medical University, Luzhou, 646000, Sichuan, China
| | - Xun Zhou
- Beidahuang Industry Group General Hospital, Harbin, China.
| | - Ying Zhang
- Anesthesiology Department, The Affiliated Hospital of Southwest Medical University, Luzhou, 646000, Sichuan, China.
| | - Yun Wu
- College of Computer and Information Engineering, Xiamen University of Technology, Xiamen, 361024, Fujian, China.
| |
Collapse
|
49
|
Zhao Z, Gui J, Yao A, Le NQK, Chua MCH. Improved Prediction Model of Protein and Peptide Toxicity by Integrating Channel Attention into a Convolutional Neural Network and Gated Recurrent Units. ACS OMEGA 2022; 7:40569-40577. [PMID: 36385847 PMCID: PMC9647964 DOI: 10.1021/acsomega.2c05881] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/11/2022] [Accepted: 10/19/2022] [Indexed: 06/16/2023]
Abstract
In recent times, the importance of peptides in the biomedical domain has received increasing concern in terms of their effect on multiple disease treatments. However, before successful large-scale implementation in the industry, accurate identification of peptide toxicity is a vital prerequisite. The existing computational methods have reached good results from toxicity prediction, and we present an improved model based on different deep learning architectures. The modification mainly focuses on two aspects: sequence encoding and variational information bottlenecks. Consequently, one of our modified plans shows an obvious increase in sensitivity, while the rest show good performance meanwhile adding novelty in the peptide toxicity prediction domain. In detail, our best model could achieve an accuracy of 97.38 and 95.03% in protein and peptide toxicity predictions, respectively. The performance was superior to previous predictors on the same datasets.
Collapse
Affiliation(s)
- Zhengyun Zhao
- Institute of Systems
Science, National University of Singapore, 25 Heng Mui Keng Terrace, Singapore 119615, Singapore
| | - Jingyu Gui
- Institute of Systems
Science, National University of Singapore, 25 Heng Mui Keng Terrace, Singapore 119615, Singapore
| | - Anqi Yao
- Institute of Systems
Science, National University of Singapore, 25 Heng Mui Keng Terrace, Singapore 119615, Singapore
| | - Nguyen Quoc Khanh Le
- Professional Master Program in Artificial Intelligence
in Medicine, College of Medicine, Taipei
Medical University, Taipei 106, Taiwan
- Research
Center for Artificial Intelligence in Medicine, Taipei Medical University, Taipei 106, Taiwan
- Translational Imaging Research Center, Taipei Medical University Hospital, Taipei 110, Taiwan
| | - Matthew Chin Heng Chua
- Institute of Systems
Science, National University of Singapore, 25 Heng Mui Keng Terrace, Singapore 119615, Singapore
| |
Collapse
|
50
|
Chen S, Li Q, Zhao J, Bin Y, Zheng C. NeuroPred-CLQ: incorporating deep temporal convolutional networks and multi-head attention mechanism to predict neuropeptides. Brief Bioinform 2022; 23:6672901. [DOI: 10.1093/bib/bbac319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 06/27/2022] [Accepted: 07/14/2022] [Indexed: 11/13/2022] Open
Abstract
Abstract
Neuropeptides (NPs) are a particular class of informative substances in the immune system and physiological regulation. They play a crucial role in regulating physiological functions in various biological growth and developmental stages. In addition, NPs are crucial for developing new drugs for the treatment of neurological diseases. With the development of molecular biology techniques, some data-driven tools have emerged to predict NPs. However, it is necessary to improve the predictive performance of these tools for NPs. In this study, we developed a deep learning model (NeuroPred-CLQ) based on the temporal convolutional network (TCN) and multi-head attention mechanism to identify NPs effectively and translate the internal relationships of peptide sequences into numerical features by the Word2vec algorithm. The experimental results show that NeuroPred-CLQ learns data information effectively, achieving 93.6% accuracy and 98.8% AUC on the independent test set. The model has better performance in identifying NPs than the state-of-the-art predictors. Visualization of features using t-distribution random neighbor embedding shows that the NeuroPred-CLQ can clearly distinguish the positive NPs from the negative ones. We believe the NeuroPred-CLQ can facilitate drug development and clinical trial studies to treat neurological disorders.
Collapse
Affiliation(s)
- Shouzhi Chen
- School of Mathematics and System Science, Xinjiang University , Urumqi, China
| | - Qing Li
- School of Mathematics and System Science, Xinjiang University , Urumqi, China
| | - Jianping Zhao
- School of Mathematics and System Science, Xinjiang University , Urumqi, China
| | - Yannan Bin
- School of Computer Science and Technology, Anhui University , Hefei, China
| | - Chunhou Zheng
- School of Mathematics and System Science, Xinjiang University , Urumqi, China
- School of Computer Science and Technology, Anhui University , Hefei, China
| |
Collapse
|