1
|
Cui Y, Cui Y, Ding Y, Nakai K, Wei L, Le Y, Ye X, Sakurai T. OmniClust: A versatile clustering toolkit for single-cell and spatial transcriptomics data. Methods 2025; 238:84-94. [PMID: 40057293 DOI: 10.1016/j.ymeth.2025.03.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2025] [Revised: 02/24/2025] [Accepted: 03/05/2025] [Indexed: 03/22/2025] Open
Abstract
In recent years, RNA transcriptome sequencing technology has been continuously evolving, ranging from single-cell transcriptomics to spatial transcriptomics. Although these technologies are all based on RNA sequencing, each sequencing technology has its own unique characteristics, and there is an urgent need to develop an algorithmic toolkit that integrates both sequencing techniques. To address this, we have developed OmniClust, a toolkit based on single-cell RNA sequencing (scRNA-seq) and spatial transcriptomics data. OmniClust employs deep learning algorithms for feature learning and clustering of spatial transcriptomics data, while utilizing machine learning algorithms for clustering scRNA-seq data. OmniClust was tested on 12 spatial transcriptomics benchmark datasets, demonstrating high clustering accuracy across multiple clustering evaluation metrics. It was also evaluated on four scRNA-seq benchmark datasets, achieving high clustering accuracy based on various clustering evaluation metrics. Furthermore, we applied OmniClust to downstream analyses of spatial transcriptomics and single-cell RNA breast cancer data, showcasing its potential to uncover and interpret the biological significance of cancer transcriptome data. In summary, OmniClust is a clustering tool designed for both single-cell transcriptomics and spatial transcriptomics data, demonstrating outstanding performance.
Collapse
Affiliation(s)
- Yaxuan Cui
- Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan
| | - Yang Cui
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Tokyo, Japan
| | - Yi Ding
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Tokyo, Japan
| | - Kenta Nakai
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Tokyo, Japan; Human Genome Center, The Institute of Medical Science, The University of Tokyo, Tokyo 108-8639, Japan
| | - Leyi Wei
- Centre for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Science, Macao Polytechnic University, Macao SAR, China
| | - Yuyin Le
- Department of Radiation Oncology Fuzhou Pulmonary Hospital of Fujian Province , Teaching Hospital of Fujian Medical University, China.
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan.
| | - Tetsuya Sakurai
- Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan
| |
Collapse
|
2
|
Chuang CC, Liu YC, Jhang WE, Wei SS, Ou YY. RAG_MCNNIL6: A Retrieval-Augmented Multi-Window Convolutional Network for Accurate Prediction of IL-6 Inducing Epitopes. J Chem Inf Model 2025; 65:2685-2694. [PMID: 39967508 PMCID: PMC11898070 DOI: 10.1021/acs.jcim.4c02144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2024] [Revised: 01/20/2025] [Accepted: 02/11/2025] [Indexed: 02/20/2025]
Abstract
Interleukin-6 (IL-6) is a critical cytokine involved in immune regulation, inflammation, and the pathogenesis of various diseases, including autoimmune disorders, cancer, and the cytokine storm associated with severe COVID-19. Identifying IL-6 inducing epitopes, the short peptide fragments that trigger IL-6 production, is crucial for developing epitope-based vaccines and immunotherapies. However, traditional methods for epitope prediction often lack accuracy and efficiency. This study presents RAG_MCNNIL6, a novel deep learning framework that integrates Retrieval-augmented generation (RAG) with multiwindow convolutional neural networks (MCNNs) for accurate and rapid prediction of IL-6 inducing epitopes. RAG_MCNNIL6 leverages ProtTrans, a state-of-the-art pretrained protein language model, to generate rich embedding representations of peptide sequences. By incorporating a RAG-based similarity retrieval and embedding augmentation strategy, RAG_MCNNIL6 effectively captures both local and global sequence patterns relevant for IL-6 induction, significantly improving prediction performance compared to existing methods. We demonstrate the superior performance of RAG_MCNNIL6 on benchmark data sets, highlighting its potential for advancing research and therapeutic development for IL-6-mediated diseases.
Collapse
Affiliation(s)
- Cheng-Che Chuang
- Department
of Computer Science and Engineering, Yuan
Ze University, Chung-Li 32003, Taiwan
| | - Yu-Chen Liu
- Department
of Computer Science and Engineering, Yuan
Ze University, Chung-Li 32003, Taiwan
| | - Wei-En Jhang
- Department
of Computer Science and Engineering, Yuan
Ze University, Chung-Li 32003, Taiwan
| | - Sin-Siang Wei
- Department
of Computer Science and Engineering, Yuan
Ze University, Chung-Li 32003, Taiwan
| | - Yu-Yen Ou
- Department
of Computer Science and Engineering, Yuan
Ze University, Chung-Li 32003, Taiwan
- Graduate
Program in Biomedical Informatics, Yuan
Ze University, Chung-Li 32003, Taiwan
| |
Collapse
|
3
|
Cao R, Li Q, Wei P, Ding Y, Bin Y, Zheng C. IL-6-Inducing Peptide Prediction Based on 3D Structure and Graph Neural Network. Biomolecules 2025; 15:99. [PMID: 39858493 PMCID: PMC11764147 DOI: 10.3390/biom15010099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2024] [Revised: 12/27/2024] [Accepted: 01/07/2025] [Indexed: 01/27/2025] Open
Abstract
Interleukin-6 (IL-6) is a potent glycoprotein that plays a crucial role in regulating innate and adaptive immunity, as well as metabolism. The expression and release of IL-6 are closely correlated with the severity of various diseases. IL-6-inducing peptides are critical for the development of immunotherapy and diagnostic biomarkers for some diseases. Most existing methods for predicting IL-6-induced peptides use traditional machine learning methods, whose feature selection is based on prior knowledge. In addition, none of these methods take into account the three-dimensional (3D) structure of peptides, which is essential for their functional properties. In this study, we propose a novel IL-6-inducing peptide prediction method called DGIL-6, which integrates 3D structural information with graph neural networks. DGIL-6 represents a peptide sequence as a graph, where each amino acid is treated as a node, and the adjacency matrix, representing the relationships between nodes, is derived from the predicted residue contact graph of the peptide sequence. In addition to commonly used amino acid representations, such as one-hot encoding and position encoding, the pre-trained model ESM-1b is employed to extract amino acid features as node features. In order to simultaneously consider node weights and information updates, a dual-channel method combining Graph Attention Network (GAT) and Graph Convolutional Network (GCN) is adopted. Finally, the extracted features from both channels are merged for the classification of IL-6-inducing peptides. A series of experiments including cross-validation, independent testing, ablation studies, and visualizations demonstrate the effectiveness of the DGIL-6 method.
Collapse
Affiliation(s)
- Ruifen Cao
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, School of Computer Science and Technology, Anhui University, Hefei 230601, China; (R.C.); (Q.L.)
| | - Qiangsheng Li
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, School of Computer Science and Technology, Anhui University, Hefei 230601, China; (R.C.); (Q.L.)
| | - Pijing Wei
- Institutes of Physical Science and Information Technology, Anhui University, Hefei 230601, China;
| | - Yun Ding
- School of Artificial Intelligence, Anhui University, Hefei 230601, China;
| | - Yannan Bin
- Institutes of Physical Science and Information Technology, Anhui University, Hefei 230601, China;
| | - Chunhou Zheng
- School of Artificial Intelligence, Anhui University, Hefei 230601, China;
| |
Collapse
|
4
|
Asediya VS, Anjaria PA, Mathakiya RA, Koringa PG, Nayak JB, Bisht D, Fulmali D, Patel VA, Desai DN. Vaccine development using artificial intelligence and machine learning: A review. Int J Biol Macromol 2024; 282:136643. [PMID: 39426778 DOI: 10.1016/j.ijbiomac.2024.136643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2024] [Revised: 09/30/2024] [Accepted: 10/15/2024] [Indexed: 10/21/2024]
Abstract
The COVID-19 pandemic has underscored the critical importance of effective vaccines, yet their development is a challenging and demanding process. It requires identifying antigens that elicit protective immunity, selecting adjuvants that enhance immunogenicity, and designing delivery systems that ensure optimal efficacy. Artificial intelligence (AI) can facilitate this process by using machine learning methods to analyze large and diverse datasets, suggest novel vaccine candidates, and refine their design and predict their performance. This review explores how AI can be applied to various aspects of vaccine development, such as predicting immune response from protein sequences, discovering adjuvants, optimizing vaccine doses, modeling vaccine supply chains, and predicting protein structures. We also address the challenges and ethical issues that emerge from the use of AI in vaccine development, such as data privacy, algorithmic bias, and health data sensitivity. We contend that AI has immense potential to accelerate vaccine development and respond to future pandemics, but it also requires careful attention to the quality and validity of the data and methods used.
Collapse
Affiliation(s)
| | | | | | | | | | - Deepanker Bisht
- Indian Veterinary Research Institute, Izatnagar, U.P., India
| | | | | | | |
Collapse
|
5
|
Huang J, Wang X, Xia R, Yang D, Liu J, Lv Q, Yu X, Meng J, Chen K, Song B, Wang Y. Domain-knowledge enabled ensemble learning of 5-formylcytosine (f5C) modification sites. Comput Struct Biotechnol J 2024; 23:3175-3185. [PMID: 39253057 PMCID: PMC11381828 DOI: 10.1016/j.csbj.2024.08.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2024] [Revised: 08/07/2024] [Accepted: 08/07/2024] [Indexed: 09/11/2024] Open
Abstract
5-formylcytidine (f5C) is a unique post-transcriptional RNA modification found in mRNA and tRNA at the wobble site, playing a crucial role in mitochondrial protein synthesis and potentially contributing to the regulation of translation. Recent studies have unveiled that the f5C modifications may drive mitochondrial mRNA translation to power cancer metastasis. Accurate identification of f5C sites is essential for further unraveling their molecular functions and regulatory mechanisms, but there are currently no computational methods available for predicting their locations. In this study, we introduce an innovative ensemble approach, successfully enabling the computational recognition of Saccharomyces cerevisiae f5C. We conducted a comprehensive model selection process that involved multiple basic machine learning and deep learning algorithms such as recurrent neural networks, convolutional neural networks and Transformer-based models. Initially trained only on sequence information, these individual models achieved an AUROC ranging from 0.7104 to 0.7492. Through the integration of 32 novel domain-derived genomic features, the performance of individual models has significantly improved to an AUROC between 0.7309 and 0.8076. To further enhance accuracy and robustness, we then constructed the ensembles of these individual models with different combinations. The best performance attained by our ensemble models reached an AUROC of 0.8391. Shapley additive explanations were conducted to explain the significant contributions of genomic features, providing insights into the putative distribution of f5C across various topological regions and potentially paving the way for revealing their functional relevance within distinct genomic contexts. A freely accessible web server that allows real-time analysis of user-uploaded sites can be accessed at: www.rnamd.org/Resf5C-Pred.
Collapse
Affiliation(s)
- Jiaming Huang
- Jiangsu Key Laboratory for Functional Substance of Chinese Medicine, School of Pharmacy, Nanjing University of Chinese Medicine, Nanjing 210023, China
- Department of Biological Sciences, School of Science, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China
| | - Xuan Wang
- Department of Biological Sciences, School of Science, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China
| | - Rong Xia
- Department of Biological Sciences, School of Science, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China
- School of AI and Advanced Computing, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China
| | - Dongqing Yang
- Department of Public Health, School of Medicine, Nanjing University of Chinese Medicine, Nanjing 210023, China
| | - Jian Liu
- Jiangsu Key Laboratory for Functional Substance of Chinese Medicine, School of Pharmacy, Nanjing University of Chinese Medicine, Nanjing 210023, China
| | - Qi Lv
- Jiangsu Key Laboratory for Functional Substance of Chinese Medicine, School of Pharmacy, Nanjing University of Chinese Medicine, Nanjing 210023, China
| | - Xiaoxuan Yu
- Department of Pharmacology, School of Medicine, Nanjing University of Chinese Medicine, Nanjing 210023, China
| | - Jia Meng
- Department of Biological Sciences, School of Science, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China
- AI University Research Centre, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L7 8TX, United Kingdom
| | - Kunqi Chen
- Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou 350004, China
| | - Bowen Song
- Department of Public Health, School of Medicine, Nanjing University of Chinese Medicine, Nanjing 210023, China
| | - Yue Wang
- Jiangsu Key Laboratory for Functional Substance of Chinese Medicine, School of Pharmacy, Nanjing University of Chinese Medicine, Nanjing 210023, China
| |
Collapse
|
6
|
Dhall A, Patiyal S, Raghava GPS. A hybrid method for discovering interferon-gamma inducing peptides in human and mouse. Sci Rep 2024; 14:26859. [PMID: 39501025 PMCID: PMC11538504 DOI: 10.1038/s41598-024-77957-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2024] [Accepted: 10/28/2024] [Indexed: 11/08/2024] Open
Abstract
Interferon-gamma (IFN-γ) is a versatile pleiotropic cytokine essential for both innate and adaptive immune responses. It exhibits both pro-inflammatory and anti-inflammatory properties, making it a promising therapeutic candidate for treating various infectious diseases and cancers. We present IFNepitope2, a host-specific technique to annotate IFN-γ inducing peptides, it is an updated version of IFNepitope introduced by Dhanda et al. In this study, dataset used for developing prediction method contain experimentally validated 25,492 and 7983 IFN-γ inducing peptides in human and mouse host, respectively. In initial phase, machine learning techniques have been exploited to develop classification model using wide range of peptide features. Further, to improve machine learning based models or alignment free models, we explore potential of similarity-based technique BLAST. Finally, a hybrid model has been developed that combine best machine learning based model with BLAST. In most of the case, models based on extra tree perform better than other machine learning techniques. In case of peptide features, compositional feature particularly dipeptide composition performs better than one-hot encoding or binary profile. Our best machine learning based models achieved AUROC 0.89 and 0.83 for human and mouse host, respectively. The hybrid model achieved the AUROC 0.90 and 0.85 for human and mouse host, respectively. All models have been evaluated on an independent/validation dataset not used for training or testing these models. Newly developed method performs better than existing method on independent dataset. The major objective of this study is to predict, design and scan IFN-γ inducing peptides, thus server/software have been developed ( https://webs.iiitd.edu.in/raghava/ifnepitope2/ ). This method is also available as standalone at https://github.com/raghavagps/ifnepitope2 and python package index at https://pypi.org/project/ifnepitope2/ .
Collapse
Affiliation(s)
- Anjali Dhall
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Industrial Estate, Phase III, (Near Govind Puri Metro Station), New Delhi, 110020, India
| | - Sumeet Patiyal
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Industrial Estate, Phase III, (Near Govind Puri Metro Station), New Delhi, 110020, India
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Industrial Estate, Phase III, (Near Govind Puri Metro Station), New Delhi, 110020, India.
| |
Collapse
|
7
|
Zhao Y, Zhang S, Liang Y. HemoFuse: multi-feature fusion based on multi-head cross-attention for identification of hemolytic peptides. Sci Rep 2024; 14:22518. [PMID: 39342017 PMCID: PMC11438874 DOI: 10.1038/s41598-024-74326-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2024] [Accepted: 09/25/2024] [Indexed: 10/01/2024] Open
Abstract
Hemolytic peptides are therapeutic peptides that damage red blood cells. However, therapeutic peptides used in medical treatment must exhibit low toxicity to red blood cells to achieve the desired therapeutic effect. Therefore, accurate prediction of the hemolytic activity of therapeutic peptides is essential for the development of peptide therapies. In this study, a multi-feature cross-fusion model, HemoFuse, for hemolytic peptide identification is proposed. The feature vectors of peptide sequences are transformed by word embedding technique and four hand-crafted feature extraction methods. We apply multi-head cross-attention mechanism to hemolytic peptide identification for the first time. It captures the interaction between word embedding features and hand-crafted features by calculating the attention of all positions in them, so that multiple features can be deeply fused. Moreover, we visualize the features obtained by this module to enhance its interpretability. On the comprehensive integrated dataset, HemoFuse achieves ideal results, with ACC, SP, SN, MCC, F1, AUC, and AP of 0.7575, 0.8814, 0.5793, 0.4909, 0.6620, 0.8387, and 0.7118, respectively. Compared with HemoDL proposed by Yang et al., it is 3.32%, 3.89%, 5.93%, 10.6%, 8.17%, 5.88%, and 2.72% higher. Other ablation experiments also prove that our model is reasonable and efficient. The codes and datasets are accessible at https://github.com/z11code/Hemo .
Collapse
Affiliation(s)
- Ya Zhao
- School of Mathematics and Statistics, Xidian University, Xi'an, 710071, P. R. China
| | - Shengli Zhang
- School of Mathematics and Statistics, Xidian University, Xi'an, 710071, P. R. China.
| | - Yunyun Liang
- School of Science, Xi'an Polytechnic University, Xi'an, 710048, P. R. China
| |
Collapse
|
8
|
Liu Z, Liu F, Wang C, Li H, Xu Y, Sun S. Ratiometric Electrochemical Detection of Interleukin-6 Using Electropolymerized Methylene Blue and a Multi-Walled Carbon-Nanotube-Modified Screen-Printed Carbon Electrode. BIOSENSORS 2024; 14:457. [PMID: 39451670 PMCID: PMC11506342 DOI: 10.3390/bios14100457] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/01/2024] [Revised: 09/19/2024] [Accepted: 09/24/2024] [Indexed: 10/26/2024]
Abstract
Herein, we report a ratio-based electrochemical biosensor for the detection of interleukin-6 (IL-6). We electropolymerized methylene blue (MB) on the surface of screen-printed carbon electrodes; introduced an internal reference signal probe; modified the carboxylate multi-walled carbon nanotubes on the electrode surface to increase the electrochemically active area; and finally linked the amino-modified IL-6 aptamer to the electrode surface through the Schiff base reaction, with bovine serum albumin (BSA) added to mask non-specific adsorption. After adding IL-6 to the samples, the signal of IMB remained almost unchanged, while the signal of I[Fe(CN)6]3-/4- decreased with increasing IL-6 concentration. Thus, a novel ratiometric electrochemical sensor with a linear range of 0.001~1000.0 ng/mL and a low detection limit of 0.54 pg/mL was successfully developed. The sensor had high repeatability, stability, sensitivity, and practicability. It provides a new method for constructing proportional electrochemical sensors and detecting IL-6.
Collapse
Affiliation(s)
- Zhuo Liu
- College of Chemistry & Pharmacy, Northwest A&F University, Xianyang 712100, China; (Z.L.); (C.W.); (H.L.); (Y.X.)
| | - Fengyu Liu
- School of Chemistry, Dalian University of Technology, No. 2 Linggong Road, Ganjingzi District, Dalian 116023, China
| | - Chaofan Wang
- College of Chemistry & Pharmacy, Northwest A&F University, Xianyang 712100, China; (Z.L.); (C.W.); (H.L.); (Y.X.)
| | - Hongjuan Li
- College of Chemistry & Pharmacy, Northwest A&F University, Xianyang 712100, China; (Z.L.); (C.W.); (H.L.); (Y.X.)
| | - Yongqian Xu
- College of Chemistry & Pharmacy, Northwest A&F University, Xianyang 712100, China; (Z.L.); (C.W.); (H.L.); (Y.X.)
| | - Shiguo Sun
- College of Chemistry & Pharmacy, Northwest A&F University, Xianyang 712100, China; (Z.L.); (C.W.); (H.L.); (Y.X.)
- Shenzhen Research Institute, Northwest A&F University, Shenzhen 518000, China
| |
Collapse
|
9
|
Zhang J, Wang R, Wei L. MucLiPred: Multi-Level Contrastive Learning for Predicting Nucleic Acid Binding Residues of Proteins. J Chem Inf Model 2024; 64:1050-1065. [PMID: 38301174 DOI: 10.1021/acs.jcim.3c01471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2024]
Abstract
Protein-molecule interactions play a crucial role in various biological functions, with their accurate prediction being pivotal for drug discovery and design processes. Traditional methods for predicting protein-molecule interactions are limited. Some can only predict interactions with a specific molecule, restricting their applicability, while others target multiple molecule types but fail to efficiently process diverse interaction information, leading to complexity and inefficiency. This study presents a novel deep learning model, MucLiPred, equipped with a dual contrastive learning mechanism aimed at improving the prediction of multiple molecule-protein interactions and the identification of potential molecule-binding residues. The residue-level paradigm focuses on differentiating binding from non-binding residues, illuminating detailed local interactions. The type-level paradigm, meanwhile, analyzes overarching contexts of molecule types, like DNA or RNA, ensuring that representations of identical molecule types gravitate closer in the representational space, bolstering the model's proficiency in discerning interaction motifs. This dual approach enables comprehensive multi-molecule predictions, elucidating the relationships among different molecule types and strengthening precise protein-molecule interaction predictions. Empirical evidence demonstrates MucLiPred's superiority over existing models in robustness and prediction accuracy. The integration of dual contrastive learning techniques amplifies its capability to detect potential molecule-binding residues with precision. Further optimization, separating representational and classification tasks, has markedly improved its performance. MucLiPred thus represents a significant advancement in protein-molecule interaction prediction, setting a new precedent for future research in this field.
Collapse
Affiliation(s)
- Jiashuo Zhang
- School of Software, Shandong University, Jinan 250101, China
| | - Ruheng Wang
- School of Software, Shandong University, Jinan 250101, China
| | - Leyi Wei
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250101, China
- Faculty of Applied Sciences, Macao Polytechnic University, Macao 999078, China
| |
Collapse
|
10
|
Yu H, Wang R, Qiao J, Wei L. Multi-CGAN: Deep Generative Model-Based Multiproperty Antimicrobial Peptide Design. J Chem Inf Model 2024; 64:316-326. [PMID: 38135439 DOI: 10.1021/acs.jcim.3c01881] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2023]
Abstract
Antimicrobial peptides are peptides that are effective against bacteria and viruses, and the discovery of new antimicrobial peptides is of great importance to human life and health. Although the design of antimicrobial peptides using machine learning methods has achieved good results in recent years, it remains a challenge to learn and design novel antimicrobial peptides with multiple properties of interest from peptide data with certain property labels. To this end, we propose Multi-CGAN, a deep generative model-based architecture that can learn from single-attribute peptide data and generate antimicrobial peptide sequences with multiple attributes that we need, which may have a potentially wide range of uses in drug discovery. In particular, we verified that our Multi-CGAN generated peptides with the desired properties have good performance in terms of generation rate. Moreover, a comprehensive statistical analysis demonstrated that our generated peptides are diverse and have a low probability of being homologous to the training data. Interestingly, we found that the performance of many popular deep learning methods on the antimicrobial peptide prediction task can be improved by using Multi-CGAN to expand the data on the training set of the original task, indicating the high quality of our generated peptides and the robust ability of our method. In addition, we also investigated whether it is possible to directionally generate peptide sequences with specified properties by controlling the input noise sampling for our model.
Collapse
Affiliation(s)
- Haoqing Yu
- School of Software, Shandong University, Jinan 250101, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250101, China
| | - Ruheng Wang
- School of Software, Shandong University, Jinan 250101, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250101, China
| | - Jianbo Qiao
- School of Software, Shandong University, Jinan 250101, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250101, China
| | - Leyi Wei
- School of Software, Shandong University, Jinan 250101, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250101, China
| |
Collapse
|