1
|
Guan J, Yao L, Xie P, Chung CR, Huang Y, Chiang YC, Lee TY. A two-stage computational framework for identifying antiviral peptides and their functional types based on contrastive learning and multi-feature fusion strategy. Brief Bioinform 2024; 25:bbae208. [PMID: 38706321 PMCID: PMC11070730 DOI: 10.1093/bib/bbae208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Revised: 03/14/2024] [Accepted: 04/17/2024] [Indexed: 05/07/2024] Open
Abstract
Antiviral peptides (AVPs) have shown potential in inhibiting viral attachment, preventing viral fusion with host cells and disrupting viral replication due to their unique action mechanisms. They have now become a broad-spectrum, promising antiviral therapy. However, identifying effective AVPs is traditionally slow and costly. This study proposed a new two-stage computational framework for AVP identification. The first stage identifies AVPs from a wide range of peptides, and the second stage recognizes AVPs targeting specific families or viruses. This method integrates contrastive learning and multi-feature fusion strategy, focusing on sequence information and peptide characteristics, significantly enhancing predictive ability and interpretability. The evaluation results of the model show excellent performance, with accuracy of 0.9240 and Matthews correlation coefficient (MCC) score of 0.8482 on the non-AVP independent dataset, and accuracy of 0.9934 and MCC score of 0.9869 on the non-AMP independent dataset. Furthermore, our model can predict antiviral activities of AVPs against six key viral families (Coronaviridae, Retroviridae, Herpesviridae, Paramyxoviridae, Orthomyxoviridae, Flaviviridae) and eight viruses (FIV, HCV, HIV, HPIV3, HSV1, INFVA, RSV, SARS-CoV). Finally, to facilitate user accessibility, we built a user-friendly web interface deployed at https://awi.cuhk.edu.cn/∼dbAMP/AVP/.
Collapse
Affiliation(s)
- Jiahui Guan
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, 2001 Longxiang Road, 518172 Shenzhen, China
| | - Lantian Yao
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China
- School of Science and Engineering, The Chinese University of Hong Kong, 2001 Longxiang Road, 518172 Shenzhen, China
| | - Peilin Xie
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, 2001 Longxiang Road, 518172 Shenzhen, China
| | - Chia-Ru Chung
- Department of Computer Science and Information Engineering, National Central University, 320317 Taoyuan, Taiwan
| | - Yixian Huang
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China
| | - Ying-Chih Chiang
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, 2001 Longxiang Road, 518172 Shenzhen, China
| | - Tzong-Yi Lee
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, 300093 Hsinchu, Taiwan
- Center for Intelligent Drug Systems and Smart Bio-devices (IDS2B), National Yang Ming Chiao Tung University, 300093 Hsinchu, Taiwan
| |
Collapse
|
2
|
Jiang J, Pei H, Li J, Li M, Zou Q, Lv Z. FEOpti-ACVP: identification of novel anti-coronavirus peptide sequences based on feature engineering and optimization. Brief Bioinform 2024; 25:bbae037. [PMID: 38366802 PMCID: PMC10939380 DOI: 10.1093/bib/bbae037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 12/27/2023] [Accepted: 01/17/2024] [Indexed: 02/18/2024] Open
Abstract
Anti-coronavirus peptides (ACVPs) represent a relatively novel approach of inhibiting the adsorption and fusion of the virus with human cells. Several peptide-based inhibitors showed promise as potential therapeutic drug candidates. However, identifying such peptides in laboratory experiments is both costly and time consuming. Therefore, there is growing interest in using computational methods to predict ACVPs. Here, we describe a model for the prediction of ACVPs that is based on the combination of feature engineering (FE) optimization and deep representation learning. FEOpti-ACVP was pre-trained using two feature extraction frameworks. At the next step, several machine learning approaches were tested in to construct the final algorithm. The final version of FEOpti-ACVP outperformed existing methods used for ACVPs prediction and it has the potential to become a valuable tool in ACVP drug design. A user-friendly webserver of FEOpti-ACVP can be accessed at http://servers.aibiochem.net/soft/FEOpti-ACVP/.
Collapse
Affiliation(s)
- Jici Jiang
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Hongdi Pei
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Jiayu Li
- College of Life Science, Sichuan University, Chengdu 610065, China
| | - Mingxin Li
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
| | - Zhibin Lv
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| |
Collapse
|
3
|
Xing W, Zhang J, Li C, Huo Y, Dong G. iAMP-Attenpred: a novel antimicrobial peptide predictor based on BERT feature extraction method and CNN-BiLSTM-Attention combination model. Brief Bioinform 2023; 25:bbad443. [PMID: 38055840 PMCID: PMC10699745 DOI: 10.1093/bib/bbad443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 10/31/2023] [Accepted: 11/11/2023] [Indexed: 12/08/2023] Open
Abstract
As a kind of small molecule protein that can fight against various microorganisms in nature, antimicrobial peptides (AMPs) play an indispensable role in maintaining the health of organisms and fortifying defenses against diseases. Nevertheless, experimental approaches for AMP identification still demand substantial allocation of human resources and material inputs. Alternatively, computing approaches can assist researchers effectively and promptly predict AMPs. In this study, we present a novel AMP predictor called iAMP-Attenpred. As far as we know, this is the first work that not only employs the popular BERT model in the field of natural language processing (NLP) for AMPs feature encoding, but also utilizes the idea of combining multiple models to discover AMPs. Firstly, we treat each amino acid from preprocessed AMPs and non-AMP sequences as a word, and then input it into BERT pre-training model for feature extraction. Moreover, the features obtained from BERT method are fed to a composite model composed of one-dimensional CNN, BiLSTM and attention mechanism for better discriminating features. Finally, a flatten layer and various fully connected layers are utilized for the final classification of AMPs. Experimental results reveal that, compared with the existing predictors, our iAMP-Attenpred predictor achieves better performance indicators, such as accuracy, precision and so on. This further demonstrates that using the BERT approach to capture effective feature information of peptide sequences and combining multiple deep learning models are effective and meaningful for predicting AMPs.
Collapse
Affiliation(s)
- Wenxuan Xing
- School of Computer Science and Engineering, Northeastern University, No.195 Chuangxin Road, Hunnan District, Shenyang 110170, China
| | - Jie Zhang
- College of Computer and Information Engineering, Inner Mongolia Agricultural University, No.29 Erdos East Street, Saihan District, Hohhot 010011, China
| | - Chen Li
- School of Computer Science and Engineering, Northeastern University, No.195 Chuangxin Road, Hunnan District, Shenyang 110170, China
| | - Yujia Huo
- College of Computer and Information Engineering, Inner Mongolia Agricultural University, No.29 Erdos East Street, Saihan District, Hohhot 010011, China
| | - Gaifang Dong
- College of Computer and Information Engineering, Inner Mongolia Agricultural University, No.29 Erdos East Street, Saihan District, Hohhot 010011, China
| |
Collapse
|
4
|
Wu J, Qing H, Ouyang J, Zhou J, Gao Z, Mason CE, Liu Z, Shi T. HiFun: homology independent protein function prediction by a novel protein-language self-attention model. Brief Bioinform 2023; 24:bbad311. [PMID: 37649370 DOI: 10.1093/bib/bbad311] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Revised: 07/31/2023] [Accepted: 08/08/2023] [Indexed: 09/01/2023] Open
Abstract
Protein function prediction based on amino acid sequence alone is an extremely challenging but important task, especially in metagenomics/metatranscriptomics field, in which novel proteins have been uncovered exponentially from new microorganisms. Many of them are extremely low homology to known proteins and cannot be annotated with homology-based or information integrative methods. To overcome this problem, we proposed a Homology Independent protein Function annotation method (HiFun) based on a unified deep-learning model by reassembling the sequence as protein language. The robustness of HiFun was evaluated using the benchmark datasets and metrics in the CAFA3 challenge. To navigate the utility of HiFun, we annotated 2 212 663 unknown proteins and discovered novel motifs in the UHGP-50 catalog. We proved that HiFun can extract latent function related structure features which empowers it ability to achieve function annotation for non-homology proteins. HiFun can substantially improve newly proteins annotation and expand our understanding of microorganisms' adaptation in various ecological niches. Moreover, we provided a free and accessible webservice at http://www.unimd.org/HiFun, requiring only protein sequences as input, offering researchers an efficient and practical platform for predicting protein functions.
Collapse
Affiliation(s)
- Jun Wu
- Center for Bioinformatics and Computational Biology, the Institute of Biomedical Sciences and The School of Life Sciences, East China Normal University, Shanghai , 200241, China
| | - Haipeng Qing
- Center for Bioinformatics and Computational Biology, the Institute of Biomedical Sciences and The School of Life Sciences, East China Normal University, Shanghai , 200241, China
| | - Jian Ouyang
- Center for Bioinformatics and Computational Biology, the Institute of Biomedical Sciences and The School of Life Sciences, East China Normal University, Shanghai , 200241, China
| | - Jiajia Zhou
- Center for Bioinformatics and Computational Biology, the Institute of Biomedical Sciences and The School of Life Sciences, East China Normal University, Shanghai , 200241, China
| | - Zihao Gao
- Center for Bioinformatics and Computational Biology, the Institute of Biomedical Sciences and The School of Life Sciences, East China Normal University, Shanghai , 200241, China
| | | | - Zhichao Liu
- Nonclinical Drug Safety, Boehringer Ingelheim Pharmaceuticals, Inc., Ridgefield, Connecticut 06877, United States
| | - Tieliu Shi
- Center for Bioinformatics and Computational Biology, the Institute of Biomedical Sciences and The School of Life Sciences, East China Normal University, Shanghai , 200241, China
- School of Statistics, Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, East China Normal University, Shanghai 200062, China
- Beijing Advanced Innovation Center, for Big Data-Based Precision Medicine, Beihang University & Capital Medical University, Beijing 100083, China
| |
Collapse
|