1
|
Tadros DM, Racle J, Gfeller D. Predicting MHC-I ligands across alleles and species: how far can we go? Genome Med 2025; 17:25. [PMID: 40114147 PMCID: PMC11927126 DOI: 10.1186/s13073-025-01450-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2024] [Accepted: 03/10/2025] [Indexed: 03/22/2025] Open
Abstract
BACKGROUND CD8+ T-cell activation is initiated by the recognition of epitopes presented on class I major histocompatibility complex (MHC-I) molecules. Identifying such epitopes is useful for molecular understanding of cellular immune responses and can guide the development of personalized vaccines for various diseases including cancer. For a few hundred common human and mouse MHC-I alleles, large datasets of ligands are available and machine learning MHC-I ligand predictors trained on such data reach high prediction accuracy. However, for the vast majority of other MHC-I alleles, no ligand is known. METHODS We capitalize on an expanded architecture of our MHC-I ligand predictor (MixMHCpred3.0) to systematically assess the extent to which predictions of MHC-I ligands can be applied to MHC-I alleles that currently lack known ligand data. RESULTS Our results reveal high prediction accuracy for most MHC-I alleles in human and in laboratory mouse strains, but significantly lower accuracy in other species. Our work further outlines some of the molecular determinants of MHC-I ligand prediction accuracy across alleles and species. Robust benchmarking on external data shows that our MHC-I ligand predictor demonstrates competitive performance relative to other state-of-the-art MHC-I ligand predictors and can be used for CD8+ T-cell epitope predictions. CONCLUSIONS Our work provides a valuable tool for predicting antigen presentation across all human and mouse MHC-I alleles. MixMHCpred3.0 tool is available at https://github.com/GfellerLab/MixMHCpred .
Collapse
Affiliation(s)
- Daniel M Tadros
- Department of Oncology, Ludwig Institute for Cancer Research Lausanne, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
- Agora Cancer Research Centre, Lausanne, 1011, Switzerland
- Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland
| | - Julien Racle
- Department of Oncology, Ludwig Institute for Cancer Research Lausanne, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
- Agora Cancer Research Centre, Lausanne, 1011, Switzerland
- Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland
| | - David Gfeller
- Department of Oncology, Ludwig Institute for Cancer Research Lausanne, University of Lausanne, Lausanne, Switzerland.
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland.
- Agora Cancer Research Centre, Lausanne, 1011, Switzerland.
- Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland.
| |
Collapse
|
2
|
Xu L, Yang Q, Dong W, Li X, Wang K, Dong S, Zhang X, Yang T, Luo G, Liao X, Gao X, Wang G. Meta learning for mutant HLA class I epitope immunogenicity prediction to accelerate cancer clinical immunotherapy. Brief Bioinform 2024; 26:bbae625. [PMID: 39656887 DOI: 10.1093/bib/bbae625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 09/18/2024] [Accepted: 11/14/2024] [Indexed: 12/17/2024] Open
Abstract
Accurate prediction of binding between human leukocyte antigen (HLA) class I molecules and antigenic peptide segments is a challenging task and a key bottleneck in personalized immunotherapy for cancer. Although existing prediction tools have demonstrated significant results using established datasets, most can only predict the binding affinity of antigenic peptides to HLA and do not enable the immunogenic interpretation of new antigenic epitopes. This limitation results from the training data for the computational models relying heavily on a large amount of peptide-HLA (pHLA) eluting ligand data, in which most of the candidate epitopes lack immunogenicity. Here, we propose an adaptive immunogenicity prediction model, named MHLAPre, which is trained on the large-scale MS-derived HLA I eluted ligandome (mostly presented by epitopes) that are immunogenic. Allele-specific and pan-allelic prediction models are also provided for endogenous peptide presentation. Using a meta-learning strategy, MHLAPre rapidly assessed HLA class I peptide affinities across the whole pHLA pairs and accurately identified tumor-associated endogenous antigens. During the process of adaptive immune response of T-cells, pHLA-specific binding in the antigen presentation is only a pre-task for CD8+ T-cell recognition. The key factor in activating the immune response is the interaction between pHLA complexes and T-cell receptors (TCRs). Therefore, we performed transfer learning on the pHLA model using the pHLA-TCR dataset. In pHLA binding task, MHLAPre demonstrated significant improvement in identifying neoepitope immunogenicity compared with five state-of-the-art models, proving its effectiveness and robustness. After transfer learning of the pHLA-TCR data, MHLAPre also exhibited relatively superior performance in revealing the mechanism of immunotherapy. MHLAPre is a powerful tool to identify neoepitopes that can interact with TCR and induce immune responses. We believe that the proposed method will greatly contribute to clinical immunotherapy, such as anti-tumor immunity, tumor-specific T-cell engineering, and personalized tumor vaccine.
Collapse
Affiliation(s)
- Long Xu
- School of Computer Science and Technology, Harbin Institute of Technology, West DaZhi Street, 150001 Harbin, China
| | - Qiang Yang
- School of Computer Science and Technology, Harbin Institute of Technology, West DaZhi Street, 150001 Harbin, China
- School of Medicine and Health, Harbin Institute of Technology, Yikuang Street, 150000 Harbin, China
| | - Weihe Dong
- College of Computer and Control Engineering, Northeast Forestry University, Hexing Road, 150040 Harbin, China
| | - Xiaokun Li
- School of Computer Science and Technology, Harbin Institute of Technology, West DaZhi Street, 150001 Harbin, China
- School of Computer Science and Technology, Heilongjiang University, Xuefu Road, 150080 Harbin, China
- Postdoctoral Program of Heilongjiang Hengxun Technology Co., Ltd., Xuefu Road, 150090 Harbin, China
- Shandong Hengxun Technology Co., Ltd., Miaoling Road, 266100 Qingdao, China
| | - Kuanquan Wang
- School of Computer Science and Technology, Harbin Institute of Technology, West DaZhi Street, 150001 Harbin, China
| | - Suyu Dong
- College of Computer and Control Engineering, Northeast Forestry University, Hexing Road, 150040 Harbin, China
| | - Xianyu Zhang
- Department of Breast Surgery, Harbin Medical University Cancer Hospital, Haping Road, 150081 Harbin, China
| | - Tiansong Yang
- Department of Rehabilitation, The First Affiliated Hospital of Heilongjiang University of Traditional Chinese Medicine, Xuefu Road, 150040 Harbin, China
| | - Gongning Luo
- School of Computer Science and Technology, Harbin Institute of Technology, West DaZhi Street, 150001 Harbin, China
- Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, Thuwal 23955, 4700 KAUST Saudi, Arabia
| | - Xingyu Liao
- School of Computer Science, Northwestern Polytechnical University, 710072 Xian, China
| | - Xin Gao
- Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, Thuwal 23955, 4700 KAUST Saudi, Arabia
| | - Guohua Wang
- School of Computer Science and Technology, Harbin Institute of Technology, West DaZhi Street, 150001 Harbin, China
- College of Computer and Control Engineering, Northeast Forestry University, Hexing Road, 150040 Harbin, China
| |
Collapse
|
3
|
Niu R, Wang J, Li Y, Zhou J, Guo Y, Shang X. Attention-aware differential learning for predicting peptide-MHC class I binding and T cell receptor recognition. Brief Bioinform 2024; 26:bbaf038. [PMID: 39883517 PMCID: PMC11781218 DOI: 10.1093/bib/bbaf038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Revised: 01/05/2025] [Accepted: 01/15/2025] [Indexed: 01/31/2025] Open
Abstract
The identification of neoantigens is crucial for advancing vaccines, diagnostics, and immunotherapies. Despite this importance, a fundamental question remains: how to model the presentation of neoantigens by major histocompatibility complex class I molecules and the recognition of the peptide-MHC-I (pMHC-I) complex by T cell receptors (TCRs). Accurate prediction of pMHC-I binding and TCR recognition remains a significant computational challenge in immunology due to intricate binding motifs and the long-tail distribution of known binding pairs in public databases. Here, we propose an attention-aware framework comprising TranspMHC for pMHC-I binding prediction and TransTCR for TCR-pMHC-I recognition prediction. Leveraging the attention mechanism, TranspMHC surpasses existing algorithms on independent datasets at both pan-specific and allele-specific levels. For TCR-pMHC-I recognition, TransTCR incorporates transfer learning and a differential learning strategy, demonstrating superior performance and enhanced generalization on independent datasets compared to existing methods. Furthermore, we identify key amino acids associated with binding motifs of peptides and TCRs that facilitate pMHC-I and TCR-pMHC-I binding, indicating the potential interpretability of our proposed framework.
Collapse
Affiliation(s)
- Rui Niu
- School of Computer Science, Northwestern Polytechnical University, Xi’an, 710129 Shaanxi, China
| | - Jingwei Wang
- School of Computer Science, Northwestern Polytechnical University, Xi’an, 710129 Shaanxi, China
| | - Yanli Li
- John Curtin School of Medical Research, The Australian National University, Canberra, ACT 2600, Australia
| | - Jiren Zhou
- School of Computer Science, Northwestern Polytechnical University, Xi’an, 710129 Shaanxi, China
- John Curtin School of Medical Research, The Australian National University, Canberra, ACT 2600, Australia
| | - Yang Guo
- School of Information Science and Engineering, Lanzhou University, Lanzhou, 730000 Gansu, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi’an, 710129 Shaanxi, China
| |
Collapse
|
4
|
Kushwaha A, Duroux P, Giudicelli V, Todorov K, Kossida S. IMGT/RobustpMHC: robust training for class-I MHC peptide binding prediction. Brief Bioinform 2024; 25:bbae552. [PMID: 39504482 PMCID: PMC11540059 DOI: 10.1093/bib/bbae552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Revised: 08/24/2024] [Accepted: 10/15/2024] [Indexed: 11/08/2024] Open
Abstract
The accurate prediction of peptide-major histocompatibility complex (MHC) class I binding probabilities is a critical endeavor in immunoinformatics, with broad implications for vaccine development and immunotherapies. While recent deep neural network based approaches have showcased promise in peptide-MHC (pMHC) prediction, they have two shortcomings: (i) they rely on hand-crafted pseudo-sequence extraction, (ii) they do not generalize well to different datasets, which limits the practicality of these approaches. While existing methods rely on a 34 amino acid pseudo-sequence, our findings uncover the involvement of 147 positions in direct interactions between MHC and peptide. We further show that neural architectures can learn the intricacies of pMHC binding using even full sequences. To this end, we present PerceiverpMHC that is able to learn accurate representations on full-sequences by leveraging efficient transformer based architectures. Additionally, we propose IMGT/RobustpMHC that harnesses the potential of unlabeled data in improving the robustness of pMHC binding predictions through a self-supervised learning strategy. We extensively evaluate RobustpMHC on eight different datasets and showcase an overall improvement of over 6% in binding prediction accuracy compared to state-of-the-art approaches. We compile CrystalIMGT, a crystallography-verified dataset presenting a challenge to existing approaches due to significantly different pMHC distributions. Finally, to mitigate this distribution gap, we further develop a transfer learning pipeline.
Collapse
Affiliation(s)
- Anjana Kushwaha
- IMGT®, The International ImMunoGeneTics Information System®, Montpellier, France
- Institute of Human Genetics (IGH), Montpellier, France
- University of Montpellier (UM), Montpellier, France
- National Center for Scientific Research (CNRS), France
| | - Patrice Duroux
- IMGT®, The International ImMunoGeneTics Information System®, Montpellier, France
- Institute of Human Genetics (IGH), Montpellier, France
- National Center for Scientific Research (CNRS), France
| | - Véronique Giudicelli
- IMGT®, The International ImMunoGeneTics Information System®, Montpellier, France
- Institute of Human Genetics (IGH), Montpellier, France
- University of Montpellier (UM), Montpellier, France
- National Center for Scientific Research (CNRS), France
| | - Konstantin Todorov
- LIRMM, Laboratoire d’Informatique, de Robotique et de Microélectronique de Montpellier, Montpellier, France
| | - Sofia Kossida
- IMGT®, The International ImMunoGeneTics Information System®, Montpellier, France
- Institute of Human Genetics (IGH), Montpellier, France
- University of Montpellier (UM), Montpellier, France
- National Center for Scientific Research (CNRS), France
- Institut Universitaire de France (IUF), Paris, France
| |
Collapse
|
5
|
Su Z, Wu Y, Cao K, Du J, Cao L, Wu Z, Wu X, Wang X, Song Y, Wang X, Duan H. APEX-pHLA: A novel method for accurate prediction of the binding between exogenous short peptides and HLA class I molecules. Methods 2024; 228:38-47. [PMID: 38772499 DOI: 10.1016/j.ymeth.2024.05.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Revised: 04/28/2024] [Accepted: 05/18/2024] [Indexed: 05/23/2024] Open
Abstract
Human leukocyte antigen (HLA) molecules play critically significant role within the realm of immunotherapy due to their capacities to recognize and bind exogenous antigens such as peptides, subsequently delivering them to immune cells. Predicting the binding between peptides and HLA molecules (pHLA) can expedite the screening of immunogenic peptides and facilitate vaccine design. However, traditional experimental methods are time-consuming and inefficient. In this study, an efficient method based on deep learning was developed for predicting peptide-HLA binding, which treated peptide sequences as linguistic entities. It combined the architectures of textCNN and BiLSTM to create a deep neural network model called APEX-pHLA. This model operated without limitations related to HLA class I allele variants and peptide segment lengths, enabling efficient encoding of sequence features for both HLA and peptide segments. On the independent test set, the model achieved Accuracy, ROC_AUC, F1, and MCC is 0.9449, 0.9850, 0.9453, and 0.8899, respectively. Similarly, on an external test set, the results were 0.9803, 0.9574, 0.8835, and 0.7863, respectively. These findings outperformed fifteen methods previously reported in the literature. The accurate prediction capability of the APEX-pHLA model in peptide-HLA binding might provide valuable insights for future HLA vaccine design.
Collapse
Affiliation(s)
- Zhihao Su
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, Zhejiang 310014, China.
| | - Yejian Wu
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, Zhejiang 310014, China
| | - Kaiqiang Cao
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, Zhejiang 310014, China.
| | - Jie Du
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, Zhejiang 310014, China.
| | - Lujing Cao
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, Zhejiang 310014, China
| | - Zhipeng Wu
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, Zhejiang 310014, China
| | - Xinyi Wu
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, Zhejiang 310014, China
| | - Xinqiao Wang
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, Zhejiang 310014, China
| | - Ying Song
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, Zhejiang 310014, China.
| | - Xudong Wang
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, Zhejiang 310014, China.
| | - Hongliang Duan
- Faculty of Applied Sciences, Macao Polytechnic University, Macao 999078, China.
| |
Collapse
|
6
|
Zhang M, Cheng Q, Wei Z, Xu J, Wu S, Xu N, Zhao C, Yu L, Feng W. BertTCR: a Bert-based deep learning framework for predicting cancer-related immune status based on T cell receptor repertoire. Brief Bioinform 2024; 25:bbae420. [PMID: 39177262 PMCID: PMC11342255 DOI: 10.1093/bib/bbae420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Revised: 07/24/2024] [Accepted: 08/08/2024] [Indexed: 08/24/2024] Open
Abstract
The T cell receptor (TCR) repertoire is pivotal to the human immune system, and understanding its nuances can significantly enhance our ability to forecast cancer-related immune responses. However, existing methods often overlook the intra- and inter-sequence interactions of T cell receptors (TCRs), limiting the development of sequence-based cancer-related immune status predictions. To address this challenge, we propose BertTCR, an innovative deep learning framework designed to predict cancer-related immune status using TCRs. BertTCR combines a pre-trained protein large language model with deep learning architectures, enabling it to extract deeper contextual information from TCRs. Compared to three state-of-the-art sequence-based methods, BertTCR improves the AUC on an external validation set for thyroid cancer detection by 21 percentage points. Additionally, this model was trained on over 2000 publicly available TCR libraries covering 17 types of cancer and healthy samples, and it has been validated on multiple public external datasets for its ability to distinguish cancer patients from healthy individuals. Furthermore, BertTCR can accurately classify various cancer types and healthy individuals. Overall, BertTCR is the advancing method for cancer-related immune status forecasting based on TCRs, offering promising potential for a wide range of immune status prediction tasks.
Collapse
Affiliation(s)
- Min Zhang
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, No. 145 Nantong Street, Nangang District, Harbin, 150001, China
| | - Qi Cheng
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, No. 145 Nantong Street, Nangang District, Harbin, 150001, China
| | - Zhenyu Wei
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, No. 145 Nantong Street, Nangang District, Harbin, 150001, China
| | - Jiayu Xu
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, No. 145 Nantong Street, Nangang District, Harbin, 150001, China
| | - Shiwei Wu
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, No. 145 Nantong Street, Nangang District, Harbin, 150001, China
| | - Nan Xu
- Institute of Biomedical Engineering and Technology, Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, No. 500 Dongchuan Road, Shanghai, 200241, China
- Shanghai Unicar-Therapy Bio-medicine Technology Co., Ltd, No. 1525 Minqiang Road, Shanghai, 201612, China
| | - Chengkui Zhao
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, No. 145 Nantong Street, Nangang District, Harbin, 150001, China
- Shanghai Unicar-Therapy Bio-medicine Technology Co., Ltd, No. 1525 Minqiang Road, Shanghai, 201612, China
| | - Lei Yu
- Institute of Biomedical Engineering and Technology, Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, No. 500 Dongchuan Road, Shanghai, 200241, China
- Shanghai Unicar-Therapy Bio-medicine Technology Co., Ltd, No. 1525 Minqiang Road, Shanghai, 201612, China
| | - Weixing Feng
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, No. 145 Nantong Street, Nangang District, Harbin, 150001, China
| |
Collapse
|
7
|
Rocha LGDN, Guimarães PAS, Carvalho MGR, Ruiz JC. Tumor Neoepitope-Based Vaccines: A Scoping Review on Current Predictive Computational Strategies. Vaccines (Basel) 2024; 12:836. [PMID: 39203962 PMCID: PMC11360805 DOI: 10.3390/vaccines12080836] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2024] [Revised: 07/09/2024] [Accepted: 07/11/2024] [Indexed: 09/03/2024] Open
Abstract
Therapeutic cancer vaccines have been considered in recent decades as important immunotherapeutic strategies capable of leading to tumor regression. In the development of these vaccines, the identification of neoepitopes plays a critical role, and different computational methods have been proposed and employed to direct and accelerate this process. In this context, this review identified and systematically analyzed the most recent studies published in the literature on the computational prediction of epitopes for the development of therapeutic vaccines, outlining critical steps, along with the associated program's strengths and limitations. A scoping review was conducted following the PRISMA extension (PRISMA-ScR). Searches were performed in databases (Scopus, PubMed, Web of Science, Science Direct) using the keywords: neoepitope, epitope, vaccine, prediction, algorithm, cancer, and tumor. Forty-nine articles published from 2012 to 2024 were synthesized and analyzed. Most of the identified studies focus on the prediction of epitopes with an affinity for MHC I molecules in solid tumors, such as lung carcinoma. Predicting epitopes with class II MHC affinity has been relatively underexplored. Besides neoepitope prediction from high-throughput sequencing data, additional steps were identified, such as the prioritization of neoepitopes and validation. Mutect2 is the most used tool for variant calling, while NetMHCpan is favored for neoepitope prediction. Artificial/convolutional neural networks are the preferred methods for neoepitope prediction. For prioritizing immunogenic epitopes, the random forest algorithm is the most used for classification. The performance values related to the computational models for the prediction and prioritization of neoepitopes are high; however, a large part of the studies still use microbiome databases for training. The in vitro/in vivo validations of the predicted neoepitopes were verified in 55% of the analyzed studies. Clinical trials that led to successful tumor remission were identified, highlighting that this immunotherapeutic approach can benefit these patients. Integrating high-throughput sequencing, sophisticated bioinformatics tools, and rigorous validation methods through in vitro/in vivo assays as well as clinical trials, the tumor neoepitope-based vaccine approach holds promise for developing personalized therapeutic vaccines that target specific tumor cancers.
Collapse
Affiliation(s)
- Luiz Gustavo do Nascimento Rocha
- Biologia Computacional e Sistemas (BCS), Instituto Oswaldo Cruz (IOC), Fundação Oswaldo Cruz, Rio de Janeiro 21040-900, Brazil; (L.G.d.N.R.); (P.A.S.G.)
- Grupo Informática de Biossistemas e Genômica, Instituto René Rachou, Fundação Oswaldo Cruz, Belo Horizonte 30190-002, Brazil
| | - Paul Anderson Souza Guimarães
- Biologia Computacional e Sistemas (BCS), Instituto Oswaldo Cruz (IOC), Fundação Oswaldo Cruz, Rio de Janeiro 21040-900, Brazil; (L.G.d.N.R.); (P.A.S.G.)
- Grupo Informática de Biossistemas e Genômica, Instituto René Rachou, Fundação Oswaldo Cruz, Belo Horizonte 30190-002, Brazil
| | - Maria Gabriela Reis Carvalho
- Biologia Computacional e Sistemas (BCS), Instituto Oswaldo Cruz (IOC), Fundação Oswaldo Cruz, Rio de Janeiro 21040-900, Brazil; (L.G.d.N.R.); (P.A.S.G.)
- Grupo Informática de Biossistemas e Genômica, Instituto René Rachou, Fundação Oswaldo Cruz, Belo Horizonte 30190-002, Brazil
| | - Jeronimo Conceição Ruiz
- Biologia Computacional e Sistemas (BCS), Instituto Oswaldo Cruz (IOC), Fundação Oswaldo Cruz, Rio de Janeiro 21040-900, Brazil; (L.G.d.N.R.); (P.A.S.G.)
- Grupo Informática de Biossistemas e Genômica, Instituto René Rachou, Fundação Oswaldo Cruz, Belo Horizonte 30190-002, Brazil
| |
Collapse
|
8
|
Hong N, Jiang D, Wang Z, Sun H, Luo H, Bao L, Song M, Kang Y, Hou T. TransfIGN: A Structure-Based Deep Learning Method for Modeling the Interaction between HLA-A*02:01 and Antigen Peptides. J Chem Inf Model 2024; 64:5016-5027. [PMID: 38920330 DOI: 10.1021/acs.jcim.4c00678] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/27/2024]
Abstract
The intricate interaction between major histocompatibility complexes (MHCs) and antigen peptides with diverse amino acid sequences plays a pivotal role in immune responses and T cell activity. In recent years, deep learning (DL)-based models have emerged as promising tools for accelerating antigen peptide screening. However, most of these models solely rely on one-dimensional amino acid sequences, overlooking crucial information required for the three-dimensional (3-D) space binding process. In this study, we propose TransfIGN, a structure-based DL model that is inspired by our previously developed framework, Interaction Graph Network (IGN), and incorporates sequence information from transformers to predict the interactions between HLA-A*02:01 and antigen peptides. Our model, trained on a comprehensive data set containing 61,816 sequences with 9051 binding affinity labels and 56,848 eluted ligand labels, achieves an area under the curve (AUC) of 0.893 on the binary data set, better than state-of-the-art sequence-based models trained on larger data sets such as NetMHCpan4.1, ANN, and TransPHLA. Furthermore, when evaluated on the IEDB weekly benchmark data sets, our predictions (AUC = 0.816) are better than those of the recommended methods like the IEDB consensus (AUC = 0.795). Notably, the interaction weight matrices generated by our method highlight the strong interactions at specific positions within peptides, emphasizing the model's ability to provide physical interpretability. This capability to unveil binding mechanisms through intricate structural features holds promise for new immunotherapeutic avenues.
Collapse
Affiliation(s)
- Nanqi Hong
- College of Computer Science and Technology, Zhejiang University, Hangzhou, Zhejiang 310027, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Dejun Jiang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Zhe Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Huiyong Sun
- Department of Medicinal Chemistry, China Pharmaceutical University, Nanjing, Jiangsu 210009, China
| | - Hao Luo
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Lingjie Bao
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Mingli Song
- College of Computer Science and Technology, Zhejiang University, Hangzhou, Zhejiang 310027, China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Zhejiang University, Hangzhou, Zhejiang 310058, China
| |
Collapse
|
9
|
Machaca V, Goyzueta V, Cruz MG, Sejje E, Pilco LM, López J, Túpac Y. Transformers meets neoantigen detection: a systematic literature review. J Integr Bioinform 2024; 21:jib-2023-0043. [PMID: 38960869 PMCID: PMC11377031 DOI: 10.1515/jib-2023-0043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Accepted: 03/20/2024] [Indexed: 07/05/2024] Open
Abstract
Cancer immunology offers a new alternative to traditional cancer treatments, such as radiotherapy and chemotherapy. One notable alternative is the development of personalized vaccines based on cancer neoantigens. Moreover, Transformers are considered a revolutionary development in artificial intelligence with a significant impact on natural language processing (NLP) tasks and have been utilized in proteomics studies in recent years. In this context, we conducted a systematic literature review to investigate how Transformers are applied in each stage of the neoantigen detection process. Additionally, we mapped current pipelines and examined the results of clinical trials involving cancer vaccines.
Collapse
Affiliation(s)
| | | | | | - Erika Sejje
- Universidad Nacional de San Agustín, Arequipa, Perú
| | | | | | - Yván Túpac
- 187038 Universidad Católica San Pablo , Arequipa, Perú
| |
Collapse
|
10
|
Bulashevska A, Nacsa Z, Lang F, Braun M, Machyna M, Diken M, Childs L, König R. Artificial intelligence and neoantigens: paving the path for precision cancer immunotherapy. Front Immunol 2024; 15:1394003. [PMID: 38868767 PMCID: PMC11167095 DOI: 10.3389/fimmu.2024.1394003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 05/13/2024] [Indexed: 06/14/2024] Open
Abstract
Cancer immunotherapy has witnessed rapid advancement in recent years, with a particular focus on neoantigens as promising targets for personalized treatments. The convergence of immunogenomics, bioinformatics, and artificial intelligence (AI) has propelled the development of innovative neoantigen discovery tools and pipelines. These tools have revolutionized our ability to identify tumor-specific antigens, providing the foundation for precision cancer immunotherapy. AI-driven algorithms can process extensive amounts of data, identify patterns, and make predictions that were once challenging to achieve. However, the integration of AI comes with its own set of challenges, leaving space for further research. With particular focus on the computational approaches, in this article we have explored the current landscape of neoantigen prediction, the fundamental concepts behind, the challenges and their potential solutions providing a comprehensive overview of this rapidly evolving field.
Collapse
Affiliation(s)
- Alla Bulashevska
- Host-Pathogen-Interactions, Paul-Ehrlich-Institut, Langen, Germany
| | - Zsófia Nacsa
- Host-Pathogen-Interactions, Paul-Ehrlich-Institut, Langen, Germany
| | - Franziska Lang
- TRON - Translational Oncology at the University Medical Center of the Johannes Gutenberg University gGmbH, Mainz, Germany
| | - Markus Braun
- Host-Pathogen-Interactions, Paul-Ehrlich-Institut, Langen, Germany
| | - Martin Machyna
- Host-Pathogen-Interactions, Paul-Ehrlich-Institut, Langen, Germany
| | - Mustafa Diken
- TRON - Translational Oncology at the University Medical Center of the Johannes Gutenberg University gGmbH, Mainz, Germany
| | - Liam Childs
- Host-Pathogen-Interactions, Paul-Ehrlich-Institut, Langen, Germany
| | - Renate König
- Host-Pathogen-Interactions, Paul-Ehrlich-Institut, Langen, Germany
| |
Collapse
|
11
|
Zhuang J, Huang X, Liu S, Gao W, Su R, Feng K. MulTFBS: A Spatial-Temporal Network with Multichannels for Predicting Transcription Factor Binding Sites. J Chem Inf Model 2024; 64:4322-4333. [PMID: 38733561 DOI: 10.1021/acs.jcim.3c02088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/13/2024]
Abstract
Revealing the mechanisms that influence transcription factor binding specificity is the key to understanding gene regulation. In previous studies, DNA double helix structure and one-hot embedding have been used successfully to design computational methods for predicting transcription factor binding sites (TFBSs). However, DNA sequence as a kind of biological language, the method of word embedding representation in natural language processing, has not been considered properly in TFBS prediction models. In our work, we integrate different types of features of DNA sequence to design a multichanneled deep learning framework, namely MulTFBS, in which independent one-hot encoding, word embedding encoding, which can incorporate contextual information and extract the global features of the sequences, and double helix three-dimensional structural features have been trained in different channels. To extract sequence high-level information effectively, in our deep learning framework, we select the spatial-temporal network by combining convolutional neural networks and bidirectional long short-term memory networks with attention mechanism. Compared with six state-of-the-art methods on 66 universal protein-binding microarray data sets of different transcription factors, MulTFBS performs best on all data sets in the regression tasks, with the average R2 of 0.698 and the average PCC of 0.833, which are 5.4% and 3.2% higher, respectively, than the suboptimal method CRPTS. In addition, we evaluate the classification performance of MulTFBS for distinguishing bound or unbound regions on TF ChIP-seq data. The results show that our framework also performs well in the TFBS classification tasks.
Collapse
Affiliation(s)
- Jujuan Zhuang
- The School of Science, Dalian Maritime University, Dalian 116026, China
| | - Xinru Huang
- The School of Science, Dalian Maritime University, Dalian 116026, China
| | - Shuhan Liu
- The School of Science, Dalian Maritime University, Dalian 116026, China
| | - Wanquan Gao
- The School of Science, Dalian Maritime University, Dalian 116026, China
| | - Rui Su
- The School of Science, Dalian Maritime University, Dalian 116026, China
| | - Kexin Feng
- The School of Science, Dalian Maritime University, Dalian 116026, China
| |
Collapse
|
12
|
Jiang M, Yu Z, Lan X. VitTCR: A deep learning method for peptide recognition prediction. iScience 2024; 27:109770. [PMID: 38711451 PMCID: PMC11070698 DOI: 10.1016/j.isci.2024.109770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Revised: 01/21/2024] [Accepted: 04/15/2024] [Indexed: 05/08/2024] Open
Abstract
This study introduces VitTCR, a predictive model based on the vision transformer (ViT) architecture, aimed at identifying interactions between T cell receptors (TCRs) and peptides, crucial for developing cancer immunotherapies and vaccines. VitTCR converts TCR-peptide interactions into numerical AtchleyMaps using Atchley factors for prediction, achieving AUROC (0.6485) and AUPR (0.6295) values. Benchmark analysis indicates VitTCR's performance is comparable to other models, with further comparative studies suggested to understand its effectiveness in varied contexts. Additionally, integrating a positional bias weight matrix (PBWM), derived from amino acid contact probabilities in structurally resolved pMHC-TCR complexes, slightly improves VitTCR's accuracy. The model's predictions show weak yet statistically significant correlations with immunological factors like T cell clonal expansion and activation percentages, underscoring the biological relevance of VitTCR's predictive capabilities. VitTCR emerges as a valuable computational tool for predicting TCR-peptide interactions, offering insights for immunotherapy and vaccine development.
Collapse
Affiliation(s)
- Mengnan Jiang
- School of Medicine, Tsinghua University, Beijing 100084, China
| | - Zilan Yu
- School of Medicine, Tsinghua University, Beijing 100084, China
- Centre for Life Sciences, Tsinghua University, Beijing 100084, China
| | - Xun Lan
- School of Medicine, Tsinghua University, Beijing 100084, China
- Centre for Life Sciences, Tsinghua University, Beijing 100084, China
- Tsinghua-Peking Center for Life Sciences, MOE Key Laboratory of Tsinghua University, Beijing, China
- MOE Key Laboratory of Bioinformatics, Tsinghua University, Beijing 100084, China
| |
Collapse
|
13
|
Giziński S, Preibisch G, Kucharski P, Tyrolski M, Rembalski M, Grzegorczyk P, Gambin A. Enhancing antigenic peptide discovery: Improved MHC-I binding prediction and methodology. Methods 2024; 224:1-9. [PMID: 38295891 DOI: 10.1016/j.ymeth.2024.01.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2023] [Revised: 12/30/2023] [Accepted: 01/16/2024] [Indexed: 02/05/2024] Open
Abstract
The Major Histocompatibility Complex (MHC) is a critical element of the vertebrate cellular immune system, responsible for presenting peptides derived from intracellular proteins. MHC-I presentation is pivotal in the immune response and holds considerable potential in the realms of vaccine development and cancer immunotherapy. This study delves into the limitations of current methods and benchmarks for MHC-I presentation. We introduce a novel benchmark designed to assess generalization properties and the reliability of models on unseen MHC molecules and peptides, with a focus on the Human Leukocyte Antigen (HLA)-a specific subset of MHC genes present in humans. Finally, we introduce HLABERT, a pretrained language model that outperforms previous methods significantly on our benchmark and establishes a new state-of-the-art on existing benchmarks.
Collapse
Affiliation(s)
| | - Grzegorz Preibisch
- Deepflare, Warsaw, Poland; University of Warsaw, Department of Mathematics Informatics and Mechanics, Warsaw, Poland.
| | | | | | | | | | - Anna Gambin
- University of Warsaw, Department of Mathematics Informatics and Mechanics, Warsaw, Poland.
| |
Collapse
|
14
|
Zhang L, Song W, Zhu T, Liu Y, Chen W, Cao Y. ConvNeXt-MHC: improving MHC-peptide affinity prediction by structure-derived degenerate coding and the ConvNeXt model. Brief Bioinform 2024; 25:bbae133. [PMID: 38561979 PMCID: PMC10985285 DOI: 10.1093/bib/bbae133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Revised: 02/11/2024] [Accepted: 03/02/2024] [Indexed: 04/04/2024] Open
Abstract
Peptide binding to major histocompatibility complex (MHC) proteins plays a critical role in T-cell recognition and the specificity of the immune response. Experimental validation such peptides is extremely resource-intensive. As a result, accurate computational prediction of binding peptides is highly important, particularly in the context of cancer immunotherapy applications, such as the identification of neoantigens. In recent years, there is a significant need to continually improve the existing prediction methods to meet the demands of this field. We developed ConvNeXt-MHC, a method for predicting MHC-I-peptide binding affinity. It introduces a degenerate encoding approach to enhance well-established panspecific methods and integrates transfer learning and semi-supervised learning methods into the cutting-edge deep learning framework ConvNeXt. Comprehensive benchmark results demonstrate that ConvNeXt-MHC outperforms state-of-the-art methods in terms of accuracy. We expect that ConvNeXt-MHC will help us foster new discoveries in the field of immunoinformatics in the distant future. We constructed a user-friendly website at http://www.combio-lezhang.online/predict/, where users can access our data and application.
Collapse
Affiliation(s)
- Le Zhang
- College of Computer Science, Sichuan University, Chengdu 610065, China
| | - Wenkai Song
- College of Computer Science, Sichuan University, Chengdu 610065, China
| | - Tinghao Zhu
- College of Computer Science, Sichuan University, Chengdu 610065, China
- Nuclear Power Institute of China, Chengdu 610213, China
| | - Yang Liu
- Center of Growth, Metabolism and Aging, Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, No. 29 Wangjiang Road, Chengdu 610065, China
| | - Wei Chen
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China
| | - Yang Cao
- Center of Growth, Metabolism and Aging, Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, No. 29 Wangjiang Road, Chengdu 610065, China
| |
Collapse
|
15
|
Wang M, Lei C, Wang J, Li Y, Li M. TripHLApan: predicting HLA molecules binding peptides based on triple coding matrix and transfer learning. Brief Bioinform 2024; 25:bbae154. [PMID: 38600667 PMCID: PMC11006794 DOI: 10.1093/bib/bbae154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Revised: 02/16/2024] [Accepted: 03/13/2024] [Indexed: 04/12/2024] Open
Abstract
Human leukocyte antigen (HLA) recognizes foreign threats and triggers immune responses by presenting peptides to T cells. Computationally modeling the binding patterns between peptide and HLA is very important for the development of tumor vaccines. However, it is still a big challenge to accurately predict HLA molecules binding peptides. In this paper, we develop a new model TripHLApan for predicting HLA molecules binding peptides by integrating triple coding matrix, BiGRU + Attention models, and transfer learning strategy. We have found the main interaction site regions between HLA molecules and peptides, as well as the correlation between HLA encoding and binding motifs. Based on the discovery, we make the preprocessing and coding closer to the natural biological process. Besides, due to the input being based on multiple types of features and the attention module focused on the BiGRU hidden layer, TripHLApan has learned more sequence level binding information. The application of transfer learning strategies ensures the accuracy of prediction results under special lengths (peptides in length 8) and model scalability with the data explosion. Compared with the current optimal models, TripHLApan exhibits strong predictive performance in various prediction environments with different positive and negative sample ratios. In addition, we validate the superiority and scalability of TripHLApan's predictive performance using additional latest data sets, ablation experiments and binding reconstitution ability in the samples of a melanoma patient. The results show that TripHLApan is a powerful tool for predicting the binding of HLA-I and HLA-II molecular peptides for the synthesis of tumor vaccines. TripHLApan is publicly available at https://github.com/CSUBioGroup/TripHLApan.git.
Collapse
Affiliation(s)
- Meng Wang
- School of Computer Science and engineering, Central South University, Changsha 410083, China
| | - Chuqi Lei
- School of Computer Science and engineering, Central South University, Changsha 410083, China
| | - Jianxin Wang
- School of Computer Science and engineering, Central South University, Changsha 410083, China
| | - Yaohang Li
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA
| | - Min Li
- School of Computer Science and engineering, Central South University, Changsha 410083, China
| |
Collapse
|
16
|
Borole P, Rajan A. Building trust in deep learning-based immune response predictors with interpretable explanations. Commun Biol 2024; 7:279. [PMID: 38448546 PMCID: PMC10917751 DOI: 10.1038/s42003-024-05968-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Accepted: 02/23/2024] [Indexed: 03/08/2024] Open
Abstract
The ability to predict whether a peptide will get presented on Major Histocompatibility Complex (MHC) class I molecules has profound implications in designing vaccines. Numerous deep learning-based predictors for peptide presentation on MHC class I molecules exist with high levels of accuracy. However, these MHC class I predictors are treated as black-box functions, providing little insight into their decision making. To build turst in these predictors, it is crucial to understand the rationale behind their decisions with human-interpretable explanations. We present MHCXAI, eXplainable AI (XAI) techniques to help interpret the outputs from MHC class I predictors in terms of input peptide features. In our experiments, we explain the outputs of four state-of-the-art MHC class I predictors over a large dataset of peptides and MHC alleles. Additionally, we evaluate the reliability of the explanations by comparing against ground truth and checking their robustness. MHCXAI seeks to increase understanding of deep learning-based predictors in the immune response domain and build trust with validated explanations.
Collapse
Affiliation(s)
- Piyush Borole
- School of Informatics, University of Edinburgh, Informatics Forum, 10 Crichton St, Newington, Edinburgh, EH8 9AB, Scotland, UK.
| | - Ajitha Rajan
- School of Informatics, University of Edinburgh, Informatics Forum, 10 Crichton St, Newington, Edinburgh, EH8 9AB, Scotland, UK.
| |
Collapse
|
17
|
Luo Y, Chen Y, Xie H, Zhu W, Zhang G. Interpretable CRISPR/Cas9 off-target activities with mismatches and indels prediction using BERT. Comput Biol Med 2024; 169:107932. [PMID: 38199209 DOI: 10.1016/j.compbiomed.2024.107932] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 12/25/2023] [Accepted: 01/01/2024] [Indexed: 01/12/2024]
Abstract
Off-target effects of CRISPR/Cas9 can lead to suboptimal genome editing outcomes. Numerous deep learning-based approaches have achieved excellent performance for off-target prediction; however, few can predict the off-target activities with both mismatches and indels between single guide RNA (sgRNA) and target DNA sequence pair. In addition, data imbalance is a common pitfall for off-target prediction. Moreover, due to the complexity of genomic contexts, generating an interpretable model also remains challenged. To address these issues, firstly we developed a BERT-based model called CRISPR-BERT for enhancing the prediction of off-target activities with both mismatches and indels. Secondly, we proposed an adaptive batch-wise class balancing strategy to combat the noise exists in imbalanced off-target data. Finally, we applied a visualization approach for investigating the generalizable nucleotide position-dependent patterns of sgRNA-DNA pair for off-target activity. In our comprehensive comparison to existing methods on five mismatches-only datasets and two mismatches-and-indels datasets, CRISPR-BERT achieved the best performance in terms of AUROC and PRAUC. Besides, the visualization analysis demonstrated how implicit knowledge learned by CRISPR-BERT facilitates off-target prediction, which shows potential in model interpretability. Collectively, CRISPR-BERT provides an accurate and interpretable framework for off-target prediction, further contributes to sgRNA optimization in practical use for improved target specificity in CRISPR/Cas9 genome editing. The source code is available at https://github.com/BrokenStringx/CRISPR-BERT.
Collapse
Affiliation(s)
- Ye Luo
- College of Engineering, Shantou University, Shantou, 515063, China
| | - Yaowen Chen
- College of Engineering, Shantou University, Shantou, 515063, China
| | - HuanZeng Xie
- College of Engineering, Shantou University, Shantou, 515063, China
| | - Wentao Zhu
- College of Engineering, Shantou University, Shantou, 515063, China
| | - Guishan Zhang
- College of Engineering, Shantou University, Shantou, 515063, China.
| |
Collapse
|
18
|
Conev A, Fasoulis R, Hall-Swan S, Ferreira R, Kavraki LE. HLAEquity: Examining biases in pan-allele peptide-HLA binding predictors. iScience 2024; 27:108613. [PMID: 38188519 PMCID: PMC10770483 DOI: 10.1016/j.isci.2023.108613] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 11/13/2023] [Accepted: 11/29/2023] [Indexed: 01/09/2024] Open
Abstract
Peptide-HLA (pHLA) binding prediction is essential in screening peptide candidates for personalized peptide vaccines. Machine learning (ML) pHLA binding prediction tools are trained on vast amounts of data and are effective in screening peptide candidates. Most ML models report the ability to generalize to HLA alleles unseen during training ("pan-allele" models). However, the use of datasets with imbalanced allele content raises concerns about biased model performance. First, we examine the data bias of two ML-based pan-allele pHLA binding predictors. We find that the pHLA datasets overrepresent alleles from geographic populations of high-income countries. Second, we show that the identified data bias is perpetuated within ML models, leading to algorithmic bias and subpar performance for alleles expressed in low-income geographic populations. We draw attention to the potential therapeutic consequences of this bias, and we challenge the use of the term "pan-allele" to describe models trained with currently available public datasets.
Collapse
Affiliation(s)
- Anja Conev
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Romanos Fasoulis
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Sarah Hall-Swan
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Rodrigo Ferreira
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Lydia E. Kavraki
- Department of Computer Science, Rice University, Houston, TX, USA
| |
Collapse
|
19
|
Akbari Rokn Abadi S, Tabatabaei S, Koohi S. KDeep: a new memory-efficient data extraction method for accurately predicting DNA/RNA transcription factor binding sites. J Transl Med 2023; 21:727. [PMID: 37845681 PMCID: PMC10580661 DOI: 10.1186/s12967-023-04593-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Accepted: 10/04/2023] [Indexed: 10/18/2023] Open
Abstract
This paper addresses the crucial task of identifying DNA/RNA binding sites, which has implications in drug/vaccine design, protein engineering, and cancer research. Existing methods utilize complex neural network structures, diverse input types, and machine learning techniques for feature extraction. However, the growing volume of sequences poses processing challenges. This study introduces KDeep, employing a CNN-LSTM architecture with a novel encoding method called 2Lk. 2Lk enhances prediction accuracy, reduces memory consumption by up to 84%, reduces trainable parameters, and improves interpretability by approximately 79% compared to state-of-the-art approaches. KDeep offers a promising solution for accurate and efficient binding site prediction.
Collapse
Affiliation(s)
| | | | - Somayyeh Koohi
- Department of Computer Engineering, Sharif University of Technology, Tehran, Iran.
| |
Collapse
|
20
|
Li F, Wang C, Guo X, Akutsu T, Webb GI, Coin LJM, Kurgan L, Song J. ProsperousPlus: a one-stop and comprehensive platform for accurate protease-specific substrate cleavage prediction and machine-learning model construction. Brief Bioinform 2023; 24:bbad372. [PMID: 37874948 DOI: 10.1093/bib/bbad372] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2023] [Revised: 08/30/2023] [Accepted: 09/29/2023] [Indexed: 10/26/2023] Open
Abstract
Proteases contribute to a broad spectrum of cellular functions. Given a relatively limited amount of experimental data, developing accurate sequence-based predictors of substrate cleavage sites facilitates a better understanding of protease functions and substrate specificity. While many protease-specific predictors of substrate cleavage sites were developed, these efforts are outpaced by the growth of the protease substrate cleavage data. In particular, since data for 100+ protease types are available and this number continues to grow, it becomes impractical to publish predictors for new protease types, and instead it might be better to provide a computational platform that helps users to quickly and efficiently build predictors that address their specific needs. To this end, we conceptualized, developed, tested and released a versatile bioinformatics platform, ProsperousPlus, that empowers users, even those with no programming or little bioinformatics background, to build fast and accurate predictors of substrate cleavage sites. ProsperousPlus facilitates the use of the rapidly accumulating substrate cleavage data to train, empirically assess and deploy predictive models for user-selected substrate types. Benchmarking tests on test datasets show that our platform produces predictors that on average exceed the predictive performance of current state-of-the-art approaches. ProsperousPlus is available as a webserver and a stand-alone software package at http://prosperousplus.unimelb-biotools.cloud.edu.au/.
Collapse
Affiliation(s)
- Fuyi Li
- College of Information Engineering, Northwest A&F University, Shaanxi 712100, China
- South Australian immunoGENomics Cancer Institute (SAiGENCI), Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, SA 5005, Australia
- The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, VIC 3000, Australia
| | - Cong Wang
- College of Information Engineering, Northwest A&F University, Shaanxi 712100, China
| | - Xudong Guo
- College of Information Engineering, Northwest A&F University, Shaanxi 712100, China
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto 611-0011, Japan
| | - Geoffrey I Webb
- Monash Data Futures Institute, Monash University, VIC 3800, Australia
| | - Lachlan J M Coin
- The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, VIC 3000, Australia
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Jiangning Song
- Monash Data Futures Institute, Monash University, VIC 3800, Australia
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, VIC 3800, Australia
| |
Collapse
|
21
|
Zhang G, Luo Y, Dai X, Dai Z. Benchmarking deep learning methods for predicting CRISPR/Cas9 sgRNA on- and off-target activities. Brief Bioinform 2023; 24:bbad333. [PMID: 37775147 DOI: 10.1093/bib/bbad333] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Revised: 08/31/2023] [Accepted: 09/04/2023] [Indexed: 10/01/2023] Open
Abstract
In silico design of single guide RNA (sgRNA) plays a critical role in clustered regularly interspaced, short palindromic repeats/CRISPR-associated protein 9 (CRISPR/Cas9) system. Continuous efforts are aimed at improving sgRNA design with efficient on-target activity and reduced off-target mutations. In the last 5 years, an increasing number of deep learning-based methods have achieved breakthrough performance in predicting sgRNA on- and off-target activities. Nevertheless, it is worthwhile to systematically evaluate these methods for their predictive abilities. In this review, we conducted a systematic survey on the progress in prediction of on- and off-target editing. We investigated the performances of 10 mainstream deep learning-based on-target predictors using nine public datasets with different sample sizes. We found that in most scenarios, these methods showed superior predictive power on large- and medium-scale datasets than on small-scale datasets. In addition, we performed unbiased experiments to provide in-depth comparison of eight representative approaches for off-target prediction on 12 publicly available datasets with various imbalanced ratios of positive/negative samples. Most methods showed excellent performance on balanced datasets but have much room for improvement on moderate- and severe-imbalanced datasets. This study provides comprehensive perspectives on CRISPR/Cas9 sgRNA on- and off-target activity prediction and improvement for method development.
Collapse
Affiliation(s)
- Guishan Zhang
- College of Engineering, Shantou University, Shantou 515063, China
| | - Ye Luo
- College of Engineering, Shantou University, Shantou 515063, China
| | - Xianhua Dai
- School of Cyber Science and Technology, Sun Yat-sen University, Shenzhen 518107, China
- Southern Marine Science and Engineering Guangdong Laboratory, Zhuhai 519000, China
| | - Zhiming Dai
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510006, China
- Guangdong Province Key Laboratory of Big Data Analysis and Processing, Sun Yat-sen University, Guangzhou 510006, China
| |
Collapse
|
22
|
Pu T, Peddle A, Zhu J, Tejpar S, Verbandt S. Neoantigen identification: Technological advances and challenges. Methods Cell Biol 2023; 183:265-302. [PMID: 38548414 DOI: 10.1016/bs.mcb.2023.06.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/02/2024]
Abstract
Neoantigens have emerged as promising targets for cutting-edge immunotherapies, such as cancer vaccines and adoptive cell therapy. These neoantigens are unique to tumors and arise exclusively from somatic mutations or non-genomic aberrations in tumor proteins. They encompass a wide range of alterations, including genomic mutations, post-transcriptomic variants, and viral oncoproteins. With the advancements in technology, the identification of immunogenic neoantigens has seen rapid progress, raising new opportunities for enhancing their clinical significance. Prediction of neoantigens necessitates the acquisition of high-quality samples and sequencing data, followed by mutation calling. Subsequently, the pipeline involves integrating various tools that can predict the expression, processing, binding, and recognition potential of neoantigens. However, the continuous improvement of computational tools is constrained by the availability of datasets which contain validated immunogenic neoantigens. This review article aims to provide a comprehensive summary of the current knowledge as well as limitations in neoantigen prediction and validation. Additionally, it delves into the origin and biological role of neoantigens, offering a deeper understanding of their significance in the field of cancer immunotherapy. This article thus seeks to contribute to the ongoing efforts to harness neoantigens as powerful weapons in the fight against cancer.
Collapse
Affiliation(s)
- Ting Pu
- Digestive Oncology Unit, KULeuven, Leuven, Belgium
| | | | - Jingjing Zhu
- de Duve Institute, Université catholique de Louvain, Brussels, Belgium
| | | | | |
Collapse
|
23
|
Qu W, You R, Mamitsuka H, Zhu S. DeepMHCI: an anchor position-aware deep interaction model for accurate MHC-I peptide binding affinity prediction. Bioinformatics 2023; 39:btad551. [PMID: 37669154 PMCID: PMC10516514 DOI: 10.1093/bioinformatics/btad551] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 08/06/2023] [Accepted: 09/04/2023] [Indexed: 09/07/2023] Open
Abstract
MOTIVATION Computationally predicting major histocompatibility complex class I (MHC-I) peptide binding affinity is an important problem in immunological bioinformatics, which is also crucial for the identification of neoantigens for personalized therapeutic cancer vaccines. Recent cutting-edge deep learning-based methods for this problem cannot achieve satisfactory performance, especially for non-9-mer peptides. This is because such methods generate the input by simply concatenating the two given sequences: a peptide and (the pseudo sequence of) an MHC class I molecule, which cannot precisely capture the anchor positions of the MHC binding motif for the peptides with variable lengths. We thus developed an anchor position-aware and high-performance deep model, DeepMHCI, with a position-wise gated layer and a residual binding interaction convolution layer. This allows the model to control the information flow in peptides to be aware of anchor positions and model the interactions between peptides and the MHC pseudo (binding) sequence directly with multiple convolutional kernels. RESULTS The performance of DeepMHCI has been thoroughly validated by extensive experiments on four benchmark datasets under various settings, such as 5-fold cross-validation, validation with the independent testing set, external HPV vaccine identification, and external CD8+ epitope identification. Experimental results with visualization of binding motifs demonstrate that DeepMHCI outperformed all competing methods, especially on non-9-mer peptides binding prediction. AVAILABILITY AND IMPLEMENTATION DeepMHCI is publicly available at https://github.com/ZhuLab-Fudan/DeepMHCI.
Collapse
Affiliation(s)
- Wei Qu
- Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, Shanghai 200433, China
| | - Ronghui You
- Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, Shanghai 200433, China
| | - Hiroshi Mamitsuka
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto Prefecture 611-0011, Japan
- Department of Computer Science, Aalto University, 00076 Espoo, Finland
| | - Shanfeng Zhu
- Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, Shanghai 200433, China
- Shanghai Qi Zhi Institute, Shanghai 200030, China
- Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, Fudan University, Ministry of Education, Shanghai 200433, China
- Shanghai Key Lab of Intelligent Information Processing and Shanghai Institute of Artificial Intelligence Algorithm, Fudan University, Shanghai 200433, China
- Zhangjiang Fudan International Innovation Center, Shanghai 200433, China
| |
Collapse
|
24
|
Kalemati M, Darvishi S, Koohi S. CapsNet-MHC predicts peptide-MHC class I binding based on capsule neural networks. Commun Biol 2023; 6:492. [PMID: 37147498 PMCID: PMC10162658 DOI: 10.1038/s42003-023-04867-2] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Accepted: 04/24/2023] [Indexed: 05/07/2023] Open
Abstract
The Major Histocompatibility Complex (MHC) binds to the derived peptides from pathogens to present them to killer T cells on the cell surface. Developing computational methods for accurate, fast, and explainable peptide-MHC binding prediction can facilitate immunotherapies and vaccine development. Various deep learning-based methods rely on separate feature extraction from the peptide and MHC sequences and ignore their pairwise binding information. This paper develops a capsule neural network-based method to efficiently capture the peptide-MHC complex features to predict the peptide-MHC class I binding. Various evaluations confirmed our method outperformance over the alternative methods, while it can provide accurate prediction over less available data. Moreover, for providing precise insights into the results, we explored the essential features that contributed to the prediction. Since the simulation results demonstrated consistency with the experimental studies, we concluded that our method can be utilized for the accurate, rapid, and interpretable peptide-MHC binding prediction to assist biological therapies.
Collapse
Affiliation(s)
- Mahmood Kalemati
- Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
| | - Saeid Darvishi
- Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
| | - Somayyeh Koohi
- Department of Computer Engineering, Sharif University of Technology, Tehran, Iran.
| |
Collapse
|
25
|
Wan Y, Jiang Z. TransCrispr: Transformer Based Hybrid Model for Predicting CRISPR/Cas9 Single Guide RNA Cleavage Efficiency. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1518-1528. [PMID: 36006888 DOI: 10.1109/tcbb.2022.3201631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
CRISPR/Cas9 is a widely used genome editing tool for site-directed modification of deoxyribonucleic acid (DNA) nucleotide sequences. However, how to accurately predict and evaluate the on- and off-target effects of single guide RNA (sgRNA) is one of the key problems for CRISPR/Cas9 system. Using computational methods to obtain high cell-specific sensitivity and specificity is a prerequisite for the optimal design of sgRNAs. Inspired by the work of predecessors, we found that sgRNA on-target knockout efficacy was not only related to the original sequence but also affected by important biological features. Hence, we introduce a novel approach called TransCrispr, which integrates Transformer and convolutional neural network (CNN) architecture to predict sgRNA knockout efficacy. Firstly, we encode the sequence data and send the transformed sgRNA sequence, positional information, and biological features into the network as input. Then, the convolutional neural network will automatically learn an appropriate feature representation for the sgRNA sequence and combine it with the positional information for self-attention learning of the Transformer. Finally, a regression score is generated by predicting biological features. Experiments on seven public datasets illustrate that TransCrispr outperforms state-of-the-art methods in terms of prediction accuracy and generalization ability.
Collapse
|
26
|
Contemplating immunopeptidomes to better predict them. Semin Immunol 2023; 66:101708. [PMID: 36621290 DOI: 10.1016/j.smim.2022.101708] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Revised: 12/16/2022] [Accepted: 12/20/2022] [Indexed: 01/09/2023]
Abstract
The identification of T-cell epitopes is key for a complete molecular understanding of immune recognition mechanisms in infectious diseases, autoimmunity and cancer. T-cell epitopes further provide targets for personalized vaccines and T-cell therapy, with several therapeutic applications in cancer immunotherapy and elsewhere. T-cell epitopes consist of short peptides displayed on Major Histocompatibility Complex (MHC) molecules. The recent advances in mass spectrometry (MS) based technologies to profile the ensemble of peptides displayed on MHC molecules - the so-called immunopeptidome - had a major impact on our understanding of antigen presentation and MHC ligands. On the one hand, these techniques enabled researchers to directly identify hundreds of thousands of peptides presented on MHC molecules, including some that elicited T-cell recognition. On the other hand, the data collected in these experiments revealed fundamental properties of antigen presentation pathways and significantly improved our ability to predict naturally presented MHC ligands and T-cell epitopes across the wide spectrum of MHC alleles found in human and other organisms. Here we review recent computational developments to analyze experimentally determined immunopeptidomes and harness these data to improve our understanding of antigen presentation and MHC binding specificities, as well as our ability to predict MHC ligands. We further discuss the strengths and limitations of the latest approaches to move beyond predictions of antigen presentation and tackle the challenges of predicting TCR recognition and immunogenicity.
Collapse
|
27
|
Cai Y, Chen R, Gao S, Li W, Liu Y, Su G, Song M, Jiang M, Jiang C, Zhang X. Artificial intelligence applied in neoantigen identification facilitates personalized cancer immunotherapy. Front Oncol 2023; 12:1054231. [PMID: 36698417 PMCID: PMC9868469 DOI: 10.3389/fonc.2022.1054231] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 12/16/2022] [Indexed: 01/10/2023] Open
Abstract
The field of cancer neoantigen investigation has developed swiftly in the past decade. Predicting novel and true neoantigens derived from large multi-omics data became difficult but critical challenges. The rise of Artificial Intelligence (AI) or Machine Learning (ML) in biomedicine application has brought benefits to strengthen the current computational pipeline for neoantigen prediction. ML algorithms offer powerful tools to recognize the multidimensional nature of the omics data and therefore extract the key neoantigen features enabling a successful discovery of new neoantigens. The present review aims to outline the significant technology progress of machine learning approaches, especially the newly deep learning tools and pipelines, that were recently applied in neoantigen prediction. In this review article, we summarize the current state-of-the-art tools developed to predict neoantigens. The standard workflow includes calling genetic variants in paired tumor and blood samples, and rating the binding affinity between mutated peptide, MHC (I and II) and T cell receptor (TCR), followed by characterizing the immunogenicity of tumor epitopes. More specifically, we highlight the outstanding feature extraction tools and multi-layer neural network architectures in typical ML models. It is noted that more integrated neoantigen-predicting pipelines are constructed with hybrid or combined ML algorithms instead of conventional machine learning models. In addition, the trends and challenges in further optimizing and integrating the existing pipelines are discussed.
Collapse
Affiliation(s)
- Yu Cai
- School of Medicine, Northwest University, Xi’an, Shaanxi, China
| | - Rui Chen
- School of Medicine, Northwest University, Xi’an, Shaanxi, China
| | - Shenghan Gao
- School of Medicine, Northwest University, Xi’an, Shaanxi, China
| | - Wenqing Li
- School of Medicine, Northwest University, Xi’an, Shaanxi, China
| | - Yuru Liu
- School of Medicine, Northwest University, Xi’an, Shaanxi, China
| | - Guodong Su
- School of Medicine, Northwest University, Xi’an, Shaanxi, China
| | - Mingming Song
- School of Medicine, Northwest University, Xi’an, Shaanxi, China
| | - Mengju Jiang
- School of Medicine, Northwest University, Xi’an, Shaanxi, China
| | - Chao Jiang
- Department of Neurology, The Second Affiliated Hospital of Xi’an Medical University, Xi’an, Shaanxi, China,*Correspondence: Chao Jiang, ; Xi Zhang,
| | - Xi Zhang
- School of Medicine, Northwest University, Xi’an, Shaanxi, China,*Correspondence: Chao Jiang, ; Xi Zhang,
| |
Collapse
|
28
|
Luo Z, Lou L, Qiu W, Xu Z, Xiao X. Predicting N6-Methyladenosine Sites in Multiple Tissues of Mammals through Ensemble Deep Learning. Int J Mol Sci 2022; 23:15490. [PMID: 36555143 PMCID: PMC9778682 DOI: 10.3390/ijms232415490] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Revised: 12/03/2022] [Accepted: 12/05/2022] [Indexed: 12/13/2022] Open
Abstract
N6-methyladenosine (m6A) is the most abundant within eukaryotic messenger RNA modification, which plays an essential regulatory role in the control of cellular functions and gene expression. However, it remains an outstanding challenge to detect mRNA m6A transcriptome-wide at base resolution via experimental approaches, which are generally time-consuming and expensive. Developing computational methods is a good strategy for accurate in silico detection of m6A modification sites from the large amount of RNA sequence data. Unfortunately, the existing computational models are usually only for m6A site prediction in a single species, without considering the tissue level of species, while most of them are constructed based on low-confidence level data generated by an m6A antibody immunoprecipitation (IP)-based sequencing method, thereby restricting reliability and generalizability of proposed models. Here, we review recent advances in computational prediction of m6A sites and construct a new computational approach named im6APred using ensemble deep learning to accurately identify m6A sites based on high-confidence level data in multiple tissues of mammals. Our model im6APred builds upon a comprehensive evaluation of multiple classification methods, including four traditional classification algorithms and three deep learning methods and their ensembles. The optimal base-classifier combinations are then chosen by five-fold cross-validation test to achieve an effective stacked model. Our model im6APred can produce the area under the receiver operating characteristic curve (AUROC) in the range of 0.82-0.91 on independent tests, indicating that our model has the ability to learn general methylation rules on RNA bases and generalize to m6A transcriptome-wide identification. Moreover, AUROCs in the range of 0.77-0.96 were achieved using cross-species/tissues validation on the benchmark dataset, demonstrating differences in predictive performance at the tissue level and the need for constructing tissue-specific models for m6A site prediction.
Collapse
Affiliation(s)
| | | | | | - Zhaochun Xu
- Computer Department, Jingdezhen Ceramic University, Jingdezhen 333403, China
| | - Xuan Xiao
- Computer Department, Jingdezhen Ceramic University, Jingdezhen 333403, China
| |
Collapse
|
29
|
Yan J, Cai J, Zhang B, Wang Y, Wong DF, Siu SWI. Recent Progress in the Discovery and Design of Antimicrobial Peptides Using Traditional Machine Learning and Deep Learning. Antibiotics (Basel) 2022; 11:1451. [PMID: 36290108 PMCID: PMC9598685 DOI: 10.3390/antibiotics11101451] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Revised: 10/11/2022] [Accepted: 10/13/2022] [Indexed: 11/16/2022] Open
Abstract
Antimicrobial resistance has become a critical global health problem due to the abuse of conventional antibiotics and the rise of multi-drug-resistant microbes. Antimicrobial peptides (AMPs) are a group of natural peptides that show promise as next-generation antibiotics due to their low toxicity to the host, broad spectrum of biological activity, including antibacterial, antifungal, antiviral, and anti-parasitic activities, and great therapeutic potential, such as anticancer, anti-inflammatory, etc. Most importantly, AMPs kill bacteria by damaging cell membranes using multiple mechanisms of action rather than targeting a single molecule or pathway, making it difficult for bacterial drug resistance to develop. However, experimental approaches used to discover and design new AMPs are very expensive and time-consuming. In recent years, there has been considerable interest in using in silico methods, including traditional machine learning (ML) and deep learning (DL) approaches, to drug discovery. While there are a few papers summarizing computational AMP prediction methods, none of them focused on DL methods. In this review, we aim to survey the latest AMP prediction methods achieved by DL approaches. First, the biology background of AMP is introduced, then various feature encoding methods used to represent the features of peptide sequences are presented. We explain the most popular DL techniques and highlight the recent works based on them to classify AMPs and design novel peptide sequences. Finally, we discuss the limitations and challenges of AMP prediction.
Collapse
Affiliation(s)
- Jielu Yan
- PAMI Research Group, Department of Computer and Information Science, University of Macau, Taipa, Macau, China
| | - Jianxiu Cai
- Faculty of Applied Sciences, Macao Polytechnic University, Macau, China
- Institute of Science and Environment, University of Saint Joseph, Estr. Marginal da Ilha Verde, Macau, China
| | - Bob Zhang
- PAMI Research Group, Department of Computer and Information Science, University of Macau, Taipa, Macau, China
| | - Yapeng Wang
- Faculty of Applied Sciences, Macao Polytechnic University, Macau, China
| | - Derek F. Wong
- NLP2CT Lab, Department of Computer and Information Science, University of Macau, Taipa, Macau, China
| | - Shirley W. I. Siu
- Institute of Science and Environment, University of Saint Joseph, Estr. Marginal da Ilha Verde, Macau, China
- School of Pharmaceutical Sciences, Universiti Sains Malaysia, Pulau Pinang 11800, Malaysia
| |
Collapse
|
30
|
An attention-based hybrid deep neural networks for accurate identification of transcription factor binding sites. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07502-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
31
|
Jiang L, Tang J, Guo F, Guo Y. Prediction of Major Histocompatibility Complex Binding with Bilateral and Variable Long Short Term Memory Networks. BIOLOGY 2022; 11:biology11060848. [PMID: 35741369 PMCID: PMC9220200 DOI: 10.3390/biology11060848] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 05/25/2022] [Accepted: 05/27/2022] [Indexed: 11/18/2022]
Abstract
Simple Summary Major histocompatibility complex molecules are of significant biological and clinical importance due to their utility in immunotherapy. The prediction of potential MHC binding peptides can estimate a T-cell immune response. The variable length of existing MHC binding peptides creates difficulty for MHC binding prediction algorithms. Thus, we utilized a bilateral and variable long-short term memory neural network to address this specific problem and developed a novel MHC binding prediction tool. Abstract As an important part of immune surveillance, major histocompatibility complex (MHC) is a set of proteins that recognize foreign molecules. Computational prediction methods for MHC binding peptides have been developed. However, existing methods share the limitation of fixed peptide sequence length, which necessitates the training of models by peptide length or prediction with a length reduction technique. Using a bidirectional long short-term memory neural network, we constructed BVMHC, an MHC class I and II binding prediction tool that is independent of peptide length. The performance of BVMHC was compared to seven MHC class I prediction tools and three MHC class II prediction tools using eight performance criteria independently. BVMHC attained the best performance in three of the eight criteria for MHC class I, and the best performance in four of the eight criteria for MHC class II, including accuracy and AUC. Furthermore, models for non-human species were also trained using the same strategy and made available for applications in mice, chimpanzees, macaques, and rats. BVMHC is composed of a series of peptide length independent MHC class I and II binding predictors. Models from this study have been implemented in an online web portal for easy access and use.
Collapse
Affiliation(s)
- Limin Jiang
- Comprehensive Cancer Center, Department of Internal Medicine, University of New Mexico, Albuquerque, NM 87131, USA;
| | - Jijun Tang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China;
| | - Fei Guo
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
- Correspondence: (F.G.); (Y.G.)
| | - Yan Guo
- Comprehensive Cancer Center, Department of Internal Medicine, University of New Mexico, Albuquerque, NM 87131, USA;
- Correspondence: (F.G.); (Y.G.)
| |
Collapse
|
32
|
Wang F, Wang H, Wang L, Lu H, Qiu S, Zang T, Zhang X, Hu Y. MHCRoBERTa: pan-specific peptide-MHC class I binding prediction through transfer learning with label-agnostic protein sequences. Brief Bioinform 2022; 23:6571528. [PMID: 35443027 DOI: 10.1093/bib/bbab595] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2021] [Revised: 12/14/2021] [Accepted: 12/23/2021] [Indexed: 11/14/2022] Open
Abstract
Predicting the binding of peptide and major histocompatibility complex (MHC) plays a vital role in immunotherapy for cancer. The success of Alphafold of applying natural language processing (NLP) algorithms in protein secondary struction prediction has inspired us to explore the possibility of NLP methods in predicting peptide-MHC class I binding. Based on the above motivations, we propose the MHCRoBERTa method, RoBERTa pre-training approach, for predicting the binding affinity between type I MHC and peptides. Analysis of the results on benchmark dataset demonstrates that MHCRoBERTa can outperform other state-of-art prediction methods with an increase of the Spearman rank correlation coefficient (SRCC) value. Notably, our model gave a significant improvement on IC50 value. Our method has achieved SRCC value and AUC value as 0.785 and 0.817, respectively. Our SRCC value is 14.3% higher than NetMHCpan3.0 (the second highest SRCC value on pan-specific) and is 3% higher than MHCflurry (the second highest SRCC value on all methods). The AUC value is also better than any other pan-specific methods. Moreover, we visualize the multi-head self-attention for the token representation across the layers and heads by this method. Through the analysis of the representation of each layer and head, we can show whether the model has learned the syntax and semantics necessary to perform the prediction task well. All these results demonstrate that our model can accurately predict the peptide-MHC class I binding affinity and that MHCRoBERTa is a powerful tool for screening potential neoantigens for cancer immunotherapy. MHCRoBERTa is available as an open source software at github (https://github.com/FuxuWang/MHCRoBERTa).
Collapse
Affiliation(s)
- Fuxu Wang
- Center for Bioinformatics, Faculty of computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Haoyan Wang
- Center for Bioinformatics, Faculty of computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Lizhuang Wang
- General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin, China
| | - Haoyu Lu
- Center for Bioinformatics, school of life science and technology, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Shizheng Qiu
- Center for Bioinformatics, school of life science and technology, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Tianyi Zang
- Cisco Research, NLP team, California, United States
| | - Xinjun Zhang
- Center for Bioinformatics, Faculty of computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Yang Hu
- Center for Bioinformatics, Faculty of computing, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| |
Collapse
|
33
|
A transformer-based model to predict peptide–HLA class I binding and optimize mutated peptides for vaccine design. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00459-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
34
|
Borden ES, Buetow KH, Wilson MA, Hastings KT. Cancer Neoantigens: Challenges and Future Directions for Prediction, Prioritization, and Validation. Front Oncol 2022; 12:836821. [PMID: 35311072 PMCID: PMC8929516 DOI: 10.3389/fonc.2022.836821] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Accepted: 02/07/2022] [Indexed: 12/16/2022] Open
Abstract
Prioritization of immunogenic neoantigens is key to enhancing cancer immunotherapy through the development of personalized vaccines, adoptive T cell therapy, and the prediction of response to immune checkpoint inhibition. Neoantigens are tumor-specific proteins that allow the immune system to recognize and destroy a tumor. Cancer immunotherapies, such as personalized cancer vaccines, adoptive T cell therapy, and immune checkpoint inhibition, rely on an understanding of the patient-specific neoantigen profile in order to guide personalized therapeutic strategies. Genomic approaches to predicting and prioritizing immunogenic neoantigens are rapidly expanding, raising new opportunities to advance these tools and enhance their clinical relevance. Predicting neoantigens requires acquisition of high-quality samples and sequencing data, followed by variant calling and variant annotation. Subsequently, prioritizing which of these neoantigens may elicit a tumor-specific immune response requires application and integration of tools to predict the expression, processing, binding, and recognition potentials of the neoantigen. Finally, improvement of the computational tools is held in constant tension with the availability of datasets with validated immunogenic neoantigens. The goal of this review article is to summarize the current knowledge and limitations in neoantigen prediction, prioritization, and validation and propose future directions that will improve personalized cancer treatment.
Collapse
Affiliation(s)
- Elizabeth S Borden
- Department of Basic Medical Sciences, College of Medicine-Phoenix, University of Arizona, Phoenix, AZ, United States.,Department of Research and Internal Medicine (Dermatology), Phoenix Veterans Affairs Health Care System, Phoenix, AZ, United States
| | - Kenneth H Buetow
- School of Life Sciences, Arizona State University, Tempe, AZ, United States.,Center for Evolution and Medicine, Arizona State University, Tempe, AZ, United States
| | - Melissa A Wilson
- School of Life Sciences, Arizona State University, Tempe, AZ, United States.,Center for Evolution and Medicine, Arizona State University, Tempe, AZ, United States
| | - Karen Taraszka Hastings
- Department of Basic Medical Sciences, College of Medicine-Phoenix, University of Arizona, Phoenix, AZ, United States.,Department of Research and Internal Medicine (Dermatology), Phoenix Veterans Affairs Health Care System, Phoenix, AZ, United States
| |
Collapse
|
35
|
Dickinson Q, Meyer JG. Positional SHAP (PoSHAP) for Interpretation of machine learning models trained from biological sequences. PLoS Comput Biol 2022; 18:e1009736. [PMID: 35089914 PMCID: PMC8797255 DOI: 10.1371/journal.pcbi.1009736] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Accepted: 12/09/2021] [Indexed: 11/29/2022] Open
Abstract
Machine learning with multi-layered artificial neural networks, also known as "deep learning," is effective for making biological predictions. However, model interpretation is challenging, especially for sequential input data used with recurrent neural network architectures. Here, we introduce a framework called "Positional SHAP" (PoSHAP) to interpret models trained from biological sequences by utilizing SHapely Additive exPlanations (SHAP) to generate positional model interpretations. We demonstrate this using three long short-term memory (LSTM) regression models that predict peptide properties, including binding affinity to major histocompatibility complexes (MHC), and collisional cross section (CCS) measured by ion mobility spectrometry. Interpretation of these models with PoSHAP reproduced MHC class I (rhesus macaque Mamu-A1*001 and human A*11:01) peptide binding motifs, reflected known properties of peptide CCS, and provided new insights into interpositional dependencies of amino acid interactions. PoSHAP should have widespread utility for interpreting a variety of models trained from biological sequences.
Collapse
Affiliation(s)
- Quinn Dickinson
- Department of Biochemistry, Medical College of Wisconsin, Milwaukee, Wisconsin
| | - Jesse G. Meyer
- Department of Biochemistry, Medical College of Wisconsin, Milwaukee, Wisconsin
| |
Collapse
|
36
|
Li G, Iyer B, Prasath VBS, Ni Y, Salomonis N. DeepImmuno: deep learning-empowered prediction and generation of immunogenic peptides for T-cell immunity. Brief Bioinform 2021; 22:bbab160. [PMID: 34009266 PMCID: PMC8135853 DOI: 10.1093/bib/bbab160] [Citation(s) in RCA: 56] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2020] [Revised: 03/26/2021] [Accepted: 04/05/2021] [Indexed: 02/07/2023] Open
Abstract
Cytolytic T-cells play an essential role in the adaptive immune system by seeking out, binding and killing cells that present foreign antigens on their surface. An improved understanding of T-cell immunity will greatly aid in the development of new cancer immunotherapies and vaccines for life-threatening pathogens. Central to the design of such targeted therapies are computational methods to predict non-native peptides to elicit a T-cell response, however, we currently lack accurate immunogenicity inference methods. Another challenge is the ability to accurately simulate immunogenic peptides for specific human leukocyte antigen alleles, for both synthetic biological applications, and to augment real training datasets. Here, we propose a beta-binomial distribution approach to derive peptide immunogenic potential from sequence alone. We conducted systematic benchmarking of five traditional machine learning (ElasticNet, K-nearest neighbors, support vector machine, Random Forest and AdaBoost) and three deep learning models (convolutional neural network (CNN), Residual Net and graph neural network) using three independent prior validated immunogenic peptide collections (dengue virus, cancer neoantigen and SARS-CoV-2). We chose the CNN as the best prediction model, based on its adaptivity for small and large datasets and performance relative to existing methods. In addition to outperforming two highly used immunogenicity prediction algorithms, DeepImmuno-CNN correctly predicts which residues are most important for T-cell antigen recognition and predicts novel impacts of SARS-CoV-2 variants. Our independent generative adversarial network (GAN) approach, DeepImmuno-GAN, was further able to accurately simulate immunogenic peptides with physicochemical properties and immunogenicity predictions similar to that of real antigens. We provide DeepImmuno-CNN as source code and an easy-to-use web interface.
Collapse
Affiliation(s)
- Guangyuan Li
- University of Cincinnati, 3333 Burnet Ave, MLC7024, Cincinnati, OH 45267, USA
| | | | - V B Surya Prasath
- Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, USA
| | - Yizhao Ni
- Cincinnati Children’s Hospital Medical Center, USA
| | | |
Collapse
|
37
|
Xu Y, Su GH, Ma D, Xiao Y, Shao ZM, Jiang YZ. Technological advances in cancer immunity: from immunogenomics to single-cell analysis and artificial intelligence. Signal Transduct Target Ther 2021; 6:312. [PMID: 34417437 PMCID: PMC8377461 DOI: 10.1038/s41392-021-00729-7] [Citation(s) in RCA: 63] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Revised: 07/06/2021] [Accepted: 07/18/2021] [Indexed: 02/07/2023] Open
Abstract
Immunotherapies play critical roles in cancer treatment. However, given that only a few patients respond to immune checkpoint blockades and other immunotherapeutic strategies, more novel technologies are needed to decipher the complicated interplay between tumor cells and the components of the tumor immune microenvironment (TIME). Tumor immunomics refers to the integrated study of the TIME using immunogenomics, immunoproteomics, immune-bioinformatics, and other multi-omics data reflecting the immune states of tumors, which has relied on the rapid development of next-generation sequencing. High-throughput genomic and transcriptomic data may be utilized for calculating the abundance of immune cells and predicting tumor antigens, referring to immunogenomics. However, as bulk sequencing represents the average characteristics of a heterogeneous cell population, it fails to distinguish distinct cell subtypes. Single-cell-based technologies enable better dissection of the TIME through precise immune cell subpopulation and spatial architecture investigations. In addition, radiomics and digital pathology-based deep learning models largely contribute to research on cancer immunity. These artificial intelligence technologies have performed well in predicting response to immunotherapy, with profound significance in cancer therapy. In this review, we briefly summarize conventional and state-of-the-art technologies in the field of immunogenomics, single-cell and artificial intelligence, and present prospects for future research.
Collapse
Affiliation(s)
- Ying Xu
- Key Laboratory of Breast Cancer in Shanghai, Department of Breast Surgery, Fudan University Shanghai Cancer Center, Shanghai, China
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
| | - Guan-Hua Su
- Key Laboratory of Breast Cancer in Shanghai, Department of Breast Surgery, Fudan University Shanghai Cancer Center, Shanghai, China
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
| | - Ding Ma
- Key Laboratory of Breast Cancer in Shanghai, Department of Breast Surgery, Fudan University Shanghai Cancer Center, Shanghai, China
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
| | - Yi Xiao
- Key Laboratory of Breast Cancer in Shanghai, Department of Breast Surgery, Fudan University Shanghai Cancer Center, Shanghai, China.
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China.
| | - Zhi-Ming Shao
- Key Laboratory of Breast Cancer in Shanghai, Department of Breast Surgery, Fudan University Shanghai Cancer Center, Shanghai, China.
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China.
- Institutes of Biomedical Sciences, Fudan University, Shanghai, China.
| | - Yi-Zhou Jiang
- Key Laboratory of Breast Cancer in Shanghai, Department of Breast Surgery, Fudan University Shanghai Cancer Center, Shanghai, China.
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China.
| |
Collapse
|
38
|
Moris P, De Pauw J, Postovskaya A, Gielis S, De Neuter N, Bittremieux W, Ogunjimi B, Laukens K, Meysman P. Current challenges for unseen-epitope TCR interaction prediction and a new perspective derived from image classification. Brief Bioinform 2021; 22:bbaa318. [PMID: 33346826 PMCID: PMC8294552 DOI: 10.1093/bib/bbaa318] [Citation(s) in RCA: 69] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
The prediction of epitope recognition by T-cell receptors (TCRs) has seen many advancements in recent years, with several methods now available that can predict recognition for a specific set of epitopes. However, the generic case of evaluating all possible TCR-epitope pairs remains challenging, mainly due to the high diversity of the interacting sequences and the limited amount of currently available training data. In this work, we provide an overview of the current state of this unsolved problem. First, we examine appropriate validation strategies to accurately assess the generalization performance of generic TCR-epitope recognition models when applied to both seen and unseen epitopes. In addition, we present a novel feature representation approach, which we call ImRex (interaction map recognition). This approach is based on the pairwise combination of physicochemical properties of the individual amino acids in the CDR3 and epitope sequences, which provides a convolutional neural network with the combined representation of both sequences. Lastly, we highlight various challenges that are specific to TCR-epitope data and that can adversely affect model performance. These include the issue of selecting negative data, the imbalanced epitope distribution of curated TCR-epitope datasets and the potential exchangeability of TCR alpha and beta chains. Our results indicate that while extrapolation to unseen epitopes remains a difficult challenge, ImRex makes this feasible for a subset of epitopes that are not too dissimilar from the training data. We show that appropriate feature engineering methods and rigorous benchmark standards are required to create and validate TCR-epitope predictive models.
Collapse
MESH Headings
- Animals
- Complementarity Determining Regions/genetics
- Complementarity Determining Regions/immunology
- Epitopes, T-Lymphocyte/genetics
- Epitopes, T-Lymphocyte/immunology
- Humans
- Macaca mulatta
- Mice
- Models, Genetic
- Models, Immunological
- Receptors, Antigen, T-Cell, alpha-beta/genetics
- Receptors, Antigen, T-Cell, alpha-beta/immunology
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Pieter Meysman
- Corresponding author: Pieter Meysman, Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, 2020, Belgium. E-mail:
| |
Collapse
|
39
|
Feng P, Zeng J, Ma J. Predicting MHC-peptide binding affinity by differential boundary tree. Bioinformatics 2021; 37:i254-i261. [PMID: 34252932 PMCID: PMC8275335 DOI: 10.1093/bioinformatics/btab312] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2021] [Accepted: 04/27/2021] [Indexed: 11/24/2022] Open
Abstract
Motivation The prediction of the binding between peptides and major histocompatibility complex (MHC) molecules plays an important role in neoantigen identification. Although a large number of computational methods have been developed to address this problem, they produce high false-positive rates in practical applications, since in most cases, a single residue mutation may largely alter the binding affinity of a peptide binding to MHC which cannot be identified by conventional deep learning methods. Results We developed a differential boundary tree-based model, named DBTpred, to address this problem. We demonstrated that DBTpred can accurately predict MHC class I binding affinity compared to the state-of-art deep learning methods. We also presented a parallel training algorithm to accelerate the training and inference process which enables DBTpred to be applied to large datasets. By investigating the statistical properties of differential boundary trees and the prediction paths to test samples, we revealed that DBTpred can provide an intuitive interpretation and possible hints in detecting important residue mutations that can largely influence binding affinity. Availability and implementation The DBTpred package is implemented in Python and freely available at: https://github.com/fpy94/DBT. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Peiyuan Feng
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Jianyang Zeng
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China.,MOE Key Laboratory of Bioinformatics, Tsinghua University, Beijing, China
| | - Jianzhu Ma
- Institute for Artificial Intelligence, Peking University, China
| |
Collapse
|
40
|
Jiang L, Yu H, Li J, Tang J, Guo Y, Guo F. Predicting MHC class I binder: existing approaches and a novel recurrent neural network solution. Brief Bioinform 2021; 22:6299205. [PMID: 34131696 DOI: 10.1093/bib/bbab216] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Revised: 05/14/2021] [Accepted: 05/17/2021] [Indexed: 01/04/2023] Open
Abstract
Major histocompatibility complex (MHC) possesses important research value in the treatment of complex human diseases. A plethora of computational tools has been developed to predict MHC class I binders. Here, we comprehensively reviewed 27 up-to-date MHC I binding prediction tools developed over the last decade, thoroughly evaluating feature representation methods, prediction algorithms and model training strategies on a benchmark dataset from Immune Epitope Database. A common limitation was identified during the review that all existing tools can only handle a fixed peptide sequence length. To overcome this limitation, we developed a bilateral and variable long short-term memory (BVLSTM)-based approach, named BVLSTM-MHC. It is the first variable-length MHC class I binding predictor. In comparison to the 10 mainstream prediction tools on an independent validation dataset, BVLSTM-MHC achieved the best performance in six out of eight evaluated metrics. A web server based on the BVLSTM-MHC model was developed to enable accurate and efficient MHC class I binder prediction in human, mouse, macaque and chimpanzee.
Collapse
Affiliation(s)
- Limin Jiang
- Comprehensive cancer center, Department of Internal Medicine, University of New Mexico, Albuquerque, NM, USA
| | - Hui Yu
- Comprehensive cancer center, Department of Internal Medicine, University of New Mexico, Albuquerque, NM, USA
| | - Jiawei Li
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Jijun Tang
- Department of Computer Science, University of South Carolina, SC, USA.,Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Yan Guo
- Comprehensive cancer center, Department of Internal Medicine, University of New Mexico, Albuquerque, NM, USA
| | - Fei Guo
- School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
41
|
Cheng J, Bendjama K, Rittner K, Malone B. BERTMHC: Improved MHC-peptide class II interaction prediction with transformer and multiple instance learning. Bioinformatics 2021; 37:4172-4179. [PMID: 34096999 PMCID: PMC9502151 DOI: 10.1093/bioinformatics/btab422] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Revised: 05/17/2021] [Accepted: 06/04/2021] [Indexed: 11/12/2022] Open
Abstract
Motivation Increasingly comprehensive characterization of cancer-associated genetic alterations has paved the way for the development of highly specific therapeutic vaccines. Predicting precisely the binding and presentation of peptides to major histocompatibility complex (MHC) alleles is an important step toward such therapies. Recent data suggest that presentation of both class I and II epitopes are critical for the induction of a sustained effective immune response. However, the prediction performance for MHC class II has been limited compared to class I. Results We present a transformer neural network model which leverages self-supervised pretraining from a large corpus of protein sequences. We also propose a multiple instance learning (MIL) framework to deconvolve mass spectrometry data where multiple potential MHC alleles may have presented each peptide. We show that pretraining boosted the performance for these tasks. Combining pretraining and the novel MIL approach, our model outperforms state-of-the-art models based on peptide and MHC sequence only for both binding and cell surface presentation predictions. Availability and implementation Our source code is available at https://github.com/s6juncheng/BERTMHC under a noncommercial license. A webserver is available at https://bertmhc.privacy.nlehd.de/ Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jun Cheng
- NEC Laboratories Europe GmbH Kurfuersten-Anlage 36, 69115 Heidelberg, Germany
| | - Kaïdre Bendjama
- Transgene, Boulevard Gonthier d'Andernach, 67400 Illkirch-Graffenstaden, France
| | - Karola Rittner
- Transgene, Boulevard Gonthier d'Andernach, 67400 Illkirch-Graffenstaden, France
| | - Brandon Malone
- NEC Laboratories Europe GmbH Kurfuersten-Anlage 36, 69115 Heidelberg, Germany
| |
Collapse
|
42
|
Chen Z, Min MR, Ning X. Ranking-Based Convolutional Neural Network Models for Peptide-MHC Class I Binding Prediction. Front Mol Biosci 2021; 8:634836. [PMID: 34079815 PMCID: PMC8165219 DOI: 10.3389/fmolb.2021.634836] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2020] [Accepted: 02/16/2021] [Indexed: 01/01/2023] Open
Abstract
T-cell receptors can recognize foreign peptides bound to major histocompatibility complex (MHC) class-I proteins, and thus trigger the adaptive immune response. Therefore, identifying peptides that can bind to MHC class-I molecules plays a vital role in the design of peptide vaccines. Many computational methods, for example, the state-of-the-art allele-specific method MHCflurry , have been developed to predict the binding affinities between peptides and MHC molecules. In this manuscript, we develop two allele-specific Convolutional Neural Network-based methods named ConvM and SpConvM to tackle the binding prediction problem. Specifically, we formulate the problem as to optimize the rankings of peptide-MHC bindings via ranking-based learning objectives. Such optimization is more robust and tolerant to the measurement inaccuracy of binding affinities, and therefore enables more accurate prioritization of binding peptides. In addition, we develop a new position encoding method in ConvM and SpConvM to better identify the most important amino acids for the binding events. We conduct a comprehensive set of experiments using the latest Immune Epitope Database (IEDB) datasets. Our experimental results demonstrate that our models significantly outperform the state-of-the-art methods including MHCflurry with an average percentage improvement of 6.70% on AUC and 17.10% on ROC5 across 128 alleles.
Collapse
Affiliation(s)
- Ziqi Chen
- Computer Science and Engineering Department, The Ohio State University, Columbus, OH, United States
| | | | - Xia Ning
- Computer Science and Engineering Department, The Ohio State University, Columbus, OH, United States
- Biomedical Informatics Department, The Ohio State University, Columbus, OH, United States
- Translational Data Analytics Institute, The Ohio State University, Columbus, OH, United States
| |
Collapse
|
43
|
Yang X, Zhao L, Wei F, Li J. DeepNetBim: deep learning model for predicting HLA-epitope interactions based on network analysis by harnessing binding and immunogenicity information. BMC Bioinformatics 2021; 22:231. [PMID: 33952199 PMCID: PMC8097772 DOI: 10.1186/s12859-021-04155-y] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2020] [Accepted: 04/27/2021] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Epitope prediction is a useful approach in cancer immunology and immunotherapy. Many computational methods, including machine learning and network analysis, have been developed quickly for such purposes. However, regarding clinical applications, the existing tools are insufficient because few of the predicted binding molecules are immunogenic. Hence, to develop more potent and effective vaccines, it is important to understand binding and immunogenic potential. Here, we observed that the interactive association constituted by human leukocyte antigen (HLA)-peptide pairs can be regarded as a network in which each HLA and peptide is taken as a node. We speculated whether this network could detect the essential interactive propensities embedded in HLA-peptide pairs. Thus, we developed a network-based deep learning method called DeepNetBim by harnessing binding and immunogenic information to predict HLA-peptide interactions. RESULTS Quantitative class I HLA-peptide binding data and qualitative immunogenic data (including data generated from T cell activation assays, major histocompatibility complex (MHC) binding assays and MHC ligand elution assays) were retrieved from the Immune Epitope Database database. The weighted HLA-peptide binding network and immunogenic network were integrated into a network-based deep learning algorithm constituted by a convolutional neural network and an attention mechanism. The results showed that the integration of network centrality metrics increased the power of both binding and immunogenicity predictions, while the new model significantly outperformed those that did not include network features and those with shuffled networks. Applied on benchmark and independent datasets, DeepNetBim achieved an AUC score of 93.74% in HLA-peptide binding prediction, outperforming 11 state-of-the-art relevant models. Furthermore, the performance enhancement of the combined model, which filtered out negative immunogenic predictions, was confirmed on neoantigen identification by an increase in both positive predictive value (PPV) and the proportion of neoantigen recognition. CONCLUSIONS We developed a network-based deep learning method called DeepNetBim as a pan-specific epitope prediction tool. It extracted the attributes of the network as new features from HLA-peptide binding and immunogenic models. We observed that not only did DeepNetBim binding model outperform other updated methods but the combination of our two models showed better performance. This indicates further applications in clinical practice.
Collapse
Affiliation(s)
- Xiaoyun Yang
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Liyuan Zhao
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Fang Wei
- Sheng Yushou Center of Cell Biology and Immunology, Joint International Research Laboratory of Metabolic and Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Jing Li
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China.
| |
Collapse
|
44
|
Venkatesh G, Grover A, Srinivasaraghavan G, Rao S. MHCAttnNet: predicting MHC-peptide bindings for MHC alleles classes I and II using an attention-based deep neural model. Bioinformatics 2021; 36:i399-i406. [PMID: 32657386 PMCID: PMC7355292 DOI: 10.1093/bioinformatics/btaa479] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Motivation Accurate prediction of binding between a major histocompatibility complex (MHC) allele and a peptide plays a major role in the synthesis of personalized cancer vaccines. The immune system struggles to distinguish between a cancerous and a healthy cell. In a patient suffering from cancer who has a particular MHC allele, only those peptides that bind with the MHC allele with high affinity, help the immune system recognize the cancerous cells. Results MHCAttnNet is a deep neural model that uses an attention mechanism to capture the relevant subsequences of the amino acid sequences of peptides and MHC alleles. It then uses this to accurately predict the MHC-peptide binding. MHCAttnNet achieves an AUC-PRC score of 94.18% with 161 class I MHC alleles, which outperforms the state-of-the-art models for this task. MHCAttnNet also achieves a better F1-score in comparison to the state-of-the-art models while covering a larger number of class II MHC alleles. The attention mechanism used by MHCAttnNet provides a heatmap over the amino acids thus indicating the important subsequences present in the amino acid sequence. This approach also allows us to focus on a much smaller number of relevant trigrams corresponding to the amino acid sequence of an MHC allele, from 9251 possible trigrams to about 258. This significantly reduces the number of amino acid subsequences that need to be clinically tested. Availability and implementation The data and source code are available at https://github.com/gopuvenkat/MHCAttnNet.
Collapse
Affiliation(s)
| | - Aayush Grover
- International Institute of Information Technology Bangalore, Bangalore 560100, India
| | - G Srinivasaraghavan
- International Institute of Information Technology Bangalore, Bangalore 560100, India
| | - Shrisha Rao
- International Institute of Information Technology Bangalore, Bangalore 560100, India
| |
Collapse
|
45
|
Zhang G, Zeng T, Dai Z, Dai X. Prediction of CRISPR/Cas9 single guide RNA cleavage efficiency and specificity by attention-based convolutional neural networks. Comput Struct Biotechnol J 2021; 19:1445-1457. [PMID: 33841753 PMCID: PMC8010402 DOI: 10.1016/j.csbj.2021.03.001] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Revised: 02/26/2021] [Accepted: 03/01/2021] [Indexed: 12/26/2022] Open
Abstract
CRISPR/Cas9 is a preferred genome editing tool and has been widely adapted to ranges of disciplines, from molecular biology to gene therapy. A key prerequisite for the success of CRISPR/Cas9 is its capacity to distinguish between single guide RNAs (sgRNAs) on target and homologous off-target sites. Thus, optimized design of sgRNAs by maximizing their on-target activity and minimizing their potential off-target mutations are crucial concerns for this system. Several deep learning models have been developed for comprehensive understanding of sgRNA cleavage efficacy and specificity. Although the proposed methods yield the performance results by automatically learning a suitable representation from the input data, there is still room for the improvement of accuracy and interpretability. Here, we propose novel interpretable attention-based convolutional neural networks, namely CRISPR-ONT and CRISPR-OFFT, for the prediction of CRISPR/Cas9 sgRNA on- and off-target activities, respectively. Experimental tests on public datasets demonstrate that our models significantly yield satisfactory results in terms of accuracy and interpretability. Our findings contribute to the understanding of how RNA-guide Cas9 nucleases scan the mammalian genome. Data and source codes are available at https://github.com/Peppags/CRISPRont-CRISPRofft.
Collapse
Affiliation(s)
- Guishan Zhang
- Key Laboratory of Digital Signal and Image Processing of Guangdong Provincial, College of Engineering, Shantou University, Shantou 515063, China.,School of Electronics and Information Technology, Sun Yat-sen University, Guangzhou 510006, China
| | - Tian Zeng
- School of Electronics and Information Technology, Sun Yat-sen University, Guangzhou 510006, China
| | - Zhiming Dai
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510006, China.,Guangdong Province Key Laboratory of Big Data Analysis and Processing, Sun Yat-sen University, Guangzhou 510006, China
| | - Xianhua Dai
- School of Electronics and Information Technology, Sun Yat-sen University, Guangzhou 510006, China.,Southern Marine Science and Engineering Guangdong Laboratory, Zhuhai 519000, China
| |
Collapse
|
46
|
Jin J, Liu Z, Nasiri A, Cui Y, Louis SY, Zhang A, Zhao Y, Hu J. Deep learning pan-specific model for interpretable MHC-I peptide binding prediction with improved attention mechanism. Proteins 2021; 89:866-883. [PMID: 33594723 DOI: 10.1002/prot.26065] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Revised: 01/28/2021] [Accepted: 02/08/2021] [Indexed: 11/06/2022]
Abstract
Accurate prediction of peptide binding affinity to the major histocompatibility complex (MHC) proteins has the potential to design better therapeutic vaccines. Previous work has shown that pan-specific prediction algorithms can achieve better prediction performance than other approaches. However, most of the top algorithms are neural networks based black box models. Here, we propose DeepAttentionPan, an improved pan-specific model, based on convolutional neural networks and attention mechanisms for more flexible, stable and interpretable MHC-I binding prediction. With the attention mechanism, our ensemble model consisting of 20 trained networks achieves high and more stabilized prediction performance. Extensive tests on IEDB's weekly benchmark dataset show that our method achieves state-of-the-art prediction performance on 21 test allele datasets. Analysis of the peptide positional attention weights learned by our model demonstrates its capability to capture critical binding positions of the peptides, which leads to mechanistic understanding of MHC-peptide binding with high alignment with experimentally verified results. Furthermore, we show that with transfer learning, our pan model can be fine-tuned for alleles with few samples to achieve additional performance improvement. DeepAttentionPan is freely available as an open-source software at https://github.com/jjin49/DeepAttentionPan.
Collapse
Affiliation(s)
- Jing Jin
- Department of Computer Science and Engineering, University of South Carolina, Columbia, South Carolina, USA
| | - Zhonghao Liu
- Department of Computer Science and Engineering, University of South Carolina, Columbia, South Carolina, USA
| | - Alireza Nasiri
- Department of Computer Science and Engineering, University of South Carolina, Columbia, South Carolina, USA
| | - Yuxin Cui
- Department of Computer Science and Engineering, University of South Carolina, Columbia, South Carolina, USA
| | - Stephen-Yves Louis
- Department of Computer Science and Engineering, University of South Carolina, Columbia, South Carolina, USA
| | - Ansi Zhang
- Department of Computer Science and Engineering, University of South Carolina, Columbia, South Carolina, USA
| | - Yong Zhao
- Department of Computer Science and Engineering, University of South Carolina, Columbia, South Carolina, USA
| | - Jianjun Hu
- Department of Computer Science and Engineering, University of South Carolina, Columbia, South Carolina, USA
| |
Collapse
|
47
|
Chen C, Wu T, Guo Z, Cheng J. Combination of deep neural network with attention mechanism enhances the explainability of protein contact prediction. Proteins 2021; 89:697-707. [PMID: 33538038 PMCID: PMC8089057 DOI: 10.1002/prot.26052] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Revised: 12/22/2020] [Accepted: 01/31/2021] [Indexed: 12/17/2022]
Abstract
Deep learning has emerged as a revolutionary technology for protein residue‐residue contact prediction since the 2012 CASP10 competition. Considerable advancements in the predictive power of the deep learning‐based contact predictions have been achieved since then. However, little effort has been put into interpreting the black‐box deep learning methods. Algorithms that can interpret the relationship between predicted contact maps and the internal mechanism of the deep learning architectures are needed to explore the essential components of contact inference and improve their explainability. In this study, we present an attention‐based convolutional neural network for protein contact prediction, which consists of two attention mechanism‐based modules: sequence attention and regional attention. Our benchmark results on the CASP13 free‐modeling targets demonstrate that the two attention modules added on top of existing typical deep learning models exhibit a complementary effect that contributes to prediction improvements. More importantly, the inclusion of the attention mechanism provides interpretable patterns that contain useful insights into the key fold‐determining residues in proteins. We expect the attention‐based model can provide a reliable and practically interpretable technique that helps break the current bottlenecks in explaining deep neural networks for contact prediction. The source code of our method is available at https://github.com/jianlin-cheng/InterpretContactMap.
Collapse
Affiliation(s)
- Chen Chen
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| | - Tianqi Wu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| | - Zhiye Guo
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| |
Collapse
|
48
|
A machine learning-based framework for modeling transcription elongation. Proc Natl Acad Sci U S A 2021; 118:2007450118. [PMID: 33526657 DOI: 10.1073/pnas.2007450118] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
RNA polymerase II (Pol II) generally pauses at certain positions along gene bodies, thereby interrupting the transcription elongation process, which is often coupled with various important biological functions, such as precursor mRNA splicing and gene expression regulation. Characterizing the transcriptional elongation dynamics can thus help us understand many essential biological processes in eukaryotic cells. However, experimentally measuring Pol II elongation rates is generally time and resource consuming. We developed PEPMAN (polymerase II elongation pausing modeling through attention-based deep neural network), a deep learning-based model that accurately predicts Pol II pausing sites based on the native elongating transcript sequencing (NET-seq) data. Through fully taking advantage of the attention mechanism, PEPMAN is able to decipher important sequence features underlying Pol II pausing. More importantly, we demonstrated that the analyses of the PEPMAN-predicted results around various types of alternative splicing sites can provide useful clues into understanding the cotranscriptional splicing events. In addition, associating the PEPMAN prediction results with different epigenetic features can help reveal important factors related to the transcription elongation process. All these results demonstrated that PEPMAN can provide a useful and effective tool for modeling transcription elongation and understanding the related biological factors from available high-throughput sequencing data.
Collapse
|
49
|
Mei S, Li F, Xiang D, Ayala R, Faridi P, Webb GI, Illing PT, Rossjohn J, Akutsu T, Croft NP, Purcell AW, Song J. Anthem: a user customised tool for fast and accurate prediction of binding between peptides and HLA class I molecules. Brief Bioinform 2021; 22:6102669. [PMID: 33454737 DOI: 10.1093/bib/bbaa415] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 11/29/2020] [Accepted: 12/16/2020] [Indexed: 12/17/2022] Open
Abstract
Neopeptide-based immunotherapy has been recognised as a promising approach for the treatment of cancers. For neopeptides to be recognised by CD8+ T cells and induce an immune response, their binding to human leukocyte antigen class I (HLA-I) molecules is a necessary first step. Most epitope prediction tools thus rely on the prediction of such binding. With the use of mass spectrometry, the scale of naturally presented HLA ligands that could be used to develop such predictors has been expanded. However, there are rarely efforts that focus on the integration of these experimental data with computational algorithms to efficiently develop up-to-date predictors. Here, we present Anthem for accurate HLA-I binding prediction. In particular, we have developed a user-friendly framework to support the development of customisable HLA-I binding prediction models to meet challenges associated with the rapidly increasing availability of large amounts of immunopeptidomic data. Our extensive evaluation, using both independent and experimental datasets shows that Anthem achieves an overall similar or higher area under curve value compared with other contemporary tools. It is anticipated that Anthem will provide a unique opportunity for the non-expert user to analyse and interpret their own in-house or publicly deposited datasets.
Collapse
Affiliation(s)
- Shutao Mei
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Australia
| | - Fuyi Li
- Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Australia
| | - Dongxu Xiang
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Australia
| | - Rochelle Ayala
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Australia
| | - Pouya Faridi
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Australia
| | | | - Patricia T Illing
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Australia
| | - Jamie Rossjohn
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Australia
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Japan
| | - Nathan P Croft
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Australia
| | - Anthony W Purcell
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Australia
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Biochemistry and Molecular Biology, Monash University, Australia
| |
Collapse
|
50
|
Ye Y, Wang J, Xu Y, Wang Y, Pan Y, Song Q, Liu X, Wan J. MATHLA: a robust framework for HLA-peptide binding prediction integrating bidirectional LSTM and multiple head attention mechanism. BMC Bioinformatics 2021; 22:7. [PMID: 33407098 PMCID: PMC7787246 DOI: 10.1186/s12859-020-03946-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Accepted: 12/21/2020] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Accurate prediction of binding between class I human leukocyte antigen (HLA) and neoepitope is critical for target identification within personalized T-cell based immunotherapy. Many recent prediction tools developed upon the deep learning algorithms and mass spectrometry data have indeed showed improvement on the average predicting power for class I HLA-peptide interaction. However, their prediction performances show great variability over individual HLA alleles and peptides with different lengths, which is particularly the case for HLA-C alleles due to the limited amount of experimental data. To meet the increasing demand for attaining the most accurate HLA-peptide binding prediction for individual patient in the real-world clinical studies, more advanced deep learning framework with higher prediction accuracy for HLA-C alleles and longer peptides is highly desirable. RESULTS We present a pan-allele HLA-peptide binding prediction framework-MATHLA which integrates bi-directional long short-term memory network and multiple head attention mechanism. This model achieves better prediction accuracy in both fivefold cross-validation test and independent test dataset. In addition, this model is superior over existing tools regarding to the prediction accuracy for longer ligand ranging from 11 to 15 amino acids. Moreover, our model also shows a significant improvement for HLA-C-peptide-binding prediction. By investigating multiple-head attention weight scores, we depicted possible interaction patterns between three HLA I supergroups and their cognate peptides. CONCLUSION Our method demonstrates the necessity of further development of deep learning algorithm in improving and interpreting HLA-peptide binding prediction in parallel to increasing the amount of high-quality HLA ligandome data.
Collapse
Affiliation(s)
- Yilin Ye
- Shenzhen Neocura Biotechnology Co. Ltd., Shenzhen, 518055, China.,School of Computer Science and Technology, Heilongjiang University, Harbin, 150080, China
| | - Jian Wang
- Shenzhen Neocura Biotechnology Co. Ltd., Shenzhen, 518055, China
| | - Yunwan Xu
- Shenzhen Neocura Biotechnology Co. Ltd., Shenzhen, 518055, China
| | - Yi Wang
- Shenzhen Neocura Biotechnology Co. Ltd., Shenzhen, 518055, China
| | - Youdong Pan
- Shenzhen Neocura Biotechnology Co. Ltd., Shenzhen, 518055, China
| | - Qi Song
- Shenzhen Neocura Biotechnology Co. Ltd., Shenzhen, 518055, China
| | - Xing Liu
- The Center for Microbes, Development and Health, Key Laboratory of Molecular Virology and Immunology, Institut Pasteur of Shanghai, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Ji Wan
- Shenzhen Neocura Biotechnology Co. Ltd., Shenzhen, 518055, China.
| |
Collapse
|