1
|
Jiang L, Baker SF, Ceccarelil M, Guo Y. Comprehensive Profiling of Computational Techniques for Sequencing-Based HLA Immune Signatures Extraction. HLA 2025; 105:e70049. [PMID: 39957309 PMCID: PMC11839181 DOI: 10.1111/tan.70049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2024] [Revised: 01/22/2025] [Accepted: 01/26/2025] [Indexed: 02/18/2025]
Abstract
HLA typing is crucial for clinical and research applications, including transplantation, disease association studies and personalised medicine. This study provides an in-depth analysis of five key challenges in computational HLA typing methods: varying lengths, high polymorphism, complex phylogenetic structures, high sequence similarity and the requirement for frequent updates. This study evaluates 27 computation-based HLA typing tools developed over the past 12 years, using a novel sequencing dataset of A549 cell lines with matched short-read RNA-Seq and long-read Iso-Seq data. We comprehensively investigated these 27 tools in terms of accessibility, capability, reliability, reproducibility, scalability and performance. In addition, we discuss the advantages and disadvantages of current tools and identify critical areas for future research and development to advance HLA typing technologies.
Collapse
Affiliation(s)
- Limin Jiang
- Department of Public Health and Sciences, University of Miami, Miami, Florida, USA
| | - Steven F Baker
- Infectious Disease Program, Lovelace Biomedical Research Institute, Albuquerque, New Mexico, USA
- Department of Molecular Genetics & Microbiology, University of New Mexico, Albuquerque, New Mexico, USA
| | - Michele Ceccarelil
- Department of Public Health and Sciences, University of Miami, Miami, Florida, USA
| | - Yan Guo
- Department of Public Health and Sciences, University of Miami, Miami, Florida, USA
| |
Collapse
|
2
|
Jiang S, Su Z, Bloodworth N, Liu Y, Martina C, Harrison DG, Meiler J. Machine learning application to predict binding affinity between peptide containing non-canonical amino acids and HLA0201. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.19.624425. [PMID: 39605664 PMCID: PMC11601666 DOI: 10.1101/2024.11.19.624425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
Class 1 major histocompatibility complexes (MHC-I), encoded by the highly polymorphic HLA-A, HLA-B, and HLA-C genes in humans, are expressed on all nucleated cells. Both self and foreign proteins are processed to peptides of 8 to 10 amino acids, loaded into MCH-1 within the endoplasmic reticulum and then presented on the cell surface. Foreign peptides presented in this fashion activate CD8+ T cells and their immunogenicity correlates with their affinity for the MHC-1 binding groove. Thus, predicting antigen binding affinity for MHC-I is a valuable tool for identifying potentially immunogenic antigens. While quite a few predictors for MHC-I binding exist, there are no currently available tools that can predict antigen/MHC-I binding affinity for antigens with explicitly labeled post-translational modifications or unusual/non-canonical amino acids (NCAAs). However, such modifications are increasingly recognized as critical mediators of peptide immunogenicity. In this work, we propose a machine learning application that quantifies the binding affinity of epitopes containing NCAAs to MHC-I and compares its performance with other commonly used regressors. Our model demonstrates robust performance, with 5-fold cross-validation yielding an R2 value of 0.477 and a root-mean-square error (RMSE) of 0.735, indicating strong predictive capability for peptides with NCAAs. This work provides a valuable tool for the computational design and optimization of peptides incorporating NCAAs, potentially accelerating the development of novel peptide-based therapeutics with enhanced properties and efficacy.
Collapse
Affiliation(s)
- Shan Jiang
- Department of Chemistry and Center for Structural Biology, Vanderbilt University, Nashville, TN, United States
| | - Zhaoqian Su
- Department of Chemistry and Center for Structural Biology, Vanderbilt University, Nashville, TN, United States
| | - Nathaniel Bloodworth
- Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, United States
| | - Yunchao Liu
- Department of Chemistry and Center for Structural Biology, Vanderbilt University, Nashville, TN, United States
| | - Cristina Martina
- Department of Chemistry and Center for Structural Biology, Vanderbilt University, Nashville, TN, United States
| | - David G. Harrison
- Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, United States
| | - Jens Meiler
- Department of Chemistry and Center for Structural Biology, Vanderbilt University, Nashville, TN, United States
- Institute for Drug Discovery, Institute for Computer Science, Wilhelm Ostwald Institute for Physical and Theoretical Chemistry, University Leipzig, Leipzig, Germany
- Center for Scalable Data Analytics and Artificial Intelligence ScaDS.AI and School of Embedded Composite Artificial Intelligence SECAI, Dresden/Leipzig, Germany
- Department of Pharmacology, Institute of Chemical Biology, Center for Applied Artificial Intelligence in Protein Dynamics, Vanderbilt University, Nashville, Tennessee, United States of America
| |
Collapse
|
3
|
Tuhin IA, Mia MR, Islam MM, Mahmud I, Gongora HF, Rios CU, Ashraf I, Samad MA. StackIL10: A stacking ensemble model for the improved prediction of IL-10 inducing peptides. PLoS One 2024; 19:e0313835. [PMID: 39541341 PMCID: PMC11563426 DOI: 10.1371/journal.pone.0313835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Accepted: 10/28/2024] [Indexed: 11/16/2024] Open
Abstract
Interleukin-10, a highly effective cytokine recognized for its anti-inflammatory properties, plays a critical role in the immune system. In addition to its well-documented capacity to mitigate inflammation, IL-10 can unexpectedly demonstrate pro-inflammatory characteristics under specific circumstances. The presence of both aspects emphasizes the vital need to identify the IL-10-induced peptide. To mitigate the drawbacks of manual identification, which include its high cost, this study introduces StackIL10, an ensemble learning model based on stacking, to identify IL-10-inducing peptides in a precise and efficient manner. Ten Amino-acid-composition-based Feature Extraction approaches are considered. The StackIL10, stacking ensemble, the model with five optimized Machine Learning Algorithm (specifically LGBM, RF, SVM, Decision Tree, KNN) as the base learners and a Logistic Regression as the meta learner was constructed, and the identification rate reached 91.7%, MCC of 0.833 with 0.9078 Specificity. Experiments were conducted to examine the impact of various enhancement techniques on the correctness of IL-10 Prediction. These experiments included comparisons between single models and various combinations of stacking-based ensemble models. It was demonstrated that the model proposed in this study was more effective than singular models and produced satisfactory results, thereby improving the identification of peptides that induce IL-10.
Collapse
Affiliation(s)
- Izaz Ahmmed Tuhin
- Department of Software Engineering, Daffodil International University, Daffodil Smart City (DSC), Savar, Dhaka, Bangladesh
| | - Md. Rajib Mia
- Department of Software Engineering, Daffodil International University, Daffodil Smart City (DSC), Savar, Dhaka, Bangladesh
| | - Md. Monirul Islam
- Department of Software Engineering, Daffodil International University, Daffodil Smart City (DSC), Savar, Dhaka, Bangladesh
| | - Imran Mahmud
- Department of Software Engineering, Daffodil International University, Daffodil Smart City (DSC), Savar, Dhaka, Bangladesh
| | - Henry Fabian Gongora
- Universidad Europea del Atlántico, Santander, Spain
- Universidad Internacional Iberoamericana Campeche, Campeche, México
- Universidad de La Romana, La Romana, República Dominicana
| | - Carlos Uc Rios
- Universidad Europea del Atlántico, Santander, Spain
- Universidad Internacional Iberoamericana Campeche, Campeche, México
- Universidad Internacional Iberoamericana Arecibo, Arecibo, Puerto Rico, United States of America
| | - Imran Ashraf
- Department of Information and Communication Engineering, Yeungnam University, Gyeongsangbuk-do, Gyeongsan-si, South Korea
| | - Md. Abdus Samad
- Department of Information and Communication Engineering, Yeungnam University, Gyeongsangbuk-do, Gyeongsan-si, South Korea
| |
Collapse
|
4
|
Giziński S, Preibisch G, Kucharski P, Tyrolski M, Rembalski M, Grzegorczyk P, Gambin A. Enhancing antigenic peptide discovery: Improved MHC-I binding prediction and methodology. Methods 2024; 224:1-9. [PMID: 38295891 DOI: 10.1016/j.ymeth.2024.01.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2023] [Revised: 12/30/2023] [Accepted: 01/16/2024] [Indexed: 02/05/2024] Open
Abstract
The Major Histocompatibility Complex (MHC) is a critical element of the vertebrate cellular immune system, responsible for presenting peptides derived from intracellular proteins. MHC-I presentation is pivotal in the immune response and holds considerable potential in the realms of vaccine development and cancer immunotherapy. This study delves into the limitations of current methods and benchmarks for MHC-I presentation. We introduce a novel benchmark designed to assess generalization properties and the reliability of models on unseen MHC molecules and peptides, with a focus on the Human Leukocyte Antigen (HLA)-a specific subset of MHC genes present in humans. Finally, we introduce HLABERT, a pretrained language model that outperforms previous methods significantly on our benchmark and establishes a new state-of-the-art on existing benchmarks.
Collapse
Affiliation(s)
| | - Grzegorz Preibisch
- Deepflare, Warsaw, Poland; University of Warsaw, Department of Mathematics Informatics and Mechanics, Warsaw, Poland.
| | | | | | | | | | - Anna Gambin
- University of Warsaw, Department of Mathematics Informatics and Mechanics, Warsaw, Poland.
| |
Collapse
|
5
|
Zhang L, Song W, Zhu T, Liu Y, Chen W, Cao Y. ConvNeXt-MHC: improving MHC-peptide affinity prediction by structure-derived degenerate coding and the ConvNeXt model. Brief Bioinform 2024; 25:bbae133. [PMID: 38561979 PMCID: PMC10985285 DOI: 10.1093/bib/bbae133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Revised: 02/11/2024] [Accepted: 03/02/2024] [Indexed: 04/04/2024] Open
Abstract
Peptide binding to major histocompatibility complex (MHC) proteins plays a critical role in T-cell recognition and the specificity of the immune response. Experimental validation such peptides is extremely resource-intensive. As a result, accurate computational prediction of binding peptides is highly important, particularly in the context of cancer immunotherapy applications, such as the identification of neoantigens. In recent years, there is a significant need to continually improve the existing prediction methods to meet the demands of this field. We developed ConvNeXt-MHC, a method for predicting MHC-I-peptide binding affinity. It introduces a degenerate encoding approach to enhance well-established panspecific methods and integrates transfer learning and semi-supervised learning methods into the cutting-edge deep learning framework ConvNeXt. Comprehensive benchmark results demonstrate that ConvNeXt-MHC outperforms state-of-the-art methods in terms of accuracy. We expect that ConvNeXt-MHC will help us foster new discoveries in the field of immunoinformatics in the distant future. We constructed a user-friendly website at http://www.combio-lezhang.online/predict/, where users can access our data and application.
Collapse
Affiliation(s)
- Le Zhang
- College of Computer Science, Sichuan University, Chengdu 610065, China
| | - Wenkai Song
- College of Computer Science, Sichuan University, Chengdu 610065, China
| | - Tinghao Zhu
- College of Computer Science, Sichuan University, Chengdu 610065, China
- Nuclear Power Institute of China, Chengdu 610213, China
| | - Yang Liu
- Center of Growth, Metabolism and Aging, Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, No. 29 Wangjiang Road, Chengdu 610065, China
| | - Wei Chen
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China
| | - Yang Cao
- Center of Growth, Metabolism and Aging, Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, No. 29 Wangjiang Road, Chengdu 610065, China
| |
Collapse
|
6
|
Trevizani R, Yan Z, Greenbaum JA, Sette A, Nielsen M, Peters B. A comprehensive analysis of the IEDB MHC class-I automated benchmark. Brief Bioinform 2022; 23:6632617. [PMID: 35794711 DOI: 10.1093/bib/bbac259] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 05/27/2022] [Accepted: 06/05/2022] [Indexed: 11/12/2022] Open
Abstract
In 2014, the Immune Epitope Database automated benchmark was created to compare the performance of the MHC class I binding predictors. However, this is not a straightforward process due to the different and non-standardized outputs of the methods. Additionally, some methods are more restrictive regarding the HLA alleles and epitope sizes for which they predict binding affinities, while others are more comprehensive. To address how these problems impacted the ranking of the predictors, we developed an approach to assess the reliability of different metrics. We found that using percentile-ranked results improved the stability of the ranks and allowed the predictors to be reliably ranked despite not being evaluated on the same data. We also found that given the rate new data are incorporated into the benchmark, a new method must wait for at least 4 years to be ranked against the pre-existing methods. The best-performing tools with statistically indistinguishable scores in this benchmark were NetMHCcons, NetMHCpan4.0, ANN3.4, NetMHCpan3.0 and NetMHCpan2.8. The results of this study will be used to improve the evaluation and display of benchmark performance. We highly encourage anyone working on MHC binding predictions to participate in this benchmark to get an unbiased evaluation of their predictors.
Collapse
Affiliation(s)
- Raphael Trevizani
- Division of Vaccine Discovery, La Jolla Institute for Immunology, La Jolla, California 92037, USA.,Fiocruz Ceará, Fundação Oswaldo Cruz, Rua São José s/n, Precabura, Eusébio/CE, Brazil
| | - Zhen Yan
- Bioinformatics Core, La Jolla Institute for Immunology, La Jolla, California 92037, USA
| | - Jason A Greenbaum
- Bioinformatics Core, La Jolla Institute for Immunology, La Jolla, California 92037, USA
| | - Alessandro Sette
- Division of Vaccine Discovery, La Jolla Institute for Immunology, La Jolla, California 92037, USA.,Department of Medicine, University of California San Diego, La Jolla, California 92093, USA
| | - Morten Nielsen
- Department of Health Technology, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark.,Instituto de Investigaciones Biotecnológicas, Universidad Nacional de San Martín, B1650 Buenos Aires, Argentina
| | - Bjoern Peters
- Division of Vaccine Discovery, La Jolla Institute for Immunology, La Jolla, California 92037, USA.,Department of Medicine, University of California San Diego, La Jolla, California 92093, USA
| |
Collapse
|
7
|
Jiang L, Tang J, Guo F, Guo Y. Prediction of Major Histocompatibility Complex Binding with Bilateral and Variable Long Short Term Memory Networks. BIOLOGY 2022; 11:biology11060848. [PMID: 35741369 PMCID: PMC9220200 DOI: 10.3390/biology11060848] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 05/25/2022] [Accepted: 05/27/2022] [Indexed: 11/18/2022]
Abstract
Simple Summary Major histocompatibility complex molecules are of significant biological and clinical importance due to their utility in immunotherapy. The prediction of potential MHC binding peptides can estimate a T-cell immune response. The variable length of existing MHC binding peptides creates difficulty for MHC binding prediction algorithms. Thus, we utilized a bilateral and variable long-short term memory neural network to address this specific problem and developed a novel MHC binding prediction tool. Abstract As an important part of immune surveillance, major histocompatibility complex (MHC) is a set of proteins that recognize foreign molecules. Computational prediction methods for MHC binding peptides have been developed. However, existing methods share the limitation of fixed peptide sequence length, which necessitates the training of models by peptide length or prediction with a length reduction technique. Using a bidirectional long short-term memory neural network, we constructed BVMHC, an MHC class I and II binding prediction tool that is independent of peptide length. The performance of BVMHC was compared to seven MHC class I prediction tools and three MHC class II prediction tools using eight performance criteria independently. BVMHC attained the best performance in three of the eight criteria for MHC class I, and the best performance in four of the eight criteria for MHC class II, including accuracy and AUC. Furthermore, models for non-human species were also trained using the same strategy and made available for applications in mice, chimpanzees, macaques, and rats. BVMHC is composed of a series of peptide length independent MHC class I and II binding predictors. Models from this study have been implemented in an online web portal for easy access and use.
Collapse
Affiliation(s)
- Limin Jiang
- Comprehensive Cancer Center, Department of Internal Medicine, University of New Mexico, Albuquerque, NM 87131, USA;
| | - Jijun Tang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China;
| | - Fei Guo
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
- Correspondence: (F.G.); (Y.G.)
| | - Yan Guo
- Comprehensive Cancer Center, Department of Internal Medicine, University of New Mexico, Albuquerque, NM 87131, USA;
- Correspondence: (F.G.); (Y.G.)
| |
Collapse
|
8
|
Chen D, Li Y. PredMHC: An Effective Predictor of Major Histocompatibility Complex Using Mixed Features. Front Genet 2022; 13:875112. [PMID: 35547252 PMCID: PMC9081368 DOI: 10.3389/fgene.2022.875112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2022] [Accepted: 03/07/2022] [Indexed: 12/03/2022] Open
Abstract
The major histocompatibility complex (MHC) is a large locus on vertebrate DNA that contains a tightly linked set of polymorphic genes encoding cell surface proteins essential for the adaptive immune system. The groups of proteins encoded in the MHC play an important role in the adaptive immune system. Therefore, the accurate identification of the MHC is necessary to understand its role in the adaptive immune system. An effective predictor called PredMHC is established in this study to identify the MHC from protein sequences. Firstly, PredMHC encoded a protein sequence with mixed features including 188D, APAAC, KSCTriad, CKSAAGP, and PAAC. Secondly, three classifiers including SGD, SMO, and random forest were trained on the mixed features of the protein sequence. Finally, the prediction result was obtained by the voting of the three classifiers. The experimental results of the 10-fold cross-validation test in the training dataset showed that PredMHC can obtain 91.69% accuracy. Experimental results on comparison with other features, classifiers, and existing methods showed the effectiveness of PredMHC in predicting the MHC.
Collapse
Affiliation(s)
- Dong Chen
- College of Electrical and Information Engineering, Quzhou University, Quzhou, China
| | - Yanjuan Li
- College of Electrical and Information Engineering, Quzhou University, Quzhou, China
| |
Collapse
|