1
|
Carbone A, Decelle A, Rosset L, Seoane B. Fast and Functional Structured Data Generators Rooted in Out-of-Equilibrium Physics. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2025; 47:1309-1316. [PMID: 39527442 DOI: 10.1109/tpami.2024.3495999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2024]
Abstract
In this study, we address the challenge of using energy-based models to produce high-quality, label-specific data in complex structured datasets, such as population genetics, RNA or protein sequences data. Traditional training methods encounter difficulties due to inefficient Markov chain Monte Carlo mixing, which affects the diversity of synthetic data and increases generation times. To address these issues, we use a novel training algorithm that exploits non-equilibrium effects. This approach, applied to the Restricted Boltzmann Machine, improves the model's ability to correctly classify samples and generate high-quality synthetic data in only a few sampling steps. The effectiveness of this method is demonstrated by its successful application to five different types of data: handwritten digits, mutations of human genomes classified by continental origin, functionally characterized sequences of an enzyme protein family, homologous RNA sequences from specific taxonomies and real classical piano pieces classified by their composer.
Collapse
|
2
|
Ruiz Ortega M, Pogorelyy MV, Minervina AA, Thomas PG, Mora T, Walczak AM. Learning predictive signatures of HLA type from T-cell repertoires. PLoS Comput Biol 2025; 21:e1012724. [PMID: 39761303 PMCID: PMC11737854 DOI: 10.1371/journal.pcbi.1012724] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Revised: 01/16/2025] [Accepted: 12/16/2024] [Indexed: 01/15/2025] Open
Abstract
T cells recognize a wide range of pathogens using surface receptors that interact directly with peptides presented on major histocompatibility complexes (MHC) encoded by the HLA loci in humans. Understanding the association between T cell receptors (TCR) and HLA alleles is an important step towards predicting TCR-antigen specificity from sequences. Here we analyze the TCR alpha and beta repertoires of large cohorts of HLA-typed donors to systematically infer such associations, by looking for overrepresentation of TCRs in individuals with a common allele.TCRs, associated with a specific HLA allele, exhibit sequence similarities that suggest prior antigen exposure. Immune repertoire sequencing has produced large numbers of datasets, however the HLA type of the corresponding donors is rarely available. Using our TCR-HLA associations, we trained a computational model to predict the HLA type of individuals from their TCR repertoire alone. We propose an iterative procedure to refine this model by using data from large cohorts of untyped individuals, by recursively typing them using the model itself. The resulting model shows good predictive performance, even for relatively rare HLA alleles.
Collapse
Affiliation(s)
- María Ruiz Ortega
- Laboratoire de physique de l’École Normale Supérieure, CNRS, PSL Université, Sorbonne Université, and Université Paris-Cité, Paris, France
| | - Mikhail V. Pogorelyy
- Department of Host-Microbe Interactions, St. Jude Children’s Research Hospital, Memphis, Tennessee, United States of America
| | - Anastasia A. Minervina
- Department of Host-Microbe Interactions, St. Jude Children’s Research Hospital, Memphis, Tennessee, United States of America
| | - Paul G. Thomas
- Department of Host-Microbe Interactions, St. Jude Children’s Research Hospital, Memphis, Tennessee, United States of America
| | - Thierry Mora
- Laboratoire de physique de l’École Normale Supérieure, CNRS, PSL Université, Sorbonne Université, and Université Paris-Cité, Paris, France
| | - Aleksandra M. Walczak
- Laboratoire de physique de l’École Normale Supérieure, CNRS, PSL Université, Sorbonne Université, and Université Paris-Cité, Paris, France
| |
Collapse
|
3
|
Wei Y, Qiu T, Ai Y, Zhang Y, Xie J, Zhang D, Luo X, Sun X, Wang X, Qiu J. Advances of computational methods enhance the development of multi-epitope vaccines. Brief Bioinform 2024; 26:bbaf055. [PMID: 39951549 PMCID: PMC11827616 DOI: 10.1093/bib/bbaf055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2024] [Revised: 11/28/2024] [Accepted: 01/27/2025] [Indexed: 02/16/2025] Open
Abstract
Vaccine development is one of the most promising fields, and multi-epitope vaccine, which does not need laborious culture processes, is an attractive alternative to classical vaccines with the advantage of safety, and efficiency. The rapid development of algorithms and the accumulation of immune data have facilitated the advancement of computer-aided vaccine design. Here we systemically reviewed the in silico data and algorithms resource, for different steps of computational vaccine design, including immunogen selection, epitope prediction, vaccine construction, optimization, and evaluation. The performance of different available tools on epitope prediction and immunogenicity evaluation was tested and compared on benchmark datasets. Finally, we discuss the future research direction for the construction of a multiepitope vaccine.
Collapse
Affiliation(s)
- Yiwen Wei
- School of Health Science and Engineering, University of Shanghai for Science and Technology, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China
| | - Tianyi Qiu
- Institute of Clinical Science, Zhongshan Hospital; Intelligent Medicine Institute; Shanghai Institute of Infectious Disease and Biosecurity, Shanghai Medical College, Fudan University, No. 180, Fenglin Road, Xuhui Destrict, Shanghai 200032, China
| | - Yisi Ai
- School of Health Science and Engineering, University of Shanghai for Science and Technology, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China
| | - Yuxi Zhang
- School of Health Science and Engineering, University of Shanghai for Science and Technology, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China
| | - Junting Xie
- School of Health Science and Engineering, University of Shanghai for Science and Technology, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China
| | - Dong Zhang
- School of Health Science and Engineering, University of Shanghai for Science and Technology, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China
| | - Xiaochuan Luo
- School of Health Science and Engineering, University of Shanghai for Science and Technology, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China
| | - Xiulan Sun
- State Key Laboratory of Food Science and Technology, School of Food Science and Technology, National Engineering Research Center for Functional Foods, Synergetic Innovation Center of Food Safety and Nutrition, Jiangnan University, Lihu Avenue 1800, Wuxi, Jiangsu 214122, China
| | - Xin Wang
- School of Health Science and Engineering, University of Shanghai for Science and Technology, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China
- Shanghai Collaborative Innovation Center of Energy Therapy for Tumors, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China
| | - Jingxuan Qiu
- School of Health Science and Engineering, University of Shanghai for Science and Technology, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China
- Shanghai Collaborative Innovation Center of Energy Therapy for Tumors, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China
| |
Collapse
|
4
|
Deng Q, Wang Z, Xiang S, Wang Q, Liu Y, Hou T, Sun H. RLpMIEC: High-Affinity Peptide Generation Targeting Major Histocompatibility Complex-I Guided and Interpreted by Interaction Spectrum-Navigated Reinforcement Learning. J Chem Inf Model 2024; 64:6432-6449. [PMID: 39118363 DOI: 10.1021/acs.jcim.4c01153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/10/2024]
Abstract
Major histocompatibility complex (MHC) plays a vital role in presenting epitopes (short peptides from pathogenic proteins) to T-cell receptors (TCRs) to trigger the subsequent immune responses. Vaccine design targeting MHC generally aims to find epitopes with a high binding affinity for MHC presentation. Nevertheless, to find novel epitopes usually requires high-throughput screening of bulk peptide database, which is time-consuming, labor-intensive, more unaffordable, and very expensive. Excitingly, the past several years have witnessed the great success of artificial intelligence (AI) in various fields, such as natural language processing (NLP, e.g., GPT-4), protein structure prediction and engineering (e.g., AlphaFold2), and so on. Therefore, herein, we propose a deep reinforcement-learning (RL)-based generative algorithm, RLpMIEC, to quantitatively design peptide targeting MHC-I systems. Specifically, RLpMIEC combines the energetic spectrum (namely, the molecular interaction energy component, MIEC) based on the peptide-MHC interaction and the sequence information to generate peptides with strong binding affinity and precise MIEC spectra to accelerate the discovery of candidate peptide vaccines. RLpMIEC performs well in all the generative capability evaluations and can generate peptides with strong binding affinities and precise MIECs and, moreover, with high interpretability, demonstrating its powerful capability in participation for accelerating peptide-based vaccine development.
Collapse
Affiliation(s)
- Qirui Deng
- Department of Medicinal Chemistry, China Pharmaceutical University, Nanjing 210009 Jiangsu, P. R. China
| | - Zhe Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058 Zhejiang, P. R. China
- School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, P. R. China
| | - Sutong Xiang
- Department of Medicinal Chemistry, China Pharmaceutical University, Nanjing 210009 Jiangsu, P. R. China
| | - Qinghua Wang
- Department of Medicinal Chemistry, China Pharmaceutical University, Nanjing 210009 Jiangsu, P. R. China
| | - Yifei Liu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058 Zhejiang, P. R. China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058 Zhejiang, P. R. China
| | - Huiyong Sun
- Department of Medicinal Chemistry, China Pharmaceutical University, Nanjing 210009 Jiangsu, P. R. China
| |
Collapse
|
5
|
Martin J, Lequerica Mateos M, Onuchic JN, Coluzza I, Morcos F. Machine learning in biological physics: From biomolecular prediction to design. Proc Natl Acad Sci U S A 2024; 121:e2311807121. [PMID: 38913893 PMCID: PMC11228481 DOI: 10.1073/pnas.2311807121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/26/2024] Open
Abstract
Machine learning has been proposed as an alternative to theoretical modeling when dealing with complex problems in biological physics. However, in this perspective, we argue that a more successful approach is a proper combination of these two methodologies. We discuss how ideas coming from physical modeling neuronal processing led to early formulations of computational neural networks, e.g., Hopfield networks. We then show how modern learning approaches like Potts models, Boltzmann machines, and the transformer architecture are related to each other, specifically, through a shared energy representation. We summarize recent efforts to establish these connections and provide examples on how each of these formulations integrating physical modeling and machine learning have been successful in tackling recent problems in biomolecular structure, dynamics, function, evolution, and design. Instances include protein structure prediction; improvement in computational complexity and accuracy of molecular dynamics simulations; better inference of the effects of mutations in proteins leading to improved evolutionary modeling and finally how machine learning is revolutionizing protein engineering and design. Going beyond naturally existing protein sequences, a connection to protein design is discussed where synthetic sequences are able to fold to naturally occurring motifs driven by a model rooted in physical principles. We show that this model is "learnable" and propose its future use in the generation of unique sequences that can fold into a target structure.
Collapse
Affiliation(s)
- Jonathan Martin
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX75080
| | - Marcos Lequerica Mateos
- BCMaterials, Basque Center for Materials, Applications and Nanostructures, Universidad del País Vasco/Euskal Herriko Unibertsitatea Science Park, Leioa48940, Spain
| | - José N. Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston, TX77005
- Department of Physics and Astronomy, Rice University, Houston, TX77005
- Department of Chemistry, Rice University, Houston, TX77005
- Department of BioSciences, Rice University, Houston, TX77005
| | - Ivan Coluzza
- BCMaterials, Basque Center for Materials, Applications and Nanostructures, Universidad del País Vasco/Euskal Herriko Unibertsitatea Science Park, Leioa48940, Spain
- Basque Foundation for Science, Ikerbasque, Bilbao48940, Spain
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX75080
- Department of Bioengineering, Center for Systems Biology, University of Texas at Dallas, Richardson, TX75080
| |
Collapse
|
6
|
Zhang L, Song W, Zhu T, Liu Y, Chen W, Cao Y. ConvNeXt-MHC: improving MHC-peptide affinity prediction by structure-derived degenerate coding and the ConvNeXt model. Brief Bioinform 2024; 25:bbae133. [PMID: 38561979 PMCID: PMC10985285 DOI: 10.1093/bib/bbae133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Revised: 02/11/2024] [Accepted: 03/02/2024] [Indexed: 04/04/2024] Open
Abstract
Peptide binding to major histocompatibility complex (MHC) proteins plays a critical role in T-cell recognition and the specificity of the immune response. Experimental validation such peptides is extremely resource-intensive. As a result, accurate computational prediction of binding peptides is highly important, particularly in the context of cancer immunotherapy applications, such as the identification of neoantigens. In recent years, there is a significant need to continually improve the existing prediction methods to meet the demands of this field. We developed ConvNeXt-MHC, a method for predicting MHC-I-peptide binding affinity. It introduces a degenerate encoding approach to enhance well-established panspecific methods and integrates transfer learning and semi-supervised learning methods into the cutting-edge deep learning framework ConvNeXt. Comprehensive benchmark results demonstrate that ConvNeXt-MHC outperforms state-of-the-art methods in terms of accuracy. We expect that ConvNeXt-MHC will help us foster new discoveries in the field of immunoinformatics in the distant future. We constructed a user-friendly website at http://www.combio-lezhang.online/predict/, where users can access our data and application.
Collapse
Affiliation(s)
- Le Zhang
- College of Computer Science, Sichuan University, Chengdu 610065, China
| | - Wenkai Song
- College of Computer Science, Sichuan University, Chengdu 610065, China
| | - Tinghao Zhu
- College of Computer Science, Sichuan University, Chengdu 610065, China
- Nuclear Power Institute of China, Chengdu 610213, China
| | - Yang Liu
- Center of Growth, Metabolism and Aging, Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, No. 29 Wangjiang Road, Chengdu 610065, China
| | - Wei Chen
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China
| | - Yang Cao
- Center of Growth, Metabolism and Aging, Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, No. 29 Wangjiang Road, Chengdu 610065, China
| |
Collapse
|
7
|
Wang M, Lei C, Wang J, Li Y, Li M. TripHLApan: predicting HLA molecules binding peptides based on triple coding matrix and transfer learning. Brief Bioinform 2024; 25:bbae154. [PMID: 38600667 PMCID: PMC11006794 DOI: 10.1093/bib/bbae154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Revised: 02/16/2024] [Accepted: 03/13/2024] [Indexed: 04/12/2024] Open
Abstract
Human leukocyte antigen (HLA) recognizes foreign threats and triggers immune responses by presenting peptides to T cells. Computationally modeling the binding patterns between peptide and HLA is very important for the development of tumor vaccines. However, it is still a big challenge to accurately predict HLA molecules binding peptides. In this paper, we develop a new model TripHLApan for predicting HLA molecules binding peptides by integrating triple coding matrix, BiGRU + Attention models, and transfer learning strategy. We have found the main interaction site regions between HLA molecules and peptides, as well as the correlation between HLA encoding and binding motifs. Based on the discovery, we make the preprocessing and coding closer to the natural biological process. Besides, due to the input being based on multiple types of features and the attention module focused on the BiGRU hidden layer, TripHLApan has learned more sequence level binding information. The application of transfer learning strategies ensures the accuracy of prediction results under special lengths (peptides in length 8) and model scalability with the data explosion. Compared with the current optimal models, TripHLApan exhibits strong predictive performance in various prediction environments with different positive and negative sample ratios. In addition, we validate the superiority and scalability of TripHLApan's predictive performance using additional latest data sets, ablation experiments and binding reconstitution ability in the samples of a melanoma patient. The results show that TripHLApan is a powerful tool for predicting the binding of HLA-I and HLA-II molecular peptides for the synthesis of tumor vaccines. TripHLApan is publicly available at https://github.com/CSUBioGroup/TripHLApan.git.
Collapse
Affiliation(s)
- Meng Wang
- School of Computer Science and engineering, Central South University, Changsha 410083, China
| | - Chuqi Lei
- School of Computer Science and engineering, Central South University, Changsha 410083, China
| | - Jianxin Wang
- School of Computer Science and engineering, Central South University, Changsha 410083, China
| | - Yaohang Li
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA
| | - Min Li
- School of Computer Science and engineering, Central South University, Changsha 410083, China
| |
Collapse
|
8
|
Yu Y, Zu L, Jiang J, Wu Y, Wang Y, Xu M, Liu Q. Structure-aware deep model for MHC-II peptide binding affinity prediction. BMC Genomics 2024; 25:127. [PMID: 38291350 PMCID: PMC10826266 DOI: 10.1186/s12864-023-09900-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Accepted: 12/12/2023] [Indexed: 02/01/2024] Open
Abstract
The prediction of major histocompatibility complex (MHC)-peptide binding affinity is an important branch in immune bioinformatics, especially helpful in accelerating the design of disease vaccines and immunity therapy. Although deep learning-based solutions have yielded promising results on MHC-II molecules in recent years, these methods ignored structure knowledge from each peptide when employing the deep neural network models. Each peptide sequence has its specific combination order, so it is worth considering adding the structural information of the peptide sequence to the deep model training. In this work, we use positional encoding to represent the structural information of peptide sequences and validly combine the positional encoding with existing models by different strategies. Experiments on three datasets show that the introduction of position-coding information can further improve the performance built upon the existing model. The idea of introducing positional encoding to this field can provide important reference significance for the optimization of the deep network structure in the future.
Collapse
Affiliation(s)
- Ying Yu
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China
| | - Lipeng Zu
- Department of Computer Science, Florida State University, Tallahassee, 32306, USA
| | - Jiaye Jiang
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China
| | - Yafang Wu
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China
| | - Yinglin Wang
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China
| | - Midie Xu
- Department of Pathology, Fudan University, Shanghai Cancer Center, Shanghai, 200032, China.
- Department of Medical Oncology, Shanghai Medical College, Fudan University, Shanghai, 200032, China.
- Institute of Pathology, Fudan University, Shanghai, 200032, China.
| | - Qing Liu
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China.
| |
Collapse
|
9
|
Ortega MR, Pogorelyy MV, Minervina AA, Thomas PG, Walczak AM, Mora T. Learning predictive signatures of HLA type from T-cell repertoires. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.25.577228. [PMID: 38352609 PMCID: PMC10862754 DOI: 10.1101/2024.01.25.577228] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/15/2025]
Abstract
T cells recognize a wide range of pathogens using surface receptors that interact directly with pep-tides presented on major histocompatibility complexes (MHC) encoded by the HLA loci in humans. Understanding the association between T cell receptors (TCR) and HLA alleles is an important step towards predicting TCR-antigen specificity from sequences. Here we analyze the TCR alpha and beta repertoires of large cohorts of HLA-typed donors to systematically infer such associations, by looking for overrepresentation of TCRs in individuals with a common allele.TCRs, associated with a specific HLA allele, exhibit sequence similarities that suggest prior antigen exposure. Immune repertoire sequencing has produced large numbers of datasets, however the HLA type of the corresponding donors is rarely available. Using our TCR-HLA associations, we trained a computational model to predict the HLA type of individuals from their TCR repertoire alone. We propose an iterative procedure to refine this model by using data from large cohorts of untyped individuals, by recursively typing them using the model itself. The resulting model shows good predictive performance, even for relatively rare HLA alleles.
Collapse
|
10
|
Bravi B. Development and use of machine learning algorithms in vaccine target selection. NPJ Vaccines 2024; 9:15. [PMID: 38242890 PMCID: PMC10798987 DOI: 10.1038/s41541-023-00795-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Accepted: 12/07/2023] [Indexed: 01/21/2024] Open
Abstract
Computer-aided discovery of vaccine targets has become a cornerstone of rational vaccine design. In this article, I discuss how Machine Learning (ML) can inform and guide key computational steps in rational vaccine design concerned with the identification of B and T cell epitopes and correlates of protection. I provide examples of ML models, as well as types of data and predictions for which they are built. I argue that interpretable ML has the potential to improve the identification of immunogens also as a tool for scientific discovery, by helping elucidate the molecular processes underlying vaccine-induced immune responses. I outline the limitations and challenges in terms of data availability and method development that need to be addressed to bridge the gap between advances in ML predictions and their translational application to vaccine design.
Collapse
Affiliation(s)
- Barbara Bravi
- Department of Mathematics, Imperial College London, London, SW7 2AZ, UK.
| |
Collapse
|
11
|
Malbranke C, Rostain W, Depardieu F, Cocco S, Monasson R, Bikard D. Computational design of novel Cas9 PAM-interacting domains using evolution-based modelling and structural quality assessment. PLoS Comput Biol 2023; 19:e1011621. [PMID: 37976326 PMCID: PMC10729993 DOI: 10.1371/journal.pcbi.1011621] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 12/19/2023] [Accepted: 10/19/2023] [Indexed: 11/19/2023] Open
Abstract
We present here an approach to protein design that combines (i) scarce functional information such as experimental data (ii) evolutionary information learned from a natural sequence variants and (iii) physics-grounded modeling. Using a Restricted Boltzmann Machine (RBM), we learn a sequence model of a protein family. We use semi-supervision to leverage available functional information during the RBM training. We then propose a strategy to explore the protein representation space that can be informed by external models such as an empirical force-field method (FoldX). Our approach is applied to a domain of the Cas9 protein responsible for recognition of a short DNA motif. We experimentally assess the functionality of 71 variants generated to explore a range of RBM and FoldX energies. Sequences with as many as 50 differences (20% of the protein domain) to the wild-type retained functionality. Overall, 21/71 sequences designed with our method were functional. Interestingly, 6/71 sequences showed an improved activity in comparison with the original wild-type protein sequence. These results demonstrate the interest in further exploring the synergies between machine-learning of protein sequence representations and physics grounded modeling strategies informed by structural information.
Collapse
Affiliation(s)
- Cyril Malbranke
- Laboratory of Physics of the Ecole Normale Superieure, PSL Research, CNRS UMR 8023, Sorbonne Université, Paris, France
- Institut Pasteur, Université Paris Cité, CNRS UMR 6047, Synthetic Biology, Paris, France
| | - William Rostain
- Institut Pasteur, Université Paris Cité, CNRS UMR 6047, Synthetic Biology, Paris, France
| | - Florence Depardieu
- Institut Pasteur, Université Paris Cité, CNRS UMR 6047, Synthetic Biology, Paris, France
| | - Simona Cocco
- Laboratory of Physics of the Ecole Normale Superieure, PSL Research, CNRS UMR 8023, Sorbonne Université, Paris, France
| | - Rémi Monasson
- Laboratory of Physics of the Ecole Normale Superieure, PSL Research, CNRS UMR 8023, Sorbonne Université, Paris, France
| | - David Bikard
- Institut Pasteur, Université Paris Cité, CNRS UMR 6047, Synthetic Biology, Paris, France
| |
Collapse
|
12
|
Bravi B, Di Gioacchino A, Fernandez-de-Cossio-Diaz J, Walczak AM, Mora T, Cocco S, Monasson R. A transfer-learning approach to predict antigen immunogenicity and T-cell receptor specificity. eLife 2023; 12:e85126. [PMID: 37681658 PMCID: PMC10522340 DOI: 10.7554/elife.85126] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Accepted: 09/07/2023] [Indexed: 09/09/2023] Open
Abstract
Antigen immunogenicity and the specificity of binding of T-cell receptors to antigens are key properties underlying effective immune responses. Here we propose diffRBM, an approach based on transfer learning and Restricted Boltzmann Machines, to build sequence-based predictive models of these properties. DiffRBM is designed to learn the distinctive patterns in amino-acid composition that, on the one hand, underlie the antigen's probability of triggering a response, and on the other hand the T-cell receptor's ability to bind to a given antigen. We show that the patterns learnt by diffRBM allow us to predict putative contact sites of the antigen-receptor complex. We also discriminate immunogenic and non-immunogenic antigens, antigen-specific and generic receptors, reaching performances that compare favorably to existing sequence-based predictors of antigen immunogenicity and T-cell receptor specificity.
Collapse
Affiliation(s)
- Barbara Bravi
- Department of Mathematics, Imperial College LondonLondonUnited Kingdom
- Laboratoire de Physique de l’Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université Paris-CitéParisFrance
| | - Andrea Di Gioacchino
- Laboratoire de Physique de l’Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université Paris-CitéParisFrance
| | - Jorge Fernandez-de-Cossio-Diaz
- Laboratoire de Physique de l’Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université Paris-CitéParisFrance
| | - Aleksandra M Walczak
- Laboratoire de Physique de l’Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université Paris-CitéParisFrance
| | - Thierry Mora
- Laboratoire de Physique de l’Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université Paris-CitéParisFrance
| | - Simona Cocco
- Laboratoire de Physique de l’Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université Paris-CitéParisFrance
| | - Rémi Monasson
- Laboratoire de Physique de l’Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université Paris-CitéParisFrance
| |
Collapse
|
13
|
Tian J, Ma J. The Value of Microbes in Cancer Neoantigen Immunotherapy. Pharmaceutics 2023; 15:2138. [PMID: 37631352 PMCID: PMC10459105 DOI: 10.3390/pharmaceutics15082138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 08/06/2023] [Accepted: 08/11/2023] [Indexed: 08/27/2023] Open
Abstract
Tumor neoantigens are widely used in cancer immunotherapy, and a growing body of research suggests that microbes play an important role in these neoantigen-based immunotherapeutic processes. The human body and its surrounding environment are filled with a large number of microbes that are in long-term interaction with the organism. The microbiota can modulate our immune system, help activate neoantigen-reactive T cells, and play a great role in the process of targeting tumor neoantigens for therapy. Recent studies have revealed the interconnection between microbes and neoantigens, which can cross-react with each other through molecular mimicry, providing theoretical guidance for more relevant studies. The current applications of microbes in immunotherapy against tumor neoantigens are mainly focused on cancer vaccine development and immunotherapy with immune checkpoint inhibitors. This article summarizes the related fields and suggests the importance of microbes in immunotherapy against neoantigens.
Collapse
Affiliation(s)
- Junrui Tian
- NHC Key Laboratory of Carcinogenesis and Hunan Key Laboratory of Cancer Metabolism, Hunan Cancer Hospital and the Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha 410013, China;
- Cancer Research Institute and School of Basic Medical Science, Central South University, Changsha 410078, China
- Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Hunan Key Laboratory of Nonresolving Inflammation and Cancer, Changsha 410078, China
| | - Jian Ma
- NHC Key Laboratory of Carcinogenesis and Hunan Key Laboratory of Cancer Metabolism, Hunan Cancer Hospital and the Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha 410013, China;
- Cancer Research Institute and School of Basic Medical Science, Central South University, Changsha 410078, China
- Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Hunan Key Laboratory of Nonresolving Inflammation and Cancer, Changsha 410078, China
| |
Collapse
|
14
|
Contemplating immunopeptidomes to better predict them. Semin Immunol 2023; 66:101708. [PMID: 36621290 DOI: 10.1016/j.smim.2022.101708] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Revised: 12/16/2022] [Accepted: 12/20/2022] [Indexed: 01/09/2023]
Abstract
The identification of T-cell epitopes is key for a complete molecular understanding of immune recognition mechanisms in infectious diseases, autoimmunity and cancer. T-cell epitopes further provide targets for personalized vaccines and T-cell therapy, with several therapeutic applications in cancer immunotherapy and elsewhere. T-cell epitopes consist of short peptides displayed on Major Histocompatibility Complex (MHC) molecules. The recent advances in mass spectrometry (MS) based technologies to profile the ensemble of peptides displayed on MHC molecules - the so-called immunopeptidome - had a major impact on our understanding of antigen presentation and MHC ligands. On the one hand, these techniques enabled researchers to directly identify hundreds of thousands of peptides presented on MHC molecules, including some that elicited T-cell recognition. On the other hand, the data collected in these experiments revealed fundamental properties of antigen presentation pathways and significantly improved our ability to predict naturally presented MHC ligands and T-cell epitopes across the wide spectrum of MHC alleles found in human and other organisms. Here we review recent computational developments to analyze experimentally determined immunopeptidomes and harness these data to improve our understanding of antigen presentation and MHC binding specificities, as well as our ability to predict MHC ligands. We further discuss the strengths and limitations of the latest approaches to move beyond predictions of antigen presentation and tackle the challenges of predicting TCR recognition and immunogenicity.
Collapse
|
15
|
Gfeller D, Schmidt J, Croce G, Guillaume P, Bobisse S, Genolet R, Queiroz L, Cesbron J, Racle J, Harari A. Improved predictions of antigen presentation and TCR recognition with MixMHCpred2.2 and PRIME2.0 reveal potent SARS-CoV-2 CD8 + T-cell epitopes. Cell Syst 2023; 14:72-83.e5. [PMID: 36603583 PMCID: PMC9811684 DOI: 10.1016/j.cels.2022.12.002] [Citation(s) in RCA: 38] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 10/12/2022] [Accepted: 12/08/2022] [Indexed: 01/06/2023]
Abstract
The recognition of pathogen or cancer-specific epitopes by CD8+ T cells is crucial for the clearance of infections and the response to cancer immunotherapy. This process requires epitopes to be presented on class I human leukocyte antigen (HLA-I) molecules and recognized by the T-cell receptor (TCR). Machine learning models capturing these two aspects of immune recognition are key to improve epitope predictions. Here, we assembled a high-quality dataset of naturally presented HLA-I ligands and experimentally verified neo-epitopes. We then integrated these data in a refined computational framework to predict antigen presentation (MixMHCpred2.2) and TCR recognition (PRIME2.0). The depth of our training data and the algorithmic developments resulted in improved predictions of HLA-I ligands and neo-epitopes. Prospectively applying our tools to SARS-CoV-2 proteins revealed several epitopes. TCR sequencing identified a monoclonal response in effector/memory CD8+ T cells against one of these epitopes and cross-reactivity with the homologous peptides from other coronaviruses.
Collapse
Affiliation(s)
- David Gfeller
- Department of Oncology, Ludwig Institute for Cancer Research Lausanne, University of Lausanne, Lausanne, Switzerland,Agora Cancer Research Centre, 1011 Lausanne, Switzerland,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland,Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland,Corresponding author
| | - Julien Schmidt
- Department of Oncology, Ludwig Institute for Cancer Research Lausanne, University Hospital of Lausanne, Lausanne, Switzerland,Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland
| | - Giancarlo Croce
- Department of Oncology, Ludwig Institute for Cancer Research Lausanne, University of Lausanne, Lausanne, Switzerland,Agora Cancer Research Centre, 1011 Lausanne, Switzerland,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland,Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland
| | - Philippe Guillaume
- Department of Oncology, Ludwig Institute for Cancer Research Lausanne, University Hospital of Lausanne, Lausanne, Switzerland,Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland
| | - Sara Bobisse
- Agora Cancer Research Centre, 1011 Lausanne, Switzerland,Department of Oncology, Ludwig Institute for Cancer Research Lausanne, University Hospital of Lausanne, Lausanne, Switzerland,Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland
| | - Raphael Genolet
- Department of Oncology, Ludwig Institute for Cancer Research Lausanne, University Hospital of Lausanne, Lausanne, Switzerland,Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland
| | - Lise Queiroz
- Department of Oncology, Ludwig Institute for Cancer Research Lausanne, University Hospital of Lausanne, Lausanne, Switzerland,Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland
| | - Julien Cesbron
- Department of Oncology, Ludwig Institute for Cancer Research Lausanne, University Hospital of Lausanne, Lausanne, Switzerland,Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland
| | - Julien Racle
- Department of Oncology, Ludwig Institute for Cancer Research Lausanne, University of Lausanne, Lausanne, Switzerland,Agora Cancer Research Centre, 1011 Lausanne, Switzerland,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland,Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland
| | - Alexandre Harari
- Agora Cancer Research Centre, 1011 Lausanne, Switzerland,Department of Oncology, Ludwig Institute for Cancer Research Lausanne, University Hospital of Lausanne, Lausanne, Switzerland,Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland
| |
Collapse
|
16
|
van der Plas TL, Tubiana J, Le Goc G, Migault G, Kunst M, Baier H, Bormuth V, Englitz B, Debrégeas G. Neural assemblies uncovered by generative modeling explain whole-brain activity statistics and reflect structural connectivity. eLife 2023; 12:83139. [PMID: 36648065 PMCID: PMC9940913 DOI: 10.7554/elife.83139] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Accepted: 01/15/2023] [Indexed: 01/18/2023] Open
Abstract
Patterns of endogenous activity in the brain reflect a stochastic exploration of the neuronal state space that is constrained by the underlying assembly organization of neurons. Yet, it remains to be shown that this interplay between neurons and their assembly dynamics indeed suffices to generate whole-brain data statistics. Here, we recorded the activity from ∼40,000 neurons simultaneously in zebrafish larvae, and show that a data-driven generative model of neuron-assembly interactions can accurately reproduce the mean activity and pairwise correlation statistics of their spontaneous activity. This model, the compositional Restricted Boltzmann Machine (cRBM), unveils ∼200 neural assemblies, which compose neurophysiological circuits and whose various combinations form successive brain states. We then performed in silico perturbation experiments to determine the interregional functional connectivity, which is conserved across individual animals and correlates well with structural connectivity. Our results showcase how cRBMs can capture the coarse-grained organization of the zebrafish brain. Notably, this generative model can readily be deployed to parse neural data obtained by other large-scale recording techniques.
Collapse
Affiliation(s)
- Thijs L van der Plas
- Computational Neuroscience Lab, Department of Neurophysiology, Donders Center for Neuroscience, Radboud UniversityNijmegenNetherlands
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine (IBPS), Laboratoire Jean Perrin (LJP)ParisFrance
- Department of Physiology, Anatomy and Genetics, University of OxfordOxfordUnited Kingdom
| | - Jérôme Tubiana
- Blavatnik School of Computer Science, Tel Aviv UniversityTel AvivIsrael
| | - Guillaume Le Goc
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine (IBPS), Laboratoire Jean Perrin (LJP)ParisFrance
| | - Geoffrey Migault
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine (IBPS), Laboratoire Jean Perrin (LJP)ParisFrance
| | - Michael Kunst
- Department Genes – Circuits – Behavior, Max Planck Institute for Biological IntelligenceMartinsriedGermany
- Allen Institute for Brain ScienceSeattleUnited States
| | - Herwig Baier
- Department Genes – Circuits – Behavior, Max Planck Institute for Biological IntelligenceMartinsriedGermany
| | - Volker Bormuth
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine (IBPS), Laboratoire Jean Perrin (LJP)ParisFrance
| | - Bernhard Englitz
- Computational Neuroscience Lab, Department of Neurophysiology, Donders Center for Neuroscience, Radboud UniversityNijmegenNetherlands
| | - Georges Debrégeas
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine (IBPS), Laboratoire Jean Perrin (LJP)ParisFrance
| |
Collapse
|
17
|
Tadros DM, Eggenschwiler S, Racle J, Gfeller D. The MHC Motif Atlas: a database of MHC binding specificities and ligands. Nucleic Acids Res 2023; 51:D428-D437. [PMID: 36318236 PMCID: PMC9825574 DOI: 10.1093/nar/gkac965] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 10/07/2022] [Accepted: 10/14/2022] [Indexed: 01/07/2023] Open
Abstract
The highly polymorphic Major Histocompatibility Complex (MHC) genes are responsible for the binding and cell surface presentation of pathogen or cancer specific T-cell epitopes. This process is fundamental for eliciting T-cell recognition of infected or malignant cells. Epitopes displayed on MHC molecules further provide therapeutic targets for personalized cancer vaccines or adoptive T-cell therapy. To help visualizing, analyzing and comparing the different binding specificities of MHC molecules, we developed the MHC Motif Atlas (http://mhcmotifatlas.org/). This database contains information about thousands of class I and class II MHC molecules, including binding motifs, peptide length distributions, motifs of phosphorylated ligands, multiple specificities or links to X-ray crystallography structures. The database further enables users to download curated datasets of MHC ligands. By combining intuitive visualization of the main binding properties of MHC molecules together with access to more than a million ligands, the MHC Motif Atlas provides a central resource to analyze and interpret the binding specificities of MHC molecules.
Collapse
Affiliation(s)
- Daniel M Tadros
- Department of Oncology, Ludwig Institute for Cancer Research Lausanne, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Simon Eggenschwiler
- Department of Oncology, Ludwig Institute for Cancer Research Lausanne, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Julien Racle
- Department of Oncology, Ludwig Institute for Cancer Research Lausanne, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - David Gfeller
- Department of Oncology, Ludwig Institute for Cancer Research Lausanne, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| |
Collapse
|
18
|
Di Gioacchino A, Procyk J, Molari M, Schreck JS, Zhou Y, Liu Y, Monasson R, Cocco S, Šulc P. Generative and interpretable machine learning for aptamer design and analysis of in vitro sequence selection. PLoS Comput Biol 2022; 18:e1010561. [PMID: 36174101 PMCID: PMC9553063 DOI: 10.1371/journal.pcbi.1010561] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Revised: 10/11/2022] [Accepted: 09/12/2022] [Indexed: 12/03/2022] Open
Abstract
Selection protocols such as SELEX, where molecules are selected over multiple rounds for their ability to bind to a target of interest, are popular methods for obtaining binders for diagnostic and therapeutic purposes. We show that Restricted Boltzmann Machines (RBMs), an unsupervised two-layer neural network architecture, can successfully be trained on sequence ensembles from single rounds of SELEX experiments for thrombin aptamers. RBMs assign scores to sequences that can be directly related to their fitnesses estimated through experimental enrichment ratios. Hence, RBMs trained from sequence data at a given round can be used to predict the effects of selection at later rounds. Moreover, the parameters of the trained RBMs are interpretable and identify functional features contributing most to sequence fitness. To exploit the generative capabilities of RBMs, we introduce two different training protocols: one taking into account sequence counts, capable of identifying the few best binders, and another based on unique sequences only, generating more diverse binders. We then use RBMs model to generate novel aptamers with putative disruptive mutations or good binding properties, and validate the generated sequences with gel shift assay experiments. Finally, we compare the RBM’s performance with different supervised learning approaches that include random forests and several deep neural network architectures. We show that two-layer neural networks, Restricted Boltzmann Machines (RBM), can be successfully trained on sequence ensemble datasets from selection-amplification experiments. We train the RBM using datasets from aptamer selection experiments on thrombin protein, and show that the model can successfully generalize to the test set to predict binders and non-binders. The log-likelihood assigned to a sequence by the RBM is correlated with the sequence fitness as quantified by the amplification between different rounds of selection. We further show that that the model is interpretable and by inspecting the weights of the model, we can identify structural motifs that are characteristic of the good binders. We explore the usage of the RBMs to identify which of the possible protein exosites the aptamers bind to. We show that the RBM can also be used for unsupervised clustering. Finally, we use RBMs to generate novel aptamers, and we experimentally verify predicted binding and non-binding sequences generated from the RBM.
Collapse
Affiliation(s)
- Andrea Di Gioacchino
- Laboratoire de Physique de l’Ecole Normale Supérieure, PSL & CNRS UMR8063, Sorbonne Université, Université de Paris, Paris, France
| | - Jonah Procyk
- School of Molecular Sciences and Center for Molecular Design and Biomimetics, The Biodesign Institute, Arizona State University, Tempe, Arizona, United States of America
| | - Marco Molari
- Laboratoire de Physique de l’Ecole Normale Supérieure, PSL & CNRS UMR8063, Sorbonne Université, Université de Paris, Paris, France
- Biozentrum, University of Basel, Basel, Switzerland
- Swiss Institute of Bioinformatics, Basel, Switzerland
| | - John S. Schreck
- National Center for Atmospheric Research, Computational and Information Systems Laboratory, Boulder, Colorado, United States of America
| | - Yu Zhou
- School of Molecular Sciences and Center for Molecular Design and Biomimetics, The Biodesign Institute, Arizona State University, Tempe, Arizona, United States of America
| | - Yan Liu
- School of Molecular Sciences and Center for Molecular Design and Biomimetics, The Biodesign Institute, Arizona State University, Tempe, Arizona, United States of America
| | - Rémi Monasson
- Laboratoire de Physique de l’Ecole Normale Supérieure, PSL & CNRS UMR8063, Sorbonne Université, Université de Paris, Paris, France
- * E-mail: (RM); (SC); (PŠ)
| | - Simona Cocco
- Laboratoire de Physique de l’Ecole Normale Supérieure, PSL & CNRS UMR8063, Sorbonne Université, Université de Paris, Paris, France
- * E-mail: (RM); (SC); (PŠ)
| | - Petr Šulc
- School of Molecular Sciences and Center for Molecular Design and Biomimetics, The Biodesign Institute, Arizona State University, Tempe, Arizona, United States of America
- * E-mail: (RM); (SC); (PŠ)
| |
Collapse
|
19
|
Illing PT, Ramarathinam SH, Purcell AW. New insights and approaches for analyses of immunopeptidomes. Curr Opin Immunol 2022; 77:102216. [PMID: 35716458 DOI: 10.1016/j.coi.2022.102216] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Accepted: 05/10/2022] [Indexed: 11/03/2022]
Abstract
Human leucocyte antigen (HLA) molecules play a key role in health and disease by presenting antigen to T-lymphocytes for immunosurveillance. Immunopeptidomics involves the study of the collection of peptides presented within the antigen-binding groove of HLA molecules. Identifying their nature and diversity is crucial to understanding immunosurveillance especially during infection or for the recognition and potential eradication of tumours. This review discusses recent advances in the isolation, identification, and quantitation of these peptide antigens. New informatics approaches and databases have shed light on the extent of peptide antigens derived from unconventional sources including peptides derived from transcripts associated with frame shifts, long noncoding RNA, incorrectly annotated untranslated regions, post-translational modifications, and proteasomal splicing. Several challenges remain in successful analysis of immunopeptides, yet recent developments point to unexplored biology waiting to be unravelled.
Collapse
Affiliation(s)
- Patricia T Illing
- Department of Biochemistry and Molecular Biology and Infection and Immunity Program, Biomedicine Discovery Institute, Monash University, Melbourne, Victoria, Australia
| | - Sri H Ramarathinam
- Department of Biochemistry and Molecular Biology and Infection and Immunity Program, Biomedicine Discovery Institute, Monash University, Melbourne, Victoria, Australia
| | - Anthony W Purcell
- Department of Biochemistry and Molecular Biology and Infection and Immunity Program, Biomedicine Discovery Institute, Monash University, Melbourne, Victoria, Australia.
| |
Collapse
|
20
|
Ochoa R, Lunardelli VAS, Rosa DS, Laio A, Cossio P. Multiple-Allele MHC Class II Epitope Engineering by a Molecular Dynamics-Based Evolution Protocol. Front Immunol 2022; 13:862851. [PMID: 35572587 PMCID: PMC9094701 DOI: 10.3389/fimmu.2022.862851] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Accepted: 03/28/2022] [Indexed: 11/13/2022] Open
Abstract
Epitopes that bind simultaneously to all human alleles of Major Histocompatibility Complex class II (MHC II) are considered one of the key factors for the development of improved vaccines and cancer immunotherapies. To engineer MHC II multiple-allele binders, we developed a protocol called PanMHC-PARCE, based on the unsupervised optimization of the epitope sequence by single-point mutations, parallel explicit-solvent molecular dynamics simulations and scoring of the MHC II-epitope complexes. The key idea is accepting mutations that not only improve the affinity but also reduce the affinity gap between the alleles. We applied this methodology to enhance a Plasmodium vivax epitope for multiple-allele binding. In vitro rate-binding assays showed that four engineered peptides were able to bind with improved affinity toward multiple human MHC II alleles. Moreover, we demonstrated that mice immunized with the peptides exhibited interferon-gamma cellular immune response. Overall, the method enables the engineering of peptides with improved binding properties that can be used for the generation of new immunotherapies.
Collapse
Affiliation(s)
- Rodrigo Ochoa
- Biophysics of Tropical Diseases, Max Planck Tandem Group, University of Antioquia UdeA, Medellin, Colombia
| | | | - Daniela Santoro Rosa
- Department of Microbiology, Immunology and Parasitology, Federal University of Sao Paulo, Sao Paulo, Brazil.,Institute for Investigation in Immunology (iii), Instituto Nacional de Ciência e Tecnologia (INCT), Sao Paulo, Brazil
| | - Alessandro Laio
- Physics Area, International School for Advanced Studies (SISSA), Trieste, Italy.,Condensed Matter and Statistical Physics Section, International Centre for Theoretical Physics (ICTP), Trieste, Italy
| | - Pilar Cossio
- Biophysics of Tropical Diseases, Max Planck Tandem Group, University of Antioquia UdeA, Medellin, Colombia.,Department of Theoretical Biophysics, Max Planck Institute of Biophysics, Frankfurt am Main, Germany.,Center for Computational Mathematics, Flatiron Institute, New York, NY, United States.,Center for Computational Biology, Flatiron Institute, New York, NY, United States
| |
Collapse
|
21
|
Malbranke C, Bikard D, Cocco S, Monasson R. Improving sequence-based modeling of protein families using secondary-structure quality assessment. Bioinformatics 2021; 37:4083-4090. [PMID: 34117879 PMCID: PMC9502231 DOI: 10.1093/bioinformatics/btab442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Revised: 06/03/2021] [Accepted: 06/16/2021] [Indexed: 12/03/2022] Open
Abstract
MOTIVATION Modeling of protein family sequence distribution from homologous sequence data recently received considerable attention, in particular for structure and function predictions, as well as for protein design. In particular, direct coupling analysis, a method to infer effective pairwise interactions between residues, was shown to capture important structural constraints and to successfully generate functional protein sequences. Building on this and other graphical models, we introduce a new framework to assess the quality of the secondary structures of the generated sequences with respect to reference structures for the family. RESULTS We introduce two scoring functions characterizing the likeliness of the secondary structure of a protein sequence to match a reference structure, called Dot Product and Pattern Matching. We test these scores on published experimental protein mutagenesis and design dataset, and show improvement in the detection of nonfunctional sequences. We also show that use of these scores help rejecting nonfunctional sequences generated by graphical models (Restricted Boltzmann Machines) learned from homologous sequence alignments. AVAILABILITY AND IMPLEMENTATION Data and code available at https://github.com/CyrilMa/ssqa. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Cyril Malbranke
- Laboratory of Physics of the Ecole Normale Superieure, PSL Research, CNRS UMR 8023, Sorbonne Université, Université de Paris, Paris, France
- Synthetic Biology, Microbiology Department, Institut Pasteur, Paris, France
| | - David Bikard
- Synthetic Biology, Microbiology Department, Institut Pasteur, Paris, France
| | - Simona Cocco
- Laboratory of Physics of the Ecole Normale Superieure, PSL Research, CNRS UMR 8023, Sorbonne Université, Université de Paris, Paris, France
| | - Rémi Monasson
- Laboratory of Physics of the Ecole Normale Superieure, PSL Research, CNRS UMR 8023, Sorbonne Université, Université de Paris, Paris, France
| |
Collapse
|
22
|
Bravi B, Balachandran VP, Greenbaum BD, Walczak AM, Mora T, Monasson R, Cocco S. Probing T-cell response by sequence-based probabilistic modeling. PLoS Comput Biol 2021; 17:e1009297. [PMID: 34473697 PMCID: PMC8476001 DOI: 10.1371/journal.pcbi.1009297] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2020] [Revised: 09/27/2021] [Accepted: 07/22/2021] [Indexed: 11/26/2022] Open
Abstract
With the increasing ability to use high-throughput next-generation sequencing to quantify the diversity of the human T cell receptor (TCR) repertoire, the ability to use TCR sequences to infer antigen-specificity could greatly aid potential diagnostics and therapeutics. Here, we use a machine-learning approach known as Restricted Boltzmann Machine to develop a sequence-based inference approach to identify antigen-specific TCRs. Our approach combines probabilistic models of TCR sequences with clone abundance information to extract TCR sequence motifs central to an antigen-specific response. We use this model to identify patient personalized TCR motifs that respond to individual tumor and infectious disease antigens, and to accurately discriminate specific from non-specific responses. Furthermore, the hidden structure of the model results in an interpretable representation space where TCRs responding to the same antigen cluster, correctly discriminating the response of TCR to different viral epitopes. The model can be used to identify condition specific responding TCRs. We focus on the examples of TCRs reactive to candidate neoantigens and selected epitopes in experiments of stimulated TCR clone expansion.
Collapse
Affiliation(s)
- Barbara Bravi
- Laboratoire de Physique de l’Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, Paris, France
| | - Vinod P. Balachandran
- Immuno-Oncology Service, Human Oncology and Pathogenesis Program, Hepatopancreatobiliary Service, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, New York State, United States of America
| | - Benjamin D. Greenbaum
- Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, New York State, United States of America
| | - Aleksandra M. Walczak
- Laboratoire de Physique de l’Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, Paris, France
| | - Thierry Mora
- Laboratoire de Physique de l’Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, Paris, France
| | - Rémi Monasson
- Laboratoire de Physique de l’Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, Paris, France
| | - Simona Cocco
- Laboratoire de Physique de l’Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, Paris, France
| |
Collapse
|