1
|
Zhou Z, Chen J, Lin S, Hong L, Wei DQ, Xiong Y. GRATCR: Epitope-Specific T Cell Receptor Sequence Generation With Data-Efficient Pre-Trained Models. IEEE J Biomed Health Inform 2025; 29:2271-2283. [PMID: 40031605 DOI: 10.1109/jbhi.2024.3514089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
T cell receptors (TCRs) play a crucial role in numerous immunotherapies targeting tumor cells. However, their acquisition and optimization present significant challenges, involving laborious and time-consuming wet lab experimental resource. Deep generative models have demonstrated remarkable capabilities in functional protein sequence generation, offering a promising solution for enhancing the acquisition of specific TCR sequences. Here, we propose GRATCR, a framework incorporates two pre-trained modules through a novel "grafting" strategy, to de-novo generate TCR sequences targeting specific epitopes. Experimental results demonstrate that TCRs generated by GRATCR exhibit higher specificity toward desired epitopes and are more biologically functional compared with the state-of-the-art model, by using significantly fewer training data. Additionally, the generated sequences display novelty compared to natural sequences, and the interpretability evaluation further confirmed that the model is capable of capturing important binding patterns.
Collapse
|
2
|
Banerjee A, Pattinson DJ, Wincek CL, Bunk P, Axhemi A, Chapin SR, Navlakha S, Meyer HV. Comprehensive epitope mutational scan database enables accurate T cell receptor cross-reactivity prediction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.01.22.576714. [PMID: 38370810 PMCID: PMC10871174 DOI: 10.1101/2024.01.22.576714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
Predicting T cell receptor (TCR) activation is challenging due to the lack of both unbiased benchmarking datasets and computational methods that are sensitive to small mutations to a peptide. To address these challenges, we curated a comprehensive database, called BATCAVE, encompassing complete single amino acid mutational assays of more than 22,000 TCR-peptide pairs, centered around 25 immunogenic human and mouse epitopes, across both major histocompatibility complex classes, against 151 TCRs. We then present an interpretable Bayesian model, called BATMAN, that can predict the set of peptides that activates a TCR. We also developed an active learning version of BATMAN, which can efficiently learn the binding profile of a novel TCR by selecting an informative yet small number of peptides to assay. When validated on our database, BATMAN outperforms existing methods and reveals important biochemical predictors of TCR-peptide interactions. Finally, we demonstrate the broad applicability of BATMAN, including for predicting off-target effects for TCR-based therapies and polyclonal T cell responses.
Collapse
Affiliation(s)
- Amitava Banerjee
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - David J Pattinson
- Department of Pathobiological Sciences, School of Veterinary Medicine, University of Wisconsin-Madison, Madison, WI 53711, USA
| | - Cornelia L. Wincek
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
- Medical Research Center and Clinic for Medical Oncology and Hematology, Cantonal Hospital St. Gallen, 9007 St. Gallen, Switzerland
| | - Paul Bunk
- School of Biological Sciences, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Armend Axhemi
- W.M. Keck Structural Biology Laboratory, Howard Hughes Medical Institute, New York, NY, USA
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Sarah R. Chapin
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Saket Navlakha
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Hannah V. Meyer
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| |
Collapse
|
3
|
Landry B, Zhang J. Masked language modeling pretraining dynamics for downstream peptide: T-cell receptor binding prediction. BIOINFORMATICS ADVANCES 2025; 5:vbaf028. [PMID: 40092527 PMCID: PMC11908642 DOI: 10.1093/bioadv/vbaf028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/07/2024] [Revised: 12/31/2024] [Accepted: 02/12/2025] [Indexed: 03/19/2025]
Abstract
Motivation Predicting antigen peptide and T-cell receptor (TCR) binding is difficult due to the combinatoric nature of peptides and the scarcity of labeled peptide-binding pairs. The masked language modeling method of pretraining is reliably used to increase the downstream performance of peptide:TCR binding prediction models by leveraging unlabeled data. In the literature, binding prediction models are commonly trained until the validation loss converges. To evaluate this method, cited transformer model architectures pretrained with masked language modeling are investigated to assess the benefits of achieving lower loss metrics during pretraining. The downstream performance metrics for these works are recorded after each subsequent interval of masked language modeling pretraining. Results The results demonstrate that the downstream performance benefit achieved from masked language modeling peaks substantially before the pretraining loss converges. Using the pretraining loss metric is largely ineffective for precisely identifying the best downstream performing pretrained model checkpoints (or saved states). However, the pretraining loss metric in these scenarios can be used to mark a threshold in which the downstream performance benefits from pretraining have fully diminished. Further pretraining beyond this threshold does not negatively impact downstream performance but results in unpredictable bilateral deviations from the post-threshold average downstream performance benefit. Availability and implementation The datasets used in this article for model training are publicly available from each original model's authors at https://github.com/SFGLab/bertrand, https://github.com/wukevin/tcr-bert, https://github.com/NKI-AI/STAPLER, and https://github.com/barthelemymp/TULIP-TCR.
Collapse
Affiliation(s)
- Brock Landry
- Division of Computer Science & Engineering, Louisiana State University, Baton Rouge, LA 70803, United States
| | - Jian Zhang
- Division of Computer Science & Engineering, Louisiana State University, Baton Rouge, LA 70803, United States
| |
Collapse
|
4
|
Nagano Y, Pyo AGT, Milighetti M, Henderson J, Shawe-Taylor J, Chain B, Tiffeau-Mayer A. Contrastive learning of T cell receptor representations. Cell Syst 2025; 16:101165. [PMID: 39778580 DOI: 10.1016/j.cels.2024.12.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Revised: 10/09/2024] [Accepted: 12/06/2024] [Indexed: 01/11/2025]
Abstract
Computational prediction of the interaction of T cell receptors (TCRs) and their ligands is a grand challenge in immunology. Despite advances in high-throughput assays, specificity-labeled TCR data remain sparse. In other domains, the pre-training of language models on unlabeled data has been successfully used to address data bottlenecks. However, it is unclear how to best pre-train protein language models for TCR specificity prediction. Here, we introduce a TCR language model called SCEPTR (simple contrastive embedding of the primary sequence of T cell receptors), which is capable of data-efficient transfer learning. Through our model, we introduce a pre-training strategy combining autocontrastive learning and masked-language modeling, which enables SCEPTR to achieve its state-of-the-art performance. In contrast, existing protein language models and a variant of SCEPTR pre-trained without autocontrastive learning are outperformed by sequence alignment-based methods. We anticipate that contrastive learning will be a useful paradigm to decode the rules of TCR specificity. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
Affiliation(s)
- Yuta Nagano
- Division of Infection and Immunity, University College London, London WC1E 6BT, UK; Division of Medicine, University College London, London WC1E 6BT, UK
| | - Andrew G T Pyo
- Center for the Physics of Biological Function, Princeton University, Princeton, NJ 08544, USA
| | - Martina Milighetti
- Division of Infection and Immunity, University College London, London WC1E 6BT, UK; Cancer Institute, University College London, London WC1E 6DD, UK
| | - James Henderson
- Division of Infection and Immunity, University College London, London WC1E 6BT, UK; Institute for the Physics of Living Systems, University College London, London WC1E 6BT, UK
| | - John Shawe-Taylor
- Department of Computer Science, University College London, London WC1E 6BT, UK
| | - Benny Chain
- Division of Infection and Immunity, University College London, London WC1E 6BT, UK; Department of Computer Science, University College London, London WC1E 6BT, UK
| | - Andreas Tiffeau-Mayer
- Division of Infection and Immunity, University College London, London WC1E 6BT, UK; Institute for the Physics of Living Systems, University College London, London WC1E 6BT, UK.
| |
Collapse
|
5
|
Rollins ZA, Curtis MB, George SC, Faller R. A Computational Strategy for the Rapid Identification and Ranking of Patient-Specific T Cell Receptors Bound to Neoantigens. Macromol Rapid Commun 2024; 45:e2400225. [PMID: 38839076 PMCID: PMC11661661 DOI: 10.1002/marc.202400225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Revised: 06/02/2024] [Indexed: 06/07/2024]
Abstract
T cell receptor (TCR) recognition of a peptide-major histocompatibility complex (pMHC) is crucial for adaptive immune response. The identification of therapeutically relevant TCR-pMHC protein pairs is a bottleneck in the implementation of TCR-based immunotherapies. The ability to computationally design TCRs to target a specific pMHC requires automated integration of next-generation sequencing, protein-protein structure prediction, molecular dynamics, and TCR ranking. A pipeline to evaluate patient-specific, sequence-based TCRs to a target pMHC is presented. Using the three most frequently expressed TCRs from 16 colorectal cancer patients, the protein-protein structure of the TCRs to the target CEA peptide-MHC is predicted using Modeller and ColabFold. TCR-pMHC structures are compared using automated equilibration and successive analysis. ColabFold generated configurations require an ≈2.5× reduction in equilibration time of TCR-pMHC structures compared to Modeller. The structural differences between Modeller and ColabFold are demonstrated by root mean square deviation (≈0.20 nm) between clusters of equilibrated configurations, which impact the number of hydrogen bonds and Lennard-Jones contacts between the TCR and pMHC. TCR ranking criteria that may prioritize TCRs for evaluation of in vitro immunogenicity are identified, and this ranking is validated by comparing to state-of-the-art machine learning-based methods trained to predict the probability of TCR-pMHC binding.
Collapse
Affiliation(s)
- Zachary A. Rollins
- Department of Chemical EngineeringUniversity of CaliforniaDavis, 1 Shields Ave, Bainer HallDavisCA95616USA
| | - Matthew B. Curtis
- Department of Biomedical EngineeringUniversity of CaliforniaDavis, 451 E. Health Sciences Dr., GBSF 2303DavisCA95616USA
| | - Steven C. George
- Department of Biomedical EngineeringUniversity of CaliforniaDavis, 451 E. Health Sciences Dr., GBSF 2303DavisCA95616USA
| | - Roland Faller
- Department of Chemical EngineeringUniversity of CaliforniaDavis, 1 Shields Ave, Bainer HallDavisCA95616USA
- Department of Chemical EngineeringTexas Tech UniversityLubbockTX79409USA
| |
Collapse
|
6
|
Postovskaya A, Vercauteren K, Meysman P, Laukens K. tcrBLOSUM: an amino acid substitution matrix for sensitive alignment of distant epitope-specific TCRs. Brief Bioinform 2024; 26:bbae602. [PMID: 39576224 PMCID: PMC11583439 DOI: 10.1093/bib/bbae602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Revised: 10/07/2024] [Accepted: 11/05/2024] [Indexed: 11/24/2024] Open
Abstract
Deciphering the specificity of T-cell receptor (TCR) repertoires is crucial for monitoring adaptive immune responses and developing targeted immunotherapies and vaccines. To elucidate the specificity of previously unseen TCRs, many methods employ the BLOSUM62 matrix to find TCRs with similar amino acid (AA) sequences. However, while BLOSUM62 reflects the AA substitutions within conserved regions of proteins with similar functions, the remarkable diversity of TCRs means that both TCRs with similar and dissimilar sequences can bind the same epitope. Therefore, reliance on BLOSUM62 may bias detection towards epitope-specific TCRs with similar biochemical properties, overlooking those with more diverse AA compositions. In this study, we introduce tcrBLOSUMa and tcrBLOSUMb, specialized AA substitution matrices for CDR3 alpha and CDR3 beta TCR chains, respectively. The matrices reflect AA frequencies and variations occurring within TCRs that bind the same epitope, revealing that both CDR3 alpha and CDR3 beta display tolerance to a wide range of AA substitutions and differ noticeably from the standard BLOSUM62. By accurately aligning distant TCRs employing tcrBLOSUMb, we were able to improve clustering performance and capture a large number of epitope-specific TCRs with diverse AA compositions and physicochemical profiles overlooked by BLOSUM62. Utilizing both the general BLOSUM62 and specialized tcrBLOSUM matrices in existing computational tools will broaden the range of TCRs that can be associated with their cognate epitopes, thereby enhancing TCR repertoire analysis.
Collapse
MESH Headings
- Receptors, Antigen, T-Cell/immunology
- Receptors, Antigen, T-Cell/genetics
- Receptors, Antigen, T-Cell/chemistry
- Amino Acid Substitution
- Humans
- Amino Acid Sequence
- Epitopes, T-Lymphocyte/immunology
- Epitopes, T-Lymphocyte/chemistry
- Sequence Alignment
- Complementarity Determining Regions/genetics
- Complementarity Determining Regions/immunology
- Complementarity Determining Regions/chemistry
- Computational Biology/methods
- Epitopes/immunology
- Epitopes/chemistry
- Algorithms
- Receptors, Antigen, T-Cell, alpha-beta/genetics
- Receptors, Antigen, T-Cell, alpha-beta/immunology
- Receptors, Antigen, T-Cell, alpha-beta/chemistry
Collapse
Affiliation(s)
- Anna Postovskaya
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, Belgium
- Antwerp Unit for Data Analysis and Computation in Immunology and Sequencing (AUDACIS), University of Antwerp, Antwerp, Belgium
- Clinical Virology Unit, Department of Clinical Sciences, Institute of Tropical Medicine, Antwerp, Belgium
| | - Koen Vercauteren
- Clinical Virology Unit, Department of Clinical Sciences, Institute of Tropical Medicine, Antwerp, Belgium
| | - Pieter Meysman
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, Belgium
- Antwerp Unit for Data Analysis and Computation in Immunology and Sequencing (AUDACIS), University of Antwerp, Antwerp, Belgium
| | - Kris Laukens
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, Belgium
- Antwerp Unit for Data Analysis and Computation in Immunology and Sequencing (AUDACIS), University of Antwerp, Antwerp, Belgium
- Biomedical Informatics Research Network Antwerp (BIOMINA), University of Antwerp, Antwerp, Belgium
| |
Collapse
|
7
|
Velez-Arce A, Li MM, Gao W, Lin X, Huang K, Fu T, Pentelute BL, Kellis M, Zitnik M. Signals in the Cells: Multimodal and Contextualized Machine Learning Foundations for Therapeutics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.12.598655. [PMID: 38948789 PMCID: PMC11212894 DOI: 10.1101/2024.06.12.598655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
Drug discovery AI datasets and benchmarks have not traditionally included single-cell analysis biomarkers. While benchmarking efforts in single-cell analysis have recently released collections of single-cell tasks, they have yet to comprehensively release datasets, models, and benchmarks that integrate a broad range of therapeutic discovery tasks with cell-type-specific biomarkers. Therapeutics Commons (TDC-2) presents datasets, tools, models, and benchmarks integrating cell-type-specific contextual features with ML tasks across therapeutics. We present four tasks for contextual learning at single-cell resolution: drug-target nomination, genetic perturbation response prediction, chemical perturbation response prediction, and protein-peptide interaction prediction. We introduce datasets, models, and benchmarks for these four tasks. Finally, we detail the advancements and challenges in machine learning and biology that drove the implementation of TDC-2 and how they are reflected in its architecture, datasets and benchmarks, and foundation model tooling.
Collapse
Affiliation(s)
| | - Michelle M. Li
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115
| | - Wenhao Gao
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139
| | - Xiang Lin
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115
| | - Kexin Huang
- Department of Computer Science, Stanford School of Engineering, Stanford, CA 94305
| | - Tianfan Fu
- Department of Computational Science, Rensselaer Polytechnic Institute, Troy, NY 12180
| | - Bradley L. Pentelute
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139
| | - Manolis Kellis
- Broad Institute of MIT and Harvard, Computer Science and Artificial Intelligence Laboratory, MIT, Electrical Engineering and Computer Science Department, Massachusetts Institute of Technology, Cambridge, MA 02139
| | - Marinka Zitnik
- Broad Institute of MIT and Harvard, Harvard Data Science Initiative, Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02215
| |
Collapse
|
8
|
Chi WY, Hu Y, Huang HC, Kuo HH, Lin SH, Kuo CTJ, Tao J, Fan D, Huang YM, Wu AA, Hung CF, Wu TC. Molecular targets and strategies in the development of nucleic acid cancer vaccines: from shared to personalized antigens. J Biomed Sci 2024; 31:94. [PMID: 39379923 PMCID: PMC11463125 DOI: 10.1186/s12929-024-01082-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Accepted: 09/01/2024] [Indexed: 10/10/2024] Open
Abstract
Recent breakthroughs in cancer immunotherapies have emphasized the importance of harnessing the immune system for treating cancer. Vaccines, which have traditionally been used to promote protective immunity against pathogens, are now being explored as a method to target cancer neoantigens. Over the past few years, extensive preclinical research and more than a hundred clinical trials have been dedicated to investigating various approaches to neoantigen discovery and vaccine formulations, encouraging development of personalized medicine. Nucleic acids (DNA and mRNA) have become particularly promising platform for the development of these cancer immunotherapies. This shift towards nucleic acid-based personalized vaccines has been facilitated by advancements in molecular techniques for identifying neoantigens, antigen prediction methodologies, and the development of new vaccine platforms. Generating these personalized vaccines involves a comprehensive pipeline that includes sequencing of patient tumor samples, data analysis for antigen prediction, and tailored vaccine manufacturing. In this review, we will discuss the various shared and personalized antigens used for cancer vaccine development and introduce strategies for identifying neoantigens through the characterization of gene mutation, transcription, translation and post translational modifications associated with oncogenesis. In addition, we will focus on the most up-to-date nucleic acid vaccine platforms, discuss the limitations of cancer vaccines as well as provide potential solutions, and raise key clinical and technical considerations in vaccine development.
Collapse
Affiliation(s)
- Wei-Yu Chi
- Physiology, Biophysics and Systems Biology Graduate Program, Weill Cornell Medicine, New York, NY, USA
| | - Yingying Hu
- Tri-Institutional PhD Program in Chemical Biology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Hsin-Che Huang
- Tri-Institutional PhD Program in Chemical Biology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Hui-Hsuan Kuo
- Pharmacology PhD Program, Weill Cornell Medicine, New York, NY, USA
| | - Shu-Hong Lin
- Department of Epidemiology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
- The University of Texas Graduate School of Biomedical Sciences at Houston and MD Anderson Cancer Center, Houston, TX, USA
| | - Chun-Tien Jimmy Kuo
- Division of Pharmaceutics and Pharmacology, College of Pharmacy, The Ohio State University, Columbus, OH, USA
| | - Julia Tao
- Department of Pathology, Johns Hopkins School of Medicine, 1550 Orleans St, CRB II Room 309, Baltimore, MD, 21287, USA
| | - Darrell Fan
- Department of Pathology, Johns Hopkins School of Medicine, 1550 Orleans St, CRB II Room 309, Baltimore, MD, 21287, USA
| | - Yi-Min Huang
- Department of Pathology, Johns Hopkins School of Medicine, 1550 Orleans St, CRB II Room 309, Baltimore, MD, 21287, USA
| | - Annie A Wu
- Department of Pathology, Johns Hopkins School of Medicine, 1550 Orleans St, CRB II Room 309, Baltimore, MD, 21287, USA
| | - Chien-Fu Hung
- Department of Pathology, Johns Hopkins School of Medicine, 1550 Orleans St, CRB II Room 309, Baltimore, MD, 21287, USA
- Department of Oncology, Johns Hopkins School of Medicine, Baltimore, MD, USA
- Department of Obstetrics and Gynecology, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - T-C Wu
- Department of Pathology, Johns Hopkins School of Medicine, 1550 Orleans St, CRB II Room 309, Baltimore, MD, 21287, USA.
- Department of Oncology, Johns Hopkins School of Medicine, Baltimore, MD, USA.
- Department of Obstetrics and Gynecology, Johns Hopkins School of Medicine, Baltimore, MD, USA.
- Department of Molecular Microbiology and Immunology, Bloomberg School of Public Health, Johns Hopkins School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
9
|
Ribeiro-Filho HV, Jara GE, Guerra JVS, Cheung M, Felbinger NR, Pereira JGC, Pierce BG, Lopes-de-Oliveira PS. Exploring the potential of structure-based deep learning approaches for T cell receptor design. PLoS Comput Biol 2024; 20:e1012489. [PMID: 39348412 PMCID: PMC11466415 DOI: 10.1371/journal.pcbi.1012489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 10/10/2024] [Accepted: 09/14/2024] [Indexed: 10/02/2024] Open
Abstract
Deep learning methods, trained on the increasing set of available protein 3D structures and sequences, have substantially impacted the protein modeling and design field. These advancements have facilitated the creation of novel proteins, or the optimization of existing ones designed for specific functions, such as binding a target protein. Despite the demonstrated potential of such approaches in designing general protein binders, their application in designing immunotherapeutics remains relatively underexplored. A relevant application is the design of T cell receptors (TCRs). Given the crucial role of T cells in mediating immune responses, redirecting these cells to tumor or infected target cells through the engineering of TCRs has shown promising results in treating diseases, especially cancer. However, the computational design of TCR interactions presents challenges for current physics-based methods, particularly due to the unique natural characteristics of these interfaces, such as low affinity and cross-reactivity. For this reason, in this study, we explored the potential of two structure-based deep learning protein design methods, ProteinMPNN and ESM-IF1, in designing fixed-backbone TCRs for binding target antigenic peptides presented by the MHC through different design scenarios. To evaluate TCR designs, we employed a comprehensive set of sequence- and structure-based metrics, highlighting the benefits of these methods in comparison to classical physics-based design methods and identifying deficiencies for improvement.
Collapse
Affiliation(s)
- Helder V. Ribeiro-Filho
- Brazilian Biosciences National Laboratory, Brazilian Center for Research in Energy and Materials, Campinas, São Paulo, Brazil
| | - Gabriel E. Jara
- Brazilian Biosciences National Laboratory, Brazilian Center for Research in Energy and Materials, Campinas, São Paulo, Brazil
| | - João V. S. Guerra
- Brazilian Biosciences National Laboratory, Brazilian Center for Research in Energy and Materials, Campinas, São Paulo, Brazil
- Graduate Program in Pharmaceutical Sciences, Faculty of Pharmaceutical Sciences, University of Campinas, Campinas, São Paulo, Brazil
| | - Melyssa Cheung
- Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, Maryland, United States of America
- Department of Chemistry and Biochemistry, University of Maryland, College Park, Maryland, United States of America
| | - Nathaniel R. Felbinger
- Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, Maryland, United States of America
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland, United States of America
| | - José G. C. Pereira
- Brazilian Biosciences National Laboratory, Brazilian Center for Research in Energy and Materials, Campinas, São Paulo, Brazil
| | - Brian G. Pierce
- Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, Maryland, United States of America
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland, United States of America
| | - Paulo S. Lopes-de-Oliveira
- Brazilian Biosciences National Laboratory, Brazilian Center for Research in Energy and Materials, Campinas, São Paulo, Brazil
- Graduate Program in Pharmaceutical Sciences, Faculty of Pharmaceutical Sciences, University of Campinas, Campinas, São Paulo, Brazil
| |
Collapse
|
10
|
Sharma G, Round J, Teng F, Ali Z, May C, Yung E, Holt RA. A synthetic cytotoxic T cell platform for rapidly prototyping TCR function. NPJ Precis Oncol 2024; 8:182. [PMID: 39160299 PMCID: PMC11333705 DOI: 10.1038/s41698-024-00669-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Accepted: 07/30/2024] [Indexed: 08/21/2024] Open
Abstract
Current tools for functionally profiling T cell receptors with respect to cytotoxic potency and cross-reactivity are hampered by difficulties in establishing model systems to test these proteins in the contexts of different HLA alleles and against broad arrays of potential antigens. We have implemented a granzyme-activatable sensor of T cell cytotoxicity in a universal prototyping platform which enables facile recombinant expression of any combination of TCR-, peptide-, and class I MHC-coding sequences and direct assessment of resultant responses. This system consists of an engineered cell platform based on the immortalized natural killer cell line, YT-Indy, and the MHC-null antigen-presenting cell line, K562. These cells were engineered to furnish the YT-Indy/K562 pair with appropriate protein domains required for recombinant TCR expression and function in a non-T cell chassis, integrate a fluorescence-based target-centric early detection reporter of cytotoxic function, and deploy a set of protective genetic interventions designed to preserve antigen-presenting cells for subsequent capture and downstream characterization. Our data show successful reconstitution of the surface TCR complex in the YT-Indy cell line at biologically relevant levels. We also demonstrate successful induction and highly sensitive detection of antigen-specific response in multiple distinct model TCRs. Additionally, we monitored destruction of targets in co-culture and found that our survival-optimized system allowed for complete preservation after 24 h exposure to cytotoxic effectors. With this bioplatform, we anticipate investigators will be empowered to rapidly express and characterize T cell receptor responses, generate knowledge regarding the patterns of T cell receptor recognition, and optimize therapeutic T cell receptors.
Collapse
Affiliation(s)
- Govinda Sharma
- Michael Smith Genome Sciences Centre, British Columbia Cancer Research Institute, Vancouver, BC, Canada
| | - James Round
- Michael Smith Genome Sciences Centre, British Columbia Cancer Research Institute, Vancouver, BC, Canada
| | - Fei Teng
- Michael Smith Genome Sciences Centre, British Columbia Cancer Research Institute, Vancouver, BC, Canada
| | - Zahra Ali
- Michael Smith Genome Sciences Centre, British Columbia Cancer Research Institute, Vancouver, BC, Canada
| | - Chris May
- Michael Smith Genome Sciences Centre, British Columbia Cancer Research Institute, Vancouver, BC, Canada
| | - Eric Yung
- Michael Smith Genome Sciences Centre, British Columbia Cancer Research Institute, Vancouver, BC, Canada
| | - Robert A Holt
- Michael Smith Genome Sciences Centre, British Columbia Cancer Research Institute, Vancouver, BC, Canada.
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada.
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada.
| |
Collapse
|
11
|
T. RR, Demerdash ONA, Smith JC. TCR-H: explainable machine learning prediction of T-cell receptor epitope binding on unseen datasets. Front Immunol 2024; 15:1426173. [PMID: 39221256 PMCID: PMC11361934 DOI: 10.3389/fimmu.2024.1426173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Accepted: 07/29/2024] [Indexed: 09/04/2024] Open
Abstract
Artificial-intelligence and machine-learning (AI/ML) approaches to predicting T-cell receptor (TCR)-epitope specificity achieve high performance metrics on test datasets which include sequences that are also part of the training set but fail to generalize to test sets consisting of epitopes and TCRs that are absent from the training set, i.e., are 'unseen' during training of the ML model. We present TCR-H, a supervised classification Support Vector Machines model using physicochemical features trained on the largest dataset available to date using only experimentally validated non-binders as negative datapoints. TCR-H exhibits an area under the curve of the receiver-operator characteristic (AUC of ROC) of 0.87 for epitope 'hard splitting' (i.e., on test sets with all epitopes unseen during ML training), 0.92 for TCR hard splitting and 0.89 for 'strict splitting' in which neither the epitopes nor the TCRs in the test set are seen in the training data. Furthermore, we employ the SHAP (Shapley additive explanations) eXplainable AI (XAI) method for post hoc interrogation to interpret the models trained with different hard splits, shedding light on the key physiochemical features driving model predictions. TCR-H thus represents a significant step towards general applicability and explainability of epitope:TCR specificity prediction.
Collapse
Affiliation(s)
- Rajitha Rajeshwar T.
- UT/ORNL Center for Molecular Biophysics, Oak Ridge National Laboratory, Oak Ridge, TN, United States
- Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, TN, United States
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, United States
| | - Omar N. A. Demerdash
- UT/ORNL Center for Molecular Biophysics, Oak Ridge National Laboratory, Oak Ridge, TN, United States
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, United States
| | - Jeremy C. Smith
- UT/ORNL Center for Molecular Biophysics, Oak Ridge National Laboratory, Oak Ridge, TN, United States
- Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, TN, United States
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, United States
| |
Collapse
|
12
|
Kim HY, Kim S, Park WY, Kim D. TSpred: a robust prediction framework for TCR-epitope interactions using paired chain TCR sequence data. Bioinformatics 2024; 40:btae472. [PMID: 39052940 PMCID: PMC11297499 DOI: 10.1093/bioinformatics/btae472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 06/11/2024] [Accepted: 07/25/2024] [Indexed: 07/27/2024] Open
Abstract
MOTIVATION Prediction of T-cell receptor (TCR)-epitope interactions is important for many applications in biomedical research, such as cancer immunotherapy and vaccine design. The prediction of TCR-epitope interactions remains challenging especially for novel epitopes, due to the scarcity of available data. RESULTS We propose TSpred, a new deep learning approach for the pan-specific prediction of TCR binding specificity based on paired chain TCR data. We develop a robust model that generalizes well to unseen epitopes by combining the predictive power of CNN and the attention mechanism. In particular, we design a reciprocal attention mechanism which focuses on extracting the patterns underlying TCR-epitope interactions. Upon a comprehensive evaluation of our model, we find that TSpred achieves state-of-the-art performances in both seen and unseen epitope specificity prediction tasks. Also, compared to other predictors, TSpred is more robust to bias related to peptide imbalance in the dataset. In addition, the reciprocal attention component of our model allows for model interpretability by capturing structurally important binding regions. Results indicate that TSpred is a robust and reliable method for the task of TCR-epitope binding prediction. AVAILABILITY AND IMPLEMENTATION Source code is available at https://github.com/ha01994/TSpred.
Collapse
Affiliation(s)
- Ha Young Kim
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, South Korea
| | | | - Woong-Yang Park
- GENINUS Inc., Seoul 05836, South Korea
- Samsung Genome Institute, Samsung Medical Center, Seoul 06351, South Korea
- Department of Molecular Cell Biology, Sungkyunkwan University School of Medicine, Suwon 16419, South Korea
| | - Dongsup Kim
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, South Korea
| |
Collapse
|
13
|
Wossnig L, Furtmann N, Buchanan A, Kumar S, Greiff V. Best practices for machine learning in antibody discovery and development. Drug Discov Today 2024; 29:104025. [PMID: 38762089 DOI: 10.1016/j.drudis.2024.104025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 04/25/2024] [Accepted: 05/13/2024] [Indexed: 05/20/2024]
Abstract
In the past 40 years, therapeutic antibody discovery and development have advanced considerably, with machine learning (ML) offering a promising way to speed up the process by reducing costs and the number of experiments required. Recent progress in ML-guided antibody design and development (D&D) has been hindered by the diversity of data sets and evaluation methods, which makes it difficult to conduct comparisons and assess utility. Establishing standards and guidelines will be crucial for the wider adoption of ML and the advancement of the field. This perspective critically reviews current practices, highlights common pitfalls and proposes method development and evaluation guidelines for various ML-based techniques in therapeutic antibody D&D. Addressing challenges across the ML process, best practices are recommended for each stage to enhance reproducibility and progress.
Collapse
Affiliation(s)
- Leonard Wossnig
- LabGenius Ltd, The Biscuit Factory, 100 Drummond Road, London SE16 4DG, UK; Department of Computer Science, University College London, 66-72 Gower St, London WC1E 6EA, UK.
| | - Norbert Furtmann
- R&D Large Molecules Research Platform, Sanofi Deutschland GmbH, Industriepark Höchst, Frankfurt Am Main, Germany
| | - Andrew Buchanan
- Biologics Engineering, R&D, AstraZeneca, Cambridge CB2 0AA, UK
| | - Sandeep Kumar
- Computational Protein Design and Modeling Group, Computational Science, Moderna Therapeutics, 200 Technology Square, Cambridge, MA 02139, USA
| | - Victor Greiff
- Department of Immunology and Oslo University Hospital, University of Oslo, Oslo, Norway
| |
Collapse
|
14
|
Bulashevska A, Nacsa Z, Lang F, Braun M, Machyna M, Diken M, Childs L, König R. Artificial intelligence and neoantigens: paving the path for precision cancer immunotherapy. Front Immunol 2024; 15:1394003. [PMID: 38868767 PMCID: PMC11167095 DOI: 10.3389/fimmu.2024.1394003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 05/13/2024] [Indexed: 06/14/2024] Open
Abstract
Cancer immunotherapy has witnessed rapid advancement in recent years, with a particular focus on neoantigens as promising targets for personalized treatments. The convergence of immunogenomics, bioinformatics, and artificial intelligence (AI) has propelled the development of innovative neoantigen discovery tools and pipelines. These tools have revolutionized our ability to identify tumor-specific antigens, providing the foundation for precision cancer immunotherapy. AI-driven algorithms can process extensive amounts of data, identify patterns, and make predictions that were once challenging to achieve. However, the integration of AI comes with its own set of challenges, leaving space for further research. With particular focus on the computational approaches, in this article we have explored the current landscape of neoantigen prediction, the fundamental concepts behind, the challenges and their potential solutions providing a comprehensive overview of this rapidly evolving field.
Collapse
Affiliation(s)
- Alla Bulashevska
- Host-Pathogen-Interactions, Paul-Ehrlich-Institut, Langen, Germany
| | - Zsófia Nacsa
- Host-Pathogen-Interactions, Paul-Ehrlich-Institut, Langen, Germany
| | - Franziska Lang
- TRON - Translational Oncology at the University Medical Center of the Johannes Gutenberg University gGmbH, Mainz, Germany
| | - Markus Braun
- Host-Pathogen-Interactions, Paul-Ehrlich-Institut, Langen, Germany
| | - Martin Machyna
- Host-Pathogen-Interactions, Paul-Ehrlich-Institut, Langen, Germany
| | - Mustafa Diken
- TRON - Translational Oncology at the University Medical Center of the Johannes Gutenberg University gGmbH, Mainz, Germany
| | - Liam Childs
- Host-Pathogen-Interactions, Paul-Ehrlich-Institut, Langen, Germany
| | - Renate König
- Host-Pathogen-Interactions, Paul-Ehrlich-Institut, Langen, Germany
| |
Collapse
|
15
|
Mösch A, Grazioli F, Machart P, Malone B. NeoAgDT: optimization of personal neoantigen vaccine composition by digital twin simulation of a cancer cell population. Bioinformatics 2024; 40:btae205. [PMID: 38614133 PMCID: PMC11076149 DOI: 10.1093/bioinformatics/btae205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Revised: 03/28/2024] [Accepted: 04/11/2024] [Indexed: 04/15/2024] Open
Abstract
MOTIVATION Neoantigen vaccines make use of tumor-specific mutations to enable the patient's immune system to recognize and eliminate cancer. Selecting vaccine elements, however, is a complex task which needs to take into account not only the underlying antigen presentation pathway but also tumor heterogeneity. RESULTS Here, we present NeoAgDT, a two-step approach consisting of: (i) simulating individual cancer cells to create a digital twin of the patient's tumor cell population and (ii) optimizing the vaccine composition by integer linear programming based on this digital twin. NeoAgDT shows improved selection of experimentally validated neoantigens over ranking-based approaches in a study of seven patients. AVAILABILITY AND IMPLEMENTATION The NeoAgDT code is published on Github: https://github.com/nec-research/neoagdt.
Collapse
Affiliation(s)
- Anja Mösch
- Biomedical AI Group, NEC Laboratories Europe GmbH, Heidelberg 69115, Germany
| | - Filippo Grazioli
- Biomedical AI Group, NEC Laboratories Europe GmbH, Heidelberg 69115, Germany
| | - Pierre Machart
- Biomedical AI Group, NEC Laboratories Europe GmbH, Heidelberg 69115, Germany
| | - Brandon Malone
- Biomedical AI Group, NEC Laboratories Europe GmbH, Heidelberg 69115, Germany
| |
Collapse
|
16
|
McMaster B, Thorpe C, Ogg G, Deane CM, Koohy H. Can AlphaFold's breakthrough in protein structure help decode the fundamental principles of adaptive cellular immunity? Nat Methods 2024; 21:766-776. [PMID: 38654083 DOI: 10.1038/s41592-024-02240-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Accepted: 03/08/2024] [Indexed: 04/25/2024]
Abstract
T cells are essential immune cells responsible for identifying and eliminating pathogens. Through interactions between their T-cell antigen receptors (TCRs) and antigens presented by major histocompatibility complex molecules (MHCs) or MHC-like molecules, T cells discriminate foreign and self peptides. Determining the fundamental principles that govern these interactions has important implications in numerous medical contexts. However, reconstructing a map between T cells and their antagonist antigens remains an open challenge for the field of immunology, and success of in silico reconstructions of this relationship has remained incremental. In this Perspective, we discuss the role that new state-of-the-art deep-learning models for predicting protein structure may play in resolving some of the unanswered questions the field faces linking TCR and peptide-MHC properties to T-cell specificity. We provide a comprehensive overview of structural databases and the evolution of predictive models, and highlight the breakthrough AlphaFold provided the field.
Collapse
Affiliation(s)
- Benjamin McMaster
- MRC Translational Immune Discovery Unit, MRC Weatherall Institute of Molecular Medicine, Radcliffe Department of Medicine, University of Oxford, Oxford, UK
- Department of Statistics, University of Oxford, Oxford, UK
| | - Christopher Thorpe
- Open Targets, Wellcome Genome Campus, Hinxton, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
| | - Graham Ogg
- MRC Translational Immune Discovery Unit, MRC Weatherall Institute of Molecular Medicine, Radcliffe Department of Medicine, University of Oxford, Oxford, UK
- Chinese Academy of Medical Sciences Oxford Institute, University of Oxford, Oxford, UK
| | | | - Hashem Koohy
- MRC Translational Immune Discovery Unit, MRC Weatherall Institute of Molecular Medicine, Radcliffe Department of Medicine, University of Oxford, Oxford, UK.
- Alan Turning Fellow in Health and Medicine, University of Oxford, Oxford, UK.
| |
Collapse
|
17
|
Ribeiro-Filho HV, Jara GE, Guerra JVS, Cheung M, Felbinger NR, Pereira JGC, Pierce BG, Lopes-de-Oliveira PS. Exploring the Potential of Structure-Based Deep Learning Approaches for T cell Receptor Design. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.19.590222. [PMID: 38712216 PMCID: PMC11071404 DOI: 10.1101/2024.04.19.590222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
Deep learning methods, trained on the increasing set of available protein 3D structures and sequences, have substantially impacted the protein modeling and design field. These advancements have facilitated the creation of novel proteins, or the optimization of existing ones designed for specific functions, such as binding a target protein. Despite the demonstrated potential of such approaches in designing general protein binders, their application in designing immunotherapeutics remains relatively unexplored. A relevant application is the design of T cell receptors (TCRs). Given the crucial role of T cells in mediating immune responses, redirecting these cells to tumor or infected target cells through the engineering of TCRs has shown promising results in treating diseases, especially cancer. However, the computational design of TCR interactions presents challenges for current physics-based methods, particularly due to the unique natural characteristics of these interfaces, such as low affinity and cross-reactivity. For this reason, in this study, we explored the potential of two structure-based deep learning protein design methods, ProteinMPNN and ESM-IF, in designing fixed-backbone TCRs for binding target antigenic peptides presented by the MHC through different design scenarios. To evaluate TCR designs, we employed a comprehensive set of sequence- and structure-based metrics, highlighting the benefits of these methods in comparison to classical physics-based design methods and identifying deficiencies for improvement.
Collapse
Affiliation(s)
- Helder V. Ribeiro-Filho
- Brazilian Biosciences National Laboratory, Brazilian Center for Research in Energy and Materials, Campinas 13083-100, Brazil
| | - Gabriel E. Jara
- Brazilian Biosciences National Laboratory, Brazilian Center for Research in Energy and Materials, Campinas 13083-100, Brazil
| | - João V. S. Guerra
- Brazilian Biosciences National Laboratory, Brazilian Center for Research in Energy and Materials, Campinas 13083-100, Brazil
- Graduate Program in Pharmaceutical Sciences, Faculty of Pharmaceutical Sciences, University of Campinas, Campinas, São Paulo, 13083-871, Brazil
| | - Melyssa Cheung
- Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, Maryland 20850, USA
- Department of Chemistry and Biochemistry, University of Maryland, College Park, Maryland 20742, USA
| | - Nathaniel R. Felbinger
- Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, Maryland 20850, USA
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland 20742, USA
| | - José G. C. Pereira
- Brazilian Biosciences National Laboratory, Brazilian Center for Research in Energy and Materials, Campinas 13083-100, Brazil
| | - Brian G. Pierce
- Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, Maryland 20850, USA
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland 20742, USA
| | - Paulo S. Lopes-de-Oliveira
- Brazilian Biosciences National Laboratory, Brazilian Center for Research in Energy and Materials, Campinas 13083-100, Brazil
- Graduate Program in Pharmaceutical Sciences, Faculty of Pharmaceutical Sciences, University of Campinas, Campinas, São Paulo, 13083-871, Brazil
| |
Collapse
|
18
|
Croce G, Bobisse S, Moreno DL, Schmidt J, Guillame P, Harari A, Gfeller D. Deep learning predictions of TCR-epitope interactions reveal epitope-specific chains in dual alpha T cells. Nat Commun 2024; 15:3211. [PMID: 38615042 PMCID: PMC11016097 DOI: 10.1038/s41467-024-47461-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Accepted: 04/03/2024] [Indexed: 04/15/2024] Open
Abstract
T cells have the ability to eliminate infected and cancer cells and play an essential role in cancer immunotherapy. T cell activation is elicited by the binding of the T cell receptor (TCR) to epitopes displayed on MHC molecules, and the TCR specificity is determined by the sequence of its α and β chains. Here, we collect and curate a dataset of 17,715 αβTCRs interacting with dozens of class I and class II epitopes. We use this curated data to develop MixTCRpred, an epitope-specific TCR-epitope interaction predictor. MixTCRpred accurately predicts TCRs recognizing several viral and cancer epitopes. MixTCRpred further provides a useful quality control tool for multiplexed single-cell TCR sequencing assays of epitope-specific T cells and pinpoints a substantial fraction of putative contaminants in public databases. Analysis of epitope-specific dual α T cells demonstrates that MixTCRpred can identify α chains mediating epitope recognition. Applying MixTCRpred to TCR repertoires from COVID-19 patients reveals enrichment of clonotypes predicted to bind an immunodominant SARS-CoV-2 epitope. Overall, MixTCRpred provides a robust tool to predict TCRs interacting with specific epitopes and interpret TCR-sequencing data from both bulk and epitope-specific T cells.
Collapse
Affiliation(s)
- Giancarlo Croce
- Department of Oncology UNIL CHUV, Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
- Agora Cancer Research Centre, Lausanne, Switzerland
- Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland
| | - Sara Bobisse
- Agora Cancer Research Centre, Lausanne, Switzerland
- Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland
- Department of Oncology UNIL CHUV, Ludwig Institute for Cancer Research, University Hospital of Lausanne, Lausanne, Switzerland
| | - Dana Léa Moreno
- Department of Oncology UNIL CHUV, Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
- Agora Cancer Research Centre, Lausanne, Switzerland
- Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland
| | - Julien Schmidt
- Department of Oncology UNIL CHUV, Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland
- Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland
- Department of Oncology UNIL CHUV, Ludwig Institute for Cancer Research, University Hospital of Lausanne, Lausanne, Switzerland
| | - Philippe Guillame
- Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland
- Department of Oncology UNIL CHUV, Ludwig Institute for Cancer Research, University Hospital of Lausanne, Lausanne, Switzerland
| | - Alexandre Harari
- Department of Oncology UNIL CHUV, Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland
- Agora Cancer Research Centre, Lausanne, Switzerland
- Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland
- Department of Oncology UNIL CHUV, Ludwig Institute for Cancer Research, University Hospital of Lausanne, Lausanne, Switzerland
| | - David Gfeller
- Department of Oncology UNIL CHUV, Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland.
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland.
- Agora Cancer Research Centre, Lausanne, Switzerland.
- Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland.
| |
Collapse
|
19
|
Ji H, Wang XX, Zhang Q, Zhang C, Zhang HM. Predicting TCR sequences for unseen antigen epitopes using structural and sequence features. Brief Bioinform 2024; 25:bbae210. [PMID: 38711371 PMCID: PMC11074592 DOI: 10.1093/bib/bbae210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 04/04/2024] [Accepted: 04/22/2024] [Indexed: 05/08/2024] Open
Abstract
T-cell receptor (TCR) recognition of antigens is fundamental to the adaptive immune response. With the expansion of experimental techniques, a substantial database of matched TCR-antigen pairs has emerged, presenting opportunities for computational prediction models. However, accurately forecasting the binding affinities of unseen antigen-TCR pairs remains a major challenge. Here, we present convolutional-self-attention TCR (CATCR), a novel framework tailored to enhance the prediction of epitope and TCR interactions. Our approach utilizes convolutional neural networks to extract peptide features from residue contact matrices, as generated by OpenFold, and a transformer to encode segment-based coded sequences. We introduce CATCR-D, a discriminator that can assess binding by analyzing the structural and sequence features of epitopes and CDR3-β regions. Additionally, the framework comprises CATCR-G, a generative module designed for CDR3-β sequences, which applies the pretrained encoder to deduce epitope characteristics and a transformer decoder for predicting matching CDR3-β sequences. CATCR-D achieved an AUROC of 0.89 on previously unseen epitope-TCR pairs and outperformed four benchmark models by a margin of 17.4%. CATCR-G has demonstrated high precision, recall and F1 scores, surpassing 95% in bidirectional encoder representations from transformers score assessments. Our results indicate that CATCR is an effective tool for predicting unseen epitope-TCR interactions. Incorporating structural insights enhances our understanding of the general rules governing TCR-epitope recognition significantly. The ability to predict TCRs for novel epitopes using structural and sequence information is promising, and broadening the repository of experimental TCR-epitope data could further improve the precision of epitope-TCR binding predictions.
Collapse
MESH Headings
- Receptors, Antigen, T-Cell/chemistry
- Receptors, Antigen, T-Cell/immunology
- Receptors, Antigen, T-Cell/metabolism
- Receptors, Antigen, T-Cell/genetics
- Humans
- Epitopes/chemistry
- Epitopes/immunology
- Computational Biology/methods
- Neural Networks, Computer
- Epitopes, T-Lymphocyte/immunology
- Epitopes, T-Lymphocyte/chemistry
- Antigens/chemistry
- Antigens/immunology
- Amino Acid Sequence
Collapse
Affiliation(s)
- Hongchen Ji
- Department of Oncology of Xijing Hospital, Air Force Medical University, Xi’an, Shaanxi, China
| | - Xiang-Xu Wang
- Department of Oncology of Xijing Hospital, Air Force Medical University, Xi’an, Shaanxi, China
| | - Qiong Zhang
- Department of Oncology of Xijing Hospital, Air Force Medical University, Xi’an, Shaanxi, China
| | - Chengkai Zhang
- Department of Oncology of Xijing Hospital, Air Force Medical University, Xi’an, Shaanxi, China
| | - Hong-Mei Zhang
- Department of Oncology of Xijing Hospital, Air Force Medical University, Xi’an, Shaanxi, China
| |
Collapse
|
20
|
Jensen MF, Nielsen M. Enhancing TCR specificity predictions by combined pan- and peptide-specific training, loss-scaling, and sequence similarity integration. eLife 2024; 12:RP93934. [PMID: 38437160 PMCID: PMC10942633 DOI: 10.7554/elife.93934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2024] Open
Abstract
Predicting the interaction between Major Histocompatibility Complex (MHC) class I-presented peptides and T-cell receptors (TCR) holds significant implications for vaccine development, cancer treatment, and autoimmune disease therapies. However, limited paired-chain TCR data, skewed towards well-studied epitopes, hampers the development of pan-specific machine-learning (ML) models. Leveraging a larger peptide-TCR dataset, we explore various alterations to the ML architectures and training strategies to address data imbalance. This leads to an overall improved performance, particularly for peptides with scant TCR data. However, challenges persist for unseen peptides, especially those distant from training examples. We demonstrate that such ML models can be used to detect potential outliers, which when removed from training, leads to augmented performance. Integrating pan-specific and peptide-specific models alongside with similarity-based predictions, further improves the overall performance, especially when a low false positive rate is desirable. In the context of the IMMREP22 benchmark, this modeling framework attained state-of-the-art performance. Moreover, combining these strategies results in acceptable predictive accuracy for peptides characterized with as little as 15 positive TCRs. This observation places great promise on rapidly expanding the peptide covering of the current models for predicting TCR specificity. The NetTCR 2.2 model incorporating these advances is available on GitHub (https://github.com/mnielLab/NetTCR-2.2) and as a web server at https://services.healthtech.dtu.dk/services/NetTCR-2.2/.
Collapse
Affiliation(s)
- Mathias Fynbo Jensen
- Department of Health Technology, Section for Bioinformatics, Technical University of DenmarkLyngbyDenmark
| | - Morten Nielsen
- Department of Health Technology, Section for Bioinformatics, Technical University of DenmarkLyngbyDenmark
| |
Collapse
|
21
|
Ektefaie Y, Shen A, Bykova D, Marin M, Zitnik M, Farhat M. Evaluating generalizability of artificial intelligence models for molecular datasets. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.25.581982. [PMID: 38464295 PMCID: PMC10925170 DOI: 10.1101/2024.02.25.581982] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Deep learning has made rapid advances in modeling molecular sequencing data. Despite achieving high performance on benchmarks, it remains unclear to what extent deep learning models learn general principles and generalize to previously unseen sequences. Benchmarks traditionally interrogate model generalizability by generating metadata based (MB) or sequence-similarity based (SB) train and test splits of input data before assessing model performance. Here, we show that this approach mischaracterizes model generalizability by failing to consider the full spectrum of cross-split overlap, i.e., similarity between train and test splits. We introduce Spectra, a spectral framework for comprehensive model evaluation. For a given model and input data, Spectra plots model performance as a function of decreasing cross-split overlap and reports the area under this curve as a measure of generalizability. We apply Spectra to 18 sequencing datasets with associated phenotypes ranging from antibiotic resistance in tuberculosis to protein-ligand binding to evaluate the generalizability of 19 state-of-the-art deep learning models, including large language models, graph neural networks, diffusion models, and convolutional neural networks. We show that SB and MB splits provide an incomplete assessment of model generalizability. With Spectra, we find as cross-split overlap decreases, deep learning models consistently exhibit a reduction in performance in a task- and model-dependent manner. Although no model consistently achieved the highest performance across all tasks, we show that deep learning models can generalize to previously unseen sequences on specific tasks. Spectra paves the way toward a better understanding of how foundation models generalize in biology.
Collapse
Affiliation(s)
- Yasha Ektefaie
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Andrew Shen
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Department of Computer Science, Northwestern University, Evanston, IL, USA
| | - Daria Bykova
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Maximillian Marin
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard Data Science Initiative, Cambridge, MA, USA
| | - Maha Farhat
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Pulmonary and Critical Care, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
| |
Collapse
|
22
|
Chen J, Zhao B, Lin S, Sun H, Mao X, Wang M, Chu Y, Hong L, Wei D, Li M, Xiong Y. TEPCAM: Prediction of T-cell receptor-epitope binding specificity via interpretable deep learning. Protein Sci 2024; 33:e4841. [PMID: 37983648 PMCID: PMC10731497 DOI: 10.1002/pro.4841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 10/11/2023] [Accepted: 11/16/2023] [Indexed: 11/22/2023]
Abstract
The recognition of T-cell receptor (TCR) on the surface of T cell to specific epitope presented by the major histocompatibility complex is the key to trigger the immune response. Identifying the binding rules of TCR-epitope pair is crucial for developing immunotherapies, including neoantigen vaccine and drugs. Accurate prediction of TCR-epitope binding specificity via deep learning remains challenging, especially in test cases which are unseen in the training set. Here, we propose TEPCAM (TCR-EPitope identification based on Cross-Attention and Multi-channel convolution), a deep learning model that incorporates self-attention, cross-attention mechanism, and multi-channel convolution to improve the generalizability and enhance the model interpretability. Experimental results demonstrate that our model outperformed several state-of-the-art models on two challenging tasks including a strictly split dataset and an external dataset. Furthermore, the model can learn some interaction patterns between TCR and epitope by extracting the interpretable matrix from cross-attention layer and mapping them to the three-dimensional structures. The source code and data are freely available at https://github.com/Chenjw99/TEPCAM.
Collapse
Affiliation(s)
- Junwei Chen
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and BiotechnologyShanghai Jiao Tong UniversityShanghaiChina
| | - Bowen Zhao
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and BiotechnologyShanghai Jiao Tong UniversityShanghaiChina
| | - Shenggeng Lin
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and BiotechnologyShanghai Jiao Tong UniversityShanghaiChina
| | - Heqi Sun
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and BiotechnologyShanghai Jiao Tong UniversityShanghaiChina
| | - Xueying Mao
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and BiotechnologyShanghai Jiao Tong UniversityShanghaiChina
| | - Meng Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and EngineeringCentral South UniversityChangshaChina
| | - Yanyi Chu
- Department of PathologyStanford University School of MedicineStandfordCaliforniaUSA
| | - Liang Hong
- Institute of Natural Sciences, Shanghai Jiao Tong UniversityShanghaiChina
- Artificial Intelligence Biomedical Center, Zhangjiang Institute for Advanced Study, Shanghai Jiao Tong UniversityShanghaiChina
| | - Dong‐Qing Wei
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and BiotechnologyShanghai Jiao Tong UniversityShanghaiChina
| | - Min Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and EngineeringCentral South UniversityChangshaChina
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and BiotechnologyShanghai Jiao Tong UniversityShanghaiChina
- Artificial Intelligence Biomedical Center, Zhangjiang Institute for Advanced Study, Shanghai Jiao Tong UniversityShanghaiChina
| |
Collapse
|
23
|
Shcherbinin DS, Karnaukhov VK, Zvyagin IV, Chudakov DM, Shugay M. Large-scale template-based structural modeling of T-cell receptors with known antigen specificity reveals complementarity features. Front Immunol 2023; 14:1224969. [PMID: 37649481 PMCID: PMC10464843 DOI: 10.3389/fimmu.2023.1224969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 07/27/2023] [Indexed: 09/01/2023] Open
Abstract
Introduction T-cell receptor (TCR) recognition of foreign peptides presented by the major histocompatibility complex (MHC) initiates the adaptive immune response against pathogens. While a large number of TCR sequences specific to different antigenic peptides are known to date, the structural data describing the conformation and contacting residues for TCR-peptide-MHC complexes is relatively limited. In the present study we aim to extend and analyze the set of available structures by performing highly accurate template-based modeling of these complexes using TCR sequences with known specificity. Methods Identification of CDR3 sequences and their further clustering, based on available spatial structures, V- and J-genes of corresponding T-cell receptors, and epitopes, was performed using the VDJdb database. Modeling of the selected CDR3 loops was conducted using a stepwise introduction of single amino acid substitutions to the template PDB structures, followed by optimization of the TCR-peptide-MHC contacting interface using the Rosetta package applications. Statistical analysis and recursive feature elimination procedures were carried out on computed energy values and properties of contacting amino acid residues between CDR3 loops and peptides, using R. Results Using the set of 29 complex templates (including a template with SARS-CoV-2 antigen) and 732 specificity records, we built a database of 1585 model structures carrying substitutions in either TCRα or TCRβ chains with some models representing the result of different mutation pathways for the same final structure. This database allowed us to analyze features of amino acid contacts in TCR - peptide interfaces that govern antigen recognition preferences and interpret these interactions in terms of physicochemical properties of interacting residues. Conclusion Our results provide a methodology for creating high-quality TCR-peptide-MHC models for antigens of interest that can be utilized to predict TCR specificity.
Collapse
Affiliation(s)
- Dmitrii S. Shcherbinin
- Institute of Translational Medicine, Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Pirogov Russian National Research Medical University, Moscow, Russia
- Laboratory of Structural Bioinformatics, Institute of Biomedical Chemistry, Moscow, Russia
| | - Vadim K. Karnaukhov
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow, Russia
| | - Ivan V. Zvyagin
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, Russia
| | - Dmitriy M. Chudakov
- Institute of Translational Medicine, Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Pirogov Russian National Research Medical University, Moscow, Russia
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow, Russia
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, Russia
- Center of Molecular Medicine, Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia
| | - Mikhail Shugay
- Institute of Translational Medicine, Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Pirogov Russian National Research Medical University, Moscow, Russia
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, Russia
| |
Collapse
|
24
|
Hudson D, Fernandes RA, Basham M, Ogg G, Koohy H. Can we predict T cell specificity with digital biology and machine learning? Nat Rev Immunol 2023; 23:511-521. [PMID: 36755161 PMCID: PMC9908307 DOI: 10.1038/s41577-023-00835-3] [Citation(s) in RCA: 61] [Impact Index Per Article: 30.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/07/2022] [Indexed: 02/10/2023]
Abstract
Recent advances in machine learning and experimental biology have offered breakthrough solutions to problems such as protein structure prediction that were long thought to be intractable. However, despite the pivotal role of the T cell receptor (TCR) in orchestrating cellular immunity in health and disease, computational reconstruction of a reliable map from a TCR to its cognate antigens remains a holy grail of systems immunology. Current data sets are limited to a negligible fraction of the universe of possible TCR-ligand pairs, and performance of state-of-the-art predictive models wanes when applied beyond these known binders. In this Perspective article, we make the case for renewed and coordinated interdisciplinary effort to tackle the problem of predicting TCR-antigen specificity. We set out the general requirements of predictive models of antigen binding, highlight critical challenges and discuss how recent advances in digital biology such as single-cell technology and machine learning may provide possible solutions. Finally, we describe how predicting TCR specificity might contribute to our understanding of the broader puzzle of antigen immunogenicity.
Collapse
Affiliation(s)
- Dan Hudson
- MRC Human Immunology Unit, MRC Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, UK
- The Rosalind Franklin Institute, Didcot, UK
| | - Ricardo A Fernandes
- Chinese Academy of Medical Sciences Oxford Institute, University of Oxford, Oxford, UK
| | | | - Graham Ogg
- MRC Human Immunology Unit, MRC Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, UK
- Chinese Academy of Medical Sciences Oxford Institute, University of Oxford, Oxford, UK
| | - Hashem Koohy
- MRC Human Immunology Unit, MRC Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, UK.
- Centre for Computational Biology, MRC Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, UK.
| |
Collapse
|
25
|
Bujak J, Kłęk S, Balawejder M, Kociniak A, Wilkus K, Szatanek R, Orzeszko Z, Welanyk J, Torbicz G, Jęckowski M, Kucharczyk T, Wohadlo Ł, Borys M, Stadnik H, Wysocki M, Kayser M, Słomka ME, Kosmowska A, Horbacka K, Gach T, Markowska B, Kowalczyk T, Karoń J, Karczewski M, Szura M, Sanecka-Duin A, Blum A. Creating an Innovative Artificial Intelligence-Based Technology (TCRact) for Designing and Optimizing T Cell Receptors for Use in Cancer Immunotherapies: Protocol for an Observational Trial. JMIR Res Protoc 2023; 12:e45872. [PMID: 37440307 PMCID: PMC10375398 DOI: 10.2196/45872] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 06/01/2023] [Accepted: 06/02/2023] [Indexed: 07/14/2023] Open
Abstract
BACKGROUND Cancer continues to be the leading cause of mortality in high-income countries, necessitating the development of more precise and effective treatment modalities. Immunotherapy, specifically adoptive cell transfer of T cell receptor (TCR)-engineered T cells (TCR-T therapy), has shown promise in engaging the immune system for cancer treatment. One of the biggest challenges in the development of TCR-T therapies is the proper prediction of the pairing between TCRs and peptide-human leukocyte antigen (pHLAs). Modern computational immunology, using artificial intelligence (AI)-based platforms, provides the means to optimize the speed and accuracy of TCR screening and discovery. OBJECTIVE This study proposes an observational clinical trial protocol to collect patient samples and generate a database of pHLA:TCR sequences to aid the development of an AI-based platform for efficient selection of specific TCRs. METHODS The multicenter observational study, involving 8 participating hospitals, aims to enroll patients diagnosed with stage II, III, or IV colorectal cancer adenocarcinoma. RESULTS Patient recruitment has recently been completed, with 100 participants enrolled. Primary tumor tissue and peripheral blood samples have been obtained, and peripheral blood mononuclear cells have been isolated and cryopreserved. Nucleic acid extraction (DNA and RNA) has been performed in 86 cases. Additionally, 57 samples underwent whole exome sequencing to determine the presence of somatic mutations and RNA sequencing for gene expression profiling. CONCLUSIONS The results of this study may have a significant impact on the treatment of patients with colorectal cancer. The comprehensive database of pHLA:TCR sequences generated through this observational clinical trial will facilitate the development of the AI-based platform for TCR selection. The results obtained thus far demonstrate successful patient recruitment and sample collection, laying the foundation for further analysis and the development of an innovative tool to expedite and enhance TCR selection for precision cancer treatments. TRIAL REGISTRATION ClinicalTrials.gov NCT04994093; https://clinicaltrials.gov/ct2/show/NCT04994093. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID) DERR1-10.2196/45872.
Collapse
Affiliation(s)
- Joanna Bujak
- Ardigen SA, Cracow, Poland
- Department of Physics and Biophysics, Institute of Biology, Warsaw University of Life Sciences, Warszawa, Poland
| | - Stanisław Kłęk
- Surgical Oncology Clinic, Maria Sklodowska-Curie National Research Institute of Oncology, Cracow, Poland
| | | | | | | | | | - Zofia Orzeszko
- Department of General and Oncological Surgery, Brothers Hospitallers Hospital, Cracow, Poland
| | - Joanna Welanyk
- Surgical Oncology Clinic, Maria Sklodowska-Curie National Research Institute of Oncology, Cracow, Poland
| | - Grzegorz Torbicz
- Department of General Surgery and Surgical Oncology, Ludwik Rydygier Memorial Hospital, Cracow, Poland
| | - Mateusz Jęckowski
- Colon Cancer Unit, Department of Oncological Surgery, Voivodeship Multi-Specialist Center for Oncology and Traumatology, Lodz, Poland
| | - Tomasz Kucharczyk
- Holy Cross Cancer Center Clinic of Clinical Oncology, Cracow, Poland
| | - Łukasz Wohadlo
- Department of General Surgery, Andrzej Frycz Modrzewski Krakow University, Cracow, Poland
| | - Maciej Borys
- Department of General Surgery and Surgical Oncology, Ludwik Rydygier Memorial Hospital, Cracow, Poland
| | - Honorata Stadnik
- Department of General and Transplant Surgery, Poznan University of Medical Sciences, University Hospital, Poznan, Poland
| | - Michał Wysocki
- Department of General Surgery and Surgical Oncology, Ludwik Rydygier Memorial Hospital, Cracow, Poland
| | - Magdalena Kayser
- General and Colorectal Surgery Department, J Struś Multispecialist Municipal Hospital, Poznan, Poland
| | - Marta Ewa Słomka
- Colon Cancer Unit, Department of Oncological Surgery, Voivodeship Multi-Specialist Center for Oncology and Traumatology, Lodz, Poland
| | - Anna Kosmowska
- General and Colorectal Surgery Department, J Struś Multispecialist Municipal Hospital, Poznan, Poland
| | - Karolina Horbacka
- General and Colorectal Surgery Department, J Struś Multispecialist Municipal Hospital, Poznan, Poland
| | - Tomasz Gach
- Surgical Clinic Institute of Physiotherapy, Faculty of Health Sciences, Jagiellonian University Medical College, Cracow, Poland
| | - Beata Markowska
- Surgical Clinic Institute of Physiotherapy, Faculty of Health Sciences, Jagiellonian University Medical College, Cracow, Poland
| | - Tomasz Kowalczyk
- Department of General Surgery, Andrzej Frycz Modrzewski Krakow University, Cracow, Poland
| | - Jacek Karoń
- General and Colorectal Surgery Department, J Struś Multispecialist Municipal Hospital, Poznan, Poland
| | - Marek Karczewski
- Department of General and Transplant Surgery, Poznan University of Medical Sciences, University Hospital, Poznan, Poland
| | - Mirosław Szura
- Surgical Clinic Institute of Physiotherapy, Faculty of Health Sciences, Jagiellonian University Medical College, Cracow, Poland
| | | | | |
Collapse
|
26
|
Deng L, Ly C, Abdollahi S, Zhao Y, Prinz I, Bonn S. Performance comparison of TCR-pMHC prediction tools reveals a strong data dependency. Front Immunol 2023; 14:1128326. [PMID: 37143667 PMCID: PMC10152969 DOI: 10.3389/fimmu.2023.1128326] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Accepted: 03/24/2023] [Indexed: 05/06/2023] Open
Abstract
The interaction of T-cell receptors with peptide-major histocompatibility complex molecules (TCR-pMHC) plays a crucial role in adaptive immune responses. Currently there are various models aiming at predicting TCR-pMHC binding, while a standard dataset and procedure to compare the performance of these approaches is still missing. In this work we provide a general method for data collection, preprocessing, splitting and generation of negative examples, as well as comprehensive datasets to compare TCR-pMHC prediction models. We collected, harmonized, and merged all the major publicly available TCR-pMHC binding data and compared the performance of five state-of-the-art deep learning models (TITAN, NetTCR-2.0, ERGO, DLpTCR and ImRex) using this data. Our performance evaluation focuses on two scenarios: 1) different splitting methods for generating training and testing data to assess model generalization and 2) different data versions that vary in size and peptide imbalance to assess model robustness. Our results indicate that the five contemporary models do not generalize to peptides that have not been in the training set. We can also show that model performance is strongly dependent on the data balance and size, which indicates a relatively low model robustness. These results suggest that TCR-pMHC binding prediction remains highly challenging and requires further high quality data and novel algorithmic approaches.
Collapse
Affiliation(s)
- Lihua Deng
- Institute of Systems Immunology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Cedric Ly
- Institut of Medical Systems Biology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Sina Abdollahi
- Institut of Medical Systems Biology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Yu Zhao
- Institut of Medical Systems Biology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Immo Prinz
- Institute of Systems Immunology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Stefan Bonn
- Institut of Medical Systems Biology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| |
Collapse
|
27
|
Tippalagama R, Chihab LY, Kearns K, Lewis S, Panda S, Willemsen L, Burel JG, Lindestam Arlehamn CS. Antigen-specificity measurements are the key to understanding T cell responses. Front Immunol 2023; 14:1127470. [PMID: 37122719 PMCID: PMC10140422 DOI: 10.3389/fimmu.2023.1127470] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Accepted: 03/30/2023] [Indexed: 05/02/2023] Open
Abstract
Antigen-specific T cells play a central role in the adaptive immune response and come in a wide range of phenotypes. T cell receptors (TCRs) mediate the antigen-specificities found in T cells. Importantly, high-throughput TCR sequencing provides a fingerprint which allows tracking of specific T cells and their clonal expansion in response to particular antigens. As a result, many studies have leveraged TCR sequencing in an attempt to elucidate the role of antigen-specific T cells in various contexts. Here, we discuss the published approaches to studying antigen-specific T cells and their specific TCR repertoire. Further, we discuss how these methods have been applied to study the TCR repertoire in various diseases in order to characterize the antigen-specific T cells involved in the immune control of disease.
Collapse
|
28
|
Neoantigens: promising targets for cancer therapy. Signal Transduct Target Ther 2023; 8:9. [PMID: 36604431 PMCID: PMC9816309 DOI: 10.1038/s41392-022-01270-x] [Citation(s) in RCA: 360] [Impact Index Per Article: 180.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Revised: 11/14/2022] [Accepted: 11/27/2022] [Indexed: 01/07/2023] Open
Abstract
Recent advances in neoantigen research have accelerated the development and regulatory approval of tumor immunotherapies, including cancer vaccines, adoptive cell therapy and antibody-based therapies, especially for solid tumors. Neoantigens are newly formed antigens generated by tumor cells as a result of various tumor-specific alterations, such as genomic mutation, dysregulated RNA splicing, disordered post-translational modification, and integrated viral open reading frames. Neoantigens are recognized as non-self and trigger an immune response that is not subject to central and peripheral tolerance. The quick identification and prediction of tumor-specific neoantigens have been made possible by the advanced development of next-generation sequencing and bioinformatic technologies. Compared to tumor-associated antigens, the highly immunogenic and tumor-specific neoantigens provide emerging targets for personalized cancer immunotherapies, and serve as prospective predictors for tumor survival prognosis and immune checkpoint blockade responses. The development of cancer therapies will be aided by understanding the mechanism underlying neoantigen-induced anti-tumor immune response and by streamlining the process of neoantigen-based immunotherapies. This review provides an overview on the identification and characterization of neoantigens and outlines the clinical applications of prospective immunotherapeutic strategies based on neoantigens. We also explore their current status, inherent challenges, and clinical translation potential.
Collapse
|
29
|
Grazioli F, Machart P, Mösch A, Li K, Castorina LV, Pfeifer N, Min MR. Attentive Variational Information Bottleneck for TCR-peptide interaction prediction. Bioinformatics 2022; 39:6960920. [PMID: 36571499 PMCID: PMC9825246 DOI: 10.1093/bioinformatics/btac820] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Revised: 11/18/2022] [Accepted: 12/23/2022] [Indexed: 12/27/2022] Open
Abstract
MOTIVATION We present a multi-sequence generalization of Variational Information Bottleneck and call the resulting model Attentive Variational Information Bottleneck (AVIB). Our AVIB model leverages multi-head self-attention to implicitly approximate a posterior distribution over latent encodings conditioned on multiple input sequences. We apply AVIB to a fundamental immuno-oncology problem: predicting the interactions between T-cell receptors (TCRs) and peptides. RESULTS Experimental results on various datasets show that AVIB significantly outperforms state-of-the-art methods for TCR-peptide interaction prediction. Additionally, we show that the latent posterior distribution learned by AVIB is particularly effective for the unsupervised detection of out-of-distribution amino acid sequences. AVAILABILITY AND IMPLEMENTATION The code and the data used for this study are publicly available at: https://github.com/nec-research/vibtcr. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Pierre Machart
- Biomedical AI Group, NEC Laboratories Europe, Heidelberg 69115, Germany
| | - Anja Mösch
- Biomedical AI Group, NEC Laboratories Europe, Heidelberg 69115, Germany
| | - Kai Li
- Machine Learning Department, NEC Laboratories America, Princeton, NJ 08540, USA
| | | | - Nico Pfeifer
- Methods in Medical Informatics, Department of Computer Science, University of Tübingen, Tübingen 72076, Germany
| | | |
Collapse
|