51
|
Li G, Iyer B, Prasath VBS, Ni Y, Salomonis N. DeepImmuno: Deep learning-empowered prediction and generation of immunogenic peptides for T cell immunity. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2020:2020.12.24.424262. [PMID: 33398286 PMCID: PMC7781330 DOI: 10.1101/2020.12.24.424262] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
T-cells play an essential role in the adaptive immune system by seeking out, binding and destroying foreign antigens presented on the cell surface of diseased cells. An improved understanding of T-cell immunity will greatly aid in the development of new cancer immunotherapies and vaccines for life threatening pathogens. Central to the design of such targeted therapies are computational methods to predict non-native epitopes to elicit a T cell response, however, we currently lack accurate immunogenicity inference methods. Another challenge is the ability to accurately simulate immunogenic peptides for specific human leukocyte antigen (HLA) alleles, for both synthetic biological applications and to augment real training datasets. Here, we proposed a beta-binomial distribution approach to derive epitope immunogenic potential from sequence alone. We conducted systematic benchmarking of five traditional machine learning (ElasticNet, KNN, SVM, Random Forest, AdaBoost) and three deep learning models (CNN, ResNet, GNN) using three independent prior validated immunogenic peptide collections (dengue virus, cancer neoantigen and SARS-Cov-2). We chose the CNN model as the best prediction model based on its adaptivity for small and large datasets, and performance relative to existing methods. In addition to outperforming two highly used immunogenicity prediction algorithms, DeepHLApan and IEDB, DeepImmuno-CNN further correctly predicts which residues are most important for T cell antigen recognition. Our independent generative adversarial network (GAN) approach, DeepImmuno-GAN, was further able to accurately simulate immunogenic peptides with physiochemical properties and immunogenicity predictions similar to that of real antigens. We provide DeepImmuno-CNN as source code and an easy-to-use web interface. DATA AVAILABILITY DeepImmuno Python3 code is available at https://github.com/frankligy/DeepImmuno . The DeepImmuno web portal is available from https://deepimmuno.herokuapp.com . The data in this article is available in GitHub and supplementary materials.
Collapse
Affiliation(s)
- Guangyuan Li
- Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA
- Department of Biomedical Informatics, College of Medicine, University of Cincinnati, OH, 45267 USA
| | - Balaji Iyer
- Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA
- Department of Electrical Engineering and Computer Science, University of Cincinnati, OH 45221 USA
| | - V. B. Surya Prasath
- Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA
- Department of Pediatrics, University of Cincinnati School of Medicine, Cincinnati, Ohio, USA
- Department of Biomedical Informatics, College of Medicine, University of Cincinnati, OH, 45267 USA
- Department of Electrical Engineering and Computer Science, University of Cincinnati, OH 45221 USA
| | - Yizhao Ni
- Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA
- Department of Pediatrics, University of Cincinnati School of Medicine, Cincinnati, Ohio, USA
- Department of Biomedical Informatics, College of Medicine, University of Cincinnati, OH, 45267 USA
| | - Nathan Salomonis
- Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA
- Department of Pediatrics, University of Cincinnati School of Medicine, Cincinnati, Ohio, USA
- Department of Biomedical Informatics, College of Medicine, University of Cincinnati, OH, 45267 USA
- Department of Electrical Engineering and Computer Science, University of Cincinnati, OH 45221 USA
| |
Collapse
|
52
|
Wen B, Zeng W, Liao Y, Shi Z, Savage SR, Jiang W, Zhang B. Deep Learning in Proteomics. Proteomics 2020; 20:e1900335. [PMID: 32939979 PMCID: PMC7757195 DOI: 10.1002/pmic.201900335] [Citation(s) in RCA: 78] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Revised: 09/14/2020] [Indexed: 12/17/2022]
Abstract
Proteomics, the study of all the proteins in biological systems, is becoming a data-rich science. Protein sequences and structures are comprehensively catalogued in online databases. With recent advancements in tandem mass spectrometry (MS) technology, protein expression and post-translational modifications (PTMs) can be studied in a variety of biological systems at the global scale. Sophisticated computational algorithms are needed to translate the vast amount of data into novel biological insights. Deep learning automatically extracts data representations at high levels of abstraction from data, and it thrives in data-rich scientific research domains. Here, a comprehensive overview of deep learning applications in proteomics, including retention time prediction, MS/MS spectrum prediction, de novo peptide sequencing, PTM prediction, major histocompatibility complex-peptide binding prediction, and protein structure prediction, is provided. Limitations and the future directions of deep learning in proteomics are also discussed. This review will provide readers an overview of deep learning and how it can be used to analyze proteomics data.
Collapse
Affiliation(s)
- Bo Wen
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Wen‐Feng Zeng
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS)Chinese Academy of SciencesInstitute of Computing TechnologyBeijing100190China
| | - Yuxing Liao
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Zhiao Shi
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Sara R. Savage
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Wen Jiang
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Bing Zhang
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| |
Collapse
|
53
|
Cai J, Xu Y, Zhang W, Ding S, Sun Y, Lyu J, Duan M, Liu S, Huang L, Zhou F. A comprehensive comparison of residue-level methylation levels with the regression-based gene-level methylation estimations by ReGear. Brief Bioinform 2020; 22:5921981. [PMID: 33048108 DOI: 10.1093/bib/bbaa253] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2020] [Revised: 08/10/2020] [Accepted: 09/08/2020] [Indexed: 02/07/2023] Open
Abstract
MOTIVATION DNA methylation is a biological process impacting the gene functions without changing the underlying DNA sequence. The DNA methylation machinery usually attaches methyl groups to some specific cytosine residues, which modify the chromatin architectures. Such modifications in the promoter regions will inactivate some tumor-suppressor genes. DNA methylation within the coding region may significantly reduce the transcription elongation efficiency. The gene function may be tuned through some cytosines are methylated. METHODS This study hypothesizes that the overall methylation level across a gene may have a better association with the sample labels like diseases than the methylations of individual cytosines. The gene methylation level is formulated as a regression model using the methylation levels of all the cytosines within this gene. A comprehensive evaluation of various feature selection algorithms and classification algorithms is carried out between the gene-level and residue-level methylation levels. RESULTS A comprehensive evaluation was conducted to compare the gene and cytosine methylation levels for their associations with the sample labels and classification performances. The unsupervised clustering was also improved using the gene methylation levels. Some genes demonstrated statistically significant associations with the class label, even when no residue-level methylation features have statistically significant associations with the class label. So in summary, the trained gene methylation levels improved various methylome-based machine learning models. Both methodology development of regression algorithms and experimental validation of the gene-level methylation biomarkers are worth of further investigations in the future studies. The source code, example data files and manual are available at http://www.healthinformaticslab.org/supp/.
Collapse
|
54
|
Fu H, Cao Z, Li M, Wang S. ACEP: improving antimicrobial peptides recognition through automatic feature fusion and amino acid embedding. BMC Genomics 2020; 21:597. [PMID: 32859150 PMCID: PMC7455913 DOI: 10.1186/s12864-020-06978-0] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Accepted: 08/11/2020] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Antimicrobial resistance is one of our most serious health threats. Antimicrobial peptides (AMPs), effecter molecules of innate immune system, can defend host organisms against microbes and most have shown a lowered likelihood for bacteria to form resistance compared to many conventional drugs. Thus, AMPs are gaining popularity as better substitute to antibiotics. To aid researchers in novel AMPs discovery, we design computational approaches to screen promising candidates. RESULTS In this work, we design a deep learning model that can learn amino acid embedding patterns, automatically extract sequence features, and fuse heterogeneous information. Results show that the proposed model outperforms state-of-the-art methods on recognition of AMPs. By visualizing data in some layers of the model, we overcome the black-box nature of deep learning, explain the working mechanism of the model, and find some import motifs in sequences. CONCLUSIONS ACEP model can capture similarity between amino acids, calculate attention scores for different parts of a peptide sequence in order to spot important parts that significantly contribute to final predictions, and automatically fuse a variety of heterogeneous information or features. For high-throughput AMPs recognition, open source software and datasets are made freely available at https://github.com/Fuhaoyi/ACEP .
Collapse
Affiliation(s)
- Haoyi Fu
- School of Information Science and Engineering, Yunnan University, Kunming, 650500, China
| | - Zicheng Cao
- School of Public Health (Shenzhen), Sun Yat-sen University, Guangzhou, 510006, China
| | - Mingyuan Li
- School of Information Science and Engineering, Yunnan University, Kunming, 650500, China
| | - Shunfang Wang
- School of Information Science and Engineering, Yunnan University, Kunming, 650500, China.
| |
Collapse
|
55
|
Springer I, Besser H, Tickotsky-Moskovitz N, Dvorkin S, Louzoun Y. Prediction of Specific TCR-Peptide Binding From Large Dictionaries of TCR-Peptide Pairs. Front Immunol 2020; 11:1803. [PMID: 32983088 PMCID: PMC7477042 DOI: 10.3389/fimmu.2020.01803] [Citation(s) in RCA: 100] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2020] [Accepted: 07/06/2020] [Indexed: 11/13/2022] Open
Abstract
Current sequencing methods allow for detailed samples of T cell receptors (TCR) repertoires. To determine from a repertoire whether its host had been exposed to a target, computational tools that predict TCR-epitope binding are required. Currents tools are based on conserved motifs and are applied to peptides with many known binding TCRs. We employ new Natural Language Processing (NLP) based methods to predict whether any TCR and peptide bind. We combined large-scale TCR-peptide dictionaries with deep learning methods to produce ERGO (pEptide tcR matchinG predictiOn), a highly specific and generic TCR-peptide binding predictor. A set of standard tests are defined for the performance of peptide-TCR binding, including the detection of TCRs binding to a given peptide/antigen, choosing among a set of candidate peptides for a given TCR and determining whether any pair of TCR-peptide bind. ERGO reaches similar results to state of the art methods in these tests even when not trained specifically for each test. The software implementation and data sets are available at https://github.com/louzounlab/ERGO. ERGO is also available through a webserver at: http://tcr.cs.biu.ac.il/.
Collapse
Affiliation(s)
- Ido Springer
- Department of Mathematics, Bar Ilan University, Ramat Gan, Israel
| | - Hanan Besser
- Department of Mathematics, Bar Ilan University, Ramat Gan, Israel
| | | | - Shirit Dvorkin
- Department of Mathematics, Bar Ilan University, Ramat Gan, Israel
| | - Yoram Louzoun
- Department of Mathematics, Bar Ilan University, Ramat Gan, Israel
| |
Collapse
|
56
|
|
57
|
Uncovering the Tumor Antigen Landscape: What to Know about the Discovery Process. Cancers (Basel) 2020; 12:cancers12061660. [PMID: 32585818 PMCID: PMC7352969 DOI: 10.3390/cancers12061660] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2020] [Revised: 06/11/2020] [Accepted: 06/20/2020] [Indexed: 12/14/2022] Open
Abstract
According to the latest available data, cancer is the second leading cause of death, highlighting the need for novel cancer therapeutic approaches. In this context, immunotherapy is emerging as a reliable first-line treatment for many cancers, particularly metastatic melanoma. Indeed, cancer immunotherapy has attracted great interest following the recent clinical approval of antibodies targeting immune checkpoint molecules, such as PD-1, PD-L1, and CTLA-4, that release the brakes of the immune system, thus reviving a field otherwise poorly explored. Cancer immunotherapy mainly relies on the generation and stimulation of cytotoxic CD8 T lymphocytes (CTLs) within the tumor microenvironment (TME), priming T cells and establishing efficient and durable anti-tumor immunity. Therefore, there is a clear need to define and identify immunogenic T cell epitopes to use in therapeutic cancer vaccines. Naturally presented antigens in the human leucocyte antigen-1 (HLA-I) complex on the tumor surface are the main protagonists in evocating a specific anti-tumor CD8+ T cell response. However, the methodologies for their identification have been a major bottleneck for their reliable characterization. Consequently, the field of antigen discovery has yet to improve. The current review is intended to define what are today known as tumor antigens, with a main focus on CTL antigenic peptides. We also review the techniques developed and employed to date for antigen discovery, exploring both the direct elution of HLA-I peptides and the in silico prediction of epitopes. Finally, the last part of the review analyses the future challenges and direction of the antigen discovery field.
Collapse
|
58
|
Mösch A, Raffegerst S, Weis M, Schendel DJ, Frishman D. Machine Learning for Cancer Immunotherapies Based on Epitope Recognition by T Cell Receptors. Front Genet 2019; 10:1141. [PMID: 31798635 PMCID: PMC6878726 DOI: 10.3389/fgene.2019.01141] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2019] [Accepted: 10/21/2019] [Indexed: 12/30/2022] Open
Abstract
In the last years, immunotherapies have shown tremendous success as treatments for multiple types of cancer. However, there are still many obstacles to overcome in order to increase response rates and identify effective therapies for every individual patient. Since there are many possibilities to boost a patient's immune response against a tumor and not all can be covered, this review is focused on T cell receptor-mediated therapies. CD8+ T cells can detect and destroy malignant cells by binding to peptides presented on cell surfaces by MHC (major histocompatibility complex) class I molecules. CD4+ T cells can also mediate powerful immune responses but their peptide recognition by MHC class II molecules is more complex, which is why the attention has been focused on CD8+ T cells. Therapies based on the power of T cells can, on the one hand, enhance T cell recognition by introducing TCRs that preferentially direct T cells to tumor sites (so called TCR-T therapy) or through vaccination to induce T cells in vivo. On the other hand, T cell activity can be improved by immune checkpoint inhibition or other means that help create a microenvironment favorable for cytotoxic T cell activity. The manifold ways in which the immune system and cancer interact with each other require not only the use of large omics datasets from gene, to transcript, to protein, and to peptide but also make the application of machine learning methods inevitable. Currently, discovering and selecting suitable TCRs is a very costly and work intensive in vitro process. To facilitate this process and to additionally allow for highly personalized therapies that can simultaneously target multiple patient-specific antigens, especially neoepitopes, breakthrough computational methods for predicting antigen presentation and TCR binding are urgently required. Particularly, potential cross-reactivity is a major consideration since off-target toxicity can pose a major threat to patient safety. The current speed at which not only datasets grow and are made available to the public, but also at which new machine learning methods evolve, is assuring that computational approaches will be able to help to solve problems that immunotherapies are still facing.
Collapse
Affiliation(s)
- Anja Mösch
- Department of Bioinformatics, Wissenschaftszentrum Weihenstephan, Technische Universität München, Freising, Germany
- Medigene Immunotherapies GmbH, a subsidiary of Medigene AG, Planegg, Germany
| | - Silke Raffegerst
- Medigene Immunotherapies GmbH, a subsidiary of Medigene AG, Planegg, Germany
| | - Manon Weis
- Medigene Immunotherapies GmbH, a subsidiary of Medigene AG, Planegg, Germany
| | - Dolores J. Schendel
- Medigene Immunotherapies GmbH, a subsidiary of Medigene AG, Planegg, Germany
| | - Dmitrij Frishman
- Department of Bioinformatics, Wissenschaftszentrum Weihenstephan, Technische Universität München, Freising, Germany
| |
Collapse
|
59
|
Wu J, Wang W, Zhang J, Zhou B, Zhao W, Su Z, Gu X, Wu J, Zhou Z, Chen S. DeepHLApan: A Deep Learning Approach for Neoantigen Prediction Considering Both HLA-Peptide Binding and Immunogenicity. Front Immunol 2019; 10:2559. [PMID: 31736974 PMCID: PMC6838785 DOI: 10.3389/fimmu.2019.02559] [Citation(s) in RCA: 75] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Accepted: 10/15/2019] [Indexed: 12/30/2022] Open
Abstract
Neoantigens play important roles in cancer immunotherapy. Current methods used for neoantigen prediction focus on the binding between human leukocyte antigens (HLAs) and peptides, which is insufficient for high-confidence neoantigen prediction. In this study, we apply deep learning techniques to predict neoantigens considering both the possibility of HLA-peptide binding (binding model) and the potential immunogenicity (immunogenicity model) of the peptide-HLA complex (pHLA). The binding model achieves comparable performance with other well-acknowledged tools on the latest Immune Epitope Database (IEDB) benchmark datasets and an independent mass spectrometry (MS) dataset. The immunogenicity model could significantly improve the prediction precision of neoantigens. The further application of our method to the mutations with pre-existing T-cell responses indicating its feasibility in clinical application. DeepHLApan is freely available at https://github.com/jiujiezz/deephlapan and http://biopharm.zju.edu.cn/deephlapan.
Collapse
Affiliation(s)
- Jingcheng Wu
- Institute of Drug Metabolism and Pharmaceutical Analysis and Zhejiang Provincial Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China.,College of Computer Science and Technology, Zhejiang University, Hangzhou, China
| | - Wenzhe Wang
- College of Computer Science and Technology, Zhejiang University, Hangzhou, China
| | - Jiucheng Zhang
- College of Computer Science and Technology, Zhejiang University, Hangzhou, China
| | - Binbin Zhou
- College of Computer Science and Technology, Zhejiang University, Hangzhou, China
| | - Wenyi Zhao
- Institute of Drug Metabolism and Pharmaceutical Analysis and Zhejiang Provincial Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China.,College of Computer Science and Technology, Zhejiang University, Hangzhou, China
| | - Zhixi Su
- MOE Key Laboratory of Contemporary Anthropology, School of Life Sciences, Fudan University, Shanghai, China
| | - Xun Gu
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA, United States
| | - Jian Wu
- College of Computer Science and Technology, Zhejiang University, Hangzhou, China
| | - Zhan Zhou
- Institute of Drug Metabolism and Pharmaceutical Analysis and Zhejiang Provincial Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Shuqing Chen
- Institute of Drug Metabolism and Pharmaceutical Analysis and Zhejiang Provincial Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| |
Collapse
|