1
|
Asediya VS, Anjaria PA, Mathakiya RA, Koringa PG, Nayak JB, Bisht D, Fulmali D, Patel VA, Desai DN. Vaccine development using artificial intelligence and machine learning: A review. Int J Biol Macromol 2024; 282:136643. [PMID: 39426778 DOI: 10.1016/j.ijbiomac.2024.136643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2024] [Revised: 09/30/2024] [Accepted: 10/15/2024] [Indexed: 10/21/2024]
Abstract
The COVID-19 pandemic has underscored the critical importance of effective vaccines, yet their development is a challenging and demanding process. It requires identifying antigens that elicit protective immunity, selecting adjuvants that enhance immunogenicity, and designing delivery systems that ensure optimal efficacy. Artificial intelligence (AI) can facilitate this process by using machine learning methods to analyze large and diverse datasets, suggest novel vaccine candidates, and refine their design and predict their performance. This review explores how AI can be applied to various aspects of vaccine development, such as predicting immune response from protein sequences, discovering adjuvants, optimizing vaccine doses, modeling vaccine supply chains, and predicting protein structures. We also address the challenges and ethical issues that emerge from the use of AI in vaccine development, such as data privacy, algorithmic bias, and health data sensitivity. We contend that AI has immense potential to accelerate vaccine development and respond to future pandemics, but it also requires careful attention to the quality and validity of the data and methods used.
Collapse
Affiliation(s)
| | | | | | | | | | - Deepanker Bisht
- Indian Veterinary Research Institute, Izatnagar, U.P., India
| | | | | | | |
Collapse
|
2
|
Meng F, Zhou N, Hu G, Liu R, Zhang Y, Jing M, Hou Q. A comprehensive overview of recent advances in generative models for antibodies. Comput Struct Biotechnol J 2024; 23:2648-2660. [PMID: 39027650 PMCID: PMC11254834 DOI: 10.1016/j.csbj.2024.06.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2024] [Revised: 06/15/2024] [Accepted: 06/18/2024] [Indexed: 07/20/2024] Open
Abstract
Therapeutic antibodies are an important class of biopharmaceuticals. With the rapid development of deep learning methods and the increasing amount of antibody data, antibody generative models have made great progress recently. They aim to solve the antibody space searching problems and are widely incorporated into the antibody development process. Therefore, a comprehensive introduction to the development methods in this field is imperative. Here, we collected 34 representative antibody generative models published recently and all generative models can be divided into three categories: sequence-generating models, structure-generating models, and hybrid models, based on their principles and algorithms. We further studied their performance and contributions to antibody sequence prediction, structure optimization, and affinity enhancement. Our manuscript will provide a comprehensive overview of the status of antibody generative models and also offer guidance for selecting different approaches.
Collapse
Affiliation(s)
- Fanxu Meng
- College of Chemical Engineering, Qingdao University of Science and Technology, Qingdao 266042, China
| | - Na Zhou
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan 250100, China
- National Institute of Health Data Science of China, Shandong University, Jinan 250100, China
| | - Guangchun Hu
- School of Information Science and Engineering, University of Jinan, Jinan 250022, China
| | - Ruotong Liu
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan 250100, China
- National Institute of Health Data Science of China, Shandong University, Jinan 250100, China
| | - Yuanyuan Zhang
- College of Chemical Engineering, Qingdao University of Science and Technology, Qingdao 266042, China
| | - Ming Jing
- Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, China
- Shandong Provincial Key Laboratory of Computer Networks, Shandong Fundamental Research Center for Computer Science, Jinan 250000, China
| | - Qingzhen Hou
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan 250100, China
- National Institute of Health Data Science of China, Shandong University, Jinan 250100, China
| |
Collapse
|
3
|
Gu M, Yang W, Liu M. Prediction of antibody-antigen interaction based on backbone aware with invariant point attention. BMC Bioinformatics 2024; 25:348. [PMID: 39506679 PMCID: PMC11542381 DOI: 10.1186/s12859-024-05961-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2024] [Accepted: 10/16/2024] [Indexed: 11/08/2024] Open
Abstract
BACKGROUND Antibodies play a crucial role in disease treatment, leveraging their ability to selectively interact with the specific antigen. However, screening antibody gene sequences for target antigens via biological experiments is extremely time-consuming and labor-intensive. Several computational methods have been developed to predict antibody-antigen interaction while suffering from the lack of characterizing the underlying structure of the antibody. RESULTS Beneficial from the recent breakthroughs in deep learning for antibody structure prediction, we propose a novel neural network architecture to predict antibody-antigen interaction. We first introduce AbAgIPA: an antibody structure prediction network to obtain the antibody backbone structure, where the structural features of antibodies and antigens are encoded into representation vectors according to the amino acid physicochemical features and Invariant Point Attention (IPA) computation methods. Finally, the antibody-antigen interaction is predicted by global max pooling, feature concatenation, and a fully connected layer. We evaluated our method on antigen diversity and antigen-specific antibody-antigen interaction datasets. Additionally, our model exhibits a commendable level of interpretability, essential for understanding underlying interaction mechanisms. CONCLUSIONS Quantitative experimental results demonstrate that the new neural network architecture significantly outperforms the best sequence-based methods as well as the methods based on residue contact maps and graph convolution networks (GCNs). The source code is freely available on GitHub at https://github.com/gmthu66/AbAgIPA .
Collapse
Affiliation(s)
- Miao Gu
- Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Weiyang Yang
- Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Min Liu
- Department of Automation, Tsinghua University, Beijing, 100084, China.
| |
Collapse
|
4
|
Li Y, Nan X, Zhang S, Zhou Q, Lu S, Tian Z. PMSFF: Improved Protein Binding Residues Prediction through Multi-Scale Sequence-Based Feature Fusion Strategy. Biomolecules 2024; 14:1220. [PMID: 39456153 PMCID: PMC11506650 DOI: 10.3390/biom14101220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2024] [Revised: 09/22/2024] [Accepted: 09/24/2024] [Indexed: 10/28/2024] Open
Abstract
Proteins perform different biological functions through binding with various molecules which are mediated by a few key residues and accurate prediction of such protein binding residues (PBRs) is crucial for understanding cellular processes and for designing new drugs. Many computational prediction approaches have been proposed to identify PBRs with sequence-based features. However, these approaches face two main challenges: (1) these methods only concatenate residue feature vectors with a simple sliding window strategy, and (2) it is challenging to find a uniform sliding window size suitable for learning embeddings across different types of PBRs. In this study, we propose one novel framework that could apply multiple types of PBRs Prediciton task through Multi-scale Sequence-based Feature Fusion (PMSFF) strategy. Firstly, PMSFF employs a pre-trained language model named ProtT5, to encode amino acid residues in protein sequences. Then, it generates multi-scale residue embeddings by applying multi-size windows to capture effective neighboring residues and multi-size kernels to learn information across different scales. Additionally, the proposed model treats protein sequences as sentences, employing a bidirectional GRU to learn global context. We also collect benchmark datasets encompassing various PBRs types and evaluate our PMSFF approach to these datasets. Compared with state-of-the-art methods, PMSFF demonstrates superior performance on most PBRs prediction tasks.
Collapse
Affiliation(s)
- Yuguang Li
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China; (Y.L.); (X.N.); (Q.Z.)
| | - Xiaofei Nan
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China; (Y.L.); (X.N.); (Q.Z.)
| | - Shoutao Zhang
- School of Life Sciences, Zhengzhou University, Zhengzhou 450001, China;
- Longhu Laboratory of Advanced Immunology, Zhengzhou 450001, China
| | - Qinglei Zhou
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China; (Y.L.); (X.N.); (Q.Z.)
| | - Shuai Lu
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China; (Y.L.); (X.N.); (Q.Z.)
- National Supercomputing Center in Zhengzhou, Zhengzhou University, Zhengzhou 450001, China
| | - Zhen Tian
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China; (Y.L.); (X.N.); (Q.Z.)
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324003, China
| |
Collapse
|
5
|
Wu J, Liu B, Zhang J, Wang Z, Li J. DL-PPI: a method on prediction of sequenced protein-protein interaction based on deep learning. BMC Bioinformatics 2023; 24:473. [PMID: 38097937 PMCID: PMC10722729 DOI: 10.1186/s12859-023-05594-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 12/01/2023] [Indexed: 12/17/2023] Open
Abstract
PURPOSE Sequenced Protein-Protein Interaction (PPI) prediction represents a pivotal area of study in biology, playing a crucial role in elucidating the mechanistic underpinnings of diseases and facilitating the design of novel therapeutic interventions. Conventional methods for extracting features through experimental processes have proven to be both costly and exceedingly complex. In light of these challenges, the scientific community has turned to computational approaches, particularly those grounded in deep learning methodologies. Despite the progress achieved by current deep learning technologies, their effectiveness diminishes when applied to larger, unfamiliar datasets. RESULTS In this study, the paper introduces a novel deep learning framework, termed DL-PPI, for predicting PPIs based on sequence data. The proposed framework comprises two key components aimed at improving the accuracy of feature extraction from individual protein sequences and capturing relationships between proteins in unfamiliar datasets. 1. Protein Node Feature Extraction Module: To enhance the accuracy of feature extraction from individual protein sequences and facilitate the understanding of relationships between proteins in unknown datasets, the paper devised a novel protein node feature extraction module utilizing the Inception method. This module efficiently captures relevant patterns and representations within protein sequences, enabling more informative feature extraction. 2. Feature-Relational Reasoning Network (FRN): In the Global Feature Extraction module of our model, the paper developed a novel FRN that leveraged Graph Neural Networks to determine interactions between pairs of input proteins. The FRN effectively captures the underlying relational information between proteins, contributing to improved PPI predictions. DL-PPI framework demonstrates state-of-the-art performance in the realm of sequence-based PPI prediction.
Collapse
Affiliation(s)
- Jiahui Wu
- Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
| | - Bo Liu
- School of Mathematical and Computational Sciences, Massey University, Auckland, 0745, New Zealand.
| | - Jidong Zhang
- Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
| | - Zhihan Wang
- Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
| | - Jianqiang Li
- Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
| |
Collapse
|
6
|
Zeng X, Bai G, Sun C, Ma B. Recent Progress in Antibody Epitope Prediction. Antibodies (Basel) 2023; 12:52. [PMID: 37606436 PMCID: PMC10443277 DOI: 10.3390/antib12030052] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 07/31/2023] [Accepted: 08/03/2023] [Indexed: 08/23/2023] Open
Abstract
Recent progress in epitope prediction has shown promising results in the development of vaccines and therapeutics against various diseases. However, the overall accuracy and success rate need to be improved greatly to gain practical application significance, especially conformational epitope prediction. In this review, we examined the general features of antibody-antigen recognition, highlighting the conformation selection mechanism in flexible antibody-antigen binding. We recently highlighted the success and warning signs of antibody epitope predictions, including linear and conformation epitope predictions. While deep learning-based models gradually outperform traditional feature-based machine learning, sequence and structure features still provide insight into antibody-antigen recognition problems.
Collapse
Affiliation(s)
- Xincheng Zeng
- Engineering Research Center of Cell & Therapeutic Antibody (MOE), School of Pharmacy, Shanghai Jiao Tong University, Shanghai 200240, China; (X.Z.); (C.S.)
| | - Ganggang Bai
- Engineering Research Center of Cell & Therapeutic Antibody (MOE), School of Pharmacy, Shanghai Jiao Tong University, Shanghai 200240, China; (X.Z.); (C.S.)
| | - Chuance Sun
- Engineering Research Center of Cell & Therapeutic Antibody (MOE), School of Pharmacy, Shanghai Jiao Tong University, Shanghai 200240, China; (X.Z.); (C.S.)
| | - Buyong Ma
- Engineering Research Center of Cell & Therapeutic Antibody (MOE), School of Pharmacy, Shanghai Jiao Tong University, Shanghai 200240, China; (X.Z.); (C.S.)
- Shanghai Digiwiser Biological, Inc., Shanghai 200131, China
| |
Collapse
|
7
|
Dopico XC, Mandolesi M, Hedestam GBK. Untangling immunoglobulin genotype-function associations. Immunol Lett 2023:S0165-2478(23)00073-1. [PMID: 37209913 DOI: 10.1016/j.imlet.2023.05.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Revised: 04/19/2023] [Accepted: 05/12/2023] [Indexed: 05/22/2023]
Abstract
Immunoglobulin (IG) genes, encoding B cell receptors (BCRs), are fundamental components of the mammalian immune system, which evolved to recognize the diverse antigenic universe present in nature. To handle these myriad inputs, BCRs are generated through combinatorial recombination of a set of highly polymorphic germline genes, resulting in a vast repertoire of antigen receptors that initiate responses to pathogens and regulate commensals. Following antigen recognition and B cell activation, memory B cells and plasma cells form, allowing for the development of anamnestic antibody (Ab) responses. How inherited variation in IG genes impacts host traits, disease susceptibility, and Ab recall responses is a topic of great interest. Here, we consider approaches to translate emerging knowledge about IG genetic diversity and expressed repertoires to inform our understanding of Ab function in health and disease etiology. As our understanding of IG genetics grows, so will our need for tools to decipher preferences for IG gene or allele usage in different contexts, to better understand antibody responses at the population level.
Collapse
Affiliation(s)
- Xaquin Castro Dopico
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm 17177, Sweden.
| | - Marco Mandolesi
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm 17177, Sweden
| | | |
Collapse
|
8
|
Rahmani F, Imani Fooladi AA, Ajoudanifar H, Soleimani NA. In silico and experimental methods for designing a potent anticancer arazyme-herceptin fusion protein in HER2-positive breast cancer. J Mol Model 2023; 29:160. [PMID: 37103612 DOI: 10.1007/s00894-023-05562-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Accepted: 04/17/2023] [Indexed: 04/28/2023]
Abstract
CONTEXT Breast cancer is the most prevalent type of malignancies among women worldwide and is associated with serious physical and mental consequences. Current chemotherapies may lack successful outcomes; thus, the development of targeted recombinant immunotoxins is plausible. The predicted B cell and T cell epitopes of arazyme of the fusion protein are able to elicit immune response. The results of codon adaptation tool of herceptin-arazyme have improved from 0.4 to 1. The in silico immune simulation results showed significant response for immune cells. In conclusion, our findings show that the known multi-epitope fusion protein may activate humoral and cellular immune responses and maybe a possible candidate for breast cancer treatment. METHODS In this study, the selected monoclonal antibody constituting herceptin and the bacterial metalloprotease, arazyme, was used with different peptide linkers to design a novel fusion protein to predict different B cell and T cell epitopes by the means of the relevant databases. Modeler 10.1 and I-TASSER online server were used to predict and validate the 3D structure and then docked to HER2-receptor using HADDOCK2.4 web server. The molecular dynamics (MD) simulations of the arazyme-linker-herceptin-HER2 complex were performed by GROMACS 2019.6 software. The sequence of arazyme-herceptin was optimized for the expression in prokaryotic host using online servers and cloned into pET-28a plasmid. The recombinant pET28a was transferred into the Escherichia coli BL21DE3. Expression and binding affinity of arazyme-herceptin and arazyme to human breast cancer cell lines (SK-BR-3/HER2 + and MDA-MB-468/HER2 -) were validated by the SDS-PAGE and cell‑ELISA, respectively.
Collapse
Affiliation(s)
- Farideh Rahmani
- Department of Microbiology, Damghan Branch, Islamic Azad University, Damghan, Iran
| | - Abbas Ali Imani Fooladi
- Applied Microbiology Research Center, Systems Biology and Poisonings Institute, Baqiyatallah University of Medical Sciences, Tehran, Iran.
| | - Hatef Ajoudanifar
- Department of Microbiology, Damghan Branch, Islamic Azad University, Damghan, Iran
| | | |
Collapse
|
9
|
de Oliveira Matos A, dos Santos Dantas PH, Colmenares MTC, Sartori GR, Silva-Sales M, Da Silva JHM, Neves BJ, Andrade CH, Sales-Campos H. The CDR3 region as the major driver of TREM-1 interaction with its ligands, an in silico characterization. Comput Struct Biotechnol J 2023; 21:2579-2590. [PMID: 37122631 PMCID: PMC10130352 DOI: 10.1016/j.csbj.2023.04.008] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Revised: 04/07/2023] [Accepted: 04/12/2023] [Indexed: 05/02/2023] Open
Abstract
The triggering receptor expressed on myeloid cells-1 (TREM-1) is a pattern recognition receptor heavily investigated in infectious and non-infectious diseases. Because of its role in amplifying inflammation, TREM-1 has been explored as a diagnostic/prognostic biomarker. Further, as the receptor has been implicated in the pathophysiology of several diseases, therapies aiming at modulating its activity represent a promising strategy to constrain uncontrolled inflammatory or infectious diseases. Despite this, several aspects concerning its interaction with ligands and activation process, remain unclear. Although many molecules have been suggested as TREM-1 ligands, only five have been confirmed to interact with the receptor: actin, eCIRP, HMGB1, Hsp70 and PGLYRP1. However, the domains involved in the interaction between the receptor and these proteins are not clarified yet. Therefore, here we used in silico approaches to investigate the putative binding domains in the receptor, using hot spots analysis, molecular docking and molecular dynamics simulations between TREM-1 and its five known ligands. Our results indicated the complementarity-determining regions (CDRs) of the receptor as the main mediators of antigen recognition, especially the CDR3 loop. We believe that our study could be used as structural basis for the elucidation of TREM-1's recognition process, and may be useful for prospective in silico and biological investigations exploring the receptor in different contexts.
Collapse
Affiliation(s)
| | | | | | | | - Marcelle Silva-Sales
- Instituto de Patologia Tropical e Saúde Pública, Universidade Federal de Goiás, Goiânia, Brazil
| | | | - Bruno Junior Neves
- LabMol – Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Universidade Federal de Goiás, Goiânia, Brazil
| | - Carolina Horta Andrade
- LabMol – Laboratory for Molecular Modeling and Drug Design, Faculty of Pharmacy, Universidade Federal de Goiás, Goiânia, Brazil
| | | |
Collapse
|
10
|
Abstract
Antibody-mediated neurological diseases constitute an emerging clinical entity that remains to be fully explored. Recent studies identified autoantibodies that directly confer pathogenicity, and it was shown that in these cases immunotherapies can result in profound positive patient responses. These advances highlight the urgent need for improved means to effectively screen patient samples for novel autoantibodies (aAbs) and their subsequent characterization. Here, we discuss challenges and opportunities for peptide microarrays to contribute to the identification, mapping, and characterization of the underlying monospecific disease-defining binding surfaces. We outline control experiments, workflow modifications and bioinformatic filtering methods that enhance the predictive power of array-based studies. Further, we highlight experimental and computer-based display approaches that have the potential to expand the use of synthetic microarrays over the detection of discontinuous epitopes. Knowledge over the autoantibody epitopes in neurological disease will enhance our understanding of the pathological mechanisms and thereby potentially contribute to novel diagnostic approaches or even innovative antigen-specific treatments that avoid the serious adverse effects seen with currently used immunosuppressive therapies.
Collapse
Affiliation(s)
- Ivan Talucci
- Rudolf Virchow Center, Center for Integrative and Translational Bioimaging, University of Würzburg, Würzburg, Germany
| | - Hans Michael Maric
- Rudolf Virchow Center, Center for Integrative and Translational Bioimaging, University of Würzburg, Würzburg, Germany.
| |
Collapse
|
11
|
Hou Q, Waury K, Gogishvili D, Feenstra KA. Ten quick tips for sequence-based prediction of protein properties using machine learning. PLoS Comput Biol 2022; 18:e1010669. [PMID: 36454728 PMCID: PMC9714715 DOI: 10.1371/journal.pcbi.1010669] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
The ubiquitous availability of genome sequencing data explains the popularity of machine learning-based methods for the prediction of protein properties from their amino acid sequences. Over the years, while revising our own work, reading submitted manuscripts as well as published papers, we have noticed several recurring issues, which make some reported findings hard to understand and replicate. We suspect this may be due to biologists being unfamiliar with machine learning methodology, or conversely, machine learning experts may miss some of the knowledge needed to correctly apply their methods to proteins. Here, we aim to bridge this gap for developers of such methods. The most striking issues are linked to a lack of clarity: how were annotations of interest obtained; which benchmark metrics were used; how are positives and negatives defined. Others relate to a lack of rigor: If you sneak in structural information, your method is not sequence-based; if you compare your own model to "state-of-the-art," take the best methods; if you want to conclude that some method is better than another, obtain a significance estimate to support this claim. These, and other issues, we will cover in detail. These points may have seemed obvious to the authors during writing; however, they are not always clear-cut to the readers. We also expect many of these tips to hold for other machine learning-based applications in biology. Therefore, many computational biologists who develop methods in this particular subject will benefit from a concise overview of what to avoid and what to do instead.
Collapse
Affiliation(s)
- Qingzhen Hou
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Shandong, P. R. China
- National Institute of Health Data Science of China, Shandong University, Shandong, P. R. China
| | - Katharina Waury
- Department of Computer Science, Bioinformatics Group, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
| | - Dea Gogishvili
- Department of Computer Science, Bioinformatics Group, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
| | - K. Anton Feenstra
- Department of Computer Science, Bioinformatics Group, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
| |
Collapse
|
12
|
Waury K, Willemse EAJ, Vanmechelen E, Zetterberg H, Teunissen CE, Abeln S. Bioinformatics tools and data resources for assay development of fluid protein biomarkers. Biomark Res 2022; 10:83. [DOI: 10.1186/s40364-022-00425-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2022] [Accepted: 10/25/2022] [Indexed: 11/16/2022] Open
Abstract
AbstractFluid protein biomarkers are important tools in clinical research and health care to support diagnosis and to monitor patients. Especially within the field of dementia, novel biomarkers could address the current challenges of providing an early diagnosis and of selecting trial participants. While the great potential of fluid biomarkers is recognized, their implementation in routine clinical use has been slow. One major obstacle is the often unsuccessful translation of biomarker candidates from explorative high-throughput techniques to sensitive antibody-based immunoassays. In this review, we propose the incorporation of bioinformatics into the workflow of novel immunoassay development to overcome this bottleneck and thus facilitate the development of novel biomarkers towards clinical laboratory practice. Due to the rapid progress within the field of bioinformatics many freely available and easy-to-use tools and data resources exist which can aid the researcher at various stages. Current prediction methods and databases can support the selection of suitable biomarker candidates, as well as the choice of appropriate commercial affinity reagents. Additionally, we examine methods that can determine or predict the epitope - an antibody’s binding region on its antigen - and can help to make an informed choice on the immunogenic peptide used for novel antibody production. Selected use cases for biomarker candidates help illustrate the application and interpretation of the introduced tools.
Collapse
|
13
|
Capel H, Weiler R, Dijkstra M, Vleugels R, Bloem P, Feenstra KA. ProteinGLUE multi-task benchmark suite for self-supervised protein modeling. Sci Rep 2022; 12:16047. [PMID: 36163232 PMCID: PMC9512797 DOI: 10.1038/s41598-022-19608-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Accepted: 08/31/2022] [Indexed: 11/09/2022] Open
Abstract
Self-supervised language modeling is a rapidly developing approach for the analysis of protein sequence data. However, work in this area is heterogeneous and diverse, making comparison of models and methods difficult. Moreover, models are often evaluated only on one or two downstream tasks, making it unclear whether the models capture generally useful properties. We introduce the ProteinGLUE benchmark for the evaluation of protein representations: a set of seven per-amino-acid tasks for evaluating learned protein representations. We also offer reference code, and we provide two baseline models with hyperparameters specifically trained for these benchmarks. Pre-training was done on two tasks, masked symbol prediction and next sentence prediction. We show that pre-training yields higher performance on a variety of downstream tasks such as secondary structure and protein interaction interface prediction, compared to no pre-training. However, the larger base model does not outperform the smaller medium model. We expect the ProteinGLUE benchmark dataset introduced here, together with the two baseline pre-trained models and their performance evaluations, to be of great value to the field of protein sequence-based property prediction. Availability: code and datasets from https://github.com/ibivu/protein-glue .
Collapse
Affiliation(s)
- Henriette Capel
- Informatics Institute, Vrije Universiteit, 1081 HV, Amsterdam, The Netherlands
| | - Robin Weiler
- Informatics Institute, Vrije Universiteit, 1081 HV, Amsterdam, The Netherlands
| | - Maurits Dijkstra
- Informatics Institute, Vrije Universiteit, 1081 HV, Amsterdam, The Netherlands
| | - Reinier Vleugels
- Informatics Institute, Vrije Universiteit, 1081 HV, Amsterdam, The Netherlands
| | - Peter Bloem
- Informatics Institute, Vrije Universiteit, 1081 HV, Amsterdam, The Netherlands
| | - K Anton Feenstra
- Informatics Institute, Vrije Universiteit, 1081 HV, Amsterdam, The Netherlands.
| |
Collapse
|
14
|
Mavrina E, Kimble L, Waury K, Gogishvili D, Gómez de San José N, Das S, Coppens S, Fernandes Gomes B, Mravinacová S, Wojdała AL, Bolsewig K, Bayoumy S, Burtscher F, Mohaupt P, Willemse E, Teunissen C. Multi-Omics Interdisciplinary Research Integration to Accelerate Dementia Biomarker Development (MIRIADE). Front Neurol 2022; 13:890638. [PMID: 35903119 PMCID: PMC9315267 DOI: 10.3389/fneur.2022.890638] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Accepted: 05/31/2022] [Indexed: 11/13/2022] Open
Abstract
Proteomics studies have shown differential expression of numerous proteins in dementias but have rarely led to novel biomarker tests for clinical use. The Marie Curie MIRIADE project is designed to experimentally evaluate development strategies to accelerate the validation and ultimate implementation of novel biomarkers in clinical practice, using proteomics-based biomarker development for main dementias as experimental case studies. We address several knowledge gaps that have been identified in the field. First, there is the technology-translation gap of different technologies for the discovery (e.g., mass spectrometry) and the large-scale validation (e.g., immunoassays) of biomarkers. In addition, there is a limited understanding of conformational states of biomarker proteins in different matrices, which affect the selection of reagents for assay development. In this review, we aim to understand the decisions taken in the initial steps of biomarker development, which is done via an interim narrative update of the work of each ESR subproject. The results describe the decision process to shortlist biomarkers from a proteomics to develop immunoassays or mass spectrometry assays for Alzheimer's disease, Lewy body dementia, and frontotemporal dementia. In addition, we explain the approach to prepare the market implementation of novel biomarkers and assays. Moreover, we describe the development of computational protein state and interaction prediction models to support biomarker development, such as the prediction of epitopes. Lastly, we reflect upon activities involved in the biomarker development process to deduce a best-practice roadmap for biomarker development.
Collapse
Affiliation(s)
- Ekaterina Mavrina
- MIRIADE Consortium: Multiomics Interdisciplinary Research Integration to Address DEmentia Diagnosis,KIN Center for Digital Innovation, Vrije Universiteit Amsterdam, Amsterdam, Netherlands
| | - Leighann Kimble
- MIRIADE Consortium: Multiomics Interdisciplinary Research Integration to Address DEmentia Diagnosis,KIN Center for Digital Innovation, Vrije Universiteit Amsterdam, Amsterdam, Netherlands
| | - Katharina Waury
- MIRIADE Consortium: Multiomics Interdisciplinary Research Integration to Address DEmentia Diagnosis,Centre for Integrative Bioinformatics VU (IBIVU) – Center for Integrative Bioinformatics, Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, Netherlands
| | - Dea Gogishvili
- MIRIADE Consortium: Multiomics Interdisciplinary Research Integration to Address DEmentia Diagnosis,Centre for Integrative Bioinformatics VU (IBIVU) – Center for Integrative Bioinformatics, Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, Netherlands
| | - Nerea Gómez de San José
- MIRIADE Consortium: Multiomics Interdisciplinary Research Integration to Address DEmentia Diagnosis,Department of Neurology, University of Ulm, Ulm, Germany
| | - Shreyasee Das
- MIRIADE Consortium: Multiomics Interdisciplinary Research Integration to Address DEmentia Diagnosis,ADx NeuroSciences, Gent, Belgium
| | - Salomé Coppens
- MIRIADE Consortium: Multiomics Interdisciplinary Research Integration to Address DEmentia Diagnosis,National Measurement Laboratory at Laboratory of the Government Chemist (LGC), Teddington, United Kingdom
| | - Bárbara Fernandes Gomes
- MIRIADE Consortium: Multiomics Interdisciplinary Research Integration to Address DEmentia Diagnosis,Department of Psychiatry and Neurochemistry, Institute of Neuroscience and Physiology, the Sahlgrenska Academy at the University of Gothenburg, Mölndal, Sweden
| | - Sára Mravinacová
- MIRIADE Consortium: Multiomics Interdisciplinary Research Integration to Address DEmentia Diagnosis,Division of Affinity Proteomics, Department of Protein Science, Kungliga Tekniska Högskolan (KTH) Royal Institute of Technology, SciLifeLab, Stockholm, Sweden
| | - Anna Lidia Wojdała
- MIRIADE Consortium: Multiomics Interdisciplinary Research Integration to Address DEmentia Diagnosis,Laboratory of Clinical Neurochemistry, Department of Medicine and Surgery, University of Perugia, Perugia, Italy
| | - Katharina Bolsewig
- MIRIADE Consortium: Multiomics Interdisciplinary Research Integration to Address DEmentia Diagnosis,Neurochemistry Laboratory, Department of Clinical Chemistry, Amsterdam Neuroscience, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam, Netherlands
| | - Sherif Bayoumy
- MIRIADE Consortium: Multiomics Interdisciplinary Research Integration to Address DEmentia Diagnosis,Neurochemistry Laboratory, Department of Clinical Chemistry, Amsterdam Neuroscience, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam, Netherlands
| | - Felicia Burtscher
- MIRIADE Consortium: Multiomics Interdisciplinary Research Integration to Address DEmentia Diagnosis,Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Pablo Mohaupt
- MIRIADE Consortium: Multiomics Interdisciplinary Research Integration to Address DEmentia Diagnosis,Institute for Regenerative Medicine and Biotherapy - Plateforme de Protéomique Clinique (IRMB-PPC), Institute for Neurosciences of Montpellier (INM), Université de Montpellier, Centre Hospitalier Universitaire de Montpellier, Institut National de la Santé et de la Recherche Médicale (INSERM) Centre National de la Recherche Scientifique (CNRS), Montpellier, France
| | - Eline Willemse
- MIRIADE Consortium: Multiomics Interdisciplinary Research Integration to Address DEmentia Diagnosis,Neurochemistry Laboratory, Department of Clinical Chemistry, Amsterdam Neuroscience, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam, Netherlands
| | - Charlotte Teunissen
- MIRIADE Consortium: Multiomics Interdisciplinary Research Integration to Address DEmentia Diagnosis,Neurochemistry Laboratory, Department of Clinical Chemistry, Amsterdam Neuroscience, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam, Netherlands,*Correspondence: Charlotte Teunissen
| | | |
Collapse
|
15
|
Multi-task learning to leverage partially annotated data for PPI interface prediction. Sci Rep 2022; 12:10487. [PMID: 35729253 PMCID: PMC9213449 DOI: 10.1038/s41598-022-13951-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Accepted: 05/31/2022] [Indexed: 11/29/2022] Open
Abstract
Protein protein interactions (PPI) are crucial for protein functioning, nevertheless predicting residues in PPI interfaces from the protein sequence remains a challenging problem. In addition, structure-based functional annotations, such as the PPI interface annotations, are scarce: only for about one-third of all protein structures residue-based PPI interface annotations are available. If we want to use a deep learning strategy, we have to overcome the problem of limited data availability. Here we use a multi-task learning strategy that can handle missing data. We start with the multi-task model architecture, and adapted it to carefully handle missing data in the cost function. As related learning tasks we include prediction of secondary structure, solvent accessibility, and buried residue. Our results show that the multi-task learning strategy significantly outperforms single task approaches. Moreover, only the multi-task strategy is able to effectively learn over a dataset extended with structural feature data, without additional PPI annotations. The multi-task setup becomes even more important, if the fraction of PPI annotations becomes very small: the multi-task learner trained on only one-eighth of the PPI annotations—with data extension—reaches the same performances as the single-task learner on all PPI annotations. Thus, we show that the multi-task learning strategy can be beneficial for a small training dataset where the protein’s functional properties of interest are only partially annotated.
Collapse
|
16
|
Stringer B, de Ferrante H, Abeln S, Heringa J, Feenstra KA, Haydarlou R. PIPENN: protein interface prediction from sequence with an ensemble of neural nets. Bioinformatics 2022; 38:2111-2118. [PMID: 35150231 PMCID: PMC9004643 DOI: 10.1093/bioinformatics/btac071] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Revised: 01/16/2022] [Accepted: 02/04/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION The interactions between proteins and other molecules are essential to many biological and cellular processes. Experimental identification of interface residues is a time-consuming, costly and challenging task, while protein sequence data are ubiquitous. Consequently, many computational and machine learning approaches have been developed over the years to predict such interface residues from sequence. However, the effectiveness of different Deep Learning (DL) architectures and learning strategies for protein-protein, protein-nucleotide and protein-small molecule interface prediction has not yet been investigated in great detail. Therefore, we here explore the prediction of protein interface residues using six DL architectures and various learning strategies with sequence-derived input features. RESULTS We constructed a large dataset dubbed BioDL, comprising protein-protein interactions from the PDB, and DNA/RNA and small molecule interactions from the BioLip database. We also constructed six DL architectures, and evaluated them on the BioDL benchmarks. This shows that no single architecture performs best on all instances. An ensemble architecture, which combines all six architectures, does consistently achieve peak prediction accuracy. We confirmed these results on the published benchmark set by Zhang and Kurgan (ZK448), and on our own existing curated homo- and heteromeric protein interaction dataset. Our PIPENN sequence-based ensemble predictor outperforms current state-of-the-art sequence-based protein interface predictors on ZK448 on all interaction types, achieving an AUC-ROC of 0.718 for protein-protein, 0.823 for protein-nucleotide and 0.842 for protein-small molecule. AVAILABILITY AND IMPLEMENTATION Source code and datasets are available at https://github.com/ibivu/pipenn/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Hans de Ferrante
- Department of Computer Science, IBIVU—Center for Integrative Bioinformatics, Vrije Universiteit, 1081HV Amsterdam, The Netherlands
| | - Sanne Abeln
- Department of Computer Science, IBIVU—Center for Integrative Bioinformatics, Vrije Universiteit, 1081HV Amsterdam, The Netherlands
| | - Jaap Heringa
- Department of Computer Science, IBIVU—Center for Integrative Bioinformatics, Vrije Universiteit, 1081HV Amsterdam, The Netherlands
| | - K Anton Feenstra
- Department of Computer Science, IBIVU—Center for Integrative Bioinformatics, Vrije Universiteit, 1081HV Amsterdam, The Netherlands
| | | |
Collapse
|
17
|
Schneider C, Raybould MIJ, Deane CM. SAbDab in the age of biotherapeutics: updates including SAbDab-nano, the nanobody structure tracker. Nucleic Acids Res 2022; 50:D1368-D1372. [PMID: 34986602 PMCID: PMC8728266 DOI: 10.1093/nar/gkab1050] [Citation(s) in RCA: 63] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 10/14/2021] [Accepted: 10/22/2021] [Indexed: 11/26/2022] Open
Abstract
In 2013, we released the Structural Antibody Database (SAbDab), a publicly available repository of experimentally determined antibody structures. In the interim, the rapid increase in the number of antibody structure depositions to the Protein Data Bank, driven primarily by increased interest in antibodies as biotherapeutics, has led us to implement several improvements to the original database infrastructure. These include the development of SAbDab-nano, a sub-database that tracks nanobodies (heavy chain-only antibodies) which have seen a particular growth in attention from both the academic and pharmaceutical research communities over the past few years. Both SAbDab and SAbDab-nano are updated weekly, comprehensively annotated with the latest features described here, and are freely accessible at opig.stats.ox.ac.uk/webapps/newsabdab/.
Collapse
|