1
|
Papadopoulos AM, Axenopoulos A, Iatrou A, Stamatopoulos K, Alvarez F, Daras P. ParaSurf: a surface-based deep learning approach for paratope-antigen interaction prediction. Bioinformatics 2025; 41:btaf062. [PMID: 39921885 PMCID: PMC11855283 DOI: 10.1093/bioinformatics/btaf062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2024] [Revised: 01/14/2025] [Accepted: 02/03/2025] [Indexed: 02/10/2025] Open
Abstract
MOTIVATION Identifying antibody binding sites, is crucial for developing vaccines and therapeutic antibodies, processes that are time-consuming and costly. Accurate prediction of the paratope's binding site can speed up the development by improving our understanding of antibody-antigen interactions. RESULTS We present ParaSurf, a deep learning model that significantly enhances paratope prediction by incorporating both surface geometric and non-geometric factors. Trained and tested on three prominent antibody-antigen benchmarks, ParaSurf achieves state-of-the-art results across nearly all metrics. Unlike models restricted to the variable region, ParaSurf demonstrates the ability to accurately predict binding scores across the entire Fab region of the antibody. Additionally, we conducted an extensive analysis using the largest of the three datasets employed, focusing on three key components: (i) a detailed evaluation of paratope prediction for each complementarity-determining region loop, (ii) the performance of models trained exclusively on the heavy chain, and (iii) the results of training models solely on the light chain without incorporating data from the heavy chain. AVAILABILITY AND IMPLEMENTATION Source code for ParaSurf, along with the datasets used, preprocessing pipeline, and trained model weights, are freely available at https://github.com/aggelos-michael-papadopoulos/ParaSurf.
Collapse
Affiliation(s)
- Angelos-Michael Papadopoulos
- Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki 57001, Greece
- Universidad Politécnica de Madrid, Madrid 28040, Spain
| | - Apostolos Axenopoulos
- Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki 57001, Greece
- Aristotle University of Thessaloniki, Thessaloniki 54124, Greece
| | - Anastasia Iatrou
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki 57001, Greece
| | - Kostas Stamatopoulos
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki 57001, Greece
| | | | - Petros Daras
- Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki 57001, Greece
| |
Collapse
|
2
|
Richardson E, Trevizani R, Greenbaum JA, Carter H, Nielsen M, Peters B. The receiver operating characteristic curve accurately assesses imbalanced datasets. PATTERNS (NEW YORK, N.Y.) 2024; 5:100994. [PMID: 39005487 PMCID: PMC11240176 DOI: 10.1016/j.patter.2024.100994] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 03/05/2024] [Accepted: 05/03/2024] [Indexed: 07/16/2024]
Abstract
Many problems in biology require looking for a "needle in a haystack," corresponding to a binary classification where there are a few positives within a much larger set of negatives, which is referred to as a class imbalance. The receiver operating characteristic (ROC) curve and the associated area under the curve (AUC) have been reported as ill-suited to evaluate prediction performance on imbalanced problems where there is more interest in performance on the positive minority class, while the precision-recall (PR) curve is preferable. We show via simulation and a real case study that this is a misinterpretation of the difference between the ROC and PR spaces, showing that the ROC curve is robust to class imbalance, while the PR curve is highly sensitive to class imbalance. Furthermore, we show that class imbalance cannot be easily disentangled from classifier performance measured via PR-AUC.
Collapse
Affiliation(s)
- Eve Richardson
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA
| | - Raphael Trevizani
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA
- Fiocruz Ceará, Fundação Oswaldo Cruz, Rua São José s/n, Precabura, Eusébio/CE, Brazil
| | - Jason A Greenbaum
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA
| | - Hannah Carter
- Department of Medicine, University of California, La Jolla, CA, USA
| | - Morten Nielsen
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, Lyngby, Denmark
| | - Bjoern Peters
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA
| |
Collapse
|
3
|
Lin P, Li H, Huang SY. Deep learning in modeling protein complex structures: From contact prediction to end-to-end approaches. Curr Opin Struct Biol 2024; 85:102789. [PMID: 38402744 DOI: 10.1016/j.sbi.2024.102789] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 01/16/2024] [Accepted: 02/06/2024] [Indexed: 02/27/2024]
Abstract
Protein-protein interactions play crucial roles in many biological processes. Traditionally, protein complex structures are normally built by protein-protein docking. With the rapid development of artificial intelligence and its great success in monomer protein structure prediction, deep learning has widely been applied to modeling protein-protein complex structures through inter-protein contact prediction and end-to-end approaches in the past few years. This article reviews the recent advances of deep-learning-based approaches in modeling protein-protein complex structures as well as their advantages and limitations. Challenges and possible future directions are also briefly discussed in applying deep learning for the prediction of protein complex structures.
Collapse
Affiliation(s)
- Peicong Lin
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, PR China
| | - Hao Li
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, PR China
| | - Sheng-You Huang
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, PR China.
| |
Collapse
|
4
|
Abstract
The greatest challenge in drug discovery remains the high rate of attrition across the different phases of the process, which cost the industry billions of dollars every year. While all phases remain crucial to ensure pharmaceutical-level safety, quality, and efficacy of the end product, streamlining these efforts toward compounds with success potential is pivotal for a more efficient and cost-effective process. The use of artificial intelligence (AI) within the pharmaceutical industry aims at just this, and has applications in preclinical screening for biological activity, optimization of pharmacokinetic properties for improved drug formulation, early toxicity prediction which reduces attrition, and pre-emptively screening for genetic changes in the biological target to improve therapeutic longevity. Here, we present a series of in silico tools that address these applications in small molecule development and describe how they can be embedded within the current pharmaceutical development pipeline.
Collapse
Affiliation(s)
- Adam Serghini
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, QLD, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
| | - Stephanie Portelli
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, QLD, Australia.
| | - David B Ascher
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, QLD, Australia.
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia.
| |
Collapse
|
5
|
Liu T, Gao H, Ren X, Xu G, Liu B, Wu N, Luo H, Wang Y, Tu T, Yao B, Guan F, Teng Y, Huang H, Tian J. Protein-protein interaction and site prediction using transfer learning. Brief Bioinform 2023; 24:bbad376. [PMID: 37870286 DOI: 10.1093/bib/bbad376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 09/14/2023] [Accepted: 10/02/2023] [Indexed: 10/24/2023] Open
Abstract
The advanced language models have enabled us to recognize protein-protein interactions (PPIs) and interaction sites using protein sequences or structures. Here, we trained the MindSpore ProteinBERT (MP-BERT) model, a Bidirectional Encoder Representation from Transformers, using protein pairs as inputs, making it suitable for identifying PPIs and their respective interaction sites. The pretrained model (MP-BERT) was fine-tuned as MPB-PPI (MP-BERT on PPI) and demonstrated its superiority over the state-of-the-art models on diverse benchmark datasets for predicting PPIs. Moreover, the model's capability to recognize PPIs among various organisms was evaluated on multiple organisms. An amalgamated organism model was designed, exhibiting a high level of generalization across the majority of organisms and attaining an accuracy of 92.65%. The model was also customized to predict interaction site propensity by fine-tuning it with PPI site data as MPB-PPISP. Our method facilitates the prediction of both PPIs and their interaction sites, thereby illustrating the potency of transfer learning in dealing with the protein pair task.
Collapse
Affiliation(s)
- Tuoyu Liu
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Han Gao
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Xiaopu Ren
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Guoshun Xu
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Bo Liu
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Ningfeng Wu
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Huiying Luo
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Yuan Wang
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Tao Tu
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Bin Yao
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Feifei Guan
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Yue Teng
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Academy of Military Medical Sciences, Beijing 100071, China
| | - Huoqing Huang
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Jian Tian
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| |
Collapse
|
6
|
Lee M. Recent Advances in Deep Learning for Protein-Protein Interaction Analysis: A Comprehensive Review. Molecules 2023; 28:5169. [PMID: 37446831 DOI: 10.3390/molecules28135169] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 06/30/2023] [Accepted: 06/30/2023] [Indexed: 07/15/2023] Open
Abstract
Deep learning, a potent branch of artificial intelligence, is steadily leaving its transformative imprint across multiple disciplines. Within computational biology, it is expediting progress in the understanding of Protein-Protein Interactions (PPIs), key components governing a wide array of biological functionalities. Hence, an in-depth exploration of PPIs is crucial for decoding the intricate biological system dynamics and unveiling potential avenues for therapeutic interventions. As the deployment of deep learning techniques in PPI analysis proliferates at an accelerated pace, there exists an immediate demand for an exhaustive review that encapsulates and critically assesses these novel developments. Addressing this requirement, this review offers a detailed analysis of the literature from 2021 to 2023, highlighting the cutting-edge deep learning methodologies harnessed for PPI analysis. Thus, this review stands as a crucial reference for researchers in the discipline, presenting an overview of the recent studies in the field. This consolidation helps elucidate the dynamic paradigm of PPI analysis, the evolution of deep learning techniques, and their interdependent dynamics. This scrutiny is expected to serve as a vital aid for researchers, both well-established and newcomers, assisting them in maneuvering the rapidly shifting terrain of deep learning applications in PPI analysis.
Collapse
Affiliation(s)
- Minhyeok Lee
- School of Electrical and Electronics Engineering, Chung-Ang University, Seoul 06974, Republic of Korea
| |
Collapse
|