1
|
Bou Dagher L, Madern D, Malbos P, Brochier-Armanet C. Faithful Interpretation of Protein Structures through Weighted Persistent Homology Improves Evolutionary Distance Estimation. Mol Biol Evol 2025; 42:msae271. [PMID: 39761698 PMCID: PMC11789942 DOI: 10.1093/molbev/msae271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Revised: 12/02/2024] [Accepted: 12/20/2024] [Indexed: 02/05/2025] Open
Abstract
Phylogenetic inference is mainly based on sequence analysis and requires reliable alignments. This can be challenging, especially when sequences are highly divergent. In this context, the use of three-dimensional protein structures is a promising alternative. In a recent study, we introduced an original topological data analysis method based on persistent homology to estimate the evolutionary distances from structures. The method was successfully tested on 518 protein families representing 22,940 predicted structures. However, as anticipated, the reliability of the estimated evolutionary distances was impacted by the quality of the predicted structures and the presence of indels in the proteins. This paper introduces a new topological descriptor, called bio-topological marker (BTM), which provides a more faithful description of the structures, a topological analysis for estimating evolutionary distances from BTMs, and a new weight-filtering method adapted to protein structures. These new developments significantly improve the estimation of evolutionary distances and phylogenies inferred from structures.
Collapse
Affiliation(s)
- Léa Bou Dagher
- Universite Claude Bernard Lyon 1, LBBE, UMR 5558, CNRS, VAS, Villeurbanne F-69622, France
- Université Claude Bernard Lyon 1, CNRS, Institut Camille Jordan, UMR5208, Villeurbanne F-69622, France
- Laboratoire de mathématiques, École Doctorale en Science et Technologie, Université Libanaise, Post Box 5, Hadath, Liban
| | | | - Philippe Malbos
- Université Claude Bernard Lyon 1, CNRS, Institut Camille Jordan, UMR5208, Villeurbanne F-69622, France
| | - Céline Brochier-Armanet
- Universite Claude Bernard Lyon 1, LBBE, UMR 5558, CNRS, VAS, Villeurbanne F-69622, France
- Institut Universitaire de France
| |
Collapse
|
2
|
Li Y, Duan Z, Li Z, Xue W. Data and AI-driven synthetic binding protein discovery. Trends Pharmacol Sci 2025; 46:132-144. [PMID: 39755458 DOI: 10.1016/j.tips.2024.12.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2024] [Revised: 12/02/2024] [Accepted: 12/06/2024] [Indexed: 01/06/2025]
Abstract
Synthetic binding proteins (SBPs) are a class of protein binders that are artificially created and do not exist naturally. Their broad applications in tackling challenges of research, diagnostics, and therapeutics have garnered significant interest. Traditional protein engineering is pivotal to the discovery of SBPs. Recently, this discovery has been significantly accelerated by computational approaches, such as molecular modeling and artificial intelligence (AI). Furthermore, while numerous bioinformatics databases offer a wealth of resources that fuel SBP discovery, the full potential of these data has not yet been fully exploited. In this review, we present a comprehensive overview of SBP data ecosystem and methodologies in SBP discovery, highlighting the critical role of high-quality data and AI technologies in accelerating the discovery of innovative SBPs with promising applications in pharmacological sciences.
Collapse
Affiliation(s)
- Yanlin Li
- School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Zixin Duan
- School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Zhenwen Li
- School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Weiwei Xue
- School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China; Western (Chongqing) Collaborative Innovation Center for Intelligent Diagnostics and Digital Medicine, Chongqing National Biomedicine Industry Park, Chongqing 401329, China.
| |
Collapse
|
3
|
Mi Y, Marcu SB, Tabirca S, Yallapragada VV. PS-GO parametric protein search engine. Comput Struct Biotechnol J 2024; 23:1499-1509. [PMID: 38633387 PMCID: PMC11021831 DOI: 10.1016/j.csbj.2024.04.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 04/01/2024] [Accepted: 04/01/2024] [Indexed: 04/19/2024] Open
Abstract
With the explosive growth of protein-related data, we are confronted with a critical scientific inquiry: How can we effectively retrieve, compare, and profoundly comprehend these protein structures to maximize the utilization of such data resources? PS-GO, a parametric protein search engine, has been specifically designed and developed to maximize the utilization of the rapidly growing volume of protein-related data. This innovative tool addresses the critical need for effective retrieval, comparison, and deep understanding of protein structures. By integrating computational biology, bioinformatics, and data science, PS-GO is capable of managing large-scale data and accurately predicting and comparing protein structures and functions. The engine is built upon the concept of parametric protein design, a computer-aided method that adjusts and optimizes protein structures and sequences to achieve desired biological functions and structural stability. PS-GO utilizes key parameters such as amino acid sequence, side chain angle, and solvent accessibility, which have a significant influence on protein structure and function. Additionally, PS-GO leverages computable parameters, derived computationally, which are crucial for understanding and predicting protein behavior. The development of PS-GO underscores the potential of parametric protein design in a variety of applications, including enhancing enzyme activity, improving antibody affinity, and designing novel functional proteins. This advancement not only provides a robust theoretical foundation for the field of protein engineering and biotechnology but also offers practical guidelines for future progress in this domain.
Collapse
Affiliation(s)
- Yanlin Mi
- School of Computer Science and Information Technology, University College Cork, Cork, Ireland
- SFI Centre for Research Training in Artificial Intelligence, University College Cork, Cork, Ireland
| | - Stefan-Bogdan Marcu
- School of Computer Science and Information Technology, University College Cork, Cork, Ireland
| | - Sabin Tabirca
- School of Computer Science and Information Technology, University College Cork, Cork, Ireland
- Faculty of Mathematics and Informatics, Transylvania University of Brasov, Brasov, Romania
| | - Venkata V.B. Yallapragada
- Centre for Advanced Photonics and Process Analytics, Munster Technological University, Cork, Ireland
| |
Collapse
|
4
|
Cheng P, Mao C, Tang J, Yang S, Cheng Y, Wang W, Gu Q, Han W, Chen H, Li S, Chen Y, Zhou J, Li W, Pan A, Zhao S, Huang X, Zhu S, Zhang J, Shu W, Wang S. Zero-shot prediction of mutation effects with multimodal deep representation learning guides protein engineering. Cell Res 2024; 34:630-647. [PMID: 38969803 PMCID: PMC11369238 DOI: 10.1038/s41422-024-00989-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Accepted: 06/03/2024] [Indexed: 07/07/2024] Open
Abstract
Mutations in amino acid sequences can provoke changes in protein function. Accurate and unsupervised prediction of mutation effects is critical in biotechnology and biomedicine, but remains a fundamental challenge. To resolve this challenge, here we present Protein Mutational Effect Predictor (ProMEP), a general and multiple sequence alignment-free method that enables zero-shot prediction of mutation effects. A multimodal deep representation learning model embedded in ProMEP was developed to comprehensively learn both sequence and structure contexts from ~160 million proteins. ProMEP achieves state-of-the-art performance in mutational effect prediction and accomplishes a tremendous improvement in speed, enabling efficient and intelligent protein engineering. Specifically, ProMEP accurately forecasts mutational consequences on the gene-editing enzymes TnpB and TadA, and successfully guides the development of high-performance gene-editing tools with their engineered variants. The gene-editing efficiency of a 5-site mutant of TnpB reaches up to 74.04% (vs 24.66% for the wild type); and the base editing tool developed on the basis of a TadA 15-site mutant (in addition to the A106V/D108N double mutation that renders deoxyadenosine deaminase activity to TadA) exhibits an A-to-G conversion frequency of up to 77.27% (vs 69.80% for ABE8e, a previous TadA-based adenine base editor) with significantly reduced bystander and off-target effects compared to ABE8e. ProMEP not only showcases superior performance in predicting mutational effects on proteins but also demonstrates a great capability to guide protein engineering. Therefore, ProMEP enables efficient exploration of the gigantic protein space and facilitates practical design of proteins, thereby advancing studies in biomedicine and synthetic biology.
Collapse
Affiliation(s)
- Peng Cheng
- Bioinformatics Center of AMMS, Beijing, China
| | - Cong Mao
- State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Jin Tang
- Zhejiang Lab, Hangzhou, Zhejiang, China
| | - Sen Yang
- Bioinformatics Center of AMMS, Beijing, China
| | - Yu Cheng
- State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Wuke Wang
- Zhejiang Lab, Hangzhou, Zhejiang, China
| | - Qiuxi Gu
- State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Wei Han
- Zhejiang Lab, Hangzhou, Zhejiang, China
| | - Hao Chen
- State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Sihan Li
- State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | | | | | - Wuju Li
- Bioinformatics Center of AMMS, Beijing, China
| | - Aimin Pan
- Zhejiang Lab, Hangzhou, Zhejiang, China
| | - Suwen Zhao
- iHuman Institute, ShanghaiTech University, Shanghai, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| | - Xingxu Huang
- Zhejiang Lab, Hangzhou, Zhejiang, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| | | | - Jun Zhang
- State Key Laboratory of Reproductive Medicine and Offspring Health, Women's Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing Medical University, Nanjing, Jiangsu, China.
| | - Wenjie Shu
- Bioinformatics Center of AMMS, Beijing, China.
| | | |
Collapse
|
5
|
Xia Y, Pan X, Shen HB. Heterogeneous sampled subgraph neural networks with knowledge distillation to enhance double-blind compound-protein interaction prediction. Structure 2024; 32:611-620.e4. [PMID: 38447575 DOI: 10.1016/j.str.2024.02.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 12/18/2023] [Accepted: 02/08/2024] [Indexed: 03/08/2024]
Abstract
Identifying binding compounds against a target protein is crucial for large-scale virtual screening in drug development. Recently, network-based methods have been developed for compound-protein interaction (CPI) prediction. However, they are difficult to be applied to unseen (i.e., never-seen-before) proteins and compounds. In this study, we propose SgCPI to incorporate local known interacting networks to predict CPI interactions. SgCPI randomly samples the local CPI network of the query compound-protein pair as a subgraph and applies a heterogeneous graph neural network (HGNN) to embed the active/inactive message of the subgraph. For unseen compounds and proteins, SgCPI-KD takes SgCPI as the teacher model to distillate its knowledge by estimating the potential neighbors. Experimental results indicate: (1) the sampled subgraphs of the CPI network introduce efficient knowledge for unseen molecular prediction with the HGNNs, and (2) the knowledge distillation strategy is beneficial to the double-blind interaction prediction by estimating molecular neighbors and distilling knowledge.
Collapse
Affiliation(s)
- Ying Xia
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Xiaoyong Pan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China.
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China.
| |
Collapse
|
6
|
Bou Dagher L, Madern D, Malbos P, Brochier-Armanet C. Persistent homology reveals strong phylogenetic signal in 3D protein structures. PNAS NEXUS 2024; 3:pgae158. [PMID: 38689707 PMCID: PMC11058471 DOI: 10.1093/pnasnexus/pgae158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 04/01/2024] [Indexed: 05/02/2024]
Abstract
Changes that occur in proteins over time provide a phylogenetic signal that can be used to decipher their evolutionary history and the relationships between organisms. Sequence comparison is the most common way to access this phylogenetic signal, while those based on 3D structure comparisons are still in their infancy. In this study, we propose an effective approach based on Persistent Homology Theory (PH) to extract the phylogenetic information contained in protein structures. PH provides efficient and robust algorithms for extracting and comparing geometric features from noisy datasets at different spatial resolutions. PH has a growing number of applications in the life sciences, including the study of proteins (e.g. classification, folding). However, it has never been used to study the phylogenetic signal they may contain. Here, using 518 protein families, representing 22,940 protein sequences and structures, from 10 major taxonomic groups, we show that distances calculated with PH from protein structures correlate strongly with phylogenetic distances calculated from protein sequences, at both small and large evolutionary scales. We test several methods for calculating PH distances and propose some refinements to improve their relevance for addressing evolutionary questions. This work opens up new perspectives in evolutionary biology by proposing an efficient way to access the phylogenetic signal contained in protein structures, as well as future developments of topological analysis in the life sciences.
Collapse
Affiliation(s)
- Léa Bou Dagher
- Université Claude Bernard Lyon 1, CNRS, VetAgro Sup, Laboratoire de Biométrie et BiologieÉvolutive, UMR5558, F-69622 Villeurbanne, France
- Université Claude Bernard Lyon 1, CNRS, Institut Camille Jordan, UMR5208, F-69622 Villeurbanne, France
- Université Libanaise, Laboratoire de Mathématiques, École Doctorale en Science et Technologie, PO BOX 5 Hadath, Liban
| | - Dominique Madern
- University Grenoble Alpes, CEA, CNRS, IBS, 38000 Grenoble, France
| | - Philippe Malbos
- Université Claude Bernard Lyon 1, CNRS, Institut Camille Jordan, UMR5208, F-69622 Villeurbanne, France
| | - Céline Brochier-Armanet
- Université Claude Bernard Lyon 1, CNRS, VetAgro Sup, Laboratoire de Biométrie et BiologieÉvolutive, UMR5558, F-69622 Villeurbanne, France
| |
Collapse
|
7
|
Sun S, Gao L. Contrastive pre-training and 3D convolution neural network for RNA and small molecule binding affinity prediction. Bioinformatics 2024; 40:btae155. [PMID: 38507691 PMCID: PMC11007238 DOI: 10.1093/bioinformatics/btae155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 02/23/2024] [Accepted: 03/18/2024] [Indexed: 03/22/2024] Open
Abstract
MOTIVATION The diverse structures and functions inherent in RNAs present a wealth of potential drug targets. Some small molecules are anticipated to serve as leading compounds, providing guidance for the development of novel RNA-targeted therapeutics. Consequently, the determination of RNA-small molecule binding affinity is a critical undertaking in the landscape of RNA-targeted drug discovery and development. Nevertheless, to date, only one computational method for RNA-small molecule binding affinity prediction has been proposed. The prediction of RNA-small molecule binding affinity remains a significant challenge. The development of a computational model is deemed essential to effectively extract relevant features and predict RNA-small molecule binding affinity accurately. RESULTS In this study, we introduced RLaffinity, a novel deep learning model designed for the prediction of RNA-small molecule binding affinity based on 3D structures. RLaffinity integrated information from RNA pockets and small molecules, utilizing a 3D convolutional neural network (3D-CNN) coupled with a contrastive learning-based self-supervised pre-training model. To the best of our knowledge, RLaffinity was the first deep learning based method for the prediction of RNA-small molecule binding affinity. Our experimental results exhibited RLaffinity's superior performance compared to baseline methods, revealed by all metrics. The efficacy of RLaffinity underscores the capability of 3D-CNN to accurately extract both global pocket information and local neighbor nucleotide information within RNAs. Notably, the integration of a self-supervised pre-training model significantly enhanced predictive performance. Ultimately, RLaffinity was also proved as a potential tool for RNA-targeted drugs virtual screening. AVAILABILITY AND IMPLEMENTATION https://github.com/SaisaiSun/RLaffinity.
Collapse
Affiliation(s)
- Saisai Sun
- School of Computer Science and Technology, Xidian University, No.266 Xinglong Section of Xi Feng Road, Xi’an, Shaanxi, 710126, China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, No.266 Xinglong Section of Xi Feng Road, Xi’an, Shaanxi, 710126, China
| |
Collapse
|
8
|
Greener JG, Jamali K. Fast protein structure searching using structure graph embeddings. BIOINFORMATICS ADVANCES 2024; 5:vbaf042. [PMID: 40196750 PMCID: PMC11974391 DOI: 10.1093/bioadv/vbaf042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2025] [Revised: 02/11/2025] [Accepted: 03/03/2025] [Indexed: 04/09/2025]
Abstract
Comparing and searching protein structures independent of primary sequence has proved useful for remote homology detection, function annotation, and protein classification. Fast and accurate methods to search with structures will be essential to make use of the vast databases that have recently become available, in the same way that fast protein sequence searching underpins much of bioinformatics. We train a simple graph neural network using supervised contrastive learning to learn a low-dimensional embedding of protein domains. Availability and implementation The method, called Progres, is available as software at https://github.com/greener-group/progres and as a web server at https://progres.mrc-lmb.cam.ac.uk. It has accuracy comparable to the best current methods and can search the AlphaFold database TED domains in a 10th of a second per query on CPU.
Collapse
Affiliation(s)
- Joe G Greener
- Medical Research Council Laboratory of Molecular Biology, Cambridge, CB2 0QH, United Kingdom
| | - Kiarash Jamali
- Medical Research Council Laboratory of Molecular Biology, Cambridge, CB2 0QH, United Kingdom
| |
Collapse
|
9
|
Liu Z, Zhang C, Zhang Q, Zhang Y, Yu DJ. TM-search: An Efficient and Effective Tool for Protein Structure Database Search. J Chem Inf Model 2024; 64:1043-1049. [PMID: 38270339 DOI: 10.1021/acs.jcim.3c01455] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2024]
Abstract
The quickly increasing size of the Protein Data Bank is challenging biologists to develop a more scalable protein structure alignment tool for fast structure database search. Although many protein structure search algorithms and programs have been designed and implemented for this purpose, most require a large amount of computational time. We propose a novel protein structure search approach, TM-search, which is based on the pairwise structure alignment program TM-align and a new iterative clustering algorithm. Benchmark tests demonstrate that TM-search is 27 times faster than a TM-align full database search while still being able to identify ∼90% of all high TM-score hits, which is 2-10 times more than other existing programs such as Foldseek, Dali, and PSI-BLAST.
Collapse
Affiliation(s)
- Zi Liu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China
- Computer Department, Jingdezhen Ceramic University, Jingdezhen 333403, China
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw, Ann Arbor, Michigan 48109-2218, United States
| | - Qidi Zhang
- Computer Department, Jingdezhen Ceramic University, Jingdezhen 333403, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw, Ann Arbor, Michigan 48109-2218, United States
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China
| |
Collapse
|
10
|
Jiang Z, Shen YY, Liu R. Structure-based prediction of nucleic acid binding residues by merging deep learning- and template-based approaches. PLoS Comput Biol 2023; 19:e1011428. [PMID: 37672551 PMCID: PMC10482303 DOI: 10.1371/journal.pcbi.1011428] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 08/11/2023] [Indexed: 09/08/2023] Open
Abstract
Accurate prediction of nucleic binding residues is essential for the understanding of transcription and translation processes. Integration of feature- and template-based strategies could improve the prediction of these key residues in proteins. Nevertheless, traditional hybrid algorithms have been surpassed by recently developed deep learning-based methods, and the possibility of integrating deep learning- and template-based approaches to improve performance remains to be explored. To address these issues, we developed a novel structure-based integrative algorithm called NABind that can accurately predict DNA- and RNA-binding residues. A deep learning module was built based on the diversified sequence and structural descriptors and edge aggregated graph attention networks, while a template module was constructed by transforming the alignments between the query and its multiple templates into features for supervised learning. Furthermore, the stacking strategy was adopted to integrate the above two modules for improving prediction performance. Finally, a post-processing module dependent on the random walk algorithm was proposed to further correct the integrative predictions. Extensive evaluations indicated that our approach could not only achieve excellent performance on both native and predicted structures but also outperformed existing hybrid algorithms and recent deep learning methods. The NABind server is available at http://liulab.hzau.edu.cn/NABind/.
Collapse
Affiliation(s)
- Zheng Jiang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Yue-Yue Shen
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Rong Liu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China
| |
Collapse
|
11
|
Xia C, Feng SH, Xia Y, Pan X, Shen HB. Leveraging scaffold information to predict protein-ligand binding affinity with an empirical graph neural network. Brief Bioinform 2023; 24:6982728. [PMID: 36627113 DOI: 10.1093/bib/bbac603] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 11/01/2022] [Accepted: 12/08/2022] [Indexed: 01/12/2023] Open
Abstract
Protein-ligand binding affinity prediction is an important task in structural bioinformatics for drug discovery and design. Although various scoring functions (SFs) have been proposed, it remains challenging to accurately evaluate the binding affinity of a protein-ligand complex with the known bound structure because of the potential preference of scoring system. In recent years, deep learning (DL) techniques have been applied to SFs without sophisticated feature engineering. Nevertheless, existing methods cannot model the differential contribution of atoms in various regions of proteins, and the relationship between atom properties and intermolecular distance is also not fully explored. We propose a novel empirical graph neural network for accurate protein-ligand binding affinity prediction (EGNA). Graphs of protein, ligand and their interactions are constructed based on different regions of each bound complex. Proteins and ligands are effectively represented by graph convolutional layers, enabling the EGNA to capture interaction patterns precisely by simulating empirical SFs. The contributions of different factors on binding affinity can thus be transparently investigated. EGNA is compared with the state-of-the-art machine learning-based SFs on two widely used benchmark data sets. The results demonstrate the superiority of EGNA and its good generalization capability.
Collapse
Affiliation(s)
- Chunqiu Xia
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, 200240 Shanghai, China
| | - Shi-Hao Feng
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, 200240 Shanghai, China
| | - Ying Xia
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, 200240 Shanghai, China
| | - Xiaoyong Pan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, 200240 Shanghai, China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, 200240 Shanghai, China
| |
Collapse
|