1
|
Kawabata T, Kinoshita K. Assessing Structural Classification Using AlphaFold2 Models Through ECOD-Based Comparative Analysis. Proteins 2025. [PMID: 40251890 DOI: 10.1002/prot.26828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2024] [Revised: 03/27/2025] [Accepted: 03/30/2025] [Indexed: 04/21/2025]
Abstract
Identifying homologous proteins is a fundamental task in structural bioinformatics. While AlphaFold2 has revolutionized protein structure prediction, the extent to which structure comparison of its models can reliably detect homologs remains unclear. In this study, we evaluate the feasibility of homology detection using AlphaFold2-predicted structures through structural comparisons. We considered the classification of the ECOD database for experimental structures as the correct standard and obtained their corresponding predicted models from AlphaFoldDB. To ensure blind assessment, we divided the structures into test and train sets according to their release date. Predicted and experimental 3D structures in the test and train sets were compared using 3D structure comparisons (MATRAS, Dali, and Foldseek) and sequence comparisons (BLAST and HHsearch). The results were evaluated based on the homology annotations in the ECOD database. For top-1 accuracy, the performance of structural comparisons was comparable to that of HHsearch. However, when considering metrics that included all structural pairs, including more remote homology, structural comparisons outperformed HHsearch. No significant differences were observed between comparisons of experimental versus experimental, predicted versus experimental, and predicted versus predicted structures with pLDDT (prediction confidence) values greater than 60. We also demonstrate that predicted protein structures, determined by NMR, had lower pLDDT values and contained fewer coils than their experimental counterparts. These findings highlight the potential of AlphaFold2 models in structural classification and suggest that 3D structural searches should be conducted not only against the PDB but also against AlphaFoldDB to identify more potential homologs.
Collapse
Affiliation(s)
- Takeshi Kawabata
- Graduate School of Information Sciences, Tohoku University, Sendai, Japan
| | - Kengo Kinoshita
- Graduate School of Information Sciences, Tohoku University, Sendai, Japan
| |
Collapse
|
2
|
Wang J, Chen J, Hu Y, Song C, Li X, Qian Y, Deng L. DeepMFFGO: A Protein Function Prediction Method for Large-Scale Multifeature Fusion. J Chem Inf Model 2025; 65:3841-3853. [PMID: 40116538 DOI: 10.1021/acs.jcim.5c00062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/23/2025]
Abstract
Protein functional studies are crucial in the fields of drug target discovery and drug design. However, the existing methods have significant bottlenecks in utilizing multisource data fusion and Gene Ontology (GO) hierarchy. To this end, this study innovatively proposes the DeepMFFGO model designed for protein function prediction under large-scale multifeature fusion. A fine-tuning strategy using intermediate-level feature selection is proposed to reduce redundancy in protein sequences and mitigate distortion of the top-level features. A hierarchical progressive fusion structure is designed to explore feature connections, optimize complementarity through dynamic weight allocation, and reduce redundant interference. On the CAFA3 data set, the Fmax values of the DeepMFFGO model on the MF, BP, and CC ontologies reach 0.702, 0.599, and 0.704, respectively, which are improved by 4.2%, 2.4%, and 0.07%, respectively, compared with state-of-the-art multisource methods.
Collapse
Affiliation(s)
- Jingfu Wang
- School of Software, Xinjiang University, Urumqi 830091, China
- Xinjiang Engineering Research Center of Big Data and Intelligent Software, School of Software, Xinjiang University, Urumqi 830091, China
- Key Laboratory of Software Engineering, Xinjiang University, Urumqi 830091, China
| | - Jiaying Chen
- School of Software, Xinjiang University, Urumqi 830091, China
- Xinjiang Engineering Research Center of Big Data and Intelligent Software, School of Software, Xinjiang University, Urumqi 830091, China
- Key Laboratory of Software Engineering, Xinjiang University, Urumqi 830091, China
| | - Yue Hu
- School of Computer Science and Technology, Xinjiang University, Urumqi 830046, China
- Joint International Research Laboratory of Silk Road Multilingual Cognitive Computing, Xinjiang University, Urumqi, Xinjiang 830046, China
| | - Chaolin Song
- School of Software, Xinjiang University, Urumqi 830091, China
- Xinjiang Engineering Research Center of Big Data and Intelligent Software, School of Software, Xinjiang University, Urumqi 830091, China
- Key Laboratory of Software Engineering, Xinjiang University, Urumqi 830091, China
| | - Xinhui Li
- School of Computer Science and Technology, Xinjiang University, Urumqi 830046, China
- Joint International Research Laboratory of Silk Road Multilingual Cognitive Computing, Xinjiang University, Urumqi, Xinjiang 830046, China
| | - Yurong Qian
- Xinjiang Engineering Research Center of Big Data and Intelligent Software, School of Software, Xinjiang University, Urumqi 830091, China
- Key Laboratory of Software Engineering, Xinjiang University, Urumqi 830091, China
- School of Computer Science and Technology, Xinjiang University, Urumqi 830046, China
- Joint International Research Laboratory of Silk Road Multilingual Cognitive Computing, Xinjiang University, Urumqi, Xinjiang 830046, China
| | - Lei Deng
- School of Software, Xinjiang University, Urumqi 830091, China
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
3
|
Malhotra Y, John J, Yadav D, Sharma D, Vanshika, Rawal K, Mishra V, Chaturvedi N. Advancements in protein structure prediction: A comparative overview of AlphaFold and its derivatives. Comput Biol Med 2025; 188:109842. [PMID: 39970826 DOI: 10.1016/j.compbiomed.2025.109842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2024] [Revised: 02/07/2025] [Accepted: 02/10/2025] [Indexed: 02/21/2025]
Abstract
This review provides a comprehensive analysis of AlphaFold (AF) and its derivatives (AF2 and AF3) in protein structure prediction. These tools have revolutionized structural biology with their highly accurate predictions, driving progress in protein modeling, drug discovery, and the study of protein dynamics. Its exceptional accuracy has redefined our understanding of protein folding, which enables groundbreaking advancements in protein design, disease research and discusses future integration with experimental techniques. In addition, their achievement features, architectures, important case studies, and noteworthy effects in the field of biology and medicine were evaluated. In consideration of the fact that AF2 is a relatively recent innovation, it has already been taken into account in many studies that highlight its applications in many ways. Moreover, the limitations of AF2 that directed to the introduction of AF3 are also reported, which is a great improvement as it provides precise predictions of the structures and interactions of proteins, DNA, RNA, and ligands, thereby aiding in the understanding of the molecular level. Addressing current challenges and forecasting future developments, this work underscores the lasting significance of AF in reshaping the scientific landscape of protein research.
Collapse
Affiliation(s)
- Yuktika Malhotra
- Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, 201303, India
| | - Jerry John
- Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, 201303, India
| | - Deepika Yadav
- Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, 201303, India
| | - Deepshikha Sharma
- Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, 201303, India
| | - Vanshika
- Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, 201303, India
| | - Kamal Rawal
- Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, 201303, India
| | - Vaibhav Mishra
- Amity Institute of Microbial Technology, Amity University, Uttar Pradesh, 201303, India
| | - Navaneet Chaturvedi
- Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, 201303, India.
| |
Collapse
|
4
|
Bi X, Zhang S, Ma W, Jiang H, Wei Z. HiSIF-DTA: A Hierarchical Semantic Information Fusion Framework for Drug-Target Affinity Prediction. IEEE J Biomed Health Inform 2025; 29:1579-1590. [PMID: 37983161 DOI: 10.1109/jbhi.2023.3334239] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]
Abstract
Accurately identifying drug-target affinity (DTA) plays a significant role in promoting drug discovery and has attracted increasing attention in recent years. Exploring appropriate protein representation methods and increasing the abundance of protein information is critical in enhancing the accuracy of DTA prediction. Recently, numerous deep learning-based models have been proposed to utilize the sequential or structural features of target proteins. However, these models capture only the low-order semantics that exist in a single protein, while the high-order semantics abundant in biological networks are largely ignored. In this article, we propose HiSIF-DTA-a hierarchical semantic information fusion framework for DTA prediction. In this framework, a hierarchical protein graph is constructed that includes not only contact maps as low-order structural semantics but also protein-protein interaction (PPI) networks as high-order functional semantics. Particularly, two distinct hierarchical fusion strategies (i.e., Top-down and Bottom-Up) are designed to integrate the different protein semantics, therefore contributing to a richer protein representation. Comprehensive experimental results demonstrate that HiSIF-DTA outperforms current state -of-the-art methods for prediction on the benchmark datasets of the DTA task. Further validation on binary tasks and visualization analysis demonstrates the generalization and interpretation abilities of the proposed method.
Collapse
|
5
|
Zhai Z, Xu S, Ma W, Niu N, Qu C, Zong C. LGS-PPIS: A Local-Global Structural Information Aggregation Framework for Predicting Protein-Protein Interaction Sites. Proteins 2025; 93:716-727. [PMID: 39520116 DOI: 10.1002/prot.26763] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2024] [Revised: 10/20/2024] [Accepted: 10/22/2024] [Indexed: 11/16/2024]
Abstract
Exploring protein-protein interaction sites (PPIS) is of significance to elucidating the intrinsic mechanisms of diverse biological processes. On this basis, recent studies have applied deep learning-based technologies to overcome the high cost of wet experiments for PPIS determination. However, the existing methods still suffer from two limitations that remain to be solved. Firstly, the process of feature aggregation in most methods only took into account node features, but ignored the complex edge features of the target residue to its neighbor residues, resulting in insufficient local feature extraction. Secondly, such feature aggregation was limited to aggregating spatially adjacent residues, and could not capture the "remote" residues that played a critical role in determining PPIS, which can be summed up as the lack of global feature at the residue level. To break the above limitations, a local-global structural information aggregation framework, LGS-PPIS, was proposed in this study, including two modules of edge-aware graph convolutional network (EA-GCN) and self-attention integrated with initial residual and identity mapping (SA-RIM), which achieved the aggregation of local and global information for PPIS prediction. Evaluation results of LGS-PPIS showed that the proposed method outperformed state-of-the-art deep learning methods on three widely used PPIS prediction benchmarks. Besides, the results of ablation experiments demonstrated that the local features from spatially adjacent residues and global features from "remote" residues separately captured by EA-GCN and SA-RIM could benefit the model performance. Among them, the former was shown to have a more significant role in the PPIS prediction.
Collapse
Affiliation(s)
- Zhengli Zhai
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, China
| | - Shiya Xu
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, China
| | - Wenjian Ma
- College of Computer Science and Technology, Ocean University of China, Qingdao, China
| | - Niuwangjie Niu
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, China
| | - Chunyu Qu
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, China
| | - Chao Zong
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, China
| |
Collapse
|
6
|
Chafer-Dolz B, Cecilia JM, Imbernón B, Núñez-Delicado E, Casaña-Giner V, Cerón-Carrasco JP. Discovery of novel acetylcholinesterase inhibitors through AI-powered structure prediction and high-performance computing-enhanced virtual screening. RSC Adv 2025; 15:4262-4273. [PMID: 39926230 PMCID: PMC11804414 DOI: 10.1039/d4ra07951e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2024] [Accepted: 01/29/2025] [Indexed: 02/11/2025] Open
Abstract
Virtual screening (VS) methodologies have become key in the drug discovery process but are also applicable to other fields including catalysis, material design, and, more recently, insecticide solutions. Indeed, the search for effective pest control agents is a critical industrial objective, driven by the need to meet stringent regulations and address public health concerns. Cockroaches, known vectors of numerous diseases, represent a major challenge due to the toxicity of existing control measures to humans. In this article, we leverage an Artificial Intelligence (AI)-based screening of the Drug Bank (DB) database to identify novel acetylcholinesterase (AChE) inhibitors, a previously uncharacterized target in the American cockroach (Periplaneta americana). Our AI-based VS pipeline starts with the deep-learning-based AlphaFold to predict the previously unknown 3D structure of AChE based on its amino acid sequence. This first step enables the subsequent ligand-receptor VS of potential inhibitors, the development of which is performed using a consensus VS protocol based on two different tools: Glide, an industry-leading solution, and METADOCK 2, a metaheuristic-based tool that takes advantage of GPU acceleration. The proposed VS pipeline is further refined through rescoring to pinpoint the most promising biocide compounds against cockroaches. We show the search space explored by different metaheuristics generated by METADOCK 2 and how this search is more exhaustive, but complementary, than the one offered by Glide. Finally, we applied Molecular Mechanics Generalized Born Surface Area (MMGBSA) to list the most promising compounds to inhibit the AChE enzyme.
Collapse
Affiliation(s)
| | - José M Cecilia
- Universitat Politécnica de Valencia (UPV) Camino de Vera S/N Valencia 46022 Spain
| | - Baldomero Imbernón
- Universidad Católica de Murcia (UCAM) Campus de los Jerónimos Murcia 30107 Spain
| | | | | | - José P Cerón-Carrasco
- Centro Universitario de la Defensa, Academia General del Aire, Universidad Politécnica de Cartagena C/Coronel López Peña s/n 30729, Santiago de la Ribera Murcia Spain
| |
Collapse
|
7
|
Zhai S, Liu T, Lin S, Li D, Liu H, Yao X, Hou T. Artificial intelligence in peptide-based drug design. Drug Discov Today 2025; 30:104300. [PMID: 39842504 DOI: 10.1016/j.drudis.2025.104300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2024] [Revised: 01/14/2025] [Accepted: 01/15/2025] [Indexed: 01/24/2025]
Abstract
Protein-protein interactions (PPIs) are fundamental to a variety of biological processes, but targeting them with small molecules is challenging because of their large and complex interaction interfaces. However, peptides have emerged as highly promising modulators of PPIs, because they can bind to protein surfaces with high affinity and specificity. Nonetheless, computational peptide design remains difficult, hindered by the intrinsic flexibility of peptides and the substantial computational resources required. Recent advances in artificial intelligence (AI) are paving new paths for peptide-based drug design. In this review, we explore the advanced deep generative models for designing target-specific peptide binders, highlight key challenges, and offer insights into the future direction of this rapidly evolving field.
Collapse
Affiliation(s)
- Silong Zhai
- Faculty of Applied Science, Macao Polytechnic University, 999078, Macao; College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Tiantao Liu
- Faculty of Applied Science, Macao Polytechnic University, 999078, Macao
| | - Shaolong Lin
- Faculty of Applied Science, Macao Polytechnic University, 999078, Macao
| | - Dan Li
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Huanxiang Liu
- Faculty of Applied Science, Macao Polytechnic University, 999078, Macao
| | - Xiaojun Yao
- Faculty of Applied Science, Macao Polytechnic University, 999078, Macao.
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China.
| |
Collapse
|
8
|
Chen JY, Wang JF, Hu Y, Li XH, Qian YR, Song CL. Evaluating the advancements in protein language models for encoding strategies in protein function prediction: a comprehensive review. Front Bioeng Biotechnol 2025; 13:1506508. [PMID: 39906415 PMCID: PMC11790633 DOI: 10.3389/fbioe.2025.1506508] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2024] [Accepted: 01/02/2025] [Indexed: 02/06/2025] Open
Abstract
Protein function prediction is crucial in several key areas such as bioinformatics and drug design. With the rapid progress of deep learning technology, applying protein language models has become a research focus. These models utilize the increasing amount of large-scale protein sequence data to deeply mine its intrinsic semantic information, which can effectively improve the accuracy of protein function prediction. This review comprehensively combines the current status of applying the latest protein language models in protein function prediction. It provides an exhaustive performance comparison with traditional prediction methods. Through the in-depth analysis of experimental results, the significant advantages of protein language models in enhancing the accuracy and depth of protein function prediction tasks are fully demonstrated.
Collapse
Affiliation(s)
- Jia-Ying Chen
- School of Software, Xinjiang University, Urumqi, China
- Key Laboratory of Software Engineering, Xinjiang University, Urumqi, China
- Key Laboratory of Signal Detection and Processing in Xinjiang Uygur Autonomous Region, Xinjiang University, Urumqi, China
| | - Jing-Fu Wang
- School of Software, Xinjiang University, Urumqi, China
- Key Laboratory of Software Engineering, Xinjiang University, Urumqi, China
- Key Laboratory of Signal Detection and Processing in Xinjiang Uygur Autonomous Region, Xinjiang University, Urumqi, China
| | - Yue Hu
- School of Software, Xinjiang University, Urumqi, China
- Key Laboratory of Software Engineering, Xinjiang University, Urumqi, China
- Key Laboratory of Signal Detection and Processing in Xinjiang Uygur Autonomous Region, Xinjiang University, Urumqi, China
| | - Xin-Hui Li
- School of Software, Xinjiang University, Urumqi, China
- Key Laboratory of Software Engineering, Xinjiang University, Urumqi, China
- Key Laboratory of Signal Detection and Processing in Xinjiang Uygur Autonomous Region, Xinjiang University, Urumqi, China
| | - Yu-Rong Qian
- Key Laboratory of Software Engineering, Xinjiang University, Urumqi, China
- Key Laboratory of Signal Detection and Processing in Xinjiang Uygur Autonomous Region, Xinjiang University, Urumqi, China
- School of Computer Science and Technology, Xinjiang University, Urumqi, China
| | - Chao-Lin Song
- School of Software, Xinjiang University, Urumqi, China
- Key Laboratory of Software Engineering, Xinjiang University, Urumqi, China
- Key Laboratory of Signal Detection and Processing in Xinjiang Uygur Autonomous Region, Xinjiang University, Urumqi, China
| |
Collapse
|
9
|
Buller R, Damborsky J, Hilvert D, Bornscheuer UT. Structure Prediction and Computational Protein Design for Efficient Biocatalysts and Bioactive Proteins. Angew Chem Int Ed Engl 2025; 64:e202421686. [PMID: 39584560 DOI: 10.1002/anie.202421686] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2024] [Revised: 11/22/2024] [Accepted: 11/25/2024] [Indexed: 11/26/2024]
Abstract
The ability to predict and design protein structures has led to numerous applications in medicine, diagnostics and sustainable chemical manufacture. In addition, the wealth of predicted protein structures has advanced our understanding of how life's molecules function and interact. Honouring the work that has fundamentally changed the way scientists research and engineer proteins, the Nobel Prize in Chemistry in 2024 was awarded to David Baker for computational protein design and jointly to Demis Hassabis and John Jumper, who developed AlphaFold for machine-learning-based protein structure prediction. Here, we highlight notable contributions to the development of these computational tools and their importance for the design of functional proteins that are applied in organic synthesis. Notably, both technologies have the potential to impact drug discovery as any therapeutic protein target can now be modelled, allowing the de novo design of peptide binders and the identification of small molecule ligands through in silico docking of large compound libraries. Looking ahead, we highlight future research directions in protein engineering, medicinal chemistry and material design that are enabled by this transformative shift in protein science.
Collapse
Affiliation(s)
- Rebecca Buller
- Competence Center for Biocatalysis, Institute of Chemistry and Biotechnology, Zurich University of Applied Sciences, Einsiedlerstrasse 31, 8820, Wädenswil, Switzerland
| | - Jiri Damborsky
- Loschmidt Laboratories, Dept. of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5, 625 00, Brno, Czech Republic
- International Clinical Research Centre, St. Anne's University Hospital, Pekarska 53, Brno, Czech Republic
| | - Donald Hilvert
- Laboratory of Organic Chemistry, ETH Zürich, 8093, Zürich, Switzerland
| | - Uwe T Bornscheuer
- Biotechnology & Enzyme Catalysis, Institute of Biochemistry, University of Greifswald, Felix-Hausdorff-Str. 4, 17489, Greifswald, Germany
| |
Collapse
|
10
|
Genc AG, McGuffin LJ. Beyond AlphaFold2: The Impact of AI for the Further Improvement of Protein Structure Prediction. Methods Mol Biol 2025; 2867:121-139. [PMID: 39576578 DOI: 10.1007/978-1-0716-4196-5_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2024]
Abstract
Protein structure prediction is fundamental to molecular biology and has numerous applications in areas such as drug discovery and protein engineering. Machine learning techniques have greatly advanced protein 3D modeling in recent years, particularly with the development of AlphaFold2 (AF2), which can analyze sequences of amino acids and predict 3D structures with near experimental accuracy. Since the release of AF2, numerous studies have been conducted, either using AF2 directly for large-scale modeling or building upon the software for other use cases. Many reviews have been published discussing the impact of AF2 in the field of protein bioinformatics, particularly in relation to neural networks, which have highlighted what AF2 can and cannot do. It is evident that AF2 and similar approaches are open to further development and several new approaches have emerged, in addition to older refinement approaches, for improving the quality of predictions. Here we provide a brief overview, aimed at the general biologist, of how machine learning techniques have been used for improvement of 3D models of proteins following AF2, and we highlight the impacts of these approaches. In the most recent experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP15), the most successful groups all developed their own tools for protein structure modeling that were based at least in some part on AF2. This improvement involved employing techniques such as generative modeling, changing parameters such as dropout to generate more AF2 structures, and data-driven approaches including using alternative templates and MSAs.
Collapse
Affiliation(s)
| | - Liam J McGuffin
- School of Biological Sciences, University of Reading, Reading, UK.
| |
Collapse
|
11
|
Ma W, Bi X, Jiang H, Wei Z, Zhang S. Annotating protein functions via fusing multiple biological modalities. Commun Biol 2024; 7:1705. [PMID: 39730886 DOI: 10.1038/s42003-024-07411-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Accepted: 12/17/2024] [Indexed: 12/29/2024] Open
Abstract
Understanding the function of proteins is of great significance for revealing disease pathogenesis and discovering new targets. Benefiting from the explosive growth of the protein universal, deep learning has been applied to accelerate the protein annotation cycle from different biological modalities. However, most existing deep learning-based methods not only fail to effectively fuse different biological modalities, resulting in low-quality protein representations, but also suffer from the convergence of suboptimal solution caused by sparse label representations. Aiming at the above issue, we propose a multiprocedural approach for fusing heterogeneous biological modalities and annotating protein functions, i.e., MIF2GO (Multimodal Information Fusion to infer Gene Ontology terms), which sequentially fuses up to six biological modalities ranging from different biological levels in three steps, thus leading to powerful protein representations. Evaluation results on seven benchmark datasets show that the proposed method not only considerably outperforms state-of-the-art performance, but also demonstrates great robustness and generalizability across species. Besides, we also present biological insights into the associations between those modalities and protein functions. This research provides a robust framework for integrating multimodal biological data, offering a scalable solution for protein function annotation, ultimately facilitating advancements in precision medicine and the discovery of novel therapeutic strategies.
Collapse
Affiliation(s)
- Wenjian Ma
- College of Computer Science and Technology, Ocean University of China, Qingdao, China
| | - Xiangpeng Bi
- College of Computer Science and Technology, Ocean University of China, Qingdao, China
| | - Huasen Jiang
- College of Computer Science and Technology, Ocean University of China, Qingdao, China
| | - Zhiqiang Wei
- College of Computer Science and Technology, Ocean University of China, Qingdao, China
| | - Shugang Zhang
- College of Computer Science and Technology, Ocean University of China, Qingdao, China.
| |
Collapse
|
12
|
Vu TTD, Kim J, Jung J. An experimental analysis of graph representation learning for Gene Ontology based protein function prediction. PeerJ 2024; 12:e18509. [PMID: 39553733 PMCID: PMC11569786 DOI: 10.7717/peerj.18509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2024] [Accepted: 10/21/2024] [Indexed: 11/19/2024] Open
Abstract
Understanding protein function is crucial for deciphering biological systems and facilitating various biomedical applications. Computational methods for predicting Gene Ontology functions of proteins emerged in the 2000s to bridge the gap between the number of annotated proteins and the rapidly growing number of newly discovered amino acid sequences. Recently, there has been a surge in studies applying graph representation learning techniques to biological networks to enhance protein function prediction tools. In this review, we provide fundamental concepts in graph embedding algorithms. This study described graph representation learning methods for protein function prediction based on four principal data categories, namely PPI network, protein structure, Gene Ontology graph, and integrated graph. The commonly used approaches for each category were summarized and diagrammed, with the specific results of each method explained in detail. Finally, existing limitations and potential solutions were discussed, and directions for future research within the protein research community were suggested.
Collapse
Affiliation(s)
- Thi Thuy Duong Vu
- Faculty of Fundamental Sciences, University of Medicine and Pharmacy at Ho Chi Minh City, Ho Chi Minh City, Vietnam
| | - Jeongho Kim
- Department of Information and Communication Engineering, Myongji University, Yongin, Republic of South Korea
| | - Jaehee Jung
- Department of Information and Communication Engineering, Myongji University, Yongin, Republic of South Korea
| |
Collapse
|
13
|
de Brevern AG. Special Issue: "Molecular Dynamics Simulations and Structural Analysis of Protein Domains". Int J Mol Sci 2024; 25:10793. [PMID: 39409122 PMCID: PMC11477144 DOI: 10.3390/ijms251910793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2024] [Revised: 09/30/2024] [Accepted: 10/07/2024] [Indexed: 10/20/2024] Open
Abstract
The 3D protein structure is the basis for all their biological functions [...].
Collapse
Affiliation(s)
- Alexandre G. de Brevern
- DSIMB Bioinformatics Team, BIGR, INSERM, Université Paris Cité, F-75015 Paris, France; ; Tel.: +33-1-4449-3000
- DSIMB Bioinformatics Team, BIGR, INSERM, Université de la Réunion, F-97715 Saint Denis, France
| |
Collapse
|
14
|
Yu Z, Yu J, Wang H, Zhang S, Zhao L, Shi S. PhosAF: An integrated deep learning architecture for predicting protein phosphorylation sites with AlphaFold2 predicted structures. Anal Biochem 2024; 690:115510. [PMID: 38513769 DOI: 10.1016/j.ab.2024.115510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2023] [Revised: 03/14/2024] [Accepted: 03/18/2024] [Indexed: 03/23/2024]
Abstract
Phosphorylation is indispensable in comprehending biological processes, while biological experimental methods for identifying phosphorylation sites are tedious and arduous. With the rapid growth of biotechnology, deep learning methods have made significant progress in site prediction tasks. Nevertheless, most existing predictors only consider protein sequence information, that limits the capture of protein spatial information. Building upon the latest advancement in protein structure prediction by AlphaFold2, a novel integrated deep learning architecture PhosAF is developed to predict phosphorylation sites in human proteins by integrating CMA-Net and MFC-Net, which considers sequence and structure information predicted by AlphaFold2. Here, CMA-Net module is composed of multiple convolutional neural network layers and multi-head attention is appended to obtaining the local and long-term dependencies of sequence features. Meanwhile, the MFC-Net module composed of deep neural network layers is used to capture the complex representations of evolutionary and structure features. Furthermore, different features are combined to predict the final phosphorylation sites. In addition, we put forward a new strategy to construct reliable negative samples via protein secondary structures. Experimental results on independent test data and case study indicate that our model PhosAF surpasses the current most advanced methods in phosphorylation site prediction.
Collapse
Affiliation(s)
- Ziyuan Yu
- Department of Mathematics, School of Mathematics and Computer Sciences, Nanchang University, Nanchang, 330031, China.
| | - Jialin Yu
- Department of Mathematics, School of Mathematics and Computer Sciences, Nanchang University, Nanchang, 330031, China.
| | - Hongmei Wang
- Department of Mathematics, School of Mathematics and Computer Sciences, Nanchang University, Nanchang, 330031, China.
| | - Shuai Zhang
- Department of Mathematics, School of Mathematics and Computer Sciences, Nanchang University, Nanchang, 330031, China.
| | - Long Zhao
- Department of Mathematics, School of Mathematics and Computer Sciences, Nanchang University, Nanchang, 330031, China.
| | - Shaoping Shi
- Department of Mathematics, School of Mathematics and Computer Sciences, Nanchang University, Nanchang, 330031, China; Institute of Mathematics and Interdisciplinary Sciences, Nanchang University, Nanchang, 330031, China.
| |
Collapse
|
15
|
Ebrahimikondori H, Sutherland D, Yanai A, Richter A, Salehi A, Li C, Coombe L, Kotkoff M, Warren RL, Birol I. Structure-aware deep learning model for peptide toxicity prediction. Protein Sci 2024; 33:e5076. [PMID: 39196703 PMCID: PMC11193153 DOI: 10.1002/pro.5076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 04/26/2024] [Accepted: 05/28/2024] [Indexed: 08/30/2024]
Abstract
Antimicrobial resistance is a critical public health concern, necessitating the exploration of alternative treatments. While antimicrobial peptides (AMPs) show promise, assessing their toxicity using traditional wet lab methods is both time-consuming and costly. We introduce tAMPer, a novel multi-modal deep learning model designed to predict peptide toxicity by integrating the underlying amino acid sequence composition and the three-dimensional structure of peptides. tAMPer adopts a graph-based representation for peptides, encoding ColabFold-predicted structures, where nodes represent amino acids and edges represent spatial interactions. Structural features are extracted using graph neural networks, and recurrent neural networks capture sequential dependencies. tAMPer's performance was assessed on a publicly available protein toxicity benchmark and an AMP hemolysis data we generated. On the latter, tAMPer achieves an F1-score of 68.7%, outperforming the second-best method by 23.4%. On the protein benchmark, tAMPer exhibited an improvement of over 3.0% in the F1-score compared to current state-of-the-art methods. We anticipate tAMPer to accelerate AMP discovery and development by reducing the reliance on laborious toxicity screening experiments.
Collapse
Affiliation(s)
- Hossein Ebrahimikondori
- Canada's Michael Smith Genome Sciences CentreBC Cancer AgencyVancouverBritish ColumbiaCanada
- Bioinformatics Graduate ProgramUniversity of British ColumbiaVancouverBritish ColumbiaCanada
| | - Darcy Sutherland
- Canada's Michael Smith Genome Sciences CentreBC Cancer AgencyVancouverBritish ColumbiaCanada
- Public Health LaboratoryBritish Columbia Centre for Disease ControlVancouverBritish ColumbiaCanada
- Department of Pathology and Laboratory MedicineUniversity of British ColumbiaVancouverBritish ColumbiaCanada
| | - Anat Yanai
- Canada's Michael Smith Genome Sciences CentreBC Cancer AgencyVancouverBritish ColumbiaCanada
- Public Health LaboratoryBritish Columbia Centre for Disease ControlVancouverBritish ColumbiaCanada
| | - Amelia Richter
- Canada's Michael Smith Genome Sciences CentreBC Cancer AgencyVancouverBritish ColumbiaCanada
- Public Health LaboratoryBritish Columbia Centre for Disease ControlVancouverBritish ColumbiaCanada
| | - Ali Salehi
- Canada's Michael Smith Genome Sciences CentreBC Cancer AgencyVancouverBritish ColumbiaCanada
- Public Health LaboratoryBritish Columbia Centre for Disease ControlVancouverBritish ColumbiaCanada
| | - Chenkai Li
- Canada's Michael Smith Genome Sciences CentreBC Cancer AgencyVancouverBritish ColumbiaCanada
- Bioinformatics Graduate ProgramUniversity of British ColumbiaVancouverBritish ColumbiaCanada
| | - Lauren Coombe
- Canada's Michael Smith Genome Sciences CentreBC Cancer AgencyVancouverBritish ColumbiaCanada
| | - Monica Kotkoff
- Canada's Michael Smith Genome Sciences CentreBC Cancer AgencyVancouverBritish ColumbiaCanada
| | - René L. Warren
- Canada's Michael Smith Genome Sciences CentreBC Cancer AgencyVancouverBritish ColumbiaCanada
| | - Inanc Birol
- Canada's Michael Smith Genome Sciences CentreBC Cancer AgencyVancouverBritish ColumbiaCanada
- Public Health LaboratoryBritish Columbia Centre for Disease ControlVancouverBritish ColumbiaCanada
- Department of Pathology and Laboratory MedicineUniversity of British ColumbiaVancouverBritish ColumbiaCanada
- Department of Medical GeneticsUniversity of British ColumbiaVancouverBritish ColumbiaCanada
| |
Collapse
|
16
|
Lyu J, Kapolka N, Gumpper R, Alon A, Wang L, Jain MK, Barros-Álvarez X, Sakamoto K, Kim Y, DiBerto J, Kim K, Glenn IS, Tummino TA, Huang S, Irwin JJ, Tarkhanova OO, Moroz Y, Skiniotis G, Kruse AC, Shoichet BK, Roth BL. AlphaFold2 structures guide prospective ligand discovery. Science 2024; 384:eadn6354. [PMID: 38753765 PMCID: PMC11253030 DOI: 10.1126/science.adn6354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Accepted: 04/24/2024] [Indexed: 05/18/2024]
Abstract
AlphaFold2 (AF2) models have had wide impact but mixed success in retrospective ligand recognition. We prospectively docked large libraries against unrefined AF2 models of the σ2 and serotonin 2A (5-HT2A) receptors, testing hundreds of new molecules and comparing results with those obtained from docking against the experimental structures. Hit rates were high and similar for the experimental and AF2 structures, as were affinities. Success in docking against the AF2 models was achieved despite differences between orthosteric residue conformations in the AF2 models and the experimental structures. Determination of the cryo-electron microscopy structure for one of the more potent 5-HT2A ligands from the AF2 docking revealed residue accommodations that resembled the AF2 prediction. AF2 models may sample conformations that differ from experimental structures but remain low energy and relevant for ligand discovery, extending the domain of structure-based drug design.
Collapse
Affiliation(s)
- Jiankun Lyu
- Department of Pharmaceutical Chemistry, University of California, San Francisco, CA 94158, USA
- The Evnin Family Laboratory of Computational Molecular Discovery, The Rockefeller University, New York, NY 10065, USA
| | - Nicholas Kapolka
- Department of Pharmacology, University of North Carolina at Chapel Hill School of Medicine, Chapel Hill, NC 27599, USA
| | - Ryan Gumpper
- Department of Pharmacology, University of North Carolina at Chapel Hill School of Medicine, Chapel Hill, NC 27599, USA
| | - Assaf Alon
- Department of Biological Chemistry and Molecular Pharmacology, Blavatnik Institute, Harvard Medical School, Boston, MA 02115, USA
| | - Liang Wang
- Department of Molecular and Cellular Physiology, Stanford University School of Medicine, Stanford, CA 94035, USA
| | - Manish K. Jain
- Department of Pharmacology, University of North Carolina at Chapel Hill School of Medicine, Chapel Hill, NC 27599, USA
| | - Ximena Barros-Álvarez
- Department of Molecular and Cellular Physiology, Stanford University School of Medicine, Stanford, CA 94035, USA
| | - Kensuke Sakamoto
- Department of Pharmacology, University of North Carolina at Chapel Hill School of Medicine, Chapel Hill, NC 27599, USA
- National Institute of Mental Health Psychoactive Drug Screening Program (NIMH PDSP), School of Medicine, University of North Carolina at Chapel Hill School of Medicine, Chapel Hill, NC 27599, USA
| | - Yoojoong Kim
- Department of Pharmacology, University of North Carolina at Chapel Hill School of Medicine, Chapel Hill, NC 27599, USA
| | - Jeffrey DiBerto
- Department of Pharmacology, University of North Carolina at Chapel Hill School of Medicine, Chapel Hill, NC 27599, USA
| | - Kuglae Kim
- Department of Pharmacology, University of North Carolina at Chapel Hill School of Medicine, Chapel Hill, NC 27599, USA
| | - Isabella S. Glenn
- Department of Pharmaceutical Chemistry, University of California, San Francisco, CA 94158, USA
| | - Tia A. Tummino
- Department of Pharmaceutical Chemistry, University of California, San Francisco, CA 94158, USA
| | - Sijie Huang
- Department of Pharmaceutical Chemistry, University of California, San Francisco, CA 94158, USA
| | - John J. Irwin
- Department of Pharmaceutical Chemistry, University of California, San Francisco, CA 94158, USA
| | | | - Yurii Moroz
- Chemspace LLC, Kyiv 02094, Ukraine
- Taras Shevchenko National University of Kyiv, Kyiv 01601, Ukraine
- Enamine Ltd., Kyiv 02094, Ukraine
| | - Georgios Skiniotis
- Department of Molecular and Cellular Physiology, Stanford University School of Medicine, Stanford, CA 94035, USA
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA 94304, USA
| | - Andrew C. Kruse
- Department of Biological Chemistry and Molecular Pharmacology, Blavatnik Institute, Harvard Medical School, Boston, MA 02115, USA
| | - Brian K. Shoichet
- Department of Pharmaceutical Chemistry, University of California, San Francisco, CA 94158, USA
| | - Bryan L. Roth
- Department of Pharmacology, University of North Carolina at Chapel Hill School of Medicine, Chapel Hill, NC 27599, USA
- National Institute of Mental Health Psychoactive Drug Screening Program (NIMH PDSP), School of Medicine, University of North Carolina at Chapel Hill School of Medicine, Chapel Hill, NC 27599, USA
- Division of Chemical Biology and Medicinal Chemistry, Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| |
Collapse
|
17
|
Cong Y, Endo T. A Quadruple Revolution: Deciphering Biological Complexity with Artificial Intelligence, Multiomics, Precision Medicine, and Planetary Health. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2024; 28:257-260. [PMID: 38813661 DOI: 10.1089/omi.2024.0110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2024]
Abstract
A quiet quadruple revolution has been in the making in systems science with convergence of (1) artificial intelligence, machine learning, and other digital technologies; (2) multiomics big data integration; (3) growing interest in the "variability science" of precision/personalized medicine that aims to account for patient-to-patient and between-population differences in disease susceptibilities and responses to health interventions such as drugs, nutrition, vaccines, and radiation; and (4) planetary health scholarship that both scales up and integrates biological, clinical, and ecological contexts of health and disease. Against this overarching background, this article presents and highlights some of the salient challenges and prospects of multiomics research, emphasizing the attendant pivotal role of systems medicine and systems biology. In addition, we emphasize the rapidly growing importance of planetary health research for systems medicine, particularly amid climate emergency, ecological degradation, and loss of planetary biodiversity. Looking ahead, we anticipate that the integration and utilization of multiomics big data and artificial intelligence will drive further progress in systems medicine and systems biology, heralding a promising future for both human and planetary health.
Collapse
Affiliation(s)
- Yi Cong
- Information Biology Laboratory, Graduate School of Information Science and Technology, Hokkaido University, Sapporo, Japan
| | - Toshinori Endo
- Information Biology Laboratory, Faculty of Information Science and Technology, Hokkaido University, Sapporo, Japan
| |
Collapse
|
18
|
Hoffman J, Tan H, Sandoval-Cooper C, de Villiers K, Reed SM. GTExome: Modeling commonly expressed missense mutations in the human genome. PLoS One 2024; 19:e0303604. [PMID: 38814966 PMCID: PMC11139294 DOI: 10.1371/journal.pone.0303604] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Accepted: 04/26/2024] [Indexed: 06/01/2024] Open
Abstract
A web application, GTExome, is described that quickly identifies, classifies, and models missense mutations in commonly expressed human proteins. GTExome can be used to categorize genomic mutation data with tissue specific expression data from the Genotype-Tissue Expression (GTEx) project. Commonly expressed missense mutations in proteins from a wide range of tissue types can be selected and assessed for modeling suitability. Information about the consequences of each mutation is provided to the user including if disulfide bonds, hydrogen bonds, or salt bridges are broken, buried prolines introduced, buried charges are created or lost, charge is swapped, a buried glycine is replaced, or if the residue that would be removed is a proline in the cis configuration. Also, if the mutation site is in a binding pocket the number of pockets and their volumes are reported. The user can assess this information and then select from available experimental or computationally predicted structures of native proteins to create, visualize, and download a model of the mutated protein using Fast and Accurate Side-chain Protein Repacking (FASPR). For AlphaFold modeled proteins, confidence scores for native proteins are provided. Using this tool, we explored a set of 9,666 common missense mutations from a variety of tissues from GTEx and show that most mutations can be modeled using this tool to facilitate studies of protein-protein and protein-drug interactions. The open-source tool is freely available at https://pharmacogenomics.clas.ucdenver.edu/gtexome/.
Collapse
Affiliation(s)
- Jill Hoffman
- Computational Bioscience, University of Colorado Anschutz Medical Campus, Aurora, CO, United States of America
| | - Henry Tan
- Department of Chemistry, University of Colorado Denver, Denver, CO, United States of America
| | - Clara Sandoval-Cooper
- Department of Chemistry, University of Colorado Denver, Denver, CO, United States of America
| | - Kaelyn de Villiers
- Department of Chemistry, University of Colorado Denver, Denver, CO, United States of America
| | - Scott M. Reed
- Department of Chemistry, University of Colorado Denver, Denver, CO, United States of America
| |
Collapse
|
19
|
Zhang G, Zhang C, Cai M, Luo C, Zhu F, Liang Z. FuncPhos-STR: An integrated deep neural network for functional phosphosite prediction based on AlphaFold protein structure and dynamics. Int J Biol Macromol 2024; 266:131180. [PMID: 38552697 DOI: 10.1016/j.ijbiomac.2024.131180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 03/19/2024] [Accepted: 03/26/2024] [Indexed: 04/01/2024]
Abstract
Phosphorylation modifications play important regulatory roles in most biological processes. However, the functional assignment for the vast majority of the identified phosphosites remains a major challenge. Here, we provide a deep learning framework named FuncPhos-STR as an online resource, for functional prediction and structural visualization of human proteome-level phosphosites. Based on our reported FuncPhos-SEQ framework, which was built by integrating phosphosite sequence evolution and protein-protein interaction (PPI) information, FuncPhos-STR was developed by further integrating the structural and dynamics information on AlphaFold protein structures. The characterized structural topology and dynamics features underlying functional phosphosites emphasized their molecular mechanism for regulating protein functions. By integrating the structural and dynamics, sequence evolutionary, and PPI network features from protein different dimensions, FuncPhos-STR has advantage over other reported models, with the best AUC value of 0.855. Using FuncPhos-STR, the phosphosites inside the pocket regions are accessible to higher functional scores, theoretically supporting their potential regulatory mechanism. Overall, FuncPhos-STR would accelerate the functional identification of huge unexplored phosphosites, and facilitate the elucidation of their allosteric regulation mechanisms. The web server of FuncPhos-STR is freely available at http://funcptm.jysw.suda.edu.cn/str.
Collapse
Affiliation(s)
- Guangyu Zhang
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China
| | - Cai Zhang
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China
| | - Mingyue Cai
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China
| | - Cheng Luo
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China
| | - Fei Zhu
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China.
| | - Zhongjie Liang
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou 215123, China; Jiangsu Province Engineering Research Center of Precision Diagnostics and Therapeutics Development, Soochow University, Suzhou 215123, China.
| |
Collapse
|
20
|
Lyu J, Kapolka N, Gumpper R, Alon A, Wang L, Jain MK, Barros-Álvarez X, Sakamoto K, Kim Y, DiBerto J, Kim K, Tummino TA, Huang S, Irwin JJ, Tarkhanova OO, Moroz Y, Skiniotis G, Kruse AC, Shoichet BK, Roth BL. AlphaFold2 structures template ligand discovery. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.12.20.572662. [PMID: 38187536 PMCID: PMC10769324 DOI: 10.1101/2023.12.20.572662] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
AlphaFold2 (AF2) and RosettaFold have greatly expanded the number of structures available for structure-based ligand discovery, even though retrospective studies have cast doubt on their direct usefulness for that goal. Here, we tested unrefined AF2 models prospectively, comparing experimental hit-rates and affinities from large library docking against AF2 models vs the same screens targeting experimental structures of the same receptors. In retrospective docking screens against the σ2 and the 5-HT2A receptors, the AF2 structures struggled to recapitulate ligands that we had previously found docking against the receptors' experimental structures, consistent with published results. Prospective large library docking against the AF2 models, however, yielded similar hit rates for both receptors versus docking against experimentally-derived structures; hundreds of molecules were prioritized and tested against each model and each structure of each receptor. The success of the AF2 models was achieved despite differences in orthosteric pocket residue conformations for both targets versus the experimental structures. Intriguingly, against the 5-HT2A receptor the most potent, subtype-selective agonists were discovered via docking against the AF2 model, not the experimental structure. To understand this from a molecular perspective, a cryoEM structure was determined for one of the more potent and selective ligands to emerge from docking against the AF2 model of the 5-HT2A receptor. Our findings suggest that AF2 models may sample conformations that are relevant for ligand discovery, much extending the domain of applicability of structure-based ligand discovery.
Collapse
Affiliation(s)
- Jiankun Lyu
- Department of Pharmaceutical Chemistry, University of California, San Francisco, CA 94158, USA
- The Evnin Family Laboratory of Computational Molecular Discovery, The Rockefeller University, New York, NY 10065, USA (present address)
| | - Nicholas Kapolka
- Department of Pharmacology, University of North Carolina at Chapel Hill School of Medicine, Chapel Hill, NC 27599-7365, USA
| | - Ryan Gumpper
- Department of Pharmacology, University of North Carolina at Chapel Hill School of Medicine, Chapel Hill, NC 27599-7365, USA
| | - Assaf Alon
- Department of Biological Chemistry and Molecular Pharmacology, Blavatnik Institute, Harvard Medical School, Boston, MA, USA
- Pharmacology Department, Yale School of Medicine, New Haven, CT 06510, USA (present address)
| | - Liang Wang
- Department of Molecular and Cellular Physiology, Stanford University School of Medicine, Stanford, CA, USA
| | - Manish K Jain
- Department of Pharmacology, University of North Carolina at Chapel Hill School of Medicine, Chapel Hill, NC 27599-7365, USA
| | - Ximena Barros-Álvarez
- Department of Molecular and Cellular Physiology, Stanford University School of Medicine, Stanford, CA, USA
| | - Kensuke Sakamoto
- Department of Pharmacology, University of North Carolina at Chapel Hill School of Medicine, Chapel Hill, NC 27599-7365, USA
- National Institute of Mental Health Psychoactive Drug Screening Program (NIMH PDSP), School of Medicine, University of North Carolina at Chapel Hill School of Medicine, Chapel Hill, NC 27599-7365, USA
| | - Yoojoong Kim
- Department of Pharmacology, University of North Carolina at Chapel Hill School of Medicine, Chapel Hill, NC 27599-7365, USA
| | - Jeffrey DiBerto
- Department of Pharmacology, University of North Carolina at Chapel Hill School of Medicine, Chapel Hill, NC 27599-7365, USA
| | - Kuglae Kim
- Department of Pharmacology, University of North Carolina at Chapel Hill School of Medicine, Chapel Hill, NC 27599-7365, USA
- Department of Pharmacy, College of Pharmacy, Yonsei University, Incheon 21983, Korea (present address)
| | - Tia A Tummino
- Department of Pharmaceutical Chemistry, University of California, San Francisco, CA 94158, USA
| | - Sijie Huang
- Department of Pharmaceutical Chemistry, University of California, San Francisco, CA 94158, USA
| | - John J Irwin
- Department of Pharmaceutical Chemistry, University of California, San Francisco, CA 94158, USA
| | | | - Yurii Moroz
- Chemspace LLC, Kyiv, 02094, Ukraine
- Taras Shevchenko National University of Kyiv, Kyiv, 01601, Ukraine
- Enamine Ltd., Kyiv, 02094, Ukraine
| | - Georgios Skiniotis
- Department of Molecular and Cellular Physiology, Stanford University School of Medicine, Stanford, CA, USA
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA, US
| | - Andrew C Kruse
- Department of Biological Chemistry and Molecular Pharmacology, Blavatnik Institute, Harvard Medical School, Boston, MA, USA
| | - Brian K Shoichet
- Department of Pharmaceutical Chemistry, University of California, San Francisco, CA 94158, USA
| | - Bryan L Roth
- Department of Pharmacology, University of North Carolina at Chapel Hill School of Medicine, Chapel Hill, NC 27599-7365, USA
- National Institute of Mental Health Psychoactive Drug Screening Program (NIMH PDSP), School of Medicine, University of North Carolina at Chapel Hill School of Medicine, Chapel Hill, NC 27599-7365, USA
- Division of Chemical Biology and Medicinal Chemistry, Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599-7360, USA
| |
Collapse
|
21
|
Li X, Qian Y, Hu Y, Chen J, Yue H, Deng L. MSF-PFP: A Novel Multisource Feature Fusion Model for Protein Function Prediction. J Chem Inf Model 2024; 64:1502-1511. [PMID: 38413369 DOI: 10.1021/acs.jcim.3c01794] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/29/2024]
Abstract
Protein function prediction is essential for disease treatment and drug development; yet, traditional biological experimental methods are less efficient in annotating protein function, and existing automated methods fail to fully leverage protein multisource data. Here, we present MSF-PFP, a computational framework that fuses multisource data features to predict protein function with high accuracy. Our framework designs specific models for feature extraction based on the characteristics of various data sources, including a global-local-individual strategy for local location features. MSF-PFP then integrates extracted features through a multisource feature fusion model, ultimately categorizing protein functions. Experimental results demonstrate that MSF-PFP outperforms eight state-of-the-art models, achieving FMax scores of 0.542, 0.675, and 0.624 for the biological process (BP), molecular function (MF), and cellular component (CC), respectively. The source code and data set for MSF-PFP are available at https://swanhub.co/TianGua/MSF-PFP, facilitating further exploration and validation of the proposed framework. This study highlights the potential of multisource data fusion in enhancing protein function prediction, contributing to improved disease therapy and medication discovery strategies.
Collapse
Affiliation(s)
- Xinhui Li
- School of Software, Xinjiang University, Urumqi 830091, China
- Key Laboratory of Signal Detection and Processing in Xinjiang Uygur Autonomous Region, Xinjiang University, Urumqi 830046, China
- Key Laboratory of Software Engineering, Xinjiang University, Urumqi 830091, China
| | - Yurong Qian
- School of Software, Xinjiang University, Urumqi 830091, China
- Key Laboratory of Signal Detection and Processing in Xinjiang Uygur Autonomous Region, Xinjiang University, Urumqi 830046, China
- Key Laboratory of Software Engineering, Xinjiang University, Urumqi 830091, China
| | - Yue Hu
- School of Software, Xinjiang University, Urumqi 830091, China
- Key Laboratory of Signal Detection and Processing in Xinjiang Uygur Autonomous Region, Xinjiang University, Urumqi 830046, China
- Key Laboratory of Software Engineering, Xinjiang University, Urumqi 830091, China
| | - Jiaying Chen
- School of Software, Xinjiang University, Urumqi 830091, China
- Key Laboratory of Signal Detection and Processing in Xinjiang Uygur Autonomous Region, Xinjiang University, Urumqi 830046, China
- Key Laboratory of Software Engineering, Xinjiang University, Urumqi 830091, China
| | - Haitao Yue
- School of Future Technology, Xinjiang University, Urumqi 830017, China
- Laboratory of Synthetic Biology, School of Life Science and Technology, Xinjiang University, Urumqi 830017, China
| | - Lei Deng
- School of Software, Xinjiang University, Urumqi 830091, China
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
22
|
Fu Y, Gu Z, Luo X, Guo Q, Lai L, Deng M. Learning a generalized graph transformer for protein function prediction in dissimilar sequences. Gigascience 2024; 13:giae093. [PMID: 39657158 PMCID: PMC11734293 DOI: 10.1093/gigascience/giae093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 07/04/2024] [Accepted: 10/25/2024] [Indexed: 12/17/2024] Open
Abstract
BACKGROUND In the face of a growing disparity between high-throughput sequence data and low-throughput experimental studies, the emerging field of deep learning stands as a promising alternative. Generally, many data-driven approaches are capable of facilitating fast and accurate predictions of protein functions. Nevertheless, the inherent statistical nature of deep learning techniques may limit their generalization capabilities when applied to novel nonhomologous proteins that diverge significantly from existing ones. RESULTS In this work, we herein propose a novel, generalized approach named Graph Adversarial Learning with Alignment (GALA) for protein function prediction. Our GALA method integrates a graph transformer architecture with an attention pooling module to extract embeddings from both protein sequences and structures, facilitating unified learning of protein representations. Particularly noteworthy, GALA incorporates a domain discriminator conditioned on both learnable representations and predicted probabilities, which undergoes adversarial learning to ensure representation invariance across diverse environments. To optimize the model with abundant label information, we generate label embeddings in the hidden space, explicitly aligning them with protein representations. Benchmarked on datasets derived from the PDB database and Swiss-Prot database, our GALA achieves considerable performance comparable to several state-of-the-art methods. Even more, GALA demonstrates wonderful biological interpretability by identifying significant functional residues associated with Gene Ontology terms through class activation mapping. CONCLUSIONS GALA, which leverages adversarial learning and label embedding alignment to acquire domain-invariant protein representations, exhibits outstanding generalizability in function prediction for proteins from previously unseen sequence space. By incorporating the structures predicted by AlphaFold2, GALA demonstrates significant potential for function annotation in newly discovered sequences. A detailed implementation of our GALA is available at https://github.com/fuyw-aisw/GALA.
Collapse
Affiliation(s)
- Yiwei Fu
- School of Mathematical Sciences, Peking University, Beijing 100871, China
| | - Zhonghui Gu
- Peking-Tsinghua Center for Life Sciences, Peking University, Beijing 100871, China
| | - Xiao Luo
- Department of Computer Science, University of California, Los Angeles, CA 90024, USA
| | - Qirui Guo
- Center for Quantitative Biology, Peking University, Beijing 100871, China
| | - Luhua Lai
- Peking-Tsinghua Center for Life Sciences, Peking University, Beijing 100871, China
- Center for Quantitative Biology, Peking University, Beijing 100871, China
| | - Minghua Deng
- School of Mathematical Sciences, Peking University, Beijing 100871, China
- Center for Quantitative Biology, Peking University, Beijing 100871, China
- Center for Statistical Science, Peking University, Beijing 100871, China
| |
Collapse
|
23
|
Zhang W, Xu R, Chen J, Xiong H, Wang Y, Pang B, Du G, Kang Z. Advances and challenges in biotechnological production of chondroitin sulfate and its oligosaccharides. Int J Biol Macromol 2023; 253:126551. [PMID: 37659488 DOI: 10.1016/j.ijbiomac.2023.126551] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 07/27/2023] [Accepted: 08/12/2023] [Indexed: 09/04/2023]
Abstract
Chondroitin sulfate (CS) is a member of glycosaminoglycans (GAGs) and has critical physiological functions. CS is widely applied in medical and clinical fields. Currently, the supply of CS relies on traditional animal tissue extraction methods. From the perspective of medical applications, the biggest drawback of animal-derived CS is its uncontrollable molecular weight and sulfonated patterns, which are key factors affecting CS activities. The advances of cell-free enzyme catalyzed systems and de novo biosynthesis strategies have paved the way to rationally regulate CS sulfonated pattern and molecular weight. In this review, we first present a general overview of biosynthesized CS and its oligosaccharides. Then, the advances in chondroitin biosynthesis, 3'-phosphoadenosine-5'-phosphosulfate (PAPS) synthesis and regeneration, and CS biosynthesis catalyzed by sulfotransferases are discussed. Moreover, the progress of mining and expression of chondroitin depolymerizing enzymes for preparation of CS oligosaccharides is also summarized. Finally, we analyze and discuss the challenges faced in synthesizing CS and its oligosaccharides using microbial and enzymatic methods. In summary, the biotechnological production of CS and its oligosaccharides is a promising method in addressing the drawbacks associated with animal-derived CS and enabling the production of CS oligosaccharides with defined structures.
Collapse
Affiliation(s)
- Weijiao Zhang
- The Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, Jiangnan University, Wuxi, China; The Science Center for Future Foods, Jiangnan University, Wuxi 214122, China; The Key Laboratory of Industrial Biotechnology, Ministry of Education, School of Biotechnology, Jiangnan University, Wuxi, China
| | - Ruirui Xu
- The Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, Jiangnan University, Wuxi, China; The Science Center for Future Foods, Jiangnan University, Wuxi 214122, China
| | - Jiamin Chen
- The Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, Jiangnan University, Wuxi, China; The Science Center for Future Foods, Jiangnan University, Wuxi 214122, China
| | - Haibo Xiong
- The Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, Jiangnan University, Wuxi, China; The Science Center for Future Foods, Jiangnan University, Wuxi 214122, China
| | - Yang Wang
- The Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, Jiangnan University, Wuxi, China; The Science Center for Future Foods, Jiangnan University, Wuxi 214122, China.
| | - Bo Pang
- The Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, Jiangnan University, Wuxi, China; The Science Center for Future Foods, Jiangnan University, Wuxi 214122, China
| | - Guocheng Du
- The Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, Jiangnan University, Wuxi, China; The Science Center for Future Foods, Jiangnan University, Wuxi 214122, China
| | - Zhen Kang
- The Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, Jiangnan University, Wuxi, China; The Science Center for Future Foods, Jiangnan University, Wuxi 214122, China; The Key Laboratory of Industrial Biotechnology, Ministry of Education, School of Biotechnology, Jiangnan University, Wuxi, China.
| |
Collapse
|
24
|
Nussinov R, Liu Y, Zhang W, Jang H. Cell phenotypes can be predicted from propensities of protein conformations. Curr Opin Struct Biol 2023; 83:102722. [PMID: 37871498 PMCID: PMC10841533 DOI: 10.1016/j.sbi.2023.102722] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 09/26/2023] [Accepted: 09/27/2023] [Indexed: 10/25/2023]
Abstract
Proteins exist as dynamic conformational ensembles. Here we suggest that the propensities of the conformations can be predictors of cell function. The conformational states that the molecules preferentially visit can be viewed as phenotypic determinants, and their mutations work by altering the relative propensities, thus the cell phenotype. Our examples include (i) inactive state variants harboring cancer driver mutations that present active state-like conformational features, as in K-Ras4BG12V compared to other K-Ras4BG12X mutations; (ii) mutants of the same protein presenting vastly different phenotypic and clinical profiles: cancer and neurodevelopmental disorders; (iii) alterations in the occupancies of the conformational (sub)states influencing enzyme reactivity. Thus, protein conformational propensities can determine cell fate. They can also suggest the allosteric drugs efficiency.
Collapse
Affiliation(s)
- Ruth Nussinov
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA; Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel; Cancer Innovation Laboratory, National Cancer Institute, Frederick, MD 21702, USA.
| | - Yonglan Liu
- Cancer Innovation Laboratory, National Cancer Institute, Frederick, MD 21702, USA
| | - Wengang Zhang
- Cancer Innovation Laboratory, National Cancer Institute, Frederick, MD 21702, USA
| | - Hyunbum Jang
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA; Cancer Innovation Laboratory, National Cancer Institute, Frederick, MD 21702, USA
| |
Collapse
|
25
|
Hoffman J, Tan H, Sandoval-Cooper C, de Villiers K, Reed SM. GTExome: Modeling commonly expressed missense mutations in the human genome. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.14.567143. [PMID: 38014287 PMCID: PMC10680684 DOI: 10.1101/2023.11.14.567143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
A web application, GTExome, is described that quickly identifies, classifies, and models missense mutations in commonly expressed human proteins. GTExome can be used to categorize genomic mutation data with tissue specific expression data from the Genotype-Tissue Expression (GTEx) project. Commonly expressed missense mutations in proteins from a wide range of tissue types can be selected and assessed for modeling suitability. Information about the consequences of each mutation is provided to the user including if disulfide bonds, hydrogen bonds, or salt bridges are broken, buried prolines introduced, buried charges are created or lost, charge is swapped, a buried glycine is replaced, or if the residue that would be removed is a proline in the cis configuration. Also, if the mutation site is in a binding pocket the number of pockets and their volumes are reported. The user can assess this information and then select from available experimental or computationally predicted structures of native proteins to create, visualize, and download a model of the mutated protein using Fast and Accurate Side-chain Protein Repacking (FASPR). For AlphaFold modeled proteins, confidence scores for native proteins are provided. Using this tool, we explored a set of 9,666 common missense mutations from a variety of tissues from GTEx and show that most mutations can be modeled using this tool to facilitate studies of protein-protein and protein-drug interactions. The open-source tool is freely available at https://pharmacogenomics.clas.ucdenver.edu/gtexome/.
Collapse
Affiliation(s)
| | | | | | | | - Scott M. Reed
- Department of Chemistry, Department of Chemistry, University of Colorado Denver, 1151 Arapahoe St., Denver, CO 80204 USA
| |
Collapse
|
26
|
Nunes-Alves A, Merz K. AlphaFold2 in Molecular Discovery. J Chem Inf Model 2023; 63:5947-5949. [PMID: 37807755 DOI: 10.1021/acs.jcim.3c01459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/10/2023]
Affiliation(s)
- Ariane Nunes-Alves
- Institute of Chemistry, Technische Universität Berlin, Berlin 10623, Germany
| | - Kenneth Merz
- Department of Chemistry, Michigan State University, East Lansing 48824, Michigan, United States
| |
Collapse
|
27
|
Zheng R, Huang Z, Deng L. Large-scale predicting protein functions through heterogeneous feature fusion. Brief Bioinform 2023:bbad243. [PMID: 37401369 DOI: 10.1093/bib/bbad243] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2023] [Revised: 05/18/2023] [Accepted: 06/12/2023] [Indexed: 07/05/2023] Open
Abstract
As the volume of protein sequence and structure data grows rapidly, the functions of the overwhelming majority of proteins cannot be experimentally determined. Automated annotation of protein function at a large scale is becoming increasingly important. Existing computational prediction methods are typically based on expanding the relatively small number of experimentally determined functions to large collections of proteins with various clues, including sequence homology, protein-protein interaction, gene co-expression, etc. Although there has been some progress in protein function prediction in recent years, the development of accurate and reliable solutions still has a long way to go. Here we exploit AlphaFold predicted three-dimensional structural information, together with other non-structural clues, to develop a large-scale approach termed PredGO to annotate Gene Ontology (GO) functions for proteins. We use a pre-trained language model, geometric vector perceptrons and attention mechanisms to extract heterogeneous features of proteins and fuse these features for function prediction. The computational results demonstrate that the proposed method outperforms other state-of-the-art approaches for predicting GO functions of proteins in terms of both coverage and accuracy. The improvement of coverage is because the number of structures predicted by AlphaFold is greatly increased, and on the other hand, PredGO can extensively use non-structural information for functional prediction. Moreover, we show that over 205 000 ($\sim $100%) entries in UniProt for human are annotated by PredGO, over 186 000 ($\sim $90%) of which are based on predicted structure. The webserver and database are available at http://predgo.denglab.org/.
Collapse
Affiliation(s)
- Rongtao Zheng
- School of Computer Science and Engineering, Central South University, 410000 Changsha, China
| | - Zhijian Huang
- School of Computer Science and Engineering, Central South University, 410000 Changsha, China
| | - Lei Deng
- School of Computer Science and Engineering, Central South University, 410000 Changsha, China
| |
Collapse
|
28
|
Boadu F, Cao H, Cheng J. Combining protein sequences and structures with transformers and equivariant graph neural networks to predict protein function. Bioinformatics 2023; 39:i318-i325. [PMID: 37387145 DOI: 10.1093/bioinformatics/btad208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION Millions of protein sequences have been generated by numerous genome and transcriptome sequencing projects. However, experimentally determining the function of the proteins is still a time consuming, low-throughput, and expensive process, leading to a large protein sequence-function gap. Therefore, it is important to develop computational methods to accurately predict protein function to fill the gap. Even though many methods have been developed to use protein sequences as input to predict function, much fewer methods leverage protein structures in protein function prediction because there was lack of accurate protein structures for most proteins until recently. RESULTS We developed TransFun-a method using a transformer-based protein language model and 3D-equivariant graph neural networks to distill information from both protein sequences and structures to predict protein function. It extracts feature embeddings from protein sequences using a pre-trained protein language model (ESM) via transfer learning and combines them with 3D structures of proteins predicted by AlphaFold2 through equivariant graph neural networks. Benchmarked on the CAFA3 test dataset and a new test dataset, TransFun outperforms several state-of-the-art methods, indicating that the language model and 3D-equivariant graph neural networks are effective methods to leverage protein sequences and structures to improve protein function prediction. Combining TransFun predictions and sequence similarity-based predictions can further increase prediction accuracy. AVAILABILITY AND IMPLEMENTATION The source code of TransFun is available at https://github.com/jianlin-cheng/TransFun.
Collapse
Affiliation(s)
- Frimpong Boadu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, United States
| | - Hongyuan Cao
- Department of Statistics, Florida State University, Tallahassee, FL 32306, Unites States
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, United States
| |
Collapse
|
29
|
Nussinov R, Zhang M, Liu Y, Jang H. AlphaFold, allosteric, and orthosteric drug discovery: Ways forward. Drug Discov Today 2023; 28:103551. [PMID: 36907321 PMCID: PMC10238671 DOI: 10.1016/j.drudis.2023.103551] [Citation(s) in RCA: 45] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Revised: 02/27/2023] [Accepted: 03/07/2023] [Indexed: 03/13/2023]
Abstract
Drug discovery is arguably a highly challenging and significant interdisciplinary aim. The stunning success of the artificial intelligence-powered AlphaFold, whose latest version is buttressed by an innovative machine-learning approach that integrates physical and biological knowledge about protein structures, raised drug discovery hopes that unsurprisingly, have not come to bear. Even though accurate, the models are rigid, including the drug pockets. AlphaFold's mixed performance poses the question of how its power can be harnessed in drug discovery. Here we discuss possible ways of going forward wielding its strengths, while bearing in mind what AlphaFold can and cannot do. For kinases and receptors, an input enriched in active (ON) state models can better AlphaFold's chance of rational drug design success.
Collapse
Affiliation(s)
- Ruth Nussinov
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA; Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel.
| | - Mingzhen Zhang
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA
| | - Yonglan Liu
- Cancer Innovation Laboratory, National Cancer Institute, Frederick, MD 21702, USA
| | - Hyunbum Jang
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA
| |
Collapse
|
30
|
Aithani L, Alcaide E, Bartunov S, Cooper CDO, Doré AS, Lane TJ, Maclean F, Rucktooa P, Shaw RA, Skerratt SE. Advancing structural biology through breakthroughs in AI. Curr Opin Struct Biol 2023; 80:102601. [PMID: 37182397 DOI: 10.1016/j.sbi.2023.102601] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Revised: 03/06/2023] [Accepted: 04/03/2023] [Indexed: 05/16/2023]
Abstract
The past century has witnessed an exponential increase in our atomic-level understanding of molecular and cellular mechanisms from a structural perspective, with multiple landmark achievements contributing to the field. This, coupled with recent and continuing breakthroughs in artificial intelligence methods such as AlphaFold2, and enhanced computational power, is enabling our understanding of protein structure and function at unprecedented levels of accuracy and predictivity. Here, we describe some of the major recent advances across these fields, and describe, as these technologies coalesce, the potential to utilise our enhanced knowledge of intricate cellular and molecular systems to discover novel therapeutics to alleviate human suffering.
Collapse
Affiliation(s)
- Laksh Aithani
- CHARM Therapeutics Ltd., The Stanley Building, 7 St. Pancras Square, London, N1C 4AG, UK.
| | - Eric Alcaide
- CHARM Therapeutics Ltd., The Stanley Building, 7 St. Pancras Square, London, N1C 4AG, UK
| | - Sergey Bartunov
- CHARM Therapeutics Ltd., The Stanley Building, 7 St. Pancras Square, London, N1C 4AG, UK
| | - Christopher D O Cooper
- CHARM Therapeutics Ltd., B900, Babraham Research Campus, Babraham, Cambridge, CB22 3AT, UK
| | - Andrew S Doré
- CHARM Therapeutics Ltd., B900, Babraham Research Campus, Babraham, Cambridge, CB22 3AT, UK
| | - Thomas J Lane
- CHARM Therapeutics Ltd., B900, Babraham Research Campus, Babraham, Cambridge, CB22 3AT, UK
| | - Finlay Maclean
- CHARM Therapeutics Ltd., The Stanley Building, 7 St. Pancras Square, London, N1C 4AG, UK
| | - Prakash Rucktooa
- CHARM Therapeutics Ltd., B900, Babraham Research Campus, Babraham, Cambridge, CB22 3AT, UK
| | - Robert A Shaw
- CHARM Therapeutics Ltd., The Stanley Building, 7 St. Pancras Square, London, N1C 4AG, UK
| | - Sarah E Skerratt
- CHARM Therapeutics Ltd., B900, Babraham Research Campus, Babraham, Cambridge, CB22 3AT, UK.
| |
Collapse
|
31
|
Yang Z, Zeng X, Zhao Y, Chen R. AlphaFold2 and its applications in the fields of biology and medicine. Signal Transduct Target Ther 2023; 8:115. [PMID: 36918529 PMCID: PMC10011802 DOI: 10.1038/s41392-023-01381-z] [Citation(s) in RCA: 182] [Impact Index Per Article: 91.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Revised: 12/27/2022] [Accepted: 02/16/2023] [Indexed: 03/16/2023] Open
Abstract
AlphaFold2 (AF2) is an artificial intelligence (AI) system developed by DeepMind that can predict three-dimensional (3D) structures of proteins from amino acid sequences with atomic-level accuracy. Protein structure prediction is one of the most challenging problems in computational biology and chemistry, and has puzzled scientists for 50 years. The advent of AF2 presents an unprecedented progress in protein structure prediction and has attracted much attention. Subsequent release of structures of more than 200 million proteins predicted by AF2 further aroused great enthusiasm in the science community, especially in the fields of biology and medicine. AF2 is thought to have a significant impact on structural biology and research areas that need protein structure information, such as drug discovery, protein design, prediction of protein function, et al. Though the time is not long since AF2 was developed, there are already quite a few application studies of AF2 in the fields of biology and medicine, with many of them having preliminarily proved the potential of AF2. To better understand AF2 and promote its applications, we will in this article summarize the principle and system architecture of AF2 as well as the recipe of its success, and particularly focus on reviewing its applications in the fields of biology and medicine. Limitations of current AF2 prediction will also be discussed.
Collapse
Affiliation(s)
- Zhenyu Yang
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China
| | - Xiaoxi Zeng
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
| | - Yi Zhao
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
- Key Laboratory of Intelligent Information Processing, Advanced Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China.
| | - Runsheng Chen
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China.
- Pingshan Translational Medicine Center, Shenzhen Bay Laboratory, Shenzhen, 518118, China.
| |
Collapse
|
32
|
Mohamed AR, Ochsenkühn MA, Kazlak AM, Moustafa A, Amin SA. The coral microbiome: towards an understanding of the molecular mechanisms of coral-microbiota interactions. FEMS Microbiol Rev 2023; 47:fuad005. [PMID: 36882224 PMCID: PMC10045912 DOI: 10.1093/femsre/fuad005] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Revised: 02/10/2023] [Accepted: 02/15/2023] [Indexed: 03/09/2023] Open
Abstract
Corals live in a complex, multipartite symbiosis with diverse microbes across kingdoms, some of which are implicated in vital functions, such as those related to resilience against climate change. However, knowledge gaps and technical challenges limit our understanding of the nature and functional significance of complex symbiotic relationships within corals. Here, we provide an overview of the complexity of the coral microbiome focusing on taxonomic diversity and functions of well-studied and cryptic microbes. Mining the coral literature indicate that while corals collectively harbour a third of all marine bacterial phyla, known bacterial symbionts and antagonists of corals represent a minute fraction of this diversity and that these taxa cluster into select genera, suggesting selective evolutionary mechanisms enabled these bacteria to gain a niche within the holobiont. Recent advances in coral microbiome research aimed at leveraging microbiome manipulation to increase coral's fitness to help mitigate heat stress-related mortality are discussed. Then, insights into the potential mechanisms through which microbiota can communicate with and modify host responses are examined by describing known recognition patterns, potential microbially derived coral epigenome effector proteins and coral gene regulation. Finally, the power of omics tools used to study corals are highlighted with emphasis on an integrated host-microbiota multiomics framework to understand the underlying mechanisms during symbiosis and climate change-driven dysbiosis.
Collapse
Affiliation(s)
- Amin R Mohamed
- Biology Program, New York University Abu Dhabi, Abu Dhabi 129188, United Arab Emirates
| | - Michael A Ochsenkühn
- Biology Program, New York University Abu Dhabi, Abu Dhabi 129188, United Arab Emirates
| | - Ahmed M Kazlak
- Systems Genomics Laboratory, American University in Cairo, New Cairo 11835, Egypt
- Biotechnology Graduate Program, American University in Cairo, New Cairo 11835, Egypt
| | - Ahmed Moustafa
- Systems Genomics Laboratory, American University in Cairo, New Cairo 11835, Egypt
- Biotechnology Graduate Program, American University in Cairo, New Cairo 11835, Egypt
- Department of Biology, American University in Cairo, New Cairo 11835, Egypt
| | - Shady A Amin
- Biology Program, New York University Abu Dhabi, Abu Dhabi 129188, United Arab Emirates
- Center for Genomics and Systems Biology (CGSB), New York University Abu Dhabi, Abu Dhabi 129188, United Arab Emirates
| |
Collapse
|