1
|
Yuan Y, Zhao G, Lu J, Wang L, Shi Y, Zhang J. Enhancing the Thermostability of Bacillus licheniformis Alkaline Protease 2709 by Computation-Based Rational Design. Molecules 2025; 30:1160. [PMID: 40076384 PMCID: PMC11901772 DOI: 10.3390/molecules30051160] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2025] [Revised: 02/16/2025] [Accepted: 03/03/2025] [Indexed: 03/14/2025] Open
Abstract
The alkaline protease from Bacillus licheniformis strain 2709 (AprE 2709) is widely used in Chinese industries but faces stability challenges under high-temperature conditions. This study employed molecular modeling and mutagenesis to identify Asn residues at positions 61, 160, and 211 as key sites affecting the stability of AprE 2709. By leveraging the additive and cooperative effects of mutations, the mutant enzyme AprE 2709 (N61G/N160G/N211G) was engineered, exhibiting enhanced thermostability and catalytic activity. The mutant demonstrated a 2.89-fold increase in half-life at 60 °C and a 1.56-fold improvement in catalytic efficiency compared to the wild-type enzyme. Structural analysis revealed that the improved thermostability was due to altered electrostatic interactions and strengthened hydrophobic contacts. Targeting Asn residues prone to deamidation presents a promising strategy for improving protein heat tolerance. These findings not only enhance our understanding of enzyme stability but also lay a foundation for future research aimed at optimizing alkaline proteases for diverse industrial applications, particularly in high-temperature processes.
Collapse
Affiliation(s)
- Yuan Yuan
- College of Chemistry and Chemical Engineering, Shanxi University, Taiyuan 030006, China; (Y.Y.); (G.Z.)
| | - Guowei Zhao
- College of Chemistry and Chemical Engineering, Shanxi University, Taiyuan 030006, China; (Y.Y.); (G.Z.)
| | - Jing Lu
- College of Life Sciences, Shanxi University, Taiyuan 030006, China;
| | - Lei Wang
- Key Laboratory of Chemical Biology and Molecular Engineering, Ministry of Education, Institute of Biotechnology, Shanxi University, Taiyuan 030006, China;
| | - Yawei Shi
- College of Life Sciences, Shanxi University, Taiyuan 030006, China;
- Shanxi Province Detergent Alkaline Protease Industrialization Key Technology and Application Engineering Research Center, Taiyuan 030006, China
| | - Jian Zhang
- College of Chemistry and Chemical Engineering, Shanxi University, Taiyuan 030006, China; (Y.Y.); (G.Z.)
- Shanxi Province Detergent Alkaline Protease Industrialization Key Technology and Application Engineering Research Center, Taiyuan 030006, China
| |
Collapse
|
2
|
Rosignoli S, Pacelli M, Manganiello F, Paiardini A. An outlook on structural biology after AlphaFold: tools, limits and perspectives. FEBS Open Bio 2025; 15:202-222. [PMID: 39313455 PMCID: PMC11788754 DOI: 10.1002/2211-5463.13902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 08/19/2024] [Accepted: 09/13/2024] [Indexed: 09/25/2024] Open
Abstract
AlphaFold and similar groundbreaking, AI-based tools, have revolutionized the field of structural bioinformatics, with their remarkable accuracy in ab-initio protein structure prediction. This success has catalyzed the development of new software and pipelines aimed at incorporating AlphaFold's predictions, often focusing on addressing the algorithm's remaining challenges. Here, we present the current landscape of structural bioinformatics shaped by AlphaFold, and discuss how the field is dynamically responding to this revolution, with new software, methods, and pipelines. While the excitement around AI-based tools led to their widespread application, it is essential to acknowledge that their practical success hinges on their integration into established protocols within structural bioinformatics, often neglected in the context of AI-driven advancements. Indeed, user-driven intervention is still as pivotal in the structure prediction process as in complementing state-of-the-art algorithms with functional and biological knowledge.
Collapse
Affiliation(s)
- Serena Rosignoli
- Department of Biochemical sciences “A. Rossi Fanelli”Sapienza Università di RomaItaly
| | - Maddalena Pacelli
- Department of Biochemical sciences “A. Rossi Fanelli”Sapienza Università di RomaItaly
| | - Francesca Manganiello
- Department of Biochemical sciences “A. Rossi Fanelli”Sapienza Università di RomaItaly
| | - Alessandro Paiardini
- Department of Biochemical sciences “A. Rossi Fanelli”Sapienza Università di RomaItaly
| |
Collapse
|
3
|
Li Y, Duan Z, Li Z, Xue W. Data and AI-driven synthetic binding protein discovery. Trends Pharmacol Sci 2025; 46:132-144. [PMID: 39755458 DOI: 10.1016/j.tips.2024.12.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2024] [Revised: 12/02/2024] [Accepted: 12/06/2024] [Indexed: 01/06/2025]
Abstract
Synthetic binding proteins (SBPs) are a class of protein binders that are artificially created and do not exist naturally. Their broad applications in tackling challenges of research, diagnostics, and therapeutics have garnered significant interest. Traditional protein engineering is pivotal to the discovery of SBPs. Recently, this discovery has been significantly accelerated by computational approaches, such as molecular modeling and artificial intelligence (AI). Furthermore, while numerous bioinformatics databases offer a wealth of resources that fuel SBP discovery, the full potential of these data has not yet been fully exploited. In this review, we present a comprehensive overview of SBP data ecosystem and methodologies in SBP discovery, highlighting the critical role of high-quality data and AI technologies in accelerating the discovery of innovative SBPs with promising applications in pharmacological sciences.
Collapse
Affiliation(s)
- Yanlin Li
- School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Zixin Duan
- School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Zhenwen Li
- School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Weiwei Xue
- School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China; Western (Chongqing) Collaborative Innovation Center for Intelligent Diagnostics and Digital Medicine, Chongqing National Biomedicine Industry Park, Chongqing 401329, China.
| |
Collapse
|
4
|
Wang A, Liu Y, Yan Y, Jiang Y, Shi S, Wang J, Qiao K, Yang L, Wang S, Li S, Gui W. Chlorpyrifos Influences Tadpole Development by Disrupting Thyroid Hormone Signaling Pathways. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2025; 59:142-151. [PMID: 39718545 DOI: 10.1021/acs.est.4c07890] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/25/2024]
Abstract
Chlorpyrifos (CPF) is a widely used organophosphate insecticide with serious toxicological effects on aquatic animals. Although extensively studied for neurotoxicity and endocrine disruption, its stage-specific effects on amphibian metamorphosis and receptor-level interactions remain unclear. This study investigated the effects of CPF on Xenopus laevis metamorphosis at environmentally relevant concentrations (1.8 and 18 μg/L) across key developmental stages, with end points including premetamorphic progression, thyroid hormone (TH)-responsive gene expression, and levels of triiodothyronine (T3) and thyroxine (T4). Additionally, molecular docking, surface plasmon resonance (SPR), and luciferase reporter gene assays were employed to elucidate CPF's interaction with the thyroid hormone receptor alpha (TRα). CPF accelerated premetamorphic development and upregulated TH-responsive genes but delayed later-stage metamorphosis. After 21 days of exposure to 18 μg/L CPF, T3 and T4 levels were reduced by 28% and 39.4%, respectively, compared to controls. Cotreatment with T3 and CPF slowed tadpole development, indicating that CPF affects thyroid signaling in a stage-dependent manner. CPF competed with T3 for TRα binding and stimulated TRα-mediated luciferase activity when administered alone, but this activity decreased when CPF was coexposed to T3. These findings suggest that CPF functions as a partial agonist of TRα, disrupting thyroid signaling and adversely affecting amphibian development.
Collapse
Affiliation(s)
- Aoxue Wang
- Institute of Pesticide and Environmental Toxicology, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, P. R. China
| | - Yuanyuan Liu
- Institute of Pesticide and Environmental Toxicology, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, P. R. China
| | - Yujia Yan
- Institute of Pesticide and Environmental Toxicology, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, P. R. China
| | - Yuyao Jiang
- Institute of Pesticide and Environmental Toxicology, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, P. R. China
| | - Shiyao Shi
- Institute of Pesticide and Environmental Toxicology, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, P. R. China
| | - Jie Wang
- Institute of Pesticide and Environmental Toxicology, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, P. R. China
| | - Kun Qiao
- Institute of Pesticide and Environmental Toxicology, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, P. R. China
- Research Centre for the Oceans and Human Health, City University of Hong Kong Shenzhen Research Institute, Shenzhen 518057, P. R. China
| | - Long Yang
- Guizhou Institute of Subtropical Crops, Guizhou 562400, P. R. China
| | - Shuting Wang
- Hangzhou Center for Disease Control and Prevention, Hangzhou Health Supervision Institution, Zhejiang 310016, P. R. China
| | - Shuying Li
- Institute of Pesticide and Environmental Toxicology, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, P. R. China
- Key Laboratory of Biology of Crop Pathogens and Insects of Zhejiang Province, Ministry of Agriculture Key Laboratory of Molecular Biology of Crop Pathogens and Insect Pests, Zhejiang University, Hangzhou 310058, P. R. China
| | - Wenjun Gui
- Institute of Pesticide and Environmental Toxicology, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, P. R. China
- Key Laboratory of Biology of Crop Pathogens and Insects of Zhejiang Province, Ministry of Agriculture Key Laboratory of Molecular Biology of Crop Pathogens and Insect Pests, Zhejiang University, Hangzhou 310058, P. R. China
| |
Collapse
|
5
|
Mi Y, Marcu SB, Tabirca S, Yallapragada VV. PS-GO parametric protein search engine. Comput Struct Biotechnol J 2024; 23:1499-1509. [PMID: 38633387 PMCID: PMC11021831 DOI: 10.1016/j.csbj.2024.04.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 04/01/2024] [Accepted: 04/01/2024] [Indexed: 04/19/2024] Open
Abstract
With the explosive growth of protein-related data, we are confronted with a critical scientific inquiry: How can we effectively retrieve, compare, and profoundly comprehend these protein structures to maximize the utilization of such data resources? PS-GO, a parametric protein search engine, has been specifically designed and developed to maximize the utilization of the rapidly growing volume of protein-related data. This innovative tool addresses the critical need for effective retrieval, comparison, and deep understanding of protein structures. By integrating computational biology, bioinformatics, and data science, PS-GO is capable of managing large-scale data and accurately predicting and comparing protein structures and functions. The engine is built upon the concept of parametric protein design, a computer-aided method that adjusts and optimizes protein structures and sequences to achieve desired biological functions and structural stability. PS-GO utilizes key parameters such as amino acid sequence, side chain angle, and solvent accessibility, which have a significant influence on protein structure and function. Additionally, PS-GO leverages computable parameters, derived computationally, which are crucial for understanding and predicting protein behavior. The development of PS-GO underscores the potential of parametric protein design in a variety of applications, including enhancing enzyme activity, improving antibody affinity, and designing novel functional proteins. This advancement not only provides a robust theoretical foundation for the field of protein engineering and biotechnology but also offers practical guidelines for future progress in this domain.
Collapse
Affiliation(s)
- Yanlin Mi
- School of Computer Science and Information Technology, University College Cork, Cork, Ireland
- SFI Centre for Research Training in Artificial Intelligence, University College Cork, Cork, Ireland
| | - Stefan-Bogdan Marcu
- School of Computer Science and Information Technology, University College Cork, Cork, Ireland
| | - Sabin Tabirca
- School of Computer Science and Information Technology, University College Cork, Cork, Ireland
- Faculty of Mathematics and Informatics, Transylvania University of Brasov, Brasov, Romania
| | - Venkata V.B. Yallapragada
- Centre for Advanced Photonics and Process Analytics, Munster Technological University, Cork, Ireland
| |
Collapse
|
6
|
Hu J, Chen KX, Rao B, Ni JY, Thafar MA, Albaradei S, Arif M. Protein-peptide binding residue prediction based on protein language models and cross-attention mechanism. Anal Biochem 2024; 694:115637. [PMID: 39121938 DOI: 10.1016/j.ab.2024.115637] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2024] [Revised: 07/28/2024] [Accepted: 08/06/2024] [Indexed: 08/12/2024]
Abstract
Accurate identifications of protein-peptide binding residues are essential for protein-peptide interactions and advancing drug discovery. To address this problem, extensive research efforts have been made to design more discriminative feature representations. However, extracting these explicit features usually depend on third-party tools, resulting in low computational efficacy and suffering from low predictive performance. In this study, we design an end-to-end deep learning-based method, E2EPep, for protein-peptide binding residue prediction using protein sequence only. E2EPep first employs and fine-tunes two state-of-the-art pre-trained protein language models that can extract two different high-latent feature representations from protein sequences relevant for protein structures and functions. A novel feature fusion module is then designed in E2EPep to fuse and optimize the above two feature representations of binding residues. In addition, we have also design E2EPep+, which integrates E2EPep and PepBCL models, to improve the prediction performance. Experimental results on two independent testing data sets demonstrate that E2EPep and E2EPep + could achieve the average AUC values of 0.846 and 0.842 while achieving an average Matthew's correlation coefficient value that is significantly higher than that of existing most of sequence-based methods and comparable to that of the state-of-the-art structure-based predictors. Detailed data analysis shows that the primary strength of E2EPep lies in the effectiveness of feature representation using cross-attention mechanism to fuse the embeddings generated by two fine-tuned protein language models. The standalone package of E2EPep and E2EPep + can be obtained at https://github.com/ckx259/E2EPep.git for academic use only.
Collapse
Affiliation(s)
- Jun Hu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China; Center for AI and Computational Biology, Suzhou Institution of Systems Medicine, Suzhou, 215123, China.
| | - Kai-Xin Chen
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China
| | - Bing Rao
- School of Information & Electrical Engineering, Hangzhou City University, Hangzhou, 310015, China
| | - Jing-Yuan Ni
- NUIST Reading Academy, Nanjing University of Information Science & Technology, Nanjing, 210044, China
| | - Maha A Thafar
- Department of Computer Science, College of Computers and Information Technology, Taif University, Taif, 21944, Saudi Arabia
| | - Somayah Albaradei
- Department of Computer Science, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Muhammad Arif
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, 34110, Qatar.
| |
Collapse
|
7
|
Lin X, Xu T, Hou W, Dong X, Sun Y. Cationic Surface Charge Engineering of Recombinant Transthyretin Remarkably Increases the Inhibitory Potency Against Amyloid β-Protein Fibrillogenesis. Molecules 2024; 29:5023. [PMID: 39519665 PMCID: PMC11547489 DOI: 10.3390/molecules29215023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2024] [Revised: 10/15/2024] [Accepted: 10/22/2024] [Indexed: 11/16/2024] Open
Abstract
The deposition of amyloid β-protein (Aβ) in the brain is the main pathogenesis of Alzheimer's disease (AD). The development of potent inhibitors against Aβ aggregation is one of the effective strategies to combat AD. Endogenous transthyretin (TTR) can inhibit Aβ fibrillization via hydrophobic interactions, but its weak inhibitory potency hinders its application in AD therapy. Here, different recombinant TTRs were designed by cationic surface charge engineering. Compared with TTR, all positively charged recombinant TTRs showed enhanced capability in inhibiting Aβ aggregation, especially the recombinant protein obtained by mutating the acidic amino acid in TTR to arginine (TTR-nR) exhibited excellent inhibitory effect. Among them, TTR-7R remarkably increased the inhibitory potency against Aβ, which could effectively inhibit Aβ40 fibrillization at a very low concentration (0.5 μM). In addition, TTR-7R increased cultured cell viability from 62% to 89%, scavenged amyloid plaques in AD nematodes, and prolonged nematode lifespan by 5 d at 2 μM. Thermodynamic studies demonstrated that TTR-7R, enriching in positive charges, presented hydrophobic interactions and enhanced electrostatic interactions with Aβ40, leading to a significantly enhanced inhibitory capacity of TTR-7R. The research provided insights into the development of efficient recombinant protein inhibitors for AD treatment.
Collapse
Affiliation(s)
- Xiaoding Lin
- Key Laboratory of Systems Bioengineering and Frontiers Science Center for Synthetic Biology (Ministry of Education), Department of Biochemical Engineering, School of Chemical Engineering and Technology, Tianjin University, Tianjin 300350, China
| | - Ting Xu
- Key Laboratory of Systems Bioengineering and Frontiers Science Center for Synthetic Biology (Ministry of Education), Department of Biochemical Engineering, School of Chemical Engineering and Technology, Tianjin University, Tianjin 300350, China
| | - Wenqi Hou
- Key Laboratory of Systems Bioengineering and Frontiers Science Center for Synthetic Biology (Ministry of Education), Department of Biochemical Engineering, School of Chemical Engineering and Technology, Tianjin University, Tianjin 300350, China
| | - Xiaoyan Dong
- Key Laboratory of Systems Bioengineering and Frontiers Science Center for Synthetic Biology (Ministry of Education), Department of Biochemical Engineering, School of Chemical Engineering and Technology, Tianjin University, Tianjin 300350, China
| | - Yan Sun
- Key Laboratory of Systems Bioengineering and Frontiers Science Center for Synthetic Biology (Ministry of Education), Department of Biochemical Engineering, School of Chemical Engineering and Technology, Tianjin University, Tianjin 300350, China
| |
Collapse
|
8
|
Tzfadia O, Gijsbers A, Vujkovic A, Snobre J, Vargas R, Dewaele K, Meehan CJ, Farhat M, Hakke S, Peters PJ, de Jong BC, Siroy A, Ravelli RBG. Single nucleotide variation catalog from clinical isolates mapped on tertiary and quaternary structures of ESX-1-related proteins reveals critical regions as putative Mtb therapeutic targets. Microbiol Spectr 2024; 12:e0381623. [PMID: 38874407 PMCID: PMC11302016 DOI: 10.1128/spectrum.03816-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 05/02/2024] [Indexed: 06/15/2024] Open
Abstract
Proteins encoded by the ESX-1 genes of interest are essential for full virulence in all Mycobacterium tuberculosis complex (Mtbc) lineages, the pathogens causing the highest mortality worldwide. Identifying critical regions in these ESX-1-related proteins could provide preventive or therapeutic targets for Mtb infection, the game changer needed for tuberculosis control. We analyzed a compendium of whole genome sequences of clinical Mtb isolates from all lineages from >32,000 patients and identified single nucleotide polymorphisms. When mutations corresponding to all non-synonymous single nucleotide polymorphisms were mapped on structural models of the ESX-1 proteins, fully conserved regions emerged. Some could be assigned to known quaternary structures, whereas others could be predicted to be involved in yet-to-be-discovered interactions. Some mutants had clonally expanded (found in >1% of the isolates); these mutants were mostly located at the surface of globular domains, remote from known intra- and inter-molecular protein-protein interactions. Fully conserved intrinsically disordered regions of proteins were found, suggesting that these regions are crucial for the pathogenicity of the Mtbc. Altogether, our findings highlight fully conserved regions of proteins as attractive vaccine antigens and drug targets to control Mtb virulence. Extending this approach to the whole Mtb genome as well as other microorganisms will enhance vaccine development for various pathogens. IMPORTANCE We mapped all non-synonymous single nucleotide polymorphisms onto each of the experimental and predicted ESX-1 proteins' structural models and inspected their placement. Varying sizes of conserved regions were found. Next, we analyzed predicted intrinsically disordered regions within our set of proteins, finding two putative long stretches that are fully conserved, and discussed their potential essential role in immunological recognition. Combined, our findings highlight new targets for interfering with Mycobacterium tuberculosis complex virulence.
Collapse
Affiliation(s)
- Oren Tzfadia
- Mycobacteriology Unit, Institute of Tropical Medicine, Antwerp, Belgium
| | - Abril Gijsbers
- Departamento de Bioquímica, Facultad de Medicina, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Alexandra Vujkovic
- Clinical Virology Unit, Institute of Tropical Medicine, Antwerp, Belgium
- ADReM Data Lab, University of Antwerp, Antwerp, Belgium
| | - Jihad Snobre
- Mycobacteriology Unit, Institute of Tropical Medicine, Antwerp, Belgium
| | - Roger Vargas
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Klaas Dewaele
- Mycobacteriology Unit, Institute of Tropical Medicine, Antwerp, Belgium
| | - Conor J. Meehan
- Mycobacteriology Unit, Institute of Tropical Medicine, Antwerp, Belgium
- Department of Biosciences, Nottingham Trent University, Nottingham, United Kingdom
| | - Maha Farhat
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Sneha Hakke
- Division of Nanoscopy, Maastricht Multimodal Imaging Institute (M4i), Maastricht University, Maastricht, the Netherlands
| | - Peter J. Peters
- Division of Nanoscopy, Maastricht Multimodal Imaging Institute (M4i), Maastricht University, Maastricht, the Netherlands
| | - Bouke C. de Jong
- Mycobacteriology Unit, Institute of Tropical Medicine, Antwerp, Belgium
| | - Axel Siroy
- Unité de soutien à l'Institut Européen de Chimie et Biologie (IECB), CNRS, INSERM, IECB, US1, Université de Bordeaux, Pessac, France
| | - Raimond B. G. Ravelli
- Division of Nanoscopy, Maastricht Multimodal Imaging Institute (M4i), Maastricht University, Maastricht, the Netherlands
| |
Collapse
|
9
|
Lai JS, Burley SK, Duarte JM. ZMPY3D: accelerating protein structure volume analysis through vectorized 3D Zernike moments and Python-based GPU integration. BIOINFORMATICS ADVANCES 2024; 4:vbae111. [PMID: 39100546 PMCID: PMC11297494 DOI: 10.1093/bioadv/vbae111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 07/12/2024] [Accepted: 07/25/2024] [Indexed: 08/06/2024]
Abstract
Motivation Volumetric 3D object analyses are being applied in research fields such as structural bioinformatics, biophysics, and structural biology, with potential integration of artificial intelligence/machine learning (AI/ML) techniques. One such method, 3D Zernike moments, has proven valuable in analyzing protein structures (e.g., protein fold classification, protein-protein interaction analysis, and molecular dynamics simulations). Their compactness and efficiency make them amenable to large-scale analyses. Established methods for deriving 3D Zernike moments, however, can be inefficient, particularly when higher order terms are required, hindering broader applications. As the volume of experimental and computationally-predicted protein structure information continues to increase, structural biology has become a "big data" science requiring more efficient analysis tools. Results This application note presents a Python-based software package, ZMPY3D, to accelerate computation of 3D Zernike moments by vectorizing the mathematical formulae and using graphical processing units (GPUs). The package offers popular GPU-supported libraries such as CuPy and TensorFlow together with NumPy implementations, aiming to improve computational efficiency, adaptability, and flexibility in future algorithm development. The ZMPY3D package can be installed via PyPI, and the source code is available from GitHub. Volumetric-based protein 3D structural similarity scores and transform matrix of superposition functionalities have both been implemented, creating a powerful computational tool that will allow the research community to amalgamate 3D Zernike moments with existing AI/ML tools, to advance research and education in protein structure bioinformatics. Availability and implementation ZMPY3D, implemented in Python, is available on GitHub (https://github.com/tawssie/ZMPY3D) and PyPI, released under the GPL License.
Collapse
Affiliation(s)
- Jhih-Siang Lai
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, United States
| | - Stephen K Burley
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, United States
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, United States
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, United States
- Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, NJ 08901, United States
| | - Jose M Duarte
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, United States
| |
Collapse
|
10
|
Szulc NA, Stefaniak F, Piechota M, Soszyńska A, Piórkowska G, Cappannini A, Bujnicki J, Maniaci C, Pokrzywa W. DEGRONOPEDIA: a web server for proteome-wide inspection of degrons. Nucleic Acids Res 2024; 52:W221-W232. [PMID: 38567734 PMCID: PMC11223883 DOI: 10.1093/nar/gkae238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Revised: 03/12/2024] [Accepted: 03/20/2024] [Indexed: 07/06/2024] Open
Abstract
E3 ubiquitin ligases recognize substrates through their short linear motifs termed degrons. While degron-signaling has been a subject of extensive study, resources for its systematic screening are limited. To bridge this gap, we developed DEGRONOPEDIA, a web server that searches for degrons and maps them to nearby residues that can undergo ubiquitination and disordered regions, which may act as protein unfolding seeds. Along with an evolutionary assessment of degron conservation, the server also reports on post-translational modifications and mutations that may modulate degron availability. Acknowledging the prevalence of degrons at protein termini, DEGRONOPEDIA incorporates machine learning to assess N-/C-terminal stability, supplemented by simulations of proteolysis to identify degrons in newly formed termini. An experimental validation of a predicted C-terminal destabilizing motif, coupled with the confirmation of a post-proteolytic degron in another case, exemplifies its practical application. DEGRONOPEDIA can be freely accessed at degronopedia.com.
Collapse
Affiliation(s)
- Natalia A Szulc
- Laboratory of Protein Metabolism, International Institute of Molecular and Cell Biology in Warsaw, 4 Ks. Trojdena Str., 02-109 Warsaw, Poland
| | - Filip Stefaniak
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, 4 Ks. Trojdena Str., 02-109 Warsaw, Poland
| | - Małgorzata Piechota
- Laboratory of Protein Metabolism, International Institute of Molecular and Cell Biology in Warsaw, 4 Ks. Trojdena Str., 02-109 Warsaw, Poland
| | - Anna Soszyńska
- Laboratory of Protein Metabolism, International Institute of Molecular and Cell Biology in Warsaw, 4 Ks. Trojdena Str., 02-109 Warsaw, Poland
| | - Gabriela Piórkowska
- Laboratory of Protein Metabolism, International Institute of Molecular and Cell Biology in Warsaw, 4 Ks. Trojdena Str., 02-109 Warsaw, Poland
| | - Andrea Cappannini
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, 4 Ks. Trojdena Str., 02-109 Warsaw, Poland
| | - Janusz M Bujnicki
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, 4 Ks. Trojdena Str., 02-109 Warsaw, Poland
| | - Chiara Maniaci
- Medical Research Council (MRC) Protein Phosphorylation and Ubiquitylation Unit, School of Life Sciences, University of Dundee, Dow Street, Dundee DD1 5EH, UK
| | - Wojciech Pokrzywa
- Laboratory of Protein Metabolism, International Institute of Molecular and Cell Biology in Warsaw, 4 Ks. Trojdena Str., 02-109 Warsaw, Poland
| |
Collapse
|
11
|
Qi J, Feng C, Shi Y, Yang J, Zhang F, Li G, Han R. FP-Zernike: An Open-source Structural Database Construction Toolkit for Fast Structure Retrieval. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzae007. [PMID: 38894604 PMCID: PMC11423855 DOI: 10.1093/gpbjnl/qzae007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 08/16/2023] [Accepted: 09/20/2023] [Indexed: 06/21/2024]
Abstract
The release of AlphaFold2 has sparked a rapid expansion in protein model databases. Efficient protein structure retrieval is crucial for the analysis of structure models, while measuring the similarity between structures is the key challenge in structural retrieval. Although existing structure alignment algorithms can address this challenge, they are often time-consuming. Currently, the state-of-the-art approach involves converting protein structures into three-dimensional (3D) Zernike descriptors and assessing similarity using Euclidean distance. However, the methods for computing 3D Zernike descriptors mainly rely on structural surfaces and are predominantly web-based, thus limiting their application in studying custom datasets. To overcome this limitation, we developed FP-Zernike, a user-friendly toolkit for computing different types of Zernike descriptors based on feature points. Users simply need to enter a single line of command to calculate the Zernike descriptors of all structures in customized datasets. FP-Zernike outperforms the leading method in terms of retrieval accuracy and binary classification accuracy across diverse benchmark datasets. In addition, we showed the application of FP-Zernike in the construction of the descriptor database and the protocol used for the Protein Data Bank (PDB) dataset to facilitate the local deployment of this tool for interested readers. Our demonstration contained 590,685 structures, and at this scale, our system required only 4-9 s to complete a retrieval. The experiments confirmed that it achieved the state-of-the-art accuracy level. FP-Zernike is an open-source toolkit, with the source code and related data accessible at https://ngdc.cncb.ac.cn/biocode/tools/BT007365/releases/0.1, as well as through a webserver at http://www.structbioinfo.cn/.
Collapse
Affiliation(s)
- Junhai Qi
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao 266237, China
- BioMap Research, Menlo Park, CA 94025, USA
| | - Chenjie Feng
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao 266237, China
- College of Medical Information and Engineering, Ningxia Medical University, Yinchuan 750004, China
| | - Yulin Shi
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao 266237, China
| | - Jianyi Yang
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao 266237, China
| | - Fa Zhang
- Institute of Engineering Medicine, Beijing Institute of Technology, Beijing 100081, China
| | - Guojun Li
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao 266237, China
| | - Renmin Han
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao 266237, China
| |
Collapse
|
12
|
Wang Y, Liu W, Sun Y, Dong X. Transthyretin-Penetratin: A Potent Fusion Protein Inhibitor against Alzheimer's Amyloid-β Fibrillogenesis with High Blood Brain Barrier Crossing Capability. Bioconjug Chem 2024; 35:419-431. [PMID: 38450606 DOI: 10.1021/acs.bioconjchem.4c00073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/08/2024]
Abstract
The design of a potent amyloid-β protein (Aβ) inhibitor plays a pivotal role in the prevention and treatment of Alzheimer's disease (AD). Despite endogenous transthyretin (TTR) being recognized as an Aβ inhibitor, the weak inhibitory and blood brain barrier (BBB) crossing capabilities hinder it for Aβ aggregation inhibition and transport. Therefore, we have herein designed a recombinant TTR by conjugating a cationic cell penetrating peptide (penetratin, Pen), which not only enabled the fusion protein, TTR-Pen (TP), to present high BBB penetration but also greatly enhanced the potency of Aβ inhibition. Namely, the protein fusion made TP positively charged, leading to a potent suppression of Aβ40 fibrillization at a low concentration (1.5 μM), while a TTR concentration as high as 12.5 μM was required to gain a similar function. Moreover, TP could mitigate Aβ-induced neuronal death, increase cultured cell viability from 72% to 92% at 2.5 μM, and extend the lifespan of AD nematodes from 14 to 18 d. Thermodynamic studies revealed that TP, enriched in positive charges, presented extensive electrostatic interactions with Aβ40. Importantly, TP showed excellent BBB penetration performance, with a 10 times higher BBB permeability than TTR, which would allow TP to enter the brain of AD patients and participate in the transport of Aβ species out of the brain. Thus, it is expected that the fusion protein has great potential for drug development in AD treatment.
Collapse
Affiliation(s)
- Ying Wang
- Department of Biochemical Engineering, School of Chemical Engineering and Technology and Key Laboratory of Systems Bioengineering and Frontiers Science Center for Synthetic Biology (Ministry of Education), Tianjin University, Tianjin 300350, China
| | - Wei Liu
- Tianjin Key Laboratory of Radiation Medicine and Molecular Nuclear Medicine, Institute of Radiation Medicine, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin 300192, China
| | - Yan Sun
- Department of Biochemical Engineering, School of Chemical Engineering and Technology and Key Laboratory of Systems Bioengineering and Frontiers Science Center for Synthetic Biology (Ministry of Education), Tianjin University, Tianjin 300350, China
| | - Xiaoyan Dong
- Department of Biochemical Engineering, School of Chemical Engineering and Technology and Key Laboratory of Systems Bioengineering and Frontiers Science Center for Synthetic Biology (Ministry of Education), Tianjin University, Tianjin 300350, China
| |
Collapse
|
13
|
Christoffer C, Harini K, Archit G, Kihara D. Assembly of Protein Complexes in and on the Membrane with Predicted Spatial Arrangement Constraints. J Mol Biol 2024; 436:168486. [PMID: 38336197 PMCID: PMC10942765 DOI: 10.1016/j.jmb.2024.168486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 01/17/2024] [Accepted: 02/05/2024] [Indexed: 02/12/2024]
Abstract
Membrane proteins play crucial roles in various cellular processes, and their interactions with other proteins in and on the membrane are essential for their proper functioning. While an increasing number of structures of more membrane proteins are being determined, the available structure data is still sparse. To gain insights into the mechanisms of membrane protein complexes, computational docking methods are necessary due to the challenge of experimental determination. Here, we introduce Mem-LZerD, a rigid-body membrane docking algorithm designed to take advantage of modern membrane modeling and protein docking techniques to facilitate the docking of membrane protein complexes. Mem-LZerD is based on the LZerD protein docking algorithm, which has been constantly among the top servers in many rounds of CAPRI protein docking assessment. By employing a combination of geometric hashing, newly constrained by the predicted membrane height and tilt angle, and model scoring accounting for the energy of membrane insertion, we demonstrate the capability of Mem-LZerD to model diverse membrane protein-protein complexes. Mem-LZerD successfully performed unbound docking on 13 of 21 (61.9%) transmembrane complexes in an established benchmark, more than shown by previous approaches. It was additionally tested on new datasets of 44 transmembrane complexes and 92 peripheral membrane protein complexes, of which it successfully modeled 35 (79.5%) and 15 (16.3%) complexes respectively. When non-blind orientations of peripheral targets were included, the number of successes increased to 54 (58.7%). We further demonstrate that Mem-LZerD produces complex models which are suitable for molecular dynamics simulation. Mem-LZerD is made available at https://lzerd.kiharalab.org.
Collapse
Affiliation(s)
- Charles Christoffer
- Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA
| | - Kannan Harini
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India; Department of Biological Sciences, Purdue University, West Lafayette, IN 47907, USA
| | - Gupta Archit
- Department of Biological Sciences, Purdue University, West Lafayette, IN 47907, USA; Department of Genetic Engineering, SRM Institute of Science and Technology, Kattankulathur 603203, India
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA; Department of Biological Sciences, Purdue University, West Lafayette, IN 47907, USA; Purdue University Center for Cancer Research, Purdue University, West Lafayette, IN 47907, USA.
| |
Collapse
|
14
|
Greener JG, Jamali K. Fast protein structure searching using structure graph embeddings. BIOINFORMATICS ADVANCES 2024; 5:vbaf042. [PMID: 40196750 PMCID: PMC11974391 DOI: 10.1093/bioadv/vbaf042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2025] [Revised: 02/11/2025] [Accepted: 03/03/2025] [Indexed: 04/09/2025]
Abstract
Comparing and searching protein structures independent of primary sequence has proved useful for remote homology detection, function annotation, and protein classification. Fast and accurate methods to search with structures will be essential to make use of the vast databases that have recently become available, in the same way that fast protein sequence searching underpins much of bioinformatics. We train a simple graph neural network using supervised contrastive learning to learn a low-dimensional embedding of protein domains. Availability and implementation The method, called Progres, is available as software at https://github.com/greener-group/progres and as a web server at https://progres.mrc-lmb.cam.ac.uk. It has accuracy comparable to the best current methods and can search the AlphaFold database TED domains in a 10th of a second per query on CPU.
Collapse
Affiliation(s)
- Joe G Greener
- Medical Research Council Laboratory of Molecular Biology, Cambridge, CB2 0QH, United Kingdom
| | - Kiarash Jamali
- Medical Research Council Laboratory of Molecular Biology, Cambridge, CB2 0QH, United Kingdom
| |
Collapse
|
15
|
Shin WH, Kihara D. PL-PatchSurfer3: Improved Structure-Based Virtual Screening for Structure Variation Using 3D Zernike Descriptors. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.22.581511. [PMID: 38464318 PMCID: PMC10925112 DOI: 10.1101/2024.02.22.581511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Structure-based virtual screening (SBVS) is a widely used method in silico drug discovery. It necessitates a receptor structure or binding site to predict the binding pose and fitness of a ligand. Therefore, the performance of the SBVS is affected by the protein conformation. The most frequently used method in SBVS is the protein-ligand docking program, which utilizes atomic distance-based scoring functions. Hence, they are highly prone to sensitivity towards variation in receptor structure, and it is reported that the conformational change significantly drops the performance of the docking program. To address the problem, we have introduced a novel program of SBVS, named PL-PatchSurfer. This program makes use of molecular surface patches and the Zernike descriptor. The surfaces of the pocket and ligand are segmented into several patches by the program. These patches are then mapped with physico-chemical properties such as shape and electrostatic potential before being converted into the Zernike descriptor, which is rotationally invariant. A complementarity between the protein and the ligand is assessed by comparing the descriptors and geometric distribution of the patches in the molecules. A benchmarking study showed that PL-PatchSurfer2 was able to screen active molecules regardless of the receptor structure change with fast speed. However, the program could not achieve high performance for the targets that the hydrogen bonding feature is important such as nuclear hormone receptors. In this paper, we present the newer version of PL-PatchSurfer, PL-PatchSurfer3, which incorporates two new features: a change in the definition of hydrogen bond complementarity and consideration of visibility that contains curvature information of a patch. Our evaluation demonstrates that the new program outperforms its predecessor and other SBVS methods while retaining its characteristic tolerance to receptor structure changes. Interested individuals can access the program at kiharalab.org/plps3.
Collapse
Affiliation(s)
- Woong-Hee Shin
- Department of Biomedical Informatics, Korea University College of Medicine, Seoul, Republic of Korea
| | - Daisuke Kihara
- Department of Biological Science, Purdue University, West Lafayette, IN, USA
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
- Center for Cancer Research, Purdue University, West Lafayette, IN, USA
| |
Collapse
|
16
|
Wuyun Q, Chen Y, Shen Y, Cao Y, Hu G, Cui W, Gao J, Zheng W. Recent Progress of Protein Tertiary Structure Prediction. Molecules 2024; 29:832. [PMID: 38398585 PMCID: PMC10893003 DOI: 10.3390/molecules29040832] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Revised: 02/06/2024] [Accepted: 02/08/2024] [Indexed: 02/25/2024] Open
Abstract
The prediction of three-dimensional (3D) protein structure from amino acid sequences has stood as a significant challenge in computational and structural bioinformatics for decades. Recently, the widespread integration of artificial intelligence (AI) algorithms has substantially expedited advancements in protein structure prediction, yielding numerous significant milestones. In particular, the end-to-end deep learning method AlphaFold2 has facilitated the rise of structure prediction performance to new heights, regularly competitive with experimental structures in the 14th Critical Assessment of Protein Structure Prediction (CASP14). To provide a comprehensive understanding and guide future research in the field of protein structure prediction for researchers, this review describes various methodologies, assessments, and databases in protein structure prediction, including traditionally used protein structure prediction methods, such as template-based modeling (TBM) and template-free modeling (FM) approaches; recently developed deep learning-based methods, such as contact/distance-guided methods, end-to-end folding methods, and protein language model (PLM)-based methods; multi-domain protein structure prediction methods; the CASP experiments and related assessments; and the recently released AlphaFold Protein Structure Database (AlphaFold DB). We discuss their advantages, disadvantages, and application scopes, aiming to provide researchers with insights through which to understand the limitations, contexts, and effective selections of protein structure prediction methods in protein-related fields.
Collapse
Affiliation(s)
- Qiqige Wuyun
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Yihan Chen
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China;
| | - Yifeng Shen
- Faculty of Environment and Information Studies, Keio University, Fujisawa 252-0882, Kanagawa, Japan;
| | - Yang Cao
- College of Life Sciences, Sichuan University, Chengdu 610065, China
| | - Gang Hu
- NITFID, School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin 300071, China
| | - Wei Cui
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China;
| | - Jianzhao Gao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China;
| | - Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
17
|
Liu Z, Zhang C, Zhang Q, Zhang Y, Yu DJ. TM-search: An Efficient and Effective Tool for Protein Structure Database Search. J Chem Inf Model 2024; 64:1043-1049. [PMID: 38270339 DOI: 10.1021/acs.jcim.3c01455] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2024]
Abstract
The quickly increasing size of the Protein Data Bank is challenging biologists to develop a more scalable protein structure alignment tool for fast structure database search. Although many protein structure search algorithms and programs have been designed and implemented for this purpose, most require a large amount of computational time. We propose a novel protein structure search approach, TM-search, which is based on the pairwise structure alignment program TM-align and a new iterative clustering algorithm. Benchmark tests demonstrate that TM-search is 27 times faster than a TM-align full database search while still being able to identify ∼90% of all high TM-score hits, which is 2-10 times more than other existing programs such as Foldseek, Dali, and PSI-BLAST.
Collapse
Affiliation(s)
- Zi Liu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China
- Computer Department, Jingdezhen Ceramic University, Jingdezhen 333403, China
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw, Ann Arbor, Michigan 48109-2218, United States
| | - Qidi Zhang
- Computer Department, Jingdezhen Ceramic University, Jingdezhen 333403, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw, Ann Arbor, Michigan 48109-2218, United States
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China
| |
Collapse
|
18
|
Schierholz L, Brown CR, Helena-Bueno K, Uversky VN, Hirt RP, Barandun J, Melnikov SV. A Conserved Ribosomal Protein Has Entirely Dissimilar Structures in Different Organisms. Mol Biol Evol 2024; 41:msad254. [PMID: 37987564 PMCID: PMC10764239 DOI: 10.1093/molbev/msad254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 10/23/2023] [Accepted: 11/16/2023] [Indexed: 11/22/2023] Open
Abstract
Ribosomes from different species can markedly differ in their composition by including dozens of ribosomal proteins that are unique to specific lineages but absent in others. However, it remains unknown how ribosomes acquire new proteins throughout evolution. Here, to help answer this question, we describe the evolution of the ribosomal protein msL1/msL2 that was recently found in ribosomes from the parasitic microorganism clade, microsporidia. We show that this protein has a conserved location in the ribosome but entirely dissimilar structures in different organisms: in each of the analyzed species, msL1/msL2 exhibits an altered secondary structure, an inverted orientation of the N-termini and C-termini on the ribosomal binding surface, and a completely transformed 3D fold. We then show that this fold switching is likely caused by changes in the ribosomal msL1/msL2-binding site, specifically, by variations in rRNA. These observations allow us to infer an evolutionary scenario in which a small, positively charged, de novo-born unfolded protein was first captured by rRNA to become part of the ribosome and subsequently underwent complete fold switching to optimize its binding to its evolving ribosomal binding site. Overall, our work provides a striking example of how a protein can switch its fold in the context of a complex biological assembly, while retaining its specificity for its molecular partner. This finding will help us better understand the origin and evolution of new protein components of complex molecular assemblies-thereby enhancing our ability to engineer biological molecules, identify protein homologs, and peer into the history of life on Earth.
Collapse
Affiliation(s)
- Léon Schierholz
- Department of Molecular Biology, Laboratory for Molecular Infection Medicine Sweden, Umeå Centre for Microbial Research, Science for Life Laboratory, Umeå University, Umeå 901 87, Sweden
| | - Charlotte R Brown
- Biosciences Institute, Newcastle University School of Medicine, Newcastle upon Tyne NE2 4HH, UK
| | - Karla Helena-Bueno
- Biosciences Institute, Newcastle University School of Medicine, Newcastle upon Tyne NE2 4HH, UK
| | - Vladimir N Uversky
- Department of Molecular Medicine and USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA
| | - Robert P Hirt
- Biosciences Institute, Newcastle University School of Medicine, Newcastle upon Tyne NE2 4HH, UK
| | - Jonas Barandun
- Department of Molecular Biology, Laboratory for Molecular Infection Medicine Sweden, Umeå Centre for Microbial Research, Science for Life Laboratory, Umeå University, Umeå 901 87, Sweden
| | - Sergey V Melnikov
- Biosciences Institute, Newcastle University School of Medicine, Newcastle upon Tyne NE2 4HH, UK
| |
Collapse
|
19
|
Zhang Y, Wang X, Zhang Z, Huang Y, Kihara D. Assessment of Protein-Protein Docking Models Using Deep Learning. Methods Mol Biol 2024; 2780:149-162. [PMID: 38987469 DOI: 10.1007/978-1-0716-3985-6_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/12/2024]
Abstract
Protein-protein interactions are involved in almost all processes in a living cell and determine the biological functions of proteins. To obtain mechanistic understandings of protein-protein interactions, the tertiary structures of protein complexes have been determined by biophysical experimental methods, such as X-ray crystallography and cryogenic electron microscopy. However, as experimental methods are costly in resources, many computational methods have been developed that model protein complex structures. One of the difficulties in computational protein complex modeling (protein docking) is to select the most accurate models among many models that are usually generated by a docking method. This article reviews advances in protein docking model assessment methods, focusing on recent developments that apply deep learning to several network architectures.
Collapse
Affiliation(s)
- Yuanyuan Zhang
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Xiao Wang
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Zicong Zhang
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Yunhan Huang
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN, USA.
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA.
| |
Collapse
|
20
|
Banach M. Structural Outlier Detection and Zernike-Canterakis Moments for Molecular Surface Meshes-Fast Implementation in Python. Molecules 2023; 29:52. [PMID: 38202635 PMCID: PMC10779519 DOI: 10.3390/molecules29010052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 12/06/2023] [Accepted: 12/12/2023] [Indexed: 01/12/2024] Open
Abstract
Object retrieval systems measure the degree of similarity of the shape of 3D models. They search for the elements of the 3D model databases that resemble the query model. In structural bioinformatics, the query model is a protein tertiary/quaternary structure and the objective is to find similarly shaped molecules in the Protein Data Bank. With the ever-growing size of the PDB, a direct atomic coordinate comparison with all its members is impractical. To overcome this problem, the shape of the molecules can be encoded by fixed-length feature vectors. The distance of a protein to the entire PDB can be measured in this low-dimensional domain in linear time. The state-of-the-art approaches utilize Zernike-Canterakis moments for the shape encoding and supply the retrieval process with geometric data of the input structures. The BioZernike descriptors are a standard utility of the PDB since 2020. However, when trying to calculate the ZC moments locally, the issue of the deficiency of libraries readily available for use in custom programs (i.e., without relying on external binaries) is encountered, in particular programs written in Python. Here, a fast and well-documented Python implementation of the Pozo-Koehl algorithm is presented. In contrast to the more popular algorithm by Novotni and Klein, which is based on the voxelized volume, the PK algorithm produces ZC moments directly from the triangular surface meshes of 3D models. In particular, it can accept the molecular surfaces of proteins as its input. In the presented PK-Zernike library, owing to Numba's just-in-time compilation, a mesh with 50,000 facets is processed by a single thread in a second at the moment order 20. Since this is the first time the PK algorithm is used in structural bioinformatics, it is employed in a novel, simple, but efficient protein structure retrieval pipeline. The elimination of the outlying chain fragments via a fast PCA-based subroutine improves the discrimination ability, allowing for this pipeline to achieve an 0.961 area under the ROC curve in the BioZernike validation suite (0.997 for the assemblies). The correlation between the results of the proposed approach and of the 3D Surfer program attains values up to 0.99.
Collapse
Affiliation(s)
- Mateusz Banach
- Department of Bioinformatics and Telemedicine, Faculty of Medicine, Jagiellonian University Medical College, Medyczna 7, 30-688 Kraków, Poland
| |
Collapse
|
21
|
Al-Fatlawi A, Menzel M, Schroeder M. Is Protein BLAST a thing of the past? Nat Commun 2023; 14:8195. [PMID: 38081865 PMCID: PMC10713564 DOI: 10.1038/s41467-023-44082-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Accepted: 11/21/2023] [Indexed: 12/18/2023] Open
Affiliation(s)
- Ali Al-Fatlawi
- Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering, Technische Universität Dresden, Dresden, Germany
- Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI), Dresden, Germany
| | - Martin Menzel
- Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering, Technische Universität Dresden, Dresden, Germany
| | - Michael Schroeder
- Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering, Technische Universität Dresden, Dresden, Germany.
- Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI), Dresden, Germany.
| |
Collapse
|
22
|
Christoffer C, Harini K, Archit G, Kihara D. Assembly of Protein Complexes In and On the Membrane with Predicted Spatial Arrangement Constraints. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.20.563303. [PMID: 37961264 PMCID: PMC10634698 DOI: 10.1101/2023.10.20.563303] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Membrane proteins play crucial roles in various cellular processes, and their interactions with other proteins in and on the membrane are essential for their proper functioning. While an increasing number of structures of more membrane proteins are being determined, the available structure data is still sparse. To gain insights into the mechanisms of membrane protein complexes, computational docking methods are necessary due to the challenge of experimental determination. Here, we introduce Mem-LZerD, a rigid-body membrane docking algorithm designed to take advantage of modern membrane modeling and protein docking techniques to facilitate the docking of membrane protein complexes. Mem-LZerD is based on the LZerD protein docking algorithm, which has been constantly among the top servers in many rounds of CAPRI protein docking assessment. By employing a combination of geometric hashing, newly constrained by the predicted membrane height and tilt angle, and model scoring accounting for the energy of membrane insertion, we demonstrate the capability of Mem-LZerD to model diverse membrane protein-protein complexes. Mem-LZerD successfully performed unbound docking on 13 of 21 (61.9%) transmembrane complexes in an established benchmark, more than shown by previous approaches. It was additionally tested on new datasets of 44 transmembrane complexes and 92 peripheral membrane protein complexes, of which it successfully modeled 35 (79.5%) and 15 (16.3%) complexes respectively. When non-blind orientations of peripheral targets were included, the number of successes increased to 54 (58.7%). We further demonstrate that Mem-LZerD produces complex models which are suitable for molecular dynamics simulation. Mem-LZerD is made available at https://lzerd.kiharalab.org.
Collapse
Affiliation(s)
- Charles Christoffer
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA
| | - Kannan Harini
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
- Department of Biological Sciences, Purdue University, West Lafayette, IN, 47907, USA
| | - Gupta Archit
- Department of Biological Sciences, Purdue University, West Lafayette, IN, 47907, USA
- Department of Genetic Engineering, SRM Institute of Science and Technology, Kattankulathur 603203, India
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA
- Department of Biological Sciences, Purdue University, West Lafayette, IN, 47907, USA
- Purdue University Center for Cancer Research, Purdue University, West Lafayette, IN, 47907, USA
| |
Collapse
|
23
|
Kurgan L, Hu G, Wang K, Ghadermarzi S, Zhao B, Malhis N, Erdős G, Gsponer J, Uversky VN, Dosztányi Z. Tutorial: a guide for the selection of fast and accurate computational tools for the prediction of intrinsic disorder in proteins. Nat Protoc 2023; 18:3157-3172. [PMID: 37740110 DOI: 10.1038/s41596-023-00876-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 06/21/2023] [Indexed: 09/24/2023]
Abstract
Intrinsic disorder is instrumental for a wide range of protein functions, and its analysis, using computational predictions from primary structures, complements secondary and tertiary structure-based approaches. In this Tutorial, we provide an overview and comparison of 23 publicly available computational tools with complementary parameters useful for intrinsic disorder prediction, partly relying on results from the Critical Assessment of protein Intrinsic Disorder prediction experiment. We consider factors such as accuracy, runtime, availability and the need for functional insights. The selected tools are available as web servers and downloadable programs, offer state-of-the-art predictions and can be used in a high-throughput manner. We provide examples and instructions for the selected tools to illustrate practical aspects related to the submission, collection and interpretation of predictions, as well as the timing and their limitations. We highlight two predictors for intrinsically disordered proteins, flDPnn as accurate and fast and IUPred as very fast and moderately accurate, while suggesting ANCHOR2 and MoRFchibi as two of the best-performing predictors for intrinsically disordered region binding. We link these tools to additional resources, including databases of predictions and web servers that integrate multiple predictive methods. Altogether, this Tutorial provides a hands-on guide to comparatively evaluating multiple predictors, submitting and collecting their own predictions, and reading and interpreting results. It is suitable for experimentalists and computational biologists interested in accurately and conveniently identifying intrinsic disorder, facilitating the functional characterization of the rapidly growing collections of protein sequences.
Collapse
Affiliation(s)
- Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA.
| | - Gang Hu
- School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin, China
| | - Kui Wang
- School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin, China
| | - Sina Ghadermarzi
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Nawar Malhis
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| | - Gábor Erdős
- MTA-ELTE Momentum Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, Budapest, Hungary
| | - Jörg Gsponer
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada.
| | - Vladimir N Uversky
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL, USA.
- Byrd Alzheimer's Center and Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL, USA.
| | - Zsuzsanna Dosztányi
- MTA-ELTE Momentum Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, Budapest, Hungary.
| |
Collapse
|
24
|
El Khoury G, Azzam W, Rebehmed J. PyProtif: a PyMol plugin to retrieve and visualize protein motifs for structural studies. Amino Acids 2023; 55:1429-1436. [PMID: 37698713 DOI: 10.1007/s00726-023-03323-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Accepted: 08/24/2023] [Indexed: 09/13/2023]
Abstract
Proteins often possess several motifs and the ones with similar motifs were found to have similar biochemical properties and thus related biological functions. Thereby, multiple databases were developed to store information on such motifs in proteins. For instance, PDBsum stores the results of Promotif's generated structural motifs and Pfam stores pre-computed patterns of functional domains. In addition to the fact that all this stored information is extremely useful, we can further augment its importance if we ought to integrate these motifs into visualization software. In this work, we have developed PyProtif, a plugin for the PyMOL molecular visualization program, which automatically retrieves protein structural and functional motifs from different databases and integrates them in PyMOL for visualization and analyses. Through an expendable menu and a user-friendly interface, the plugin grants the users the ability to study simultaneously multiple proteins and to select and manipulate each motif separately. Thus, this plugin will be of great interest for structural, evolutionary and classification studies of proteins.
Collapse
Affiliation(s)
- Gilbert El Khoury
- Department of Computer Science and Mathematics, Lebanese American University, Beirut, Lebanon
| | - Wael Azzam
- Department of Computer Science and Mathematics, Lebanese American University, Beirut, Lebanon
| | - Joseph Rebehmed
- Department of Computer Science and Mathematics, Lebanese American University, Beirut, Lebanon.
| |
Collapse
|
25
|
Gollapalli P, Rudrappa S, Kumar V, Santosh Kumar HS. Domain Architecture Based Methods for Comparative Functional Genomics Toward Therapeutic Drug Target Discovery. J Mol Evol 2023; 91:598-615. [PMID: 37626222 DOI: 10.1007/s00239-023-10129-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2022] [Accepted: 08/06/2023] [Indexed: 08/27/2023]
Abstract
Genes duplicate, mutate, recombine, fuse or fission to produce new genes, or when genes are formed from de novo, novel functions arise during evolution. Researchers have tried to quantify the causes of these molecular diversification processes to know how these genes increase molecular complexity over a period of time, for instance protein domain organization. In contrast to global sequence similarity, protein domain architectures can capture key structural and functional characteristics, making them better proxies for describing functional equivalence. In Prokaryotes and eukaryotes it has proven that, domain designs are retained over significant evolutionary distances. Protein domain architectures are now being utilized to categorize and distinguish evolutionarily related proteins and find homologs among species that are evolutionarily distant from one another. Additionally, structural information stored in domain structures has accelerated homology identification and sequence search methods. Tools for functional protein annotation have been developed to discover, protein domain content, domain order, domain recurrence, and domain position as all these contribute to the prediction of protein functional accuracy. In this review, an attempt is made to summarise facts and speculations regarding the use of protein domain architecture and modularity to identify possible therapeutic targets among cellular activities based on the understanding their linked biological processes.
Collapse
Affiliation(s)
- Pavan Gollapalli
- Center for Bioinformatics and Biostatistics, Nitte (Deemed to be University), Mangalore, Karnataka, 575018, India
| | - Sushmitha Rudrappa
- Department of Biotechnology and Bioinformatics, Jnana Sahyadri Campus, Kuvempu University, Shankaraghatta, Shivamogga, Karnataka, 577451, India
| | - Vadlapudi Kumar
- Department of Biochemistry, Davangere University, Shivagangothri, Davangere, Karnataka, 577007, India
| | - Hulikal Shivashankara Santosh Kumar
- Department of Biotechnology and Bioinformatics, Jnana Sahyadri Campus, Kuvempu University, Shankaraghatta, Shivamogga, Karnataka, 577451, India.
| |
Collapse
|
26
|
Varadi M, Velankar S. The impact of AlphaFold Protein Structure Database on the fields of life sciences. Proteomics 2023; 23:e2200128. [PMID: 36382391 DOI: 10.1002/pmic.202200128] [Citation(s) in RCA: 48] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 11/07/2022] [Accepted: 11/08/2022] [Indexed: 09/06/2023]
Abstract
Arguably, 2020 was the year of high-accuracy protein structure predictions, with AlphaFold 2.0 achieving previously unseen accuracy in the Critical Assessment of Protein Structure Prediction (CASP). In 2021, DeepMind and EMBL-EBI developed the AlphaFold Protein Structure Database to make an unprecedented number of reliable protein structure predictions easily accessible to the broad scientific community. We provide a brief overview and describe the latest developments in the AlphaFold database. We highlight how the fields of data services, bioinformatics, structural biology, and drug discovery are directly affected by the influx of protein structure data. We also show examples of cutting-edge research that took advantage of the AlphaFold database. It is apparent that connections between various fields through protein structures are now possible, but the amount of data poses new challenges. Finally, we give an outlook regarding the future direction of the database, both in terms of data sets and new functionalities.
Collapse
Affiliation(s)
- Mihaly Varadi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Sameer Velankar
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| |
Collapse
|
27
|
Bittrich S, Bhikadiya C, Bi C, Chao H, Duarte JM, Dutta S, Fayazi M, Henry J, Khokhriakov I, Lowe R, Piehl DW, Segura J, Vallat B, Voigt M, Westbrook JD, Burley SK, Rose Y. RCSB Protein Data Bank: Efficient Searching and Simultaneous Access to One Million Computed Structure Models Alongside the PDB Structures Enabled by Architectural Advances. J Mol Biol 2023; 435:167994. [PMID: 36738985 PMCID: PMC11514064 DOI: 10.1016/j.jmb.2023.167994] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Revised: 01/27/2023] [Accepted: 01/28/2023] [Indexed: 02/05/2023]
Abstract
The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) provides open access to experimentally-determined three-dimensional (3D) structures of biomolecules. The RCSB PDB RCSB.org research-focused web portal is used annually by many millions of users around the world. They access biostructure information, run complex queries utilizing various search services (e.g., full-text, structural and chemical attribute, chemical, sequence, and structure similarity searches), and visualize macromolecules in 3D, all at no charge and with no limitations on data usage. Notwithstanding more than 24,000-fold growth of the PDB over the past five decades, experimentally-determined structures are only available for a small subset of the millions of proteins of known sequence. Recently developed machine learning software tools can predict 3D structures of proteins at accuracies comparable to lower-resolution experimental methods. The RCSB PDB now provides access to ∼1,000,000 Computed Structure Models (CSMs) of proteins coming from AlphaFold DB and the ModelArchive alongside ∼200,000 experimentally-determined PDB structures. Both CSMs and PDB structures are available on RCSB.org and via well-established RCSB PDB Data, Search, and 1D-Coordinates application programming interfaces (APIs). Simultaneous delivery of PDB data and CSMs provides users with access to complementary structural information across the human proteome and those of model organisms and selected pathogens. API enhancements are backwards-compatible and programmatic users can "opt in" to access CSMs with minimal effort. Herein, we describe modifications to RCSB PDB cyberinfrastructure required to support sixfold scaling of 3D biostructure data delivery and lay the groundwork for scaling to accommodate hundreds of millions of CSMs.
Collapse
Affiliation(s)
- Sebastian Bittrich
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA.
| | - Charmi Bhikadiya
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA
| | - Chunxiao Bi
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA
| | - Henry Chao
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Jose M Duarte
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA
| | - Shuchismita Dutta
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, NJ 08901, USA
| | - Maryam Fayazi
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Jeremy Henry
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA
| | - Igor Khokhriakov
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA
| | - Robert Lowe
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Dennis W Piehl
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Joan Segura
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA
| | - Brinda Vallat
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, NJ 08901, USA
| | - Maria Voigt
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - John D Westbrook
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, NJ 08901, USA
| | - Stephen K Burley
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA; Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, NJ 08901, USA; Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Yana Rose
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA
| |
Collapse
|
28
|
Ma Z, Sun Z, Lv X, Chen H, Geng Y, Geng Z. Sensitivity-enhanced nanoplasmonic biosensor using direct immobilization of two engineered nanobodies for SARS-CoV-2 spike receptor-binding domain detection. SENSORS AND ACTUATORS. B, CHEMICAL 2023; 383:133575. [PMID: 36873859 PMCID: PMC9957344 DOI: 10.1016/j.snb.2023.133575] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Revised: 02/19/2023] [Accepted: 02/23/2023] [Indexed: 06/18/2023]
Abstract
Sensitive, rapid, and easy-to-implement biosensors are critical in responding to highly contagious and fast-spreading severe acute respiratory syndrome coronavirus (SARS-CoV-2) mutations, enabling early infection screening for appropriate isolation and treatment measures to prevent the spread of the virus. Based on the sensing principle of localized surface plasmon resonance (LSPR) and nanobody immunological techniques, an enhanced sensitivity nanoplasmonic biosensor was developed to quantify the SARS-CoV-2 spike receptor-binding domain (RBD) in serum within 30 min. The lowest concentration in the linear range can be detected down to 0.01 ng/mL by direct immobilization of two engineered nanobodies. Both the sensor fabrication process and immune strategy are facile and inexpensive, with the potential for large-scale application. The designed nanoplasmonic biosensor achieved excellent specificity and sensitivity for SARS-CoV-2 spike RBD, providing a potential option for accurate early screening of the novel coronavirus disease 2019 (COVID-19).
Collapse
Affiliation(s)
- Zhengtai Ma
- State Key Laboratory for Integrated Optoelectronics, Institute of Semiconductors, Chinese Academy of Sciences, Beijing, China
- College of Materials Science and Opto-Electronic Technology, University of Chinese Academy of Sciences, Beijing, China
| | - Zengchao Sun
- The Chinese Academy of Sciences Key Laboratory of Receptor Research, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
| | - Xiaoqing Lv
- State Key Laboratory for Integrated Optoelectronics, Institute of Semiconductors, Chinese Academy of Sciences, Beijing, China
| | - Hongda Chen
- State Key Laboratory for Integrated Optoelectronics, Institute of Semiconductors, Chinese Academy of Sciences, Beijing, China
| | - Yong Geng
- The Chinese Academy of Sciences Key Laboratory of Receptor Research, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
| | - Zhaoxin Geng
- School of Information Engineering, Minzu University of China, Beijing, China
| |
Collapse
|
29
|
VanderWal AR, Park JU, Polevoda B, Nicosia JK, Vargas AMM, Kellogg EH, O’Connell MR. Csx28 is a membrane pore that enhances CRISPR-Cas13b-dependent antiphage defense. Science 2023; 380:410-415. [PMID: 37104586 PMCID: PMC10228660 DOI: 10.1126/science.abm1184] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Accepted: 03/27/2023] [Indexed: 04/29/2023]
Abstract
Type VI CRISPR-Cas systems use RNA-guided ribonuclease (RNase) Cas13 to defend bacteria against viruses, and some of these systems encode putative membrane proteins that have unclear roles in Cas13-mediated defense. We show that Csx28, of type VI-B2 systems, is a transmembrane protein that assists to slow cellular metabolism upon viral infection, increasing antiviral defense. High-resolution cryo-electron microscopy reveals that Csx28 forms an octameric pore-like structure. These Csx28 pores localize to the inner membrane in vivo. Csx28's antiviral activity in vivo requires sequence-specific cleavage of viral messenger RNAs by Cas13b, which subsequently results in membrane depolarization, slowed metabolism, and inhibition of sustained viral infection. Our work suggests a mechanism by which Csx28 acts as a downstream, Cas13b-dependent effector protein that uses membrane perturbation as an antiviral defense strategy.
Collapse
Affiliation(s)
- Arica R. VanderWal
- Department of Biochemistry and Biophysics, School of Medicine and Dentistry, University of Rochester; Rochester, USA
- Center for RNA Biology, University of Rochester; Rochester, USA
| | - Jung-Un Park
- Department of Molecular Biology and Genetics, Cornell University; Ithaca, USA
| | - Bogdan Polevoda
- Department of Biochemistry and Biophysics, School of Medicine and Dentistry, University of Rochester; Rochester, USA
- Center for RNA Biology, University of Rochester; Rochester, USA
| | - Julia K. Nicosia
- Department of Biochemistry and Biophysics, School of Medicine and Dentistry, University of Rochester; Rochester, USA
- Center for RNA Biology, University of Rochester; Rochester, USA
| | - Adrian M. Molina Vargas
- Department of Biochemistry and Biophysics, School of Medicine and Dentistry, University of Rochester; Rochester, USA
- Center for RNA Biology, University of Rochester; Rochester, USA
| | | | - Mitchell R. O’Connell
- Department of Biochemistry and Biophysics, School of Medicine and Dentistry, University of Rochester; Rochester, USA
- Center for RNA Biology, University of Rochester; Rochester, USA
| |
Collapse
|
30
|
Bruley A, Bitard-Feildel T, Callebaut I, Duprat E. A sequence-based foldability score combined with AlphaFold2 predictions to disentangle the protein order/disorder continuum. Proteins 2023; 91:466-484. [PMID: 36306150 DOI: 10.1002/prot.26441] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 10/14/2022] [Accepted: 10/18/2022] [Indexed: 11/11/2022]
Abstract
Order and disorder govern protein functions, but there is a great diversity in disorder, from regions that are-and stay-fully disordered to conditional order. This diversity is still difficult to decipher even though it is encoded in the amino acid sequences. Here, we developed an analytic Python package, named pyHCA, to estimate the foldability of a protein segment from the only information of its amino acid sequence and based on a measure of its density in regular secondary structures associated with hydrophobic clusters, as defined by the hydrophobic cluster analysis (HCA) approach. The tool was designed by optimizing the separation between foldable segments from databases of disorder (DisProt) and order (SCOPe [soluble domains] and OPM [transmembrane domains]). It allows to specify the ratio between order, embodied by regular secondary structures (either participating in the hydrophobic core of well-folded 3D structures or conditionally formed in intrinsically disordered regions) and disorder. We illustrated the relevance of pyHCA with several examples and applied it to the sequences of the proteomes of 21 species ranging from prokaryotes and archaea to unicellular and multicellular eukaryotes, for which structure models are provided in the AlphaFold protein structure database. Cases of low-confidence scores related to disorder were distinguished from those of sequences that we identified as foldable but are still excluded from accurate modeling by AlphaFold2 due to a lack of sequence homologs or to compositional biases. Overall, our approach is complementary to AlphaFold2, providing guides to map structural innovations through evolutionary processes, at proteome and gene scales.
Collapse
Affiliation(s)
- Apolline Bruley
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| | - Tristan Bitard-Feildel
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| | - Isabelle Callebaut
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| | - Elodie Duprat
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| |
Collapse
|
31
|
Bordin N, Dallago C, Heinzinger M, Kim S, Littmann M, Rauer C, Steinegger M, Rost B, Orengo C. Novel machine learning approaches revolutionize protein knowledge. Trends Biochem Sci 2023; 48:345-359. [PMID: 36504138 PMCID: PMC10570143 DOI: 10.1016/j.tibs.2022.11.001] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 10/24/2022] [Accepted: 11/17/2022] [Indexed: 12/10/2022]
Abstract
Breakthrough methods in machine learning (ML), protein structure prediction, and novel ultrafast structural aligners are revolutionizing structural biology. Obtaining accurate models of proteins and annotating their functions on a large scale is no longer limited by time and resources. The most recent method to be top ranked by the Critical Assessment of Structure Prediction (CASP) assessment, AlphaFold 2 (AF2), is capable of building structural models with an accuracy comparable to that of experimental structures. Annotations of 3D models are keeping pace with the deposition of the structures due to advancements in protein language models (pLMs) and structural aligners that help validate these transferred annotations. In this review we describe how recent developments in ML for protein science are making large-scale structural bioinformatics available to the general scientific community.
Collapse
Affiliation(s)
- Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, Gower St, WC1E 6BT London, UK
| | - Christian Dallago
- Technical University of Munich (TUM) Department of Informatics, Bioinformatics and Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany; VantAI, 151 W 42nd Street, New York, NY 10036, USA
| | - Michael Heinzinger
- Technical University of Munich (TUM) Department of Informatics, Bioinformatics and Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany; TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748 Garching, Germany
| | - Stephanie Kim
- School of Biological Sciences, Seoul National University, Seoul, South Korea; Artificial Intelligence Institute, Seoul National University, Seoul, South Korea
| | - Maria Littmann
- Technical University of Munich (TUM) Department of Informatics, Bioinformatics and Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany
| | - Clemens Rauer
- Institute of Structural and Molecular Biology, University College London, Gower St, WC1E 6BT London, UK
| | - Martin Steinegger
- School of Biological Sciences, Seoul National University, Seoul, South Korea; Artificial Intelligence Institute, Seoul National University, Seoul, South Korea
| | - Burkhard Rost
- Technical University of Munich (TUM) Department of Informatics, Bioinformatics and Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany; Institute for Advanced Study (TUM-IAS), Lichtenbergstr. 2a, 85748 Garching/Munich, Germany; TUM School of Life Sciences Weihenstephan (TUM-WZW), Alte Akademie 8, Freising, Germany
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, Gower St, WC1E 6BT London, UK.
| |
Collapse
|
32
|
Aubel M, Eicholt L, Bornberg-Bauer E. Assessing structure and disorder prediction tools for de novo emerged proteins in the age of machine learning. F1000Res 2023; 12:347. [PMID: 37113259 PMCID: PMC10126731 DOI: 10.12688/f1000research.130443.1] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/17/2023] [Indexed: 03/31/2023] Open
Abstract
Background: De novo protein coding genes emerge from scratch in the non-coding regions of the genome and have, per definition, no homology to other genes. Therefore, their encoded de novo proteins belong to the so-called "dark protein space". So far, only four de novo protein structures have been experimentally approximated. Low homology, presumed high disorder and limited structures result in low confidence structural predictions for de novo proteins in most cases. Here, we look at the most widely used structure and disorder predictors and assess their applicability for de novo emerged proteins. Since AlphaFold2 is based on the generation of multiple sequence alignments and was trained on solved structures of largely conserved and globular proteins, its performance on de novo proteins remains unknown. More recently, natural language models of proteins have been used for alignment-free structure predictions, potentially making them more suitable for de novo proteins than AlphaFold2. Methods: We applied different disorder predictors (IUPred3 short/long, flDPnn) and structure predictors, AlphaFold2 on the one hand and language-based models (Omegafold, ESMfold, RGN2) on the other hand, to four de novo proteins with experimental evidence on structure. We compared the resulting predictions between the different predictors as well as to the existing experimental evidence. Results: Results from IUPred, the most widely used disorder predictor, depend heavily on the choice of parameters and differ significantly from flDPnn which has been found to outperform most other predictors in a comparative assessment study recently. Similarly, different structure predictors yielded varying results and confidence scores for de novo proteins. Conclusions: We suggest that, while in some cases protein language model based approaches might be more accurate than AlphaFold2, the structure prediction of de novo emerged proteins remains a difficult task for any predictor, be it disorder or structure.
Collapse
Affiliation(s)
- Margaux Aubel
- Institute for Evolution and Bidiversity, University of Muenster, Muenster, 48149, Germany
| | - Lars Eicholt
- Institute for Evolution and Bidiversity, University of Muenster, Muenster, 48149, Germany
| | - Erich Bornberg-Bauer
- Institute for Evolution and Bidiversity, University of Muenster, Muenster, 48149, Germany
- Department Protein Evolution, Max Planck-Institute for Biology, Tuebingen, 72076, Germany
| |
Collapse
|
33
|
Shin WH, Kumazawa K, Imai K, Hirokawa T, Kihara D. Quantitative comparison of protein-protein interaction interface using physicochemical feature-based descriptors of surface patches. Front Mol Biosci 2023; 10:1110567. [PMID: 36814641 PMCID: PMC9939524 DOI: 10.3389/fmolb.2023.1110567] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Accepted: 01/24/2023] [Indexed: 02/09/2023] Open
Abstract
Driving mechanisms of many biological functions in a cell include physical interactions of proteins. As protein-protein interactions (PPIs) are also important in disease development, protein-protein interactions are highlighted in the pharmaceutical industry as possible therapeutic targets in recent years. To understand the variety of protein-protein interactions in a proteome, it is essential to establish a method that can identify similarity and dissimilarity between protein-protein interactions for inferring the binding of similar molecules, including drugs and other proteins. In this study, we developed a novel method, protein-protein interaction-Surfer, which compares and quantifies similarity of local surface regions of protein-protein interactions. protein-protein interaction-Surfer represents a protein-protein interaction surface with overlapping surface patches, each of which is described with a three-dimensional Zernike descriptor (3DZD), a compact mathematical representation of 3D function. 3DZD captures both the 3D shape and physicochemical properties of the protein surface. The performance of protein-protein interaction-Surfer was benchmarked on datasets of protein-protein interactions, where we were able to show that protein-protein interaction-Surfer finds similar potential drug binding regions that do not share sequence and structure similarity. protein-protein interaction-Surfer is available at https://kiharalab.org/ppi-surfer.
Collapse
Affiliation(s)
- Woong-Hee Shin
- Department of Chemistry Education, Sunchon National University, Suncheon, South Korea,Department of Advanced Components and Materials Engineering, Sunchon National University, Suncheon, South Korea
| | - Keiko Kumazawa
- Pharmaceutical Discovery Research Laboratories, Teijin Pharma Limited, Tokyo, Japan
| | - Kenichiro Imai
- Cellular and Molecular Biotechnology Research Institute, National Institute of Advanced Industrial Science and Technology, Tokyo, Japan
| | - Takatsugu Hirokawa
- Division of Biomedical Science, Faculty of Medicine, University of Tsukuba, Tsukuba, Japan,Transborder Medical Research Center, University of Tsukuba, Tsukuba, Japan
| | - Daisuke Kihara
- Department of Biological Sciences, Purdue University, West Lafayette, IN, United States,Department of Computer Science, Purdue University, West Lafayette, IN, United States,Center for Cancer Research, Purdue University, West Lafayette, IN, United States,*Correspondence: Daisuke Kihara,
| |
Collapse
|
34
|
Ponlachantra K, Suginta W, Robinson RC, Kitaoku Y. AlphaFold2: A versatile tool to predict the appearance of functional adaptations in evolution: Profilin interactions in uncultured Asgard archaea: Profilin interactions in uncultured Asgard archaea. Bioessays 2023; 45:e2200119. [PMID: 36461738 DOI: 10.1002/bies.202200119] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Revised: 11/07/2022] [Accepted: 11/09/2022] [Indexed: 12/05/2022]
Abstract
The release of AlphaFold2 (AF2), a deep-learning-aided, open-source protein structure prediction program, from DeepMind, opened a new era of molecular biology. The astonishing improvement in the accuracy of the structure predictions provides the opportunity to characterize protein systems from uncultured Asgard archaea, key organisms in evolutionary biology. Despite the accumulation in metagenomics-derived Asgard archaea eukaryotic-like protein sequences, limited structural and biochemical information have restricted the insight in their potential functions. In this review, we focus on profilin, an actin-dynamics regulating protein, which in eukaryotes, modulates actin polymerization through (1) direct actin interaction, (2) polyproline binding, and (3) phospholipid binding. We assess AF2-predicted profilin structures in their potential abilities to participate in these activities. We demonstrate that AF2 is a powerful new tool for understanding the emergence of biological functional traits in evolution.
Collapse
Affiliation(s)
- Khongpon Ponlachantra
- School of Biomolecular Science and Engineering (BSE), Vidyasirimedhi Institute of Science and Technology (VISTEC), Rayong, Thailand
| | - Wipa Suginta
- School of Biomolecular Science and Engineering (BSE), Vidyasirimedhi Institute of Science and Technology (VISTEC), Rayong, Thailand
| | - Robert C Robinson
- School of Biomolecular Science and Engineering (BSE), Vidyasirimedhi Institute of Science and Technology (VISTEC), Rayong, Thailand.,Research Institute for Interdisciplinary Science (RIIS), Okayama University, Okayama, Japan
| | - Yoshihito Kitaoku
- Research Institute for Interdisciplinary Science (RIIS), Okayama University, Okayama, Japan
| |
Collapse
|
35
|
Holm L, Laiho A, Törönen P, Salgado M. DALI shines a light on remote homologs: One hundred discoveries. Protein Sci 2023; 32:e4519. [PMID: 36419248 PMCID: PMC9793968 DOI: 10.1002/pro.4519] [Citation(s) in RCA: 305] [Impact Index Per Article: 152.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Revised: 11/15/2022] [Accepted: 11/20/2022] [Indexed: 11/25/2022]
Abstract
Structural comparison reveals remote homology that often fails to be detected by sequence comparison. The DALI web server (http://ekhidna2.biocenter.helsinki.fi/dali) is a platform for structural analysis that provides database searches and interactive visualization, including structural alignments annotated with secondary structure, protein families and sequence logos, and 3D structure superimposition supported by color-coded sequence and structure conservation. Here, we are using DALI to mine the AlphaFold Database version 1, which increased the structural coverage of protein families by 20%. We found 100 remote homologous relationships hitherto unreported in the current reference database for protein domains, Pfam 35.0. In particular, we linked 35 domains of unknown function (DUFs) to the previously characterized families, generating a functional hypothesis that can be explored downstream in structural biology studies. Other findings include gene fusions, tandem duplications, and adjustments to domain boundaries. The evidence for homology can be browsed interactively through live examples on DALI's website.
Collapse
Affiliation(s)
- Liisa Holm
- Organismal and Evolutionary Biology Research Program, Faculty of Biological and Environmental Sciences & Institute of Biotechnology, Helsinki Institute of Life SciencesUniversity of HelsinkiHelsinkiFinland
| | - Aleksi Laiho
- Organismal and Evolutionary Biology Research Program, Faculty of Biological and Environmental Sciences & Institute of Biotechnology, Helsinki Institute of Life SciencesUniversity of HelsinkiHelsinkiFinland
| | - Petri Törönen
- Organismal and Evolutionary Biology Research Program, Faculty of Biological and Environmental Sciences & Institute of Biotechnology, Helsinki Institute of Life SciencesUniversity of HelsinkiHelsinkiFinland
| | - Marco Salgado
- Organismal and Evolutionary Biology Research Program, Faculty of Biological and Environmental Sciences & Institute of Biotechnology, Helsinki Institute of Life SciencesUniversity of HelsinkiHelsinkiFinland
| |
Collapse
|
36
|
Terashi G, Wang X, Kihara D. Protein model refinement for cryo-EM maps using AlphaFold2 and the DAQ score. Acta Crystallogr D Struct Biol 2023; 79:10-21. [PMID: 36601803 PMCID: PMC9815095 DOI: 10.1107/s2059798322011676] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Accepted: 12/06/2022] [Indexed: 12/24/2022] Open
Abstract
As more protein structure models have been determined from cryogenic electron microscopy (cryo-EM) density maps, establishing how to evaluate the model accuracy and how to correct models in cases where they contain errors is becoming crucial to ensure the quality of the structural models deposited in the public database, the PDB. Here, a new protocol is presented for evaluating a protein model built from a cryo-EM map and applying local structure refinement in the case where the model has potential errors. Firstly, model evaluation is performed using a deep-learning-based model-local map assessment score, DAQ, that has recently been developed. The subsequent local refinement is performed by a modified AlphaFold2 procedure, in which a trimmed template model and a trimmed multiple sequence alignment are provided as input to control which structure regions to refine while leaving other more confident regions of the model intact. A benchmark study showed that this protocol, DAQ-refine, consistently improves low-quality regions of the initial models. Among 18 refined models generated for an initial structure, DAQ shows a high correlation with model quality and can identify the best accurate model for most of the tested cases. The improvements obtained by DAQ-refine were on average larger than other existing methods.
Collapse
Affiliation(s)
- Genki Terashi
- Department of Biological Sciences, Purdue University, West Lafayette, IN 47907, USA
| | - Xiao Wang
- Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA
| | - Daisuke Kihara
- Department of Biological Sciences, Purdue University, West Lafayette, IN 47907, USA
- Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA
| |
Collapse
|
37
|
Nussinov R, Zhang M, Liu Y, Jang H. AlphaFold, Artificial Intelligence (AI), and Allostery. J Phys Chem B 2022; 126:6372-6383. [PMID: 35976160 PMCID: PMC9442638 DOI: 10.1021/acs.jpcb.2c04346] [Citation(s) in RCA: 55] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 08/03/2022] [Indexed: 02/08/2023]
Abstract
AlphaFold has burst into our lives. A powerful algorithm that underscores the strength of biological sequence data and artificial intelligence (AI). AlphaFold has appended projects and research directions. The database it has been creating promises an untold number of applications with vast potential impacts that are still difficult to surmise. AI approaches can revolutionize personalized treatments and usher in better-informed clinical trials. They promise to make giant leaps toward reshaping and revamping drug discovery strategies, selecting and prioritizing combinations of drug targets. Here, we briefly overview AI in structural biology, including in molecular dynamics simulations and prediction of microbiota-human protein-protein interactions. We highlight the advancements accomplished by the deep-learning-powered AlphaFold in protein structure prediction and their powerful impact on the life sciences. At the same time, AlphaFold does not resolve the decades-long protein folding challenge, nor does it identify the folding pathways. The models that AlphaFold provides do not capture conformational mechanisms like frustration and allostery, which are rooted in ensembles, and controlled by their dynamic distributions. Allostery and signaling are properties of populations. AlphaFold also does not generate ensembles of intrinsically disordered proteins and regions, instead describing them by their low structural probabilities. Since AlphaFold generates single ranked structures, rather than conformational ensembles, it cannot elucidate the mechanisms of allosteric activating driver hotspot mutations nor of allosteric drug resistance. However, by capturing key features, deep learning techniques can use the single predicted conformation as the basis for generating a diverse ensemble.
Collapse
Affiliation(s)
- Ruth Nussinov
- Computational
Structural Biology Section, Frederick National
Laboratory for Cancer Research, Frederick, Maryland 21702, United States
- Department
of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Mingzhen Zhang
- Computational
Structural Biology Section, Frederick National
Laboratory for Cancer Research, Frederick, Maryland 21702, United States
| | - Yonglan Liu
- Cancer
Innovation Laboratory, National Cancer Institute, Frederick, Maryland 21702, United States
| | - Hyunbum Jang
- Computational
Structural Biology Section, Frederick National
Laboratory for Cancer Research, Frederick, Maryland 21702, United States
| |
Collapse
|
38
|
Alnabati E, Esquivel-Rodriguez J, Terashi G, Kihara D. MarkovFit: Structure Fitting for Protein Complexes in Electron Microscopy Maps Using Markov Random Field. Front Mol Biosci 2022; 9:935411. [PMID: 35959463 PMCID: PMC9358042 DOI: 10.3389/fmolb.2022.935411] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Accepted: 06/13/2022] [Indexed: 11/13/2022] Open
Abstract
An increasing number of protein complex structures are determined by cryo-electron microscopy (cryo-EM). When individual protein structures have been determined and are available, an important task in structure modeling is to fit the individual structures into the density map. Here, we designed a method that fits the atomic structures of proteins in cryo-EM maps of medium to low resolutions using Markov random fields, which allows probabilistic evaluation of fitted models. The accuracy of our method, MarkovFit, performed better than existing methods on datasets of 31 simulated cryo-EM maps of resolution 10 Å , nine experimentally determined cryo-EM maps of resolution less than 4 Å , and 28 experimentally determined cryo-EM maps of resolution 6 to 20 Å .
Collapse
Affiliation(s)
- Eman Alnabati
- Department of Computer Science, Purdue University, West Lafayette, IN, United States
| | | | - Genki Terashi
- Department of Biological Sciences, Purdue University, West Lafayette, IN, United States
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN, United States
- Department of Biological Sciences, Purdue University, West Lafayette, IN, United States
| |
Collapse
|
39
|
Kagaya Y, Flannery ST, Jain A, Kihara D. ContactPFP: Protein Function Prediction Using Predicted Contact Information. FRONTIERS IN BIOINFORMATICS 2022; 2. [PMID: 35875419 PMCID: PMC9302406 DOI: 10.3389/fbinf.2022.896295] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Computational function prediction is one of the most important problems in bioinformatics as elucidating the function of genes is a central task in molecular biology and genomics. Most of the existing function prediction methods use protein sequences as the primary source of input information because the sequence is the most available information for query proteins. There are attempts to consider other attributes of query proteins. Among these attributes, the three-dimensional (3D) structure of proteins is known to be very useful in identifying the evolutionary relationship of proteins, from which functional similarity can be inferred. Here, we report a novel protein function prediction method, ContactPFP, which uses predicted residue-residue contact maps as input structural features of query proteins. Although 3D structure information is known to be useful, it has not been routinely used in function prediction because the 3D structure is not experimentally determined for many proteins. In ContactPFP, we overcome this limitation by using residue-residue contact prediction, which has become increasingly accurate due to rapid development in the protein structure prediction field. ContactPFP takes a query protein sequence as input and uses predicted residue-residue contact as a proxy for the 3D protein structure. To characterize how predicted contacts contribute to function prediction accuracy, we compared the performance of ContactPFP with several well-established sequence-based function prediction methods. The comparative study revealed the advantages and weaknesses of ContactPFP compared to contemporary sequence-based methods. There were many cases where it showed higher prediction accuracy. We examined factors that affected the accuracy of ContactPFP using several illustrative cases that highlight the strength of our method.
Collapse
Affiliation(s)
- Yuki Kagaya
- Department of Biological Sciences, Purdue University, West Lafayette, IN, United States
| | - Sean T. Flannery
- Department of Computer Science, Purdue University, West Lafayette, IN, United States
| | - Aashish Jain
- Department of Computer Science, Purdue University, West Lafayette, IN, United States
| | - Daisuke Kihara
- Department of Biological Sciences, Purdue University, West Lafayette, IN, United States
- Department of Computer Science, Purdue University, West Lafayette, IN, United States
- *Correspondence: Daisuke Kihara,
| |
Collapse
|