1
|
Liu J, Neupane P, Cheng J. Improving AlphaFold2- and AlphaFold3-Based Protein Complex Structure Prediction With MULTICOM4 in CASP16. Proteins 2025. [PMID: 40452318 DOI: 10.1002/prot.26850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2025] [Revised: 05/06/2025] [Accepted: 05/23/2025] [Indexed: 06/11/2025]
Abstract
With AlphaFold achieving high-accuracy tertiary structure prediction for most single-chain proteins (monomers), the next major challenge in protein structure prediction is to accurately model multichain protein complexes (multimers). We developed MULTICOM4, the latest version of the MULTICOM system, to improve protein complex structure prediction by integrating transformer-based AlphaFold2, diffusion model-based AlphaFold3, and our in-house techniques. These include protein complex stoichiometry prediction, diverse multiple sequence alignment (MSA) generation leveraging both sequence and structure comparison, modeling exception handling, and deep learning-based protein model quality assessment. MULTICOM4 was blindly evaluated in the 16th Critical Assessment of Techniques for Protein Structure Prediction (CASP16) in 2024. In Phase 0 of CASP16, where stoichiometry information was unavailable, MULTICOM predictors performed best, with MULTICOM_human achieving a TM-score of 0.752 and a DockQ score of 0.584 for top-ranked predictions on average. In Phase 1 of CASP16, with stoichiometry information provided, MULTICOM_human remained among the top predictors, attaining a TM-score of 0.797 and a DockQ score of 0.558 on average. The CASP16 results demonstrate that integrating complementary AlphaFold2 and AlphaFold3 with enhanced MSA inputs, comprehensive model ranking, exception handling, and accurate stoichiometry prediction can effectively improve protein complex structure prediction.
Collapse
Affiliation(s)
- Jian Liu
- Department of Electrical Engineering & Computer Science, NextGen Precision Health, University of Missouri, Columbia, Missouri, USA
| | - Pawan Neupane
- Department of Electrical Engineering & Computer Science, NextGen Precision Health, University of Missouri, Columbia, Missouri, USA
| | - Jianlin Cheng
- Department of Electrical Engineering & Computer Science, NextGen Precision Health, University of Missouri, Columbia, Missouri, USA
| |
Collapse
|
2
|
Ma X, Zhu K, Wang K, Liao W, Yang X, Yu W, Wang W, Han F. Improvement of Catalytic Activity and Thermostability of Alginate Lyase VxAly7B-CM via Rational Computational Design Strategies. Mar Drugs 2025; 23:198. [PMID: 40422788 DOI: 10.3390/md23050198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2025] [Revised: 04/26/2025] [Accepted: 04/26/2025] [Indexed: 05/28/2025] Open
Abstract
Alginate lyase degrades alginate through the β-elimination mechanism to produce alginate oligosaccharides (AOS) with notable biochemical properties and diverse biological activities. However, its poor thermostability limits large-scale industrial production. In this study, we employed a rational computational design strategy combining computer-aided evolutionary coupling analysis and ΔΔGfold evaluation to enhance both the thermostability and catalytic activity of the alginate lyase VxAly7B-CM. Among ten single-point mutants, the E188N and S204G mutants exhibited increases in Tm from 47.0 °C to 48.9 °C and 50.2 °C, respectively, with specific activities of 3701.02 U/mg and 2812.01 U/mg at 45 °C. Notably, the combinatorial mutant E188N/S204G demonstrated a ΔTm of 5 °C and an optimal reaction temperature up to 50 °C, where its specific activity reached 3823.80 U/mg-a 31% increase. Moreover, its half-life at 50 °C was 38.4 h, which is 7.0 times that of the wild-type enzyme. Protein structural analysis and molecular dynamics simulations suggested that the enhanced catalytic performance and thermostability of the E188N/S204G mutant may be attributed to optimized surface charge distribution, strengthened hydrophobic interactions, and increased tertiary structure stability. Overall, our findings provided valuable insights into enzyme stabilization strategies and supported the industrial production of functional AOS.
Collapse
Affiliation(s)
- Xin Ma
- School of Medicine and Pharmacy, Ocean University of China, Qingdao 266003, China
- Laboratory for Marine Drugs and Bioproducts, Qingdao Marine Science and Technology Center, Qingdao 266237, China
- Key Laboratory of Marine Drugs, Ministry of Education, Qingdao 266003, China
- Shandong Key Laboratory of Glycoscience and Glycotherapeutics, Qingdao 266003, China
| | - Ke Zhu
- School of Medicine and Pharmacy, Ocean University of China, Qingdao 266003, China
- Laboratory for Marine Drugs and Bioproducts, Qingdao Marine Science and Technology Center, Qingdao 266237, China
- Key Laboratory of Marine Drugs, Ministry of Education, Qingdao 266003, China
- Shandong Key Laboratory of Glycoscience and Glycotherapeutics, Qingdao 266003, China
| | - Kaiyang Wang
- School of Medicine and Pharmacy, Ocean University of China, Qingdao 266003, China
- Laboratory for Marine Drugs and Bioproducts, Qingdao Marine Science and Technology Center, Qingdao 266237, China
- Key Laboratory of Marine Drugs, Ministry of Education, Qingdao 266003, China
- Shandong Key Laboratory of Glycoscience and Glycotherapeutics, Qingdao 266003, China
| | - Wenhui Liao
- School of Medicine and Pharmacy, Ocean University of China, Qingdao 266003, China
- Laboratory for Marine Drugs and Bioproducts, Qingdao Marine Science and Technology Center, Qingdao 266237, China
- Key Laboratory of Marine Drugs, Ministry of Education, Qingdao 266003, China
- Shandong Key Laboratory of Glycoscience and Glycotherapeutics, Qingdao 266003, China
| | - Xiaohan Yang
- State Key Laboratory of Microbial Diversity and Innovative Utilization, Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Wengong Yu
- School of Medicine and Pharmacy, Ocean University of China, Qingdao 266003, China
- Laboratory for Marine Drugs and Bioproducts, Qingdao Marine Science and Technology Center, Qingdao 266237, China
- Key Laboratory of Marine Drugs, Ministry of Education, Qingdao 266003, China
- Shandong Key Laboratory of Glycoscience and Glycotherapeutics, Qingdao 266003, China
| | - Weishan Wang
- School of Medicine and Pharmacy, Ocean University of China, Qingdao 266003, China
- State Key Laboratory of Microbial Diversity and Innovative Utilization, Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- Beijing Key Laboratory of Genetic Element Biosourcing & Intelligent Design for Biomanufacturing, Beijing 100101, China
| | - Feng Han
- School of Medicine and Pharmacy, Ocean University of China, Qingdao 266003, China
- Laboratory for Marine Drugs and Bioproducts, Qingdao Marine Science and Technology Center, Qingdao 266237, China
- Key Laboratory of Marine Drugs, Ministry of Education, Qingdao 266003, China
- Shandong Key Laboratory of Glycoscience and Glycotherapeutics, Qingdao 266003, China
| |
Collapse
|
3
|
Kumar SP, Nadendla EK, Malireddi RKS, Haque SA, Mall R, Neuwald AF, Kanneganti TD. Evolutionary and Functional Analysis of Caspase-8 and ASC Interactions to Drive Lytic Cell Death, PANoptosis. Mol Biol Evol 2025; 42:msaf096. [PMID: 40277230 PMCID: PMC12066828 DOI: 10.1093/molbev/msaf096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2024] [Revised: 04/09/2025] [Accepted: 04/15/2025] [Indexed: 04/26/2025] Open
Abstract
Caspases are evolutionarily conserved proteins essential for driving cell death in development and host defense. Caspase-8, a key member of the caspase family, is implicated in nonlytic apoptosis, as well as lytic forms of cell death. Recently, caspase-8 has been identified as an integral component of PANoptosomes, multiprotein complexes formed in response to innate immune sensor activation. Several innate immune sensors can nucleate caspase-8-containing PANoptosome complexes to drive inflammatory lytic cell death, PANoptosis. However, how the evolutionarily conserved and diverse functions of caspase-8 drive PANoptosis remains unclear. To address this, we performed evolutionary, sequence, structural, and functional analyses to decode caspase-8's complex-forming abilities and its interaction with the PANoptosome adaptor ASC. Our study distinguished distinct subgroups within the death domain superfamily based on their evolutionary and functional relationships, identified homotypic traits among subfamily members, and captured key events in caspase evolution. We also identified critical residues defining the heterotypic interaction between caspase-8's death effector domain and ASC's pyrin domain, validated through cross-species analyses, dynamic simulations, and in vitro experiments. Overall, our study elucidated recent evolutionary adaptations of caspase-8 that allowed it to interact with ASC, improving our understanding of critical molecular associations in PANoptosome complex formation and the underlying PANoptotic responses in host defense and inflammation. These findings have implications for understanding mammalian immune responses and developing new therapeutic strategies for inflammatory diseases.
Collapse
Affiliation(s)
- Sivakumar Prasanth Kumar
- Department of Immunology, St. Jude Children's Research Hospital, 262 Danny Thomas Place, Memphis, TN 38105, USA
| | - Eswar Kumar Nadendla
- Department of Immunology, St. Jude Children's Research Hospital, 262 Danny Thomas Place, Memphis, TN 38105, USA
| | - R K Subbarao Malireddi
- Department of Immunology, St. Jude Children's Research Hospital, 262 Danny Thomas Place, Memphis, TN 38105, USA
| | - Syed Asfarul Haque
- Cryo-Electron Microscopy Center, St. Jude Children's Research Hospital, 262 Danny Thomas Place, Memphis, TN 38105, USA
| | - Raghvendra Mall
- Department of Immunology, St. Jude Children's Research Hospital, 262 Danny Thomas Place, Memphis, TN 38105, USA
| | - Andrew F Neuwald
- Institute for Genome Sciences and Department of Biochemistry and Molecular Biology, University of Maryland School of Medicine, 670 W. Baltimore Street, Baltimore, MD 21201, USA
| | - Thirumala-Devi Kanneganti
- Department of Immunology, St. Jude Children's Research Hospital, 262 Danny Thomas Place, Memphis, TN 38105, USA
| |
Collapse
|
4
|
Sanyal D, Shivram A, Pandey D, Banerjee S, Uversky VN, Muzata D, Chivukula AS, Jasuja R, Chattopadhyay K, Chowdhury S. Mapping dihydropteroate synthase evolvability through identification of a novel evolutionarily critical substructure. Int J Biol Macromol 2025; 311:143325. [PMID: 40254194 DOI: 10.1016/j.ijbiomac.2025.143325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2024] [Revised: 03/28/2025] [Accepted: 04/17/2025] [Indexed: 04/22/2025]
Abstract
Protein evolution shapes pathogen adaptation-landscape, particularly in developing drug resistance. The rapid evolution of target proteins under antibiotic pressure leads to escape mutations, resulting in antibiotic resistance. A deep understanding of the evolutionary dynamics of antibiotic target proteins presents a plausible intervention strategy for disrupting the resistance trajectory. Mutations in Dihydropteroate synthase (DHPS), an essential folate pathway protein and sulfonamide antibiotic target, reduce antibiotic binding leading to anti-folate resistance. Deploying statistical analyses on the DHPS sequence-space and integrating deep mutational analysis with structure-based network-topology models, we identified critical DHPS subsequences. Our frustration landscape analysis suggests how conformational and mutational changes redistribute energy within DHPS substructures. We present an epistasis-based fitness prediction model that simulates DHPS adaptive walks, identifying residue positions that shape evolutionary constraints. Our optimality analysis revealed a substructure central to DHPS evolvability, and we assessed its druggability. Combining evolution and structure, this integrated framework identifies a DHPS substructure with significant evolutionary and structural impact. Targeting this region may constrain DHPS evolvability and slow resistance emergence, offering new directions for antibiotic development.
Collapse
Affiliation(s)
- Dwipanjan Sanyal
- Structural Biology and Bioinformatics Division, CSIR-Indian Institute of Chemical Biology, Kolkata, India
| | - A Shivram
- Department of Computer Science and Information Systems, Birla Institute of Technology and Science-Pilani, Hyderabad, India
| | - Deeptanshu Pandey
- Department of Biological Sciences, Birla Institute of Technology and Science-Pilani, Hyderabad, India
| | | | - Vladimir N Uversky
- USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA
| | - Danny Muzata
- Department of Computer Science and Information Systems, Birla Institute of Technology and Science-Pilani, Hyderabad, India
| | - Aneesh Sreevallabh Chivukula
- Department of Computer Science and Information Systems, Birla Institute of Technology and Science-Pilani, Hyderabad, India
| | - Ravi Jasuja
- Research Program in Men's Health: Aging and Metabolism, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Krishnananda Chattopadhyay
- Structural Biology and Bioinformatics Division, CSIR-Indian Institute of Chemical Biology, Kolkata, India.
| | - Sourav Chowdhury
- Department of Biological Sciences, Birla Institute of Technology and Science-Pilani, Hyderabad, India.
| |
Collapse
|
5
|
Miller ST, Macdonald CB, Raman S. Understanding, inhibiting, and engineering membrane transporters with high-throughput mutational screens. Cell Chem Biol 2025; 32:529-541. [PMID: 40168989 DOI: 10.1016/j.chembiol.2025.03.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2024] [Revised: 01/20/2025] [Accepted: 03/10/2025] [Indexed: 04/03/2025]
Abstract
Promiscuous membrane transporters play vital roles across domains of life, mediating the uptake and efflux of structurally and chemically diverse substrates. Although many transporter structures have been solved, the fundamental rules of polyspecific transport remain inscrutable. In recent years, high-throughput genetic screens have solidified as powerful tools for comprehensive, unbiased measurements of variant function and hypothesis generation, but have had infrequent application and limited impact in the transporter field. In this primer, we describe the principles of high-throughput screening methods available for studying polyspecific transporters and comment on the necessity and potential of high-throughput methods for deciphering these transporters in particular. We present several screening approaches which could provide a fundamental understanding of the molecular basis of function and promiscuity in transporters. We further posit how this knowledge can be leveraged to design inhibitors that combat multidrug resistance and engineer transporters as needed tools for synthetic biology and biotechnology applications.
Collapse
Affiliation(s)
- Silas T Miller
- Cellular and Molecular Biology Graduate Program, University of Wisconsin-Madison, Madison, WI 53706, USA; Department of Biochemistry, University of Wisconsin-Madison, Madison, WI 53706, USA; DOE Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Christian B Macdonald
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI 53706, USA; Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Srivatsan Raman
- DOE Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Madison, WI 53706, USA; Department of Bacteriology, University of Wisconsin-Madison, Madison, WI 53706, USA; Department of Chemical and Biological Engineering, University of Wisconsin-Madison, Madison, WI 53706, USA.
| |
Collapse
|
6
|
Baeza J, Bedoya M, Cruz P, Ojeda P, Adasme-Carreño F, Cerda O, González W. Main methods and tools for peptide development based on protein-protein interactions (PPIs). Biochem Biophys Res Commun 2025; 758:151623. [PMID: 40121967 DOI: 10.1016/j.bbrc.2025.151623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2024] [Revised: 03/05/2025] [Accepted: 03/10/2025] [Indexed: 03/25/2025]
Abstract
Protein-protein interactions (PPIs) regulate essential physiological and pathological processes. Due to their large and shallow binding surfaces, PPIs are often considered challenging drug targets for small molecules. Peptides offer a viable alternative, as they can bind these targets, acting as regulators or mimicking interaction partners. This review focuses on competitive peptides, a class of orthosteric modulators that disrupt PPI formation. We provide a concise yet comprehensive overview of recent advancements in in-silico peptide design, highlighting computational strategies that have improved the efficiency and accuracy of PPI-targeting peptides. Additionally, we examine cutting-edge experimental methods for evaluating PPI-based peptides. By exploring the interplay between computational design and experimental validation, this review presents a structured framework for developing effective peptide therapeutics targeting PPIs in various diseases.
Collapse
Affiliation(s)
- Javiera Baeza
- Centro de Bioinformática, Simulación y Modelado (CBSM), Facultad de Ingeniería. Universidad de Talca, Talca, Chile; Millennium Nucleus of Ion Channel-Associated Diseases (MiNICAD), Chile
| | - Mauricio Bedoya
- Centro de Investigación de Estudios Avanzados del Maule (CIEAM), Vicerrectoría de Investigación y Postgrado, Universidad Católica del Maule, Talca, Chile; Laboratorio de Bioinformática y Química Computacional (LBQC), Departamento de Medicina Traslacional, Facultad de Medicina, Universidad Católica del Maule, Talca, Chile.
| | - Pablo Cruz
- Millennium Nucleus of Ion Channel-Associated Diseases (MiNICAD), Chile; Programa de Biología Celular y Molecular, Instituto de Ciencias Biomédicas (ICBM), Facultad de Medicina, Universidad de Chile, Santiago, Chile
| | - Paola Ojeda
- Carrera de Química y Farmacia, Facultad de Medicina y Ciencia, Universidad San Sebastián, General Lagos 1163, 5090000, Valdivia, Chile
| | - Francisco Adasme-Carreño
- Centro de Investigación de Estudios Avanzados del Maule (CIEAM), Vicerrectoría de Investigación y Postgrado, Universidad Católica del Maule, Talca, Chile; Laboratorio de Bioinformática y Química Computacional (LBQC), Departamento de Medicina Traslacional, Facultad de Medicina, Universidad Católica del Maule, Talca, Chile
| | - Oscar Cerda
- Millennium Nucleus of Ion Channel-Associated Diseases (MiNICAD), Chile; Programa de Biología Celular y Molecular, Instituto de Ciencias Biomédicas (ICBM), Facultad de Medicina, Universidad de Chile, Santiago, Chile.
| | - Wendy González
- Centro de Bioinformática, Simulación y Modelado (CBSM), Facultad de Ingeniería. Universidad de Talca, Talca, Chile; Millennium Nucleus of Ion Channel-Associated Diseases (MiNICAD), Chile.
| |
Collapse
|
7
|
Liu J, Neupane P, Cheng J. Improving AlphaFold2 and 3-based protein complex structure prediction with MULTICOM4 in CASP16. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.03.06.641913. [PMID: 40161604 PMCID: PMC11952293 DOI: 10.1101/2025.03.06.641913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/02/2025]
Abstract
With AlphaFold achieving high-accuracy tertiary structure prediction for most single-chain proteins (monomers), the next major challenge in protein structure prediction is accurately modeling multi-chain protein complexes (multimers). We developed MULTICOM4, the latest version of the MULTICOM system, to improve protein complex structure prediction by integrating transformer-based AlphaFold2, diffusion model-based AlphaFold3, and our in-house techniques. These include protein complex stoichiometry prediction, diverse multiple sequence alignment (MSA) generation leveraging both sequence and structure comparison, modeling exception handling, and deep learning-based model quality assessment. MULTICOM4 was blindly evaluated in the 16th community-wide Critical Assessment of Techniques for Protein Structure Prediction (CASP16) in 2024. In Phase 0 of CASP16, where stoichiometry information was unavailable, MULTICOM predictors performed best, with MULTICOM_human achieving a TM-score of 0.752 and a DockQ score of 0.584 for top-ranked predictions on average. In Phase 1 of CASP16, with stoichiometry information provided, MULTICOM_human remained among the top predictors, attaining a TM-score of 0.797 and a DockQ score of 0.558 on average. The CASP16 results demonstrate that integrating complementary AlphaFold2 and 3 with enhanced MSA inputs, comprehensive model ranking, exception handling, and accurate stoichiometry prediction can effectively improve protein complex structure prediction.
Collapse
|
8
|
Luppino F, Lenz S, Chow CFW, Toth-Petroczy A. Deep learning tools predict variants in disordered regions with lower sensitivity. BMC Genomics 2025; 26:367. [PMID: 40221640 PMCID: PMC11992697 DOI: 10.1186/s12864-025-11534-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2025] [Accepted: 03/27/2025] [Indexed: 04/14/2025] Open
Abstract
BACKGROUND The recent AI breakthrough of AlphaFold2 has revolutionized 3D protein structural modeling, proving crucial for protein design and variant effects prediction. However, intrinsically disordered regions-known for their lack of well-defined structure and lower sequence conservation-often yield low-confidence models. The latest Variant Effect Predictor (VEP), AlphaMissense, leverages AlphaFold2 models, achieving over 90% sensitivity and specificity in predicting variant effects. However, the effectiveness of tools for variants in disordered regions, which account for 30% of the human proteome, remains unclear. RESULTS In this study, we found that predicting pathogenicity for variants in disordered regions is less accurate than in ordered regions, particularly for mutations at the first N-Methionine site. Investigations into the efficacy of variant effect predictors on intrinsically disordered regions (IDRs) indicated that mutations in IDRs are predicted with lower sensitivity and the gap between sensitivity and specificity is largest in disordered regions, especially for AlphaMissense and VARITY. CONCLUSIONS The prevalence of IDRs within the human proteome, coupled with the increasing repertoire of biological functions they are known to perform, necessitated an investigation into the efficacy of state-of-the-art VEPs on such regions. This analysis revealed their consistently reduced sensitivity and differing prediction performance profile to ordered regions, indicating that new IDR-specific features and paradigms are needed to accurately classify disease mutations within those regions.
Collapse
Affiliation(s)
- Federica Luppino
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstrasse 108, 01307, Dresden, Germany
- Center for Systems Biology Dresden, Pfotenhauerstrasse 108, 01307, Dresden, Germany
| | - Swantje Lenz
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstrasse 108, 01307, Dresden, Germany
- Center for Systems Biology Dresden, Pfotenhauerstrasse 108, 01307, Dresden, Germany
| | - Chi Fung Willis Chow
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstrasse 108, 01307, Dresden, Germany
- Center for Systems Biology Dresden, Pfotenhauerstrasse 108, 01307, Dresden, Germany
- Cluster of Excellence Physics of Life, TU Dresden, 01062, Dresden, Germany
| | - Agnes Toth-Petroczy
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstrasse 108, 01307, Dresden, Germany.
- Center for Systems Biology Dresden, Pfotenhauerstrasse 108, 01307, Dresden, Germany.
- Cluster of Excellence Physics of Life, TU Dresden, 01062, Dresden, Germany.
| |
Collapse
|
9
|
Meng Y, Zhang Z, Zhou C, Tang X, Hu X, Tian G, Yang J, Yao Y. Protein structure prediction via deep learning: an in-depth review. Front Pharmacol 2025; 16:1498662. [PMID: 40248099 PMCID: PMC12003282 DOI: 10.3389/fphar.2025.1498662] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2024] [Accepted: 02/28/2025] [Indexed: 04/19/2025] Open
Abstract
The application of deep learning algorithms in protein structure prediction has greatly influenced drug discovery and development. Accurate protein structures are crucial for understanding biological processes and designing effective therapeutics. Traditionally, experimental methods like X-ray crystallography, nuclear magnetic resonance, and cryo-electron microscopy have been the gold standard for determining protein structures. However, these approaches are often costly, inefficient, and time-consuming. At the same time, the number of known protein sequences far exceeds the number of experimentally determined structures, creating a gap that necessitates the use of computational approaches. Deep learning has emerged as a promising solution to address this challenge over the past decade. This review provides a comprehensive guide to applying deep learning methodologies and tools in protein structure prediction. We initially outline the databases related to the protein structure prediction, then delve into the recently developed large language models as well as state-of-the-art deep learning-based methods. The review concludes with a perspective on the future of predicting protein structure, highlighting potential challenges and opportunities.
Collapse
Affiliation(s)
- Yajie Meng
- College of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan, China
| | - Zhuang Zhang
- College of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan, China
| | - Chang Zhou
- College of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan, China
| | - Xianfang Tang
- College of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan, China
| | - Xinrong Hu
- College of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan, China
| | | | | | - Yuhua Yao
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
- Key Laboratory of Data Science and Intelligence Education, Ministry of Education, Hainan Normal University, Haikou, China
- Key Laboratory of Computational Science and Application of Hainan Province, Hainan Normal University, Haikou, China
| |
Collapse
|
10
|
Mychack A, Evans D, Gilles T, James MJ, Walker S. Staphylococcus aureus uses a GGDEF protein to recruit diacylglycerol kinase to the membrane for lipid recycling. Proc Natl Acad Sci U S A 2025; 122:e2414696122. [PMID: 40100631 PMCID: PMC11962490 DOI: 10.1073/pnas.2414696122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2024] [Accepted: 02/03/2025] [Indexed: 03/20/2025] Open
Abstract
Staphylococcus aureus is a Gram-positive pathogen responsible for numerous antibiotic-resistant infections. Identifying vulnerabilities in S. aureus is crucial for developing new antibiotics to treat these infections. With this in mind, we probed the function of GdpS, a conserved Staphylococcal membrane protein containing a cytoplasmic GGDEF domain. These domains are canonically involved in cyclic-di-GMP signaling processes, but S. aureus is not known to make cyclic-di-GMP. Using a transposon screen, we found that loss of GdpS is lethal when combined with disruption in synthesis of the glycolipid anchor of a cell surface polymer called lipoteichoic acid (LTA) or with deletion of genes important in cell division. Taking advantage of a small molecule that inhibits LTA glycolipid anchor synthesis, we selected for suppressors of ΔgdpS lethality. The most prevalent suppressors were hypermorphic alleles of dgkB, which encodes a soluble diacylglycerol (DAG) kinase required to recycle DAG to phosphatidylglycerol. By following up on these suppressors, we found that the GGDEF domain of GdpS interacts directly with DgkB, orienting its active site at the membrane to promote DAG recycling. DAG kinase hypermorphs also suppressed the lethality caused by combined loss of gdpS and cell division factors, highlighting the importance of lipid homeostasis for cell division. GdpS' positive regulation of DAG kinase function was dependent on the GGDEF domain but not its catalytic residues. As the sole conserved GGDEF-domain protein in Staphylococci, GdpS promotes an enzymatic process independent of cyclic-di-GMP signaling, revealing a new function for the ubiquitously conserved GGDEF domain.
Collapse
Affiliation(s)
- Aaron Mychack
- Department of Microbiology, Blavatnik Institute, Harvard Medical School, Boston, MA02115
| | - Dwayne Evans
- Department of Microbiology, Blavatnik Institute, Harvard Medical School, Boston, MA02115
| | - Tarah Gilles
- Department of Molecular and Cellular Biology, Harvard College, Cambridge, MA02138
| | - Michael J. James
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Institute of Medicine, Harvard Medical School, Boston, MA02115
| | - Suzanne Walker
- Department of Microbiology, Blavatnik Institute, Harvard Medical School, Boston, MA02115
| |
Collapse
|
11
|
Xing E, Zhang J, Wang S, Cheng X. Leveraging Sequence Purification for Accurate Prediction of Multiple Conformational States with AlphaFold2. RESEARCH SQUARE 2025:rs.3.rs-6087969. [PMID: 40092441 PMCID: PMC11908349 DOI: 10.21203/rs.3.rs-6087969/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/19/2025]
Abstract
AlphaFold2 (AF2) has transformed protein structure prediction by harnessing co-evolutionary constraints embedded in multiple sequence alignments (MSAs). MSAs not only encode static structural information, but also hold critical details about protein dynamics, which underpin biological functions. However, these subtle coevolutionary signatures, which dictate conformational state preferences, are often obscured by noise within MSA data and thus remain challenging to decipher. Here, we introduce AF-ClaSeq, a systematic framework that isolates these co-evolutionary signals through sequence purification and iterative enrichment. By extracting sequence subsets that preferentially encode distinct structural states, AF-ClaSeq enables high-confidence predictions of alternative conformations. Our findings reveal that the successful sampling of alternative states depends not on MSA depth but on sequence purity. Intriguingly, purified sequences encoding specific structural states are distributed across phylogenetic clades and superfamilies, rather than confined to specific lineages. Expanding upon AF2's transformative capabilities, AF-ClaSeq provides a powerful approach for uncovering hidden structural plasticity, advancing allosteric protein and drug design, and facilitating dynamics-based protein function annotation.
Collapse
Affiliation(s)
- Enming Xing
- Division of Medicinal Chemistry and Pharmacognosy, College of Pharmacy, The Ohio State University, Columbus OH, 43210, USA
| | - Junjie Zhang
- Division of Medicinal Chemistry and Pharmacognosy, College of Pharmacy, The Ohio State University, Columbus OH, 43210, USA
| | - Shen Wang
- Division of Medicinal Chemistry and Pharmacognosy, College of Pharmacy, The Ohio State University, Columbus OH, 43210, USA
| | - Xiaolin Cheng
- Division of Medicinal Chemistry and Pharmacognosy, College of Pharmacy, The Ohio State University, Columbus OH, 43210, USA
- Translational Data Analytics Institute, The Ohio State University, Columbus, OH 43210, USA
| |
Collapse
|
12
|
Han B, Zhang Y, Li L, Gong X, Xia K. TopoQA: a topological deep learning-based approach for protein complex structure interface quality assessment. Brief Bioinform 2025; 26:bbaf083. [PMID: 40062613 PMCID: PMC11891663 DOI: 10.1093/bib/bbaf083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2024] [Revised: 01/11/2025] [Accepted: 02/17/2025] [Indexed: 05/13/2025] Open
Abstract
Even with the significant advances of AlphaFold-Multimer (AF-Multimer) and AlphaFold3 (AF3) in protein complex structure prediction, their accuracy is still not comparable with monomer structure prediction. Efficient and effective quality assessment (QA) or estimation of model accuracy models that can evaluate the quality of the predicted protein-complexes without knowing their native structures are of key importance for protein structure generation and model selection. In this paper, we leverage persistent homology (PH) to capture the atomic-level topological information around residues and design a topological deep learning-based QA method, TopoQA, to assess the accuracy of protein complex interfaces. We integrate PH from topological data analysis into graph neural networks (GNNs) to characterize complex higher-order structures that GNNs might overlook, enhancing the learning of the relationship between the topological structure of complex interfaces and quality scores. Our TopoQA model is extensively validated based on the two most-widely used benchmark datasets, Docking Benchmark5.5 AF2 (DBM55-AF2) and Heterodimer-AF2 (HAF2), along with our newly constructed ABAG-AF3 dataset to facilitate comparisons with AF3. For all three datasets, TopoQA outperforms AF-Multimer-based AF2Rank and shows an advantage over AF3 in nearly half of the targets. In particular, in the DBM55-AF2 dataset, a ranking loss of 73.6% lower than AF-Multimer-based AF2Rank is obtained. Further, other than AF-Multimer and AF3, we have also extensively compared with nearly-all the state-of-the-art models (as far as we know), it has been found that our TopoQA can achieve the highest Top 10 Hit-rate on the DBM55-AF2 dataset and the lowest ranking loss on the HAF2 dataset. Ablation experiments show that our topological features significantly improve the model's performance. At the same time, our method also provides a new paradigm for protein structure representation learning.
Collapse
Affiliation(s)
- Bingqing Han
- Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371, Singapore
| | - Yipeng Zhang
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371, Singapore
| | - Longlong Li
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371, Singapore
- School of Mathematics, Shandong University, Jinan 250100, China
- Data Science Institute, Shandong University, Jinan 250100, China
| | - Xinqi Gong
- Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
| | - Kelin Xia
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371, Singapore
| |
Collapse
|
13
|
Song J, Qiao J, Cheng Z, Guo J, Wang Q, Zhou Z, Han L. Computational design of coevolutionary residues for improved stability and activity of nitrile hydratase. Biochem Biophys Res Commun 2025; 750:151400. [PMID: 39889624 DOI: 10.1016/j.bbrc.2025.151400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2025] [Accepted: 01/25/2025] [Indexed: 02/03/2025]
Abstract
Nitrile Hydratase (NHase), an industrially significant enzyme, catalyzes the conversion of nitriles into amides. High activity and thermostability are crucial for its broad applications. Compared with classical evaluation and subsequent combination of single-point mutations, redesigning coevolutionary residues offers a more precise approach by targeting key functional sites and facilitating efficient computational design and iteration. Here, we proposed an optimized strategy for redesigning coevolutionary residues to enhance the robustness of NHase, a heterotetrameric protein. We conducted an extensive analysis of 80 coevolutionary residue pairs in NHase from Pseudonocardia thermophila JCM3095 (PtNHase) and identified 21 hotspot designable residue pairs lacking explicit interactions. Virtual saturating combinatorial mutations were applied to these pairs, yielding 27 positive candidates from 8379 theoretical mutations based on changes in folding free energy. After screening and iterative combinations, the optimal mutant A3 (αG86Y/αK57L/αE183F) was obtained, whose specific activity toward acrylonitrile and half-life at 65 °C were increased from 1656.8 ± 21.2 U/mg and 20.1 min in WT to 2370.1 ± 102.7 U/mg and 62.3 min, respectively. Benefiting from higher activity and thermostability, the whole-cell catalyst of A3 significantly facilitated the bioconversion of acrylonitrile to acrylamide. Molecular dynamics simulations further revealed that the newly formed inter-residue interactions stabilized the active site and enhanced the flexibility of the substrate channel, thereby improving both activity and thermostability. This study not only developed a highly robust NHase, but also established a framework for the design of other industrial enzymes.
Collapse
Affiliation(s)
- Jiaen Song
- Key Laboratory of Industrial Biotechnology (Ministry of Education), School of Biotechnology, Jiangnan University, Wuxi, Jiangsu, China
| | - Jun Qiao
- Ningbo Institute of Marine Medicine, Peking University, Ningbo, Zhejiang, China
| | - Zhongyi Cheng
- Key Laboratory of Industrial Biotechnology (Ministry of Education), School of Biotechnology, Jiangnan University, Wuxi, Jiangsu, China
| | - Junling Guo
- Key Laboratory of Industrial Biotechnology (Ministry of Education), School of Biotechnology, Jiangnan University, Wuxi, Jiangsu, China
| | - Qiong Wang
- Key Laboratory of Industrial Biotechnology (Ministry of Education), School of Biotechnology, Jiangnan University, Wuxi, Jiangsu, China
| | - Zhemin Zhou
- Key Laboratory of Industrial Biotechnology (Ministry of Education), School of Biotechnology, Jiangnan University, Wuxi, Jiangsu, China; Jiangnan University (Rugao) Food Biotechnology Research Institute, Rugao, Jiangsu, China.
| | - Laichuang Han
- Key Laboratory of Industrial Biotechnology (Ministry of Education), School of Biotechnology, Jiangnan University, Wuxi, Jiangsu, China.
| |
Collapse
|
14
|
Sun J, Ru J, Cribbs AP, Xiong D. PyPropel: a Python-based tool for efficiently processing and characterising protein data. BMC Bioinformatics 2025; 26:70. [PMID: 40025421 PMCID: PMC11871610 DOI: 10.1186/s12859-025-06079-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2024] [Accepted: 02/10/2025] [Indexed: 03/04/2025] Open
Abstract
BACKGROUND The volume of protein sequence data has grown exponentially in recent years, driven by advancements in metagenomics. Despite this, a substantial proportion of these sequences remain poorly annotated, underscoring the need for robust bioinformatics tools to facilitate efficient characterisation and annotation for functional studies. RESULTS We present PyPropel, a Python-based computational tool developed to streamline the large-scale analysis of protein data, with a particular focus on applications in machine learning. PyPropel integrates sequence and structural data pre-processing, feature generation, and post-processing for model performance evaluation and visualisation, offering a comprehensive solution for handling complex protein datasets. CONCLUSION PyPropel provides added value over existing tools by offering a unified workflow that encompasses the full spectrum of protein research, from raw data pre-processing to functional annotation and model performance analysis, thereby supporting efficient protein function studies.
Collapse
Affiliation(s)
- Jianfeng Sun
- Botnar Research Centre, University of Oxford, Headington, Oxford, OX3 7LD, UK.
| | - Jinlong Ru
- Chair of Prevention of Microbial Diseases, School of Life Sciences Weihenstephan, Technical University of Munich, 85354, Freising, Germany
| | - Adam P Cribbs
- Botnar Research Centre, University of Oxford, Headington, Oxford, OX3 7LD, UK
| | - Dapeng Xiong
- Department of Computational Biology, Cornell University, Ithaca, 14853, USA.
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, 14853, USA.
| |
Collapse
|
15
|
Nelson MG, Talavera D. Identification of coevolving positions by ancestral reconstruction. Commun Biol 2025; 8:329. [PMID: 40021815 PMCID: PMC11871020 DOI: 10.1038/s42003-025-07676-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Accepted: 02/05/2025] [Indexed: 03/03/2025] Open
Abstract
Coevolution within proteins occurs when changes in one position affect the selective pressure in another position to preserve the protein structure or function. The identification of coevolving positions within proteins remains contentious, with most methods disregarding the phylogenetic information. Here, we present a time-efficient approach for detecting coevolving pairs, which is almost perfect in terms of precision and specificity. It is based on maximum parsimony-based ancestral reconstruction followed by the identification of pairs with a depletion on separate changes when compared to their number of concurrent changes. Our analysis of a previously characterised biological dataset shows that the coevolving pairs that we identified tend to be close in the protein sequence and structure, slightly less solvent exposed and have a higher mutation rate. We also show how the ancestral reconstruction can be used to detect favourable and unfavourable amino acid combinations. Altogether, we demonstrate how this approach is essential for identifying pairs of positions with weak covariation patterns.
Collapse
Affiliation(s)
- Michael G Nelson
- Division of Cardiovascular Sciences, School of Medical Sciences, The University of Manchester, Oxford Road, Manchester, UK
| | - David Talavera
- Division of Cardiovascular Sciences, School of Medical Sciences, The University of Manchester, Oxford Road, Manchester, UK.
| |
Collapse
|
16
|
Forrest B, Derbel H, Zhao Z, Liu Q. MMRT: MultiMut Recursive Tree for predicting functional effects of high-order protein variants from low-order variants. Comput Struct Biotechnol J 2025; 27:672-681. [PMID: 40070521 PMCID: PMC11894328 DOI: 10.1016/j.csbj.2025.02.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2024] [Revised: 02/10/2025] [Accepted: 02/17/2025] [Indexed: 03/14/2025] Open
Abstract
Protein sequences primarily determine their stability and functions. Mutations may occur at one, two, or three positions at the same time (low-order variants) or at multiple positions simultaneously (high-order variants), which affect protein functions. So far, low-order variants, such as single variants, double variants, and triple variants, have been well-studied through high-throughput experimental scanning techniques and computational prediction methods. However, research on high-order variants remains limited because of the difficulty of scanning an exponentially large number of potential variant combinations. Nonetheless, studying higher-order variants is crucial for understanding the pathogenesis of complex diseases, advancing protein engineering, and driving precision medicine. In this work, we introduce a novel deep learning model, namely MultiMut Recursive Tree (MMRT), to address this challenge of predicting the functional effects of high-order variants. MMRT integrates deep learning with a recursive tree framework to leverage the information from low-order variants to predict functional effects of high-order variants. We evaluated MMRT on datasets comprising 685,593 high-order variants. Our results (mean Spearman's correlation coefficient 0.55) demonstrated that MMRT outperformed three existing state-of-the-art methods: ESM (evolutionary scale modeling), DeepSequence, and ECNet (evolutionary context-integrated neural network). MMRT thus provides more accurate prediction of the functional effects of high-order protein variants, offering great potential for aiding the interpretation of variants in human disease studies.
Collapse
Affiliation(s)
- Bryce Forrest
- Nevada Institute of Personalized Medicine, University of Nevada, Las Vegas, 4505 S Maryland Pkwy, Las Vegas, NV 89154, USA
| | - Houssemeddine Derbel
- Nevada Institute of Personalized Medicine, University of Nevada, Las Vegas, 4505 S Maryland Pkwy, Las Vegas, NV 89154, USA
| | - Zhongming Zhao
- Center for Precision Health, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Qian Liu
- Nevada Institute of Personalized Medicine, University of Nevada, Las Vegas, 4505 S Maryland Pkwy, Las Vegas, NV 89154, USA
- School of Life Sciences, College of Sciences, University of Nevada, Las Vegas, 4505 S Maryland Pkwy, Las Vegas, NV 89154, USA
| |
Collapse
|
17
|
Thornton BW, Weissman RF, Tran RV, Duong BT, Rodriguez JE, Terrace CI, Groover ED, Park JU, Tartaglia J, Doudna JA, Savage DF. Latent activity in TnpB revealed by mutational scanning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.11.637750. [PMID: 39990302 PMCID: PMC11844463 DOI: 10.1101/2025.02.11.637750] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 02/25/2025]
Abstract
TnpB is an evolutionarily diverse family of RNA-guided endonucleases associated with prokaryotic transposons. Due to their small size and putative evolutionary relationship to Cas12s, TnpB holds significant potential for genome editing and mechanistic exploration. However, most TnpBs lack robust gene-editing activity, and unbiased profiling of mutational effects on editing activity has not been experimentally explored. Here, we mapped comprehensive sequence-function landscapes of a TnpB ribonucleoprotein and discovered many activating mutations in both the protein and RNA. Single-position changes in the RNA outperform existing variants, highlighting the utility of systematic RNA scaffold mutagenesis. Leveraging the mutational landscape of the TnpB protein, we identified enhanced protein variants from a combinatorial library of activating mutations. These variants increased editing in human cells and N. benthamiana by over two-fold and fifty-fold relative to wild-type TnpB, respectively. In total, this study highlights unknown elements critical for regulation of endonuclease activity in both the TnpB protein and the RNA, and reveals a surprising amount of latent activity accessible through mutation.
Collapse
Affiliation(s)
- Brittney W. Thornton
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, 94720, USA
- Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA, 94720, USA
| | - Rachel F. Weissman
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, 94720, USA
- Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA, 94720, USA
| | - Ryan V. Tran
- Department of Chemistry, University of California, Berkeley, Berkeley, CA, 94720, US
- Scribe Therapeutics, Alameda, CA, 94501, USA
| | - Brenda T. Duong
- Department of Chemistry, University of California, Berkeley, Berkeley, CA, 94720, US
| | - Jorge E. Rodriguez
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, 94720, USA
- Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA, 94720, USA
| | - Cynthia I. Terrace
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, 94720, USA
- Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA, 94720, USA
| | - Evan D. Groover
- Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA, 94720, USA
- Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA, USA
| | - Jung-Un Park
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, 94720, USA
- Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA, 94720, USA
- Howard Hughes Medical Institute, University of California, Berkeley, Berkeley, CA, 94720, USA
| | - Julia Tartaglia
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, 94720, USA
- Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA, 94720, USA
| | - Jennifer A. Doudna
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, 94720, USA
- Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA, 94720, USA
- Department of Chemistry, University of California, Berkeley, Berkeley, CA, 94720, US
- Howard Hughes Medical Institute, University of California, Berkeley, Berkeley, CA, 94720, USA
- California Institute for Quantitative Biosciences (QB3), University of California, Berkeley, CA, US
- Gladstone-UCSF Institute of Genomic Immunology, San Francisco, CA, USA
- Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - David F. Savage
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, 94720, USA
- Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA, 94720, USA
- Howard Hughes Medical Institute, University of California, Berkeley, Berkeley, CA, 94720, USA
| |
Collapse
|
18
|
Gelman S, Johnson B, Freschlin C, Sharma A, D'Costa S, Peters J, Gitter A, Romero PA. Biophysics-based protein language models for protein engineering. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.03.15.585128. [PMID: 38559182 PMCID: PMC10980077 DOI: 10.1101/2024.03.15.585128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Protein language models trained on evolutionary data have emerged as powerful tools for predictive problems involving protein sequence, structure, and function. However, these models overlook decades of research into biophysical factors governing protein function. We propose Mutational Effect Transfer Learning (METL), a protein language model framework that unites advanced machine learning and biophysical modeling. Using the METL framework, we pretrain transformer-based neural networks on biophysical simulation data to capture fundamental relationships between protein sequence, structure, and energetics. We finetune METL on experimental sequence-function data to harness these biophysical signals and apply them when predicting protein properties like thermostability, catalytic activity, and fluorescence. METL excels in challenging protein engineering tasks like generalizing from small training sets and position extrapolation, although existing methods that train on evolutionary signals remain powerful for many types of experimental assays. We demonstrate METL's ability to design functional green fluorescent protein variants when trained on only 64 examples, showcasing the potential of biophysics-based protein language models for protein engineering.
Collapse
Affiliation(s)
- Sam Gelman
- Department of Computer Sciences, University of Wisconsin-Madison
- Morgridge Institute for Research
| | - Bryce Johnson
- Department of Computer Sciences, University of Wisconsin-Madison
- Morgridge Institute for Research
| | | | - Arnav Sharma
- Department of Computer Sciences, University of Wisconsin-Madison
- Morgridge Institute for Research
| | - Sameer D'Costa
- Department of Biochemistry, University of Wisconsin-Madison
| | - John Peters
- Morgridge Institute for Research
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison
| | - Anthony Gitter
- Department of Computer Sciences, University of Wisconsin-Madison
- Morgridge Institute for Research
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison
| | - Philip A Romero
- Department of Biochemistry, University of Wisconsin-Madison
- Department of Biomedical Engineering, Duke University
| |
Collapse
|
19
|
Hermans P, Tsishyn M, Schwersensky M, Rooman M, Pucci F. Exploring Evolution to Uncover Insights Into Protein Mutational Stability. Mol Biol Evol 2025; 42:msae267. [PMID: 39786559 PMCID: PMC11721782 DOI: 10.1093/molbev/msae267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Revised: 11/27/2024] [Accepted: 11/28/2024] [Indexed: 01/12/2025] Open
Abstract
Determining the impact of mutations on the thermodynamic stability of proteins is essential for a wide range of applications such as rational protein design and genetic variant interpretation. Since protein stability is a major driver of evolution, evolutionary data are often used to guide stability predictions. Many state-of-the-art stability predictors extract evolutionary information from multiple sequence alignments of proteins homologous to a query protein, and leverage it to predict the effects of mutations on protein stability. To evaluate the power and the limitations of such methods, we used the massive amount of stability data recently obtained by deep mutational scanning to study how best to construct multiple sequence alignments and optimally extract evolutionary information from them. We tested different evolutionary models and found that, unexpectedly, independent-site models achieve similar accuracy to more complex epistatic models. A detailed analysis of the latter models suggests that their inference often results in noisy couplings, which do not appear to add predictive power over the independent-site contribution, at least in the context of stability prediction. Interestingly, by combining any of the evolutionary features with a simple structural feature, the relative solvent accessibility of the mutated residue, we achieved similar prediction accuracy to supervised, machine learning-based, protein stability change predictors. Our results provide new insights into the relationship between protein evolution and stability, and show how evolutionary information can be exploited to improve the performance of mutational stability prediction.
Collapse
Affiliation(s)
- Pauline Hermans
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Brussels 1050, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Brussels 1050, Belgium
| | - Matsvei Tsishyn
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Brussels 1050, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Brussels 1050, Belgium
| | - Martin Schwersensky
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Brussels 1050, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Brussels 1050, Belgium
| | - Marianne Rooman
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Brussels 1050, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Brussels 1050, Belgium
| | - Fabrizio Pucci
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Brussels 1050, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Brussels 1050, Belgium
| |
Collapse
|
20
|
Chu HY, Peng J, Mou Y, Wong ASL. Quantifying Protein-Nucleic Acid Interactions for Engineering Useful CRISPR-Cas9 Genome-Editing Variants. Methods Mol Biol 2025; 2870:227-243. [PMID: 39543038 DOI: 10.1007/978-1-0716-4213-9_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2024]
Abstract
Numerous high-specificity Cas9 variants have been engineered for precision genome editing. These variants typically harbor multiple mutations designed to alter the Cas9-single guide RNA (sgRNA)-DNA complex interactions for reduced off-target cleavage. By dissecting the contributions of individual mutations, we attempt to derive principles for designing high-specificity Cas9 variants. Here, we computationally modeled the specificity harnessing mutations of the widely used Cas9 isolated from Streptococcus pyogenes (SpCas9) and investigated their individual mutational effects. We quantified the mutational effects in terms of energy and contact changes by comparing the wild-type and mutant structures. We found that these mutations disrupt the protein-protein or protein-DNA contacts within the Cas9-sgRNA-DNA complex. We also identified additional impacted amino acid sites via energy changes that constitute the structural microenvironment encompassing the focal mutation, giving insights into how the mutations contribute to the high-specificity phenotype of SpCas9. Our method outlines a strategy to evaluate mutational effects that can facilitate rational design for Cas9 optimization.
Collapse
Affiliation(s)
- Hoi Yee Chu
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Pokfulam, Hong Kong SAR, China
- Centre for Oncology and Immunology, Hong Kong Science Park, Hong Kong SAR, China
| | - Jiaxing Peng
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Pokfulam, Hong Kong SAR, China
- Centre for Oncology and Immunology, Hong Kong Science Park, Hong Kong SAR, China
| | - Yuanbiao Mou
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Pokfulam, Hong Kong SAR, China
- Centre for Oncology and Immunology, Hong Kong Science Park, Hong Kong SAR, China
| | - Alan S L Wong
- Laboratory of Combinatorial Genetics and Synthetic Biology, School of Biomedical Sciences, The University of Hong Kong, Pokfulam, Hong Kong SAR, China.
- Centre for Oncology and Immunology, Hong Kong Science Park, Hong Kong SAR, China.
| |
Collapse
|
21
|
Mikołajczyk K, Wróblewski K, Kmiecik S. Delving into human α1,4-galactosyltransferase acceptor specificity: The role of enzyme dimerization. Biochem Biophys Res Commun 2024; 736:150486. [PMID: 39111055 DOI: 10.1016/j.bbrc.2024.150486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Revised: 07/26/2024] [Accepted: 07/30/2024] [Indexed: 11/10/2024]
Abstract
Human α1,4-galactosyltransferase (A4galt), a Golgi apparatus-resident GT, synthesizes Gb3 glycosphingolipid (GSL) and P1 glycotope on glycoproteins (GPs), which are receptors for Shiga toxin types 1 and 2. Despite the significant role of A4galt in glycosylation processes, the molecular mechanisms underlying its varied acceptor specificities remain poorly understood. Here, we attempted to elucidate A4galt specificity towards GSLs and GPs by exploring its interaction with GTs with various acceptor specificities, GP-specific β1,4-galactosyltransferase 1 (B4galt1) and GSL-specific β1,4-galactosyltransferase isoenzymes 5 and 6 (B4galt5 and B4galt6). Using a novel NanoBiT assay, we found that A4galt can form homodimers and heterodimers with B4galt1 and B4galt5 in two cell lines, human embryonic kidney cells (HEK293T) and Chinese hamster ovary cells (CHO-Lec2). We found that A4galt-B4galts heterodimers preferred N-terminally tagged interactions, while in A4galt homodimers, the favored localization of the fused tag depended on the cell line used. Furthermore, by employing AlphaFold for state-of-the-art structural prediction, we analyzed the interactions and structures of these enzyme complexes. Our analysis highlighted that the A4galt-B4galt5 heterodimer exhibited the highest prediction confidence, indicating a significant role of A4galt heterodimerization in determining enzyme specificity toward GSLs and GPs. These findings enhance our knowledge of A4galt acceptor specificity and may contribute to a better comprehension of pathomechanisms of the Shiga toxin-related diseases.
Collapse
Affiliation(s)
- Krzysztof Mikołajczyk
- Laboratory of Glycobiology, Hirszfeld Institute of Immunology and Experimental Therapy, Polish Academy of Sciences, Rudolfa Weigla St. 12, 53-114, Wroclaw, Poland.
| | - Karol Wróblewski
- Biological and Chemical Research Center, Faculty of Chemistry, University of Warsaw, Pasteura 1, 02-093, Warsaw, Poland
| | - Sebastian Kmiecik
- Biological and Chemical Research Center, Faculty of Chemistry, University of Warsaw, Pasteura 1, 02-093, Warsaw, Poland
| |
Collapse
|
22
|
Yehorova D, Di Geronimo B, Robinson M, Kasson PM, Kamerlin SCL. Using residue interaction networks to understand protein function and evolution and to engineer new proteins. Curr Opin Struct Biol 2024; 89:102922. [PMID: 39332048 DOI: 10.1016/j.sbi.2024.102922] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2024] [Revised: 08/21/2024] [Accepted: 09/02/2024] [Indexed: 09/29/2024]
Abstract
Residue interaction networks (RINs) provide graph-based representations of interaction networks within proteins, providing important insight into the factors driving protein structure, function, and stability relationships. There exists a wide range of tools with which to perform RIN analysis, taking into account different types of interactions, input (crystal structures, simulation trajectories, single proteins, or comparative analysis across proteins), as well as formats, including standalone software, web server, and a web application programming interface (API). In particular, the ability to perform comparative RIN analysis across protein families using "metaRINs" provides a valuable tool with which to dissect protein evolution. This, in turn, highlights hotspots to avoid (or target) for in vitro evolutionary studies, providing a powerful framework that can be exploited to engineer new proteins.
Collapse
Affiliation(s)
- Dariia Yehorova
- School of Chemistry and Biochemistry, Georgia Institute of Technology, 901 Atlantic Drive NW, Atlanta, GA-30332, USA
| | - Bruno Di Geronimo
- School of Chemistry and Biochemistry, Georgia Institute of Technology, 901 Atlantic Drive NW, Atlanta, GA-30332, USA
| | - Michael Robinson
- Department of Chemistry - BMC, Uppsala University, BMC Box 576, S-751 23 Uppsala, Sweden
| | - Peter M Kasson
- School of Chemistry and Biochemistry, Georgia Institute of Technology, 901 Atlantic Drive NW, Atlanta, GA-30332, USA; Department of Biomedical Engineering, Georgia Institute of Technology, 313 Fersht Dr NW, Atlanta GA 30332, USA; Department of Cell and Molecular Biology, Uppsala University, BMC Box 596, S-751 24 Uppsala, Sweden
| | - Shina C L Kamerlin
- School of Chemistry and Biochemistry, Georgia Institute of Technology, 901 Atlantic Drive NW, Atlanta, GA-30332, USA; Department of Chemistry - BMC, Uppsala University, BMC Box 576, S-751 23 Uppsala, Sweden.
| |
Collapse
|
23
|
Zhang C, Wang Q, Li Y, Teng A, Hu G, Wuyun Q, Zheng W. The Historical Evolution and Significance of Multiple Sequence Alignment in Molecular Structure and Function Prediction. Biomolecules 2024; 14:1531. [PMID: 39766238 PMCID: PMC11673352 DOI: 10.3390/biom14121531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2024] [Revised: 11/24/2024] [Accepted: 11/27/2024] [Indexed: 01/11/2025] Open
Abstract
Multiple sequence alignment (MSA) has evolved into a fundamental tool in the biological sciences, playing a pivotal role in predicting molecular structures and functions. With broad applications in protein and nucleic acid modeling, MSAs continue to underpin advancements across a range of disciplines. MSAs are not only foundational for traditional sequence comparison techniques but also increasingly important in the context of artificial intelligence (AI)-driven advancements. Recent breakthroughs in AI, particularly in protein and nucleic acid structure prediction, rely heavily on the accuracy and efficiency of MSAs to enhance remote homology detection and guide spatial restraints. This review traces the historical evolution of MSA, highlighting its significance in molecular structure and function prediction. We cover the methodologies used for protein monomers, protein complexes, and RNA, while also exploring emerging AI-based alternatives, such as protein language models, as complementary or replacement approaches to traditional MSAs in application tasks. By discussing the strengths, limitations, and applications of these methods, this review aims to provide researchers with valuable insights into MSA's evolving role, equipping them to make informed decisions in structural prediction research.
Collapse
Affiliation(s)
- Chenyue Zhang
- NITFID, School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin 300071, China; (C.Z.); (Y.L.); (G.H.)
| | - Qinxin Wang
- Suzhou New & High-Tech Innovation Service Center, Suzhou 215011, China;
| | - Yiyang Li
- NITFID, School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin 300071, China; (C.Z.); (Y.L.); (G.H.)
| | - Anqi Teng
- Bioscience and Biomedical Engineering Thrust, Systems Hub, The Hong Kong University of Science and Technology (Guangzhou), Guangzhou 511453, China;
| | - Gang Hu
- NITFID, School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin 300071, China; (C.Z.); (Y.L.); (G.H.)
| | - Qiqige Wuyun
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Wei Zheng
- NITFID, School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin 300071, China; (C.Z.); (Y.L.); (G.H.)
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
24
|
Gao W, Jing Z, Meng Y, Liu Q, Wang H, Wei D. Inside-Out Rational Design of Ornithine Cyclodeaminase RlOCD from Rhizobium leguminosarum by a Multiregion Synergy Strategy for Efficient Synthesis of l-Pipecolic Acid. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2024; 72:25782-25790. [PMID: 39387484 DOI: 10.1021/acs.jafc.4c06331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/15/2024]
Abstract
Lysine cyclodeaminase (LCD)-mediated synthesis of l-pipecolic acid (l-PA) from l-lysine (l-Lys) is a promising approach. However, only one LCD has been reported, and its inadequate activity limits industrial applications. To address this problem, a substrate analogue-guided enzyme mining strategy was employed. A novel ornithine cyclodeaminase (OCD) from Rhizobium leguminosarum (RlOCD) was identified in combination with directed macrogenomic approaches. RlOCD displayed a conversion rate of 28% at a substrate loading as high as 1000 mM. A multiregion synergy strategy consisting of pocket reshaping, dynamical cross-correlation matrix-guided coevolutionary design, and surface modification was used to design RlOCD from the inside-out. A quadruple mutant (V93C/L119C/I170T/R90L) designated Mu4 with significantly increased activity was obtained, which showed a 28.46-fold increase in the catalytic efficiency. The conversion of Mu4 was 91% within 10 h at 1000 mM (146.19 g L-1) loading. The space-time yield of 282.1 g L-1 d-1 is the highest level ever reported. Molecular dynamics simulations and interaction analyses revealed that efficient pocket expansion and unique conformational rearrangements increased the affinity for the substrate, resulting in a more catalytically active conformation. This study expands the toolbox for the production of l-PA and demonstrates the effectiveness and potential of Mu4 for its production.
Collapse
Affiliation(s)
- Weijie Gao
- State Key Laboratory of Bioreactor Engineering New World Institute of Biotechnology, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, P. R. China
| | - Zijian Jing
- State Key Laboratory of Bioreactor Engineering New World Institute of Biotechnology, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, P. R. China
| | - Yifang Meng
- State Key Laboratory of Bioreactor Engineering New World Institute of Biotechnology, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, P. R. China
| | - Qinghai Liu
- State Key Laboratory of Bioreactor Engineering New World Institute of Biotechnology, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, P. R. China
| | - Hualei Wang
- State Key Laboratory of Bioreactor Engineering New World Institute of Biotechnology, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, P. R. China
| | - Dongzhi Wei
- State Key Laboratory of Bioreactor Engineering New World Institute of Biotechnology, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, P. R. China
| |
Collapse
|
25
|
Sverak HE, Yaeger LN, Worrall LJ, Vacariu CM, Glenwright AJ, Vuckovic M, Al Azawi ZD, Lamers RP, Marko VA, Skorupski C, Soni AS, Tanner ME, Burrows LL, Strynadka NC. Cryo-EM characterization of the anydromuropeptide permease AmpG central to bacterial fitness and β-lactam antibiotic resistance. Nat Commun 2024; 15:9936. [PMID: 39548104 PMCID: PMC11568325 DOI: 10.1038/s41467-024-54219-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Accepted: 11/03/2024] [Indexed: 11/17/2024] Open
Abstract
Bacteria invest significant resources into the continuous creation and tailoring of their essential protective peptidoglycan (PG) cell wall. Several soluble PG biosynthesis products in the periplasm are transported to the cytosol for recycling, leading to enhanced bacterial fitness. GlcNAc-1,6-anhydroMurNAc and peptide variants are transported by the essential major facilitator superfamily importer AmpG in Gram-negative pathogens including Escherichia coli, Klebsiella pneumoniae, Acinetobacter baumannii, and Pseudomonas aeruginosa. Accumulation of GlcNAc-1,6-anhydroMurNAc-pentapeptides also results from β-lactam antibiotic induced cell wall damage. In some species, these products upregulate the β-lactamase AmpC, which hydrolyzes β-lactams to allow for bacterial survival and drug-resistant infections. Here, we have used cryo-electron microscopy and chemical synthesis of substrates in an integrated structural, biochemical, and cellular analysis of AmpG. We show how AmpG accommodates the large GlcNAc-1,6-anhydroMurNAc peptides, including a unique hydrophobic vestibule to the substrate binding cavity, and characterize residues involved in binding that inform the mechanism of proton-mediated transport.
Collapse
Affiliation(s)
- Helena E Sverak
- Department of Biochemistry and Molecular Biology, University of British Columbia, Vancouver, Canada
- Centre for Blood Research, University of British Columbia, Vancouver, Canada
| | - Luke N Yaeger
- Department of Biochemistry and Biomedical Sciences and the Michael G. DeGroote Institute of Infectious Disease Research, McMaster University, Hamilton, Ontario, Canada
| | - Liam J Worrall
- Department of Biochemistry and Molecular Biology, University of British Columbia, Vancouver, Canada
- Centre for Blood Research, University of British Columbia, Vancouver, Canada
- High Resolution Macromolecular Cryo-Electron Microscopy (HRMEM) Facility, University of British Columbia, Vancouver, Canada
| | | | - Amy J Glenwright
- Department of Biochemistry and Molecular Biology, University of British Columbia, Vancouver, Canada
- Centre for Blood Research, University of British Columbia, Vancouver, Canada
| | - Marija Vuckovic
- Department of Biochemistry and Molecular Biology, University of British Columbia, Vancouver, Canada
- Centre for Blood Research, University of British Columbia, Vancouver, Canada
| | - Zayni-Dean Al Azawi
- Department of Biochemistry and Biomedical Sciences and the Michael G. DeGroote Institute of Infectious Disease Research, McMaster University, Hamilton, Ontario, Canada
| | - Ryan P Lamers
- Department of Biochemistry and Biomedical Sciences and the Michael G. DeGroote Institute of Infectious Disease Research, McMaster University, Hamilton, Ontario, Canada
| | - Victoria A Marko
- Department of Biochemistry and Biomedical Sciences and the Michael G. DeGroote Institute of Infectious Disease Research, McMaster University, Hamilton, Ontario, Canada
| | - Clarissa Skorupski
- Department of Biochemistry and Biomedical Sciences and the Michael G. DeGroote Institute of Infectious Disease Research, McMaster University, Hamilton, Ontario, Canada
| | - Arvind S Soni
- Department of Chemistry, University of British Columbia, Vancouver, Canada
| | - Martin E Tanner
- Department of Chemistry, University of British Columbia, Vancouver, Canada
| | - Lori L Burrows
- Department of Biochemistry and Biomedical Sciences and the Michael G. DeGroote Institute of Infectious Disease Research, McMaster University, Hamilton, Ontario, Canada
| | - Natalie Cj Strynadka
- Department of Biochemistry and Molecular Biology, University of British Columbia, Vancouver, Canada.
- Centre for Blood Research, University of British Columbia, Vancouver, Canada.
- High Resolution Macromolecular Cryo-Electron Microscopy (HRMEM) Facility, University of British Columbia, Vancouver, Canada.
| |
Collapse
|
26
|
Souza Amado de Carvalho R, Rasel MSI, Khandelwal NK, Tomasiak TM. Cryo-EM reveals a phosphorylated R-domain envelops the NBD1 catalytic domain in an ABC transporter. Life Sci Alliance 2024; 7:e202402779. [PMID: 39209537 PMCID: PMC11361370 DOI: 10.26508/lsa.202402779] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Revised: 08/05/2024] [Accepted: 08/06/2024] [Indexed: 09/04/2024] Open
Abstract
Many ATP-binding cassette transporters are regulated by phosphorylation on long and disordered loops which presents a challenge to visualize with structural methods. We have trapped an activated state of the regulatory domain (R-domain) of yeast cadmium factor 1 (Ycf1) by enzymatically enriching the phosphorylated state. A 3.23 Å cryo-EM structure reveals an R-domain structure with four phosphorylated residues and the position for the entire R-domain. The structure reveals key R-domain interactions including a bridging interaction between NBD1 and NBD2 and an interaction with the R-insertion, another regulatory region. We scanned these interactions by systematically replacing segments along the entire R-domain with scrambled combinations of alanine, glycine, and glutamine and probing function under cellular conditions that require the Ycf1 function. We find a close match with these interactions and interacting regions on our R-domain structure that points to the importance of most well-structured segments for function. We propose a model where the R-domain stabilizes a transport-competent state upon phosphorylation by enveloping NBD1 entirely.
Collapse
Affiliation(s)
| | | | - Nitesh K Khandelwal
- Department of Chemistry and Biochemistry, University of Arizona, Tucson, AZ, USA
| | - Thomas M Tomasiak
- Department of Chemistry and Biochemistry, University of Arizona, Tucson, AZ, USA
| |
Collapse
|
27
|
Karagöl A, Karagöl T, Li M, Zhang S. Inhibitory Potential of the Truncated Isoforms on Glutamate Transporter Oligomerization Identified by Computational Analysis of Gene-Centric Isoform Maps. Pharm Res 2024; 41:2173-2187. [PMID: 39487385 PMCID: PMC11599315 DOI: 10.1007/s11095-024-03786-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2024] [Accepted: 10/14/2024] [Indexed: 11/04/2024]
Abstract
OBJECTIVE Glutamate transporters play a key role in central nervous system physiology by maintaining excitatory neurotransmitter homeostasis. Biological assemblies of the transporters, consisting of cyclic homotrimers, emerge as a crucial aspect of glutamate transporter modulation. Hence targeting heteromerization promises an effective approach for modulator design. On the other hand, the dynamic nature of transcription allows for the generation of transporter isoforms in structurally distinct manners. METHODS The potential isoforms were identified through the analysis of computationally generated gene-centric isoform maps. The conserved features of isoform sequences were revealed by computational chemistry methods and subsequent structural analysis of AlphaFold2 predictions. Truncated isoforms were further subjected to a wide range of docking analyses, 50ns molecular dynamics simulations, and evolutionary coupling analyses. RESULTS Energetic landscapes of isoform-canonical transporter complexes suggested an inhibitory potential of truncated isoforms on glutamate transporter bio-assembly. Moreover, isoforms that mimic the trimerization domain (in particular, TM2 helices) exhibited stronger interactions with canonical transporters, underscoring the role of transmembrane helices in isoform interactions. Additionally, self-assembly dynamics observed in truncated isoforms mimicking canonical TM5 helices indicate a potential protective role against unwanted interactions with canonical transporters. CONCLUSION Our computational studies on glutamate transporters offer insights into the roles of alternative splicing on protein interactions and identifies potential drug targets for physiological or pathological processes.
Collapse
Affiliation(s)
- Alper Karagöl
- Istanbul University Istanbul Medical Faculty, Istanbul, Turkey
| | - Taner Karagöl
- Istanbul University Istanbul Medical Faculty, Istanbul, Turkey
| | - Mengke Li
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic and Developmental Sciences, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Shuguang Zhang
- Laboratory of Molecular Architecture, Media Lab, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA, 02139, USA.
| |
Collapse
|
28
|
Li Y, Wang Y, Tan YQ, Yue Q, Guo Y, Yan R, Meng L, Zhai H, Tong L, Yuan Z, Li W, Wang C, Han S, Ren S, Yan Y, Wang W, Gao L, Tan C, Hu T, Zhang H, Liu L, Yang P, Jiang W, Ye Y, Tan H, Wang Y, Lu C, Li X, Xie J, Yuan G, Cui Y, Shen B, Wang C, Guan Y, Li W, Shi Q, Lin G, Ni T, Sun Z, Ye L, Vourekas A, Guo X, Lin M, Zheng K. The landscape of RNA binding proteins in mammalian spermatogenesis. Science 2024; 386:eadj8172. [PMID: 39208083 DOI: 10.1126/science.adj8172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Revised: 04/08/2024] [Accepted: 08/20/2024] [Indexed: 09/04/2024]
Abstract
Despite continuous expansion of the RNA binding protein (RBP) world, there is a lack of systematic understanding of RBPs in the mammalian testis, which harbors one of the most complex tissue transcriptomes. We adapted RNA interactome capture to mouse male germ cells, building an RBP atlas characterized by multiple layers of dynamics along spermatogenesis. Trapping of RNA-cross-linked peptides showed that the glutamic acid-arginine (ER) patch, a residue-coevolved polyampholytic element present in coiled coils, enhances RNA binding of its host RBPs. Deletion of this element in NONO (non-POU domain-containing octamer-binding protein) led to a defective mitosis-to-meiosis transition due to compromised NONO-RNA interactions. Whole-exome sequencing of over 1000 infertile men revealed a prominent role of RBPs in the human genetic architecture of male infertility and identified risk ER patch variants.
Collapse
Affiliation(s)
- Yang Li
- State Key Laboratory of Reproductive Medicine and Offspring Health, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing 211166, China
| | - Yuanyuan Wang
- State Key Laboratory of Reproductive Medicine and Offspring Health, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing 211166, China
- Department of Neurobiology, School of Basic Medical Science, Nanjing Medical University, Nanjing 211166, China
| | - Yue-Qiu Tan
- Institute of Reproductive and Stem Cell Engineering, NHC Key Laboratory of Human Stem Cell and Reproductive Engineering, School of Basic Medical Science, Central South University, Changsha 410083, China
- Clinical Research Center for Reproduction and Genetics in Hunan Province, Reproductive and Genetic Hospital of CITIC-Xiangya, Changsha 410008, China
| | - Qiuling Yue
- State Key Laboratory of Reproductive Medicine and Offspring Health, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing 211166, China
- Department of Andrology, Nanjing Drum Tower Hospital, the Affiliated Hospital of Nanjing University, Nanjing 210008, China
| | - Yueshuai Guo
- State Key Laboratory of Reproductive Medicine and Offspring Health, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing 211166, China
| | - Ruoyu Yan
- State Key Laboratory of Reproductive Medicine and Offspring Health, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing 211166, China
- College of Life Sciences, Northwest A&F University, Yangling 712100, China
| | - Lanlan Meng
- Institute of Reproductive and Stem Cell Engineering, NHC Key Laboratory of Human Stem Cell and Reproductive Engineering, School of Basic Medical Science, Central South University, Changsha 410083, China
- Clinical Research Center for Reproduction and Genetics in Hunan Province, Reproductive and Genetic Hospital of CITIC-Xiangya, Changsha 410008, China
| | - Huicong Zhai
- State Key Laboratory of Reproductive Medicine and Offspring Health, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing 211166, China
| | - Lingxiu Tong
- State Key Laboratory of Reproductive Medicine and Offspring Health, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing 211166, China
| | - Zihan Yuan
- State Key Laboratory of Reproductive Medicine and Offspring Health, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing 211166, China
| | - Wu Li
- State Key Laboratory of Reproductive Medicine and Offspring Health, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing 211166, China
| | - Cuicui Wang
- State Key Laboratory of Reproductive Medicine and Offspring Health, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing 211166, China
| | - Shenglin Han
- State Key Laboratory of Reproductive Medicine and Offspring Health, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing 211166, China
| | - Sen Ren
- State Key Laboratory of Reproductive Medicine and Offspring Health, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing 211166, China
| | - Yitong Yan
- Department of Neurobiology, School of Basic Medical Science, Nanjing Medical University, Nanjing 211166, China
| | - Weixu Wang
- Institute of Computational Biology, Helmholtz Center Munich, Munich 85764, Germany
| | - Lei Gao
- State Key Laboratory of Reproductive Medicine and Offspring Health, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing 211166, China
| | - Chen Tan
- Institute of Reproductive and Stem Cell Engineering, NHC Key Laboratory of Human Stem Cell and Reproductive Engineering, School of Basic Medical Science, Central South University, Changsha 410083, China
| | - Tongyao Hu
- Institute of Reproductive and Stem Cell Engineering, NHC Key Laboratory of Human Stem Cell and Reproductive Engineering, School of Basic Medical Science, Central South University, Changsha 410083, China
| | - Hao Zhang
- State Key Laboratory of Reproductive Medicine and Offspring Health, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing 211166, China
| | - Liya Liu
- Department of Neurobiology, School of Basic Medical Science, Nanjing Medical University, Nanjing 211166, China
| | - Pinglan Yang
- State Key Laboratory of Reproductive Medicine and Offspring Health, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing 211166, China
| | - Wanyin Jiang
- State Key Laboratory of Reproductive Medicine and Offspring Health, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing 211166, China
| | - Yiting Ye
- State Key Laboratory of Reproductive Medicine and Offspring Health, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing 211166, China
| | - Huanhuan Tan
- State Key Laboratory of Reproductive Medicine and Offspring Health, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing 211166, China
| | - Yanfeng Wang
- State Key Laboratory of Reproductive Medicine and Offspring Health, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing 211166, China
| | - Chenyu Lu
- State Key Laboratory of Reproductive Medicine and Offspring Health, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing 211166, China
| | - Xin Li
- State Key Laboratory of Reproductive Medicine and Offspring Health, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing 211166, China
| | - Jie Xie
- State Key Laboratory of Reproductive Medicine and Offspring Health, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing 211166, China
| | - Gege Yuan
- State Key Laboratory of Reproductive Medicine and Offspring Health, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing 211166, China
| | - Yiqiang Cui
- State Key Laboratory of Reproductive Medicine and Offspring Health, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing 211166, China
| | - Bin Shen
- State Key Laboratory of Reproductive Medicine and Offspring Health, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing 211166, China
| | - Cheng Wang
- State Key Laboratory of Reproductive Medicine and Offspring Health, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing 211166, China
- Department of Bioinformatics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 211166, China
| | - Yichun Guan
- Center for Reproductive Medicine, the Third Affiliated Hospital of Zhengzhou University, Zhengzhou 450052, China
| | - Wei Li
- Guangzhou Women and Children's Medical Center, Guangzhou Medical University, Guangzhou 510623, China
| | - Qinghua Shi
- Division of Reproduction and Genetics, First Affiliated Hospital of USC, Hefei National Laboratory for Physical Sciences at Microscale, School of Basic Medical Sciences, Division of Life Sciences and Medicine, Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, Hefei 230027, China
| | - Ge Lin
- Institute of Reproductive and Stem Cell Engineering, NHC Key Laboratory of Human Stem Cell and Reproductive Engineering, School of Basic Medical Science, Central South University, Changsha 410083, China
- Clinical Research Center for Reproduction and Genetics in Hunan Province, Reproductive and Genetic Hospital of CITIC-Xiangya, Changsha 410008, China
| | - Ting Ni
- State Key Laboratory of Genetic Engineering, Collaborative Innovation Center of Genetics and Development, Human Phenome Institute, Shanghai Engineering Research Center of Industrial Microorganisms, School of Life Sciences and Huashan Hospital, Fudan University, Shanghai 200438, China
| | - Zheng Sun
- Department of Medicine, Baylor College of Medicine, Houston, TX 77030, USA
| | - Lan Ye
- State Key Laboratory of Reproductive Medicine and Offspring Health, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing 211166, China
| | - Anastasios Vourekas
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Xuejiang Guo
- State Key Laboratory of Reproductive Medicine and Offspring Health, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing 211166, China
| | - Mingyan Lin
- Department of Neurobiology, School of Basic Medical Science, Nanjing Medical University, Nanjing 211166, China
- Changzhou Medical Center, The Affiliated Changzhou Second People's Hospital of Nanjing Medical University, Changzhou 213000, China
- Division of Birth Cohort Study, Fujian Maternity and Child Health Hospital, Fuzhou 350014, China
| | - Ke Zheng
- State Key Laboratory of Reproductive Medicine and Offspring Health, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing 211166, China
| |
Collapse
|
29
|
Draizen EJ, Veretnik S, Mura C, Bourne PE. Deep generative models of protein structure uncover distant relationships across a continuous fold space. Nat Commun 2024; 15:8094. [PMID: 39294145 PMCID: PMC11410806 DOI: 10.1038/s41467-024-52020-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Accepted: 08/23/2024] [Indexed: 09/20/2024] Open
Abstract
Our views of fold space implicitly rest upon many assumptions that impact how we analyze, interpret and understand protein structure, function and evolution. For instance, is there an optimal granularity in viewing protein structural similarities (e.g., architecture, topology or some other level)? Similarly, the discrete/continuous dichotomy of fold space is central, but remains unresolved. Discrete views of fold space bin similar folds into distinct, non-overlapping groups; unfortunately, such binning can miss remote relationships. While hierarchical systems like CATH are indispensable resources, less heuristic and more conceptually flexible approaches could enable more nuanced explorations of fold space. Building upon an Urfold model of protein structure, here we present a deep generative modeling framework, termed DeepUrfold, for analyzing protein relationships at scale. DeepUrfold's learned embeddings occupy high-dimensional latent spaces that can be distilled for a given protein in terms of an amalgamated representation uniting sequence, structure and biophysical properties. This approach is structure-guided, versus being purely structure-based, and DeepUrfold learns representations that, in a sense, define superfamilies. Deploying DeepUrfold with CATH reveals evolutionarily-remote relationships that evade existing methodologies, and suggests a mostly-continuous view of fold space-a view that extends beyond simple geometric similarity, towards the realm of integrated sequence ↔ structure ↔ function properties.
Collapse
Affiliation(s)
- Eli J Draizen
- School of Data Science, University of Virginia, Charlottesville, VA, USA.
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA, USA.
| | - Stella Veretnik
- School of Data Science, University of Virginia, Charlottesville, VA, USA
| | - Cameron Mura
- School of Data Science, University of Virginia, Charlottesville, VA, USA.
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA, USA.
| | - Philip E Bourne
- School of Data Science, University of Virginia, Charlottesville, VA, USA
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA, USA
| |
Collapse
|
30
|
Zhang T, Tian S, Gao Z, Li Y, Jia H. Engineering an Ancestral Glycosyltransferase for Biosynthesis of 2-Phenylethyl-β-d-Glucopyranoside and Salidroside. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2024; 72:19966-19976. [PMID: 39189841 DOI: 10.1021/acs.jafc.4c04381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/28/2024]
Abstract
Phenylethanoid glycosides (PhGs) are naturally occurring glycosides derived from plants with various biological activities. Glycosyltransferases catalyze the production of PhGs from phenylethanols via a transglycosylation reaction. The low activity and stability of glycosyltransferase limit its industrial application. An ancestral glycosyltransferase, UGTAn85, with heat resistance, alkali resistance, and high stability was resurrected using ancestral sequence reconstruction technology. This enzyme can efficiently convert phenylethanols to PhGs. The optimal reaction temperature and pH for UGTAn85 were found to be 70 °C and pH 10.0, respectively. This study employed a combination of structure-guided rational design and co-evolution analysis to enhance its catalytic activity. Potential mutation sites were identified through computer-aided design, including homology modeling, molecular docking, Rosetta dock design, molecular dynamics simulation, and co-evolution analysis. By targeted mutagenesis, the UGTAn85 mutant Q23E/N65D exhibited a 2.2-fold increase in enzyme activity (11.85 U/mg) and elevated affinity (Km = 0.11 mM) for 2-phenylethanol compared to UGTAn85. Following a fed-batch reaction, 36.16 g/L 2-phenylethyl-β-d-glucopyranoside and 51.49 g/L salidroside could be produced within 24 h, respectively. The findings in this study provide a new perspective on enhancing the stability and activity of glycosyltransferases, as well as a potential biocatalyst for the industrial production of PhGs.
Collapse
Affiliation(s)
- Ting Zhang
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, Nanjing 211816, China
| | - Shaowei Tian
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, Nanjing 211816, China
| | - Zhen Gao
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, Nanjing 211816, China
| | - Yan Li
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, Nanjing 211816, China
| | - Honghua Jia
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, Nanjing 211816, China
| |
Collapse
|
31
|
Luo J, Song C, Cui W, Wang Q, Zhou Z, Han L. Precise redesign for improving enzyme robustness based on coevolutionary analysis and multidimensional virtual screening. Chem Sci 2024:d4sc02058h. [PMID: 39257856 PMCID: PMC11382147 DOI: 10.1039/d4sc02058h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Accepted: 06/27/2024] [Indexed: 09/12/2024] Open
Abstract
Natural enzymes are able to function effectively under optimal physiological conditions, but the intrinsic performance often fails to meet the demands of industrial production. Existing strategies are based mainly on the evaluation and subsequent combination of single-point mutations; however, this approach often suffers from a limited number of designable residues and from low accuracy. Here, we propose a strategy (Co-MdVS) based on coevolutionary analysis and multidimensional virtual screening for precise design to improve enzyme robustness, employing nattokinase as a model. Using this strategy, we efficiently screened 8 dual mutants with enhanced thermostability from a virtual mutation library containing 7980 mutants. After further iterative combination, the optimal mutant M6 exhibited a 31-fold increase in half-life at 55 °C, significantly enhanced acid resistance, and improved catalytic efficiency with different substrates. Molecular dynamics simulations indicated that the reduced flexibility of thermal and acid-sensitive regions resulted in a significantly increased robustness of M6. Furthermore, the potential of multidimensional virtual screening in enhancing design precision has been validated on l-rhamnose isomerase and PETase. Therefore, the Co-MdVS strategy introduced in this research may offer a viable approach for developing enzymes with enhanced robustness.
Collapse
Affiliation(s)
- Jie Luo
- Key Laboratory of Industrial Biotechnology (Ministry of Education), School of Biotechnology, Jiangnan University Wuxi Jiangsu 214122 China
| | - Chenshuo Song
- Key Laboratory of Industrial Biotechnology (Ministry of Education), School of Biotechnology, Jiangnan University Wuxi Jiangsu 214122 China
| | - Wenjing Cui
- Key Laboratory of Industrial Biotechnology (Ministry of Education), School of Biotechnology, Jiangnan University Wuxi Jiangsu 214122 China
| | - Qiong Wang
- Key Laboratory of Industrial Biotechnology (Ministry of Education), School of Biotechnology, Jiangnan University Wuxi Jiangsu 214122 China
| | - Zhemin Zhou
- Key Laboratory of Industrial Biotechnology (Ministry of Education), School of Biotechnology, Jiangnan University Wuxi Jiangsu 214122 China
| | - Laichuang Han
- Key Laboratory of Industrial Biotechnology (Ministry of Education), School of Biotechnology, Jiangnan University Wuxi Jiangsu 214122 China
| |
Collapse
|
32
|
Tang X, Ortner NJ, Nikonishyna YV, Fernández-Quintero ML, Kokot J, Striessnig J, Liedl KR. Pathogenicity of de novo CACNA1D Ca 2+ channel variants predicted from sequence co-variation. Eur J Hum Genet 2024; 32:1065-1073. [PMID: 38553610 PMCID: PMC11369236 DOI: 10.1038/s41431-024-01594-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 03/02/2024] [Accepted: 03/12/2024] [Indexed: 09/04/2024] Open
Abstract
Voltage-gated L-type Cav1.3 Ca2+ channels support numerous physiological functions including neuronal excitability, sinoatrial node pacemaking, hearing, and hormone secretion. De novo missense mutations in the gene of their pore-forming α1-subunit (CACNA1D) induce severe gating defects which lead to autism spectrum disorder and a more severe neurological disorder with and without endocrine symptoms. The number of CACNA1D variants reported is constantly rising, but their pathogenic potential often remains unclear, which complicates clinical decision-making. Since functional tests are time-consuming and not always available, bioinformatic tools further improving pathogenicity potential prediction of novel variants are needed. Here we employed evolutionary analysis considering sequences of the Cav1.3 α1-subunit throughout the animal kingdom to predict the pathogenicity of human disease-associated CACNA1D missense variants. Co-variation analyses of evolutionary information revealed residue-residue couplings and allowed to generate a score, which correctly predicted previously identified pathogenic variants, supported pathogenicity in variants previously classified as likely pathogenic and even led to the re-classification or re-examination of 18 out of 80 variants previously assessed with clinical and electrophysiological data. Based on the prediction score, we electrophysiologically tested one variant (V584I) and found significant gating changes associated with pathogenic risks. Thus, our co-variation model represents a valuable addition to complement the assessment of the pathogenicity of CACNA1D variants completely independent of clinical diagnoses, electrophysiology, structural or biophysical considerations, and solely based on evolutionary analyses.
Collapse
Affiliation(s)
- Xuechen Tang
- Department of General, Inorganic and Theoretical Chemistry, Center for Molecular Biosciences Innsbruck, University of Innsbruck, A-6020, Innsbruck, Austria
| | - Nadine J Ortner
- Department of Pharmacology and Toxicology, Center for Molecular Biosciences Innsbruck, University of Innsbruck, A-6020, Innsbruck, Austria
| | - Yuliia V Nikonishyna
- Department of Pharmacology and Toxicology, Center for Molecular Biosciences Innsbruck, University of Innsbruck, A-6020, Innsbruck, Austria
| | - Monica L Fernández-Quintero
- Department of General, Inorganic and Theoretical Chemistry, Center for Molecular Biosciences Innsbruck, University of Innsbruck, A-6020, Innsbruck, Austria
| | - Janik Kokot
- Department of General, Inorganic and Theoretical Chemistry, Center for Molecular Biosciences Innsbruck, University of Innsbruck, A-6020, Innsbruck, Austria
| | - Jörg Striessnig
- Department of Pharmacology and Toxicology, Center for Molecular Biosciences Innsbruck, University of Innsbruck, A-6020, Innsbruck, Austria.
| | - Klaus R Liedl
- Department of General, Inorganic and Theoretical Chemistry, Center for Molecular Biosciences Innsbruck, University of Innsbruck, A-6020, Innsbruck, Austria.
| |
Collapse
|
33
|
Xu R, Pan Q, Zhu G, Ye Y, Xin M, Wang Z, Wang S, Li W, Wei Y, Guo J, Zheng L. ThermoLink: Bridging disulfide bonds and enzyme thermostability through database construction and machine learning prediction. Protein Sci 2024; 33:e5097. [PMID: 39145402 PMCID: PMC11325166 DOI: 10.1002/pro.5097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2024] [Revised: 05/27/2024] [Accepted: 06/15/2024] [Indexed: 08/16/2024]
Abstract
Disulfide bonds, covalently formed by sulfur atoms in cysteine residues, play a crucial role in protein folding and structure stability. Considering their significance, artificial disulfide bonds are often introduced to enhance protein thermostability. Although an increasing number of tools can assist with this task, significant amounts of time and resources are often wasted owing to inadequate consideration. To enhance the accuracy and efficiency of designing disulfide bonds for protein thermostability improvement, we initially collected disulfide bond and protein thermostability data from extensive literature sources. Thereafter, we extracted various sequence- and structure-based features and constructed machine-learning models to predict whether disulfide bonds can improve protein thermostability. Among all models, the neighborhood context model based on the Adaboost-DT algorithm performed the best, yielding "area under the receiver operating characteristic curve" and accuracy scores of 0.773 and 0.714, respectively. Furthermore, we also found AlphaFold2 to exhibit high superiority in predicting disulfide bonds, and to some extent, the coevolutionary relationship between residue pairs potentially guided artificial disulfide bond design. Moreover, several mutants of imine reductase 89 (IR89) with artificially designed thermostable disulfide bonds were experimentally proven to be considerably efficient for substrate catalysis. The SS-bond data have been integrated into an online server, namely, ThermoLink, available at guolab.mpu.edu.mo/thermoLink.
Collapse
Affiliation(s)
- Ran Xu
- Centre in Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macao, China
| | - Qican Pan
- Zelixir Biotech Company Ltd, Shanghai, China
| | | | - Yilin Ye
- Centre in Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macao, China
| | - Minghui Xin
- School of Physics, Shandong University, Jinan, China
| | - Zechen Wang
- School of Physics, Shandong University, Jinan, China
| | - Sheng Wang
- Zelixir Biotech Company Ltd, Shanghai, China
| | - Weifeng Li
- School of Physics, Shandong University, Jinan, China
| | - Yanjie Wei
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Jingjing Guo
- Centre in Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macao, China
| | - Liangzhen Zheng
- Zelixir Biotech Company Ltd, Shanghai, China
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| |
Collapse
|
34
|
Almhjell PJ, Johnston KE, Porter NJ, Kennemur JL, Bhethanabotla VC, Ducharme J, Arnold FH. The β-subunit of tryptophan synthase is a latent tyrosine synthase. Nat Chem Biol 2024; 20:1086-1093. [PMID: 38744987 PMCID: PMC11288773 DOI: 10.1038/s41589-024-01619-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Accepted: 04/04/2024] [Indexed: 05/16/2024]
Abstract
Aromatic amino acids and their derivatives are diverse primary and secondary metabolites with critical roles in protein synthesis, cell structure and integrity, defense and signaling. All de novo aromatic amino acid production relies on a set of ancient and highly conserved chemistries. Here we introduce a new enzymatic transformation for L-tyrosine synthesis by demonstrating that the β-subunit of tryptophan synthase-which natively couples indole and L-serine to form L-tryptophan-can act as a latent 'tyrosine synthase'. A single substitution of a near-universally conserved catalytic residue unlocks activity toward simple phenol analogs and yields exclusive para carbon-carbon bond formation to furnish L-tyrosines. Structural and mechanistic studies show how a new active-site water molecule orients phenols for a nonnative mechanism of alkylation, with additional directed evolution resulting in a net >30,000-fold rate enhancement. This new biocatalyst can be used to efficiently prepare valuable L-tyrosine analogs at gram scales and provides the missing chemistry for a conceptually different pathway to L-tyrosine.
Collapse
Affiliation(s)
- Patrick J Almhjell
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA, USA
- Department of Biochemistry, Stanford University, Stanford, CA, USA
| | - Kadina E Johnston
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
- Merck & Co., Inc, South San Francisco, CA, USA
| | - Nicholas J Porter
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA, USA
- Codexis, Inc., Redwood City, CA, USA
| | - Jennifer L Kennemur
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Vignesh C Bhethanabotla
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Julie Ducharme
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA, USA
- Quebec Government Office, Los Angeles, CA, USA
| | - Frances H Arnold
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA, USA.
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA.
| |
Collapse
|
35
|
Ding K, Chin M, Zhao Y, Huang W, Mai BK, Wang H, Liu P, Yang Y, Luo Y. Machine learning-guided co-optimization of fitness and diversity facilitates combinatorial library design in enzyme engineering. Nat Commun 2024; 15:6392. [PMID: 39080249 PMCID: PMC11289365 DOI: 10.1038/s41467-024-50698-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Accepted: 07/19/2024] [Indexed: 08/02/2024] Open
Abstract
The effective design of combinatorial libraries to balance fitness and diversity facilitates the engineering of useful enzyme functions, particularly those that are poorly characterized or unknown in biology. We introduce MODIFY, a machine learning (ML) algorithm that learns from natural protein sequences to infer evolutionarily plausible mutations and predict enzyme fitness. MODIFY co-optimizes predicted fitness and sequence diversity of starting libraries, prioritizing high-fitness variants while ensuring broad sequence coverage. In silico evaluation shows that MODIFY outperforms state-of-the-art unsupervised methods in zero-shot fitness prediction and enables ML-guided directed evolution with enhanced efficiency. Using MODIFY, we engineer generalist biocatalysts derived from a thermostable cytochrome c to achieve enantioselective C-B and C-Si bond formation via a new-to-nature carbene transfer mechanism, leading to biocatalysts six mutations away from previously developed enzymes while exhibiting superior or comparable activities. These results demonstrate MODIFY's potential in solving challenging enzyme engineering problems beyond the reach of classic directed evolution.
Collapse
Affiliation(s)
- Kerr Ding
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA
| | - Michael Chin
- Department of Chemistry and Biochemistry, University of California, Santa Barbara, CA, 93106, USA
| | - Yunlong Zhao
- Department of Chemistry and Biochemistry, University of California, Santa Barbara, CA, 93106, USA
| | - Wei Huang
- Department of Chemistry and Biochemistry, University of California, Santa Barbara, CA, 93106, USA
| | - Binh Khanh Mai
- Department of Chemistry, University of Pittsburgh, Pittsburgh, PA, 15260, USA
| | - Huanan Wang
- Department of Chemistry and Biochemistry, University of California, Santa Barbara, CA, 93106, USA
| | - Peng Liu
- Department of Chemistry, University of Pittsburgh, Pittsburgh, PA, 15260, USA.
| | - Yang Yang
- Department of Chemistry and Biochemistry, University of California, Santa Barbara, CA, 93106, USA.
- Biomolecular Science and Engineering (BMSE) Program, University of California, Santa Barbara, CA, 93106, USA.
| | - Yunan Luo
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA.
| |
Collapse
|
36
|
Otteson L, Nagy G, Kunkel J, Kodis G, Zheng W, Bignon C, Longhi S, Grubmüller H, Vaiana AC, Vaiana SM. Transient Non-local Interactions Dominate the Dynamics of Measles Virus N TAIL. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.22.604679. [PMID: 39091801 PMCID: PMC11291014 DOI: 10.1101/2024.07.22.604679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/04/2024]
Abstract
The RNA genome of measles virus is encapsidated by the nucleoprotein within a helical nucleocapsid that serves as template for both transcription and replication. The intrinsically disordered domain of the nucleoprotein (NTAIL), partly protruding outward from the nucleocapsid, is essential for binding the polymerase complex responsible for viral transcription and replication. As for many IDPs, binding of NTAIL occurs through a short molecular recognition element (MoRE) that folds upon binding, with the majority of NTAIL remaining disordered. Though NTAIL regions far from the MoRE influence the binding affinity, interactions between them and the MoRE have not been investigated in depth. Using an integrated approach, relying on photo-induced electron transfer (PET) experiments between tryptophan and cysteine pairs placed at different positions in the protein under varying salt and pH conditions, combined with simulations and analytical models, we identified transient interactions between two disordered regions distant in sequence, which dominate NTAIL dynamics, and regulate the conformational preferences of both the MoRE and the entire NTAIL domain. Co-evolutionary analysis corroborates our findings, and suggests an important functional role for the same intramolecular interactions. We propose mechanisms by which these non-local interactions may regulate binding to the phosphoprotein, polymerase recruitment, and ultimately viral transcription and replication. Our findings may be extended to other IDPs, where non-local intra-protein interactions affect the conformational preferences of intermolecular binding sites.
Collapse
Affiliation(s)
- Lillian Otteson
- Center for Biological Physics, Arizona State University, Tempe, AZ, USA
- Department of Physics, Arizona State University, Tempe, AZ 85287, USA
| | - Gabor Nagy
- Theoretical and Computational Biophysics, Max Planck Institute for Multidisciplinary Sciences, Göttingen, Germany
| | - John Kunkel
- Center for Biological Physics, Arizona State University, Tempe, AZ, USA
- Department of Physics, Arizona State University, Tempe, AZ 85287, USA
| | - Gerdenis Kodis
- Center for Biological Physics, Arizona State University, Tempe, AZ, USA
- Department of Physics, Arizona State University, Tempe, AZ 85287, USA
| | - Wenwei Zheng
- College of Integrative Sciences and Arts, Arizona State University, Mesa, AZ 85212, USA
| | | | - Sonia Longhi
- Aix Marseille Univ, CNRS, AFMB, UMR 7257, Marseille, France
| | - Helmut Grubmüller
- Theoretical and Computational Biophysics, Max Planck Institute for Multidisciplinary Sciences, Göttingen, Germany
| | - Andrea C Vaiana
- Theoretical and Computational Biophysics, Max Planck Institute for Multidisciplinary Sciences, Göttingen, Germany
- Present address: Nature's Toolbox, Inc. (NTx), Rio Rancho, NM 87144, USA
| | - Sara M Vaiana
- Center for Biological Physics, Arizona State University, Tempe, AZ, USA
- Department of Physics, Arizona State University, Tempe, AZ 85287, USA
| |
Collapse
|
37
|
Basu S, Subedi U, Tonelli M, Afshinpour M, Tiwari N, Fuentes EJ, Chakravarty S. Assessing the functional roles of coevolving PHD finger residues. Protein Sci 2024; 33:e5065. [PMID: 38923615 PMCID: PMC11201814 DOI: 10.1002/pro.5065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 04/21/2024] [Accepted: 05/16/2024] [Indexed: 06/28/2024]
Abstract
Although in silico folding based on coevolving residue constraints in the deep-learning era has transformed protein structure prediction, the contributions of coevolving residues to protein folding, stability, and other functions in physical contexts remain to be clarified and experimentally validated. Herein, the PHD finger module, a well-known histone reader with distinct subtypes containing subtype-specific coevolving residues, was used as a model to experimentally assess the contributions of coevolving residues and to clarify their specific roles. The results of the assessment, including proteolysis and thermal unfolding of wildtype and mutant proteins, suggested that coevolving residues have varying contributions, despite their large in silico constraints. Residue positions with large constraints were found to contribute to stability in one subtype but not others. Computational sequence design and generative model-based energy estimates of individual structures were also implemented to complement the experimental assessment. Sequence design and energy estimates distinguish coevolving residues that contribute to folding from those that do not. The results of proteolytic analysis of mutations at positions contributing to folding were consistent with those suggested by sequence design and energy estimation. Thus, we report a comprehensive assessment of the contributions of coevolving residues, as well as a strategy based on a combination of approaches that should enable detailed understanding of the residue contributions in other large protein families.
Collapse
Affiliation(s)
- Shraddha Basu
- Department of Chemistry & BiochemistrySouth Dakota State UniversityBrookingsSouth DakotaUSA
| | - Ujwal Subedi
- Department of Chemistry & BiochemistrySouth Dakota State UniversityBrookingsSouth DakotaUSA
| | - Marco Tonelli
- National Magnetic Resonance Facility at Madison (NMRFAM), University of Wisconsin‐MadisonMadisonWisconsinUSA
| | - Maral Afshinpour
- Department of Chemistry & BiochemistrySouth Dakota State UniversityBrookingsSouth DakotaUSA
| | - Nitija Tiwari
- Department of Biochemistry & Molecular BiologyUniversity of IowaIowa CityIowaUSA
| | - Ernesto J. Fuentes
- Department of Biochemistry & Molecular BiologyUniversity of IowaIowa CityIowaUSA
| | - Suvobrata Chakravarty
- Department of Chemistry & BiochemistrySouth Dakota State UniversityBrookingsSouth DakotaUSA
| |
Collapse
|
38
|
Martí JM, Hsu C, Rochereau C, Xu C, Blazejewski T, Nisonoff H, Leonard SP, Kang-Yun CS, Chlebek J, Ricci DP, Park D, Wang H, Listgarten J, Jiao Y, Allen JE. GENTANGLE: integrated computational design of gene entanglements. Bioinformatics 2024; 40:btae380. [PMID: 38905502 PMCID: PMC11251573 DOI: 10.1093/bioinformatics/btae380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 06/01/2024] [Accepted: 06/14/2024] [Indexed: 06/23/2024] Open
Abstract
SUMMARY The design of two overlapping genes in a microbial genome is an emerging technique for adding more reliable control mechanisms in engineered organisms for increased stability. The design of functional overlapping gene pairs is a challenging procedure, and computational design tools are used to improve the efficiency to deploy successful designs in genetically engineered systems. GENTANGLE (Gene Tuples ArraNGed in overLapping Elements) is a high-performance containerized pipeline for the computational design of two overlapping genes translated in different reading frames of the genome. This new software package can be used to design and test gene entanglements for microbial engineering projects using arbitrary sets of user-specified gene pairs. AVAILABILITY AND IMPLEMENTATION The GENTANGLE source code and its submodules are freely available on GitHub at https://github.com/BiosecSFA/gentangle. The DATANGLE (DATA for genTANGLE) repository contains related data and results and is freely available on GitHub at https://github.com/BiosecSFA/datangle. The GENTANGLE container is freely available on Singularity Cloud Library at https://cloud.sylabs.io/library/khyox/gentangle/gentangle.sif. The GENTANGLE repository wiki (https://github.com/BiosecSFA/gentangle/wiki), website (https://biosecsfa.github.io/gentangle/), and user manual contain detailed instructions on how to use the different components of software and data, including examples and reproducing the results. The code is licensed under the GNU Affero General Public License version 3 (https://www.gnu.org/licenses/agpl.html).
Collapse
Affiliation(s)
- Jose Manuel Martí
- Global Security Computing Applications Division, Lawrence Livermore National Laboratory, Livermore, CA 94550, United States
| | - Chloe Hsu
- Center for Computational Biology, University of California Berkeley, Berkeley, CA 94720, USA
| | - Charlotte Rochereau
- Department of Systems Biology, Columbia University, New York, NY 10023, United States
| | - Chenling Xu
- Biosciences & Biotechnology Division, Lawrence Livermore National Laboratory, Livermore, CA 94550, United States
| | - Tomasz Blazejewski
- Department of Systems Biology, Columbia University, New York, NY 10023, United States
| | - Hunter Nisonoff
- Center for Computational Biology, University of California Berkeley, Berkeley, CA 94720, USA
| | - Sean P Leonard
- Biosciences & Biotechnology Division, Lawrence Livermore National Laboratory, Livermore, CA 94550, United States
| | - Christina S Kang-Yun
- Biosciences & Biotechnology Division, Lawrence Livermore National Laboratory, Livermore, CA 94550, United States
| | - Jennifer Chlebek
- Biosciences & Biotechnology Division, Lawrence Livermore National Laboratory, Livermore, CA 94550, United States
| | - Dante P Ricci
- Biosciences & Biotechnology Division, Lawrence Livermore National Laboratory, Livermore, CA 94550, United States
| | - Dan Park
- Biosciences & Biotechnology Division, Lawrence Livermore National Laboratory, Livermore, CA 94550, United States
| | - Harris Wang
- Department of Systems Biology, Columbia University, New York, NY 10023, United States
| | - Jennifer Listgarten
- Center for Computational Biology, University of California Berkeley, Berkeley, CA 94720, USA
| | - Yongqin Jiao
- Biosciences & Biotechnology Division, Lawrence Livermore National Laboratory, Livermore, CA 94550, United States
| | - Jonathan E Allen
- Global Security Computing Applications Division, Lawrence Livermore National Laboratory, Livermore, CA 94550, United States
| |
Collapse
|
39
|
Fram B, Su Y, Truebridge I, Riesselman AJ, Ingraham JB, Passera A, Napier E, Thadani NN, Lim S, Roberts K, Kaur G, Stiffler MA, Marks DS, Bahl CD, Khan AR, Sander C, Gauthier NP. Simultaneous enhancement of multiple functional properties using evolution-informed protein design. Nat Commun 2024; 15:5141. [PMID: 38902262 PMCID: PMC11190266 DOI: 10.1038/s41467-024-49119-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 05/24/2024] [Indexed: 06/22/2024] Open
Abstract
A major challenge in protein design is to augment existing functional proteins with multiple property enhancements. Altering several properties likely necessitates numerous primary sequence changes, and novel methods are needed to accurately predict combinations of mutations that maintain or enhance function. Models of sequence co-variation (e.g., EVcouplings), which leverage extensive information about various protein properties and activities from homologous protein sequences, have proven effective for many applications including structure determination and mutation effect prediction. We apply EVcouplings to computationally design variants of the model protein TEM-1 β-lactamase. Nearly all the 14 experimentally characterized designs were functional, including one with 84 mutations from the nearest natural homolog. The designs also had large increases in thermostability, increased activity on multiple substrates, and nearly identical structure to the wild type enzyme. This study highlights the efficacy of evolutionary models in guiding large sequence alterations to generate functional diversity for protein design applications.
Collapse
Affiliation(s)
- Benjamin Fram
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA.
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA.
| | - Yang Su
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Ian Truebridge
- Institute for Protein Innovation, Boston, MA, USA
- Division of Hematology/Oncology, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
- AI Proteins, Boston, MA, USA
| | - Adam J Riesselman
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
- Program in Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - John B Ingraham
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Alessandro Passera
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Research Institute of Molecular Pathology (IMP), Vienna BioCenter (VBC), Campus-Vienna-Biocenter 1, 1030, Vienna, Austria
| | - Eve Napier
- School of Biochemistry and Immunology, Trinity College Dublin, Dublin 2, Ireland
| | - Nicole N Thadani
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
- Apriori Bio, Cambridge, MA, USA
| | - Samuel Lim
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Kristen Roberts
- Selux Diagnostics Inc., 56 Roland Street, Charlestown, MA, USA
| | - Gurleen Kaur
- Selux Diagnostics Inc., 56 Roland Street, Charlestown, MA, USA
| | - Michael A Stiffler
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Dyno Therapeutics, 343 Arsenal Street, Watertown, MA, USA
| | - Debora S Marks
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Christopher D Bahl
- Institute for Protein Innovation, Boston, MA, USA
- Division of Hematology/Oncology, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
- AI Proteins, Boston, MA, USA
| | - Amir R Khan
- School of Biochemistry and Immunology, Trinity College Dublin, Dublin 2, Ireland
- Division of Newborn Medicine, Boston Children's Hospital, Boston, MA, USA
| | - Chris Sander
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Nicholas P Gauthier
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA.
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
40
|
Chakraborty J, Poddar S, Dutta S, Bahulekar V, Harne S, Srinivasan R, Gayathri P. Dynamics of interdomain rotation facilitates FtsZ filament assembly. J Biol Chem 2024; 300:107336. [PMID: 38718863 PMCID: PMC11157280 DOI: 10.1016/j.jbc.2024.107336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2024] [Revised: 04/22/2024] [Accepted: 04/23/2024] [Indexed: 05/31/2024] Open
Abstract
FtsZ, the tubulin homolog essential for bacterial cell division, assembles as the Z-ring at the division site, and directs peptidoglycan synthesis by treadmilling. It is unclear how FtsZ achieves kinetic polarity that drives treadmilling. To obtain insights into fundamental features of FtsZ assembly dynamics independent of peptidoglycan synthesis, we carried out structural and biochemical characterization of FtsZ from the cell wall-less bacteria, Spiroplasma melliferum (SmFtsZ). Interestingly the structures of SmFtsZ, bound to GDP and GMPPNP respectively, were captured as domain swapped dimers. SmFtsZ was found to be a slower GTPase with a higher critical concentration (CC) compared to Escherichia coli FtsZ (EcFtsZ). In FtsZs, a conformational switch from R-state (close) to T-state (open) favors polymerization. We identified that Phe224, located at the interdomain cleft of SmFtsZ, is crucial for R- to T-state transition. SmFtsZF224M exhibited higher GTPase activity and lower CC, whereas the corresponding EcFtsZM225F resulted in cell division defects in E. coli. Our results demonstrate that relative rotation of the domains is a rate-limiting step of polymerization. Our structural analysis suggests that the rotation is plausibly triggered upon addition of a GTP-bound monomer to the filament through interaction of the preformed N-terminal domain (NTD). Hence, addition of monomers to the NTD-exposed end of filament is slower in comparison to the C-terminal domain (CTD) end, thus explaining kinetic polarity. In summary, the study highlights the importance of interdomain interactions and conformational changes in regulating FtsZ assembly dynamics.
Collapse
Affiliation(s)
- Joyeeta Chakraborty
- Biology Division, Indian Institute of Science Education and Research, Pune, India
| | - Sakshi Poddar
- School of Biological Sciences, National Institute of Science Education and Research, Bhubaneswar, India; Homi Bhabha National Institutes (HBNI), Training School Complex, Mumbai, India
| | - Soumyajit Dutta
- Biology Division, Indian Institute of Science Education and Research, Pune, India
| | - Vaishnavi Bahulekar
- Biology Division, Indian Institute of Science Education and Research, Pune, India
| | - Shrikant Harne
- Biology Division, Indian Institute of Science Education and Research, Pune, India
| | - Ramanujam Srinivasan
- School of Biological Sciences, National Institute of Science Education and Research, Bhubaneswar, India; Homi Bhabha National Institutes (HBNI), Training School Complex, Mumbai, India
| | - Pananghat Gayathri
- Biology Division, Indian Institute of Science Education and Research, Pune, India.
| |
Collapse
|
41
|
Nixon C, Lim SA, Sternke M, Barrick D, Harms MJ, Marqusee S. The importance of input sequence set to consensus-derived proteins and their relationship to reconstructed ancestral proteins. Protein Sci 2024; 33:e5011. [PMID: 38747388 PMCID: PMC11094778 DOI: 10.1002/pro.5011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 04/02/2024] [Accepted: 04/23/2024] [Indexed: 05/19/2024]
Abstract
A protein sequence encodes its energy landscape-all the accessible conformations, energetics, and dynamics. The evolutionary relationship between sequence and landscape can be probed phylogenetically by compiling a multiple sequence alignment of homologous sequences and generating common ancestors via Ancestral Sequence Reconstruction or a consensus protein containing the most common amino acid at each position. Both ancestral and consensus proteins are often more stable than their extant homologs-questioning the differences between them and suggesting that both approaches serve as general methods to engineer thermostability. We used the Ribonuclease H family to compare these approaches and evaluate how the evolutionary relationship of the input sequences affects the properties of the resulting consensus protein. While the consensus protein derived from our full Ribonuclease H sequence alignment is structured and active, it neither shows properties of a well-folded protein nor has enhanced stability. In contrast, the consensus protein derived from a phylogenetically-restricted set of sequences is significantly more stable and cooperatively folded, suggesting that cooperativity may be encoded by different mechanisms in separate clades and lost when too many diverse clades are combined to generate a consensus protein. To explore this, we compared pairwise covariance scores using a Potts formalism as well as higher-order sequence correlations using singular value decomposition (SVD). We find the SVD coordinates of a stable consensus sequence are close to coordinates of the analogous ancestor sequence and its descendants, whereas the unstable consensus sequences are outliers in SVD space.
Collapse
Affiliation(s)
- Charlotte Nixon
- Department of Molecular and Cell BiologyUniversity of California, BerkeleyBerkeleyCaliforniaUSA
| | - Shion A. Lim
- Department of Molecular and Cell BiologyUniversity of California, BerkeleyBerkeleyCaliforniaUSA
| | - Matt Sternke
- The T.C. Jenkins Department of BiophysicsJohns Hopkins UniversityBaltimoreMarylandUSA
| | - Doug Barrick
- The T.C. Jenkins Department of BiophysicsJohns Hopkins UniversityBaltimoreMarylandUSA
| | - Michael J. Harms
- Department of Chemistry and BiochemistryUniversity of OregonEugeneOregonUSA
| | - Susan Marqusee
- Department of Molecular and Cell BiologyUniversity of California, BerkeleyBerkeleyCaliforniaUSA
- Department of ChemistryUniversity of California, BerkeleyBerkeleyCaliforniaUSA
- California Institute for Quantitative Biosciences (QB3)BerkeleyCaliforniaUSA
| |
Collapse
|
42
|
Frank HM, Walujkar S, Walsh RM, Laursen WJ, Theobald DL, Garrity PA, Gaudet R. Structural basis of ligand specificity and channel activation in an insect gustatory receptor. Cell Rep 2024; 43:114035. [PMID: 38573859 PMCID: PMC11100771 DOI: 10.1016/j.celrep.2024.114035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2023] [Revised: 02/26/2024] [Accepted: 03/18/2024] [Indexed: 04/06/2024] Open
Abstract
Gustatory receptors (GRs) are critical for insect chemosensation and are potential targets for controlling pests and disease vectors, making their structural investigation a vital step toward such applications. We present structures of Bombyx mori Gr9 (BmGr9), a fructose-gated cation channel, in agonist-free and fructose-bound states. BmGr9 forms a tetramer similar to distantly related insect odorant receptors (ORs). Upon fructose binding, BmGr9's channel gate opens through helix S7b movements. In contrast to ORs, BmGr9's ligand-binding pocket, shaped by a kinked helix S4 and a shorter extracellular S3-S4 loop, is larger and solvent accessible in both agonist-free and fructose-bound states. Also, unlike ORs, fructose binding by BmGr9 involves helix S5 and a pocket lined with aromatic and polar residues. Structure-based sequence alignments reveal distinct patterns of ligand-binding pocket residue conservation in GR subfamilies associated with different ligand classes. These data provide insight into the molecular basis of GR ligand specificity and function.
Collapse
Affiliation(s)
- Heather M Frank
- Department of Molecular and Cellular Biology, Harvard University, 52 Oxford Street, Cambridge, MA 02138, USA
| | - Sanket Walujkar
- Department of Molecular and Cellular Biology, Harvard University, 52 Oxford Street, Cambridge, MA 02138, USA
| | - Richard M Walsh
- The Harvard Cryo-EM Center for Structural Biology, Harvard Medical School, Boston, MA 02115, USA; Department of Biological Chemistry and Molecular Pharmacology, Blavatnik Institute, Harvard Medical School, Boston, MA 02115, USA
| | - Willem J Laursen
- Department of Biology and Volen Center for Complex Systems, Brandeis University, Waltham, MA 02453, USA
| | | | - Paul A Garrity
- Department of Biology and Volen Center for Complex Systems, Brandeis University, Waltham, MA 02453, USA.
| | - Rachelle Gaudet
- Department of Molecular and Cellular Biology, Harvard University, 52 Oxford Street, Cambridge, MA 02138, USA.
| |
Collapse
|
43
|
Dibley K, Jost M, McIntosh R, Lagudah E, Zhang P. The wheat stripe rust resistance gene YrNAM is Yr10. Nat Commun 2024; 15:3291. [PMID: 38632235 PMCID: PMC11024124 DOI: 10.1038/s41467-024-47513-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2023] [Accepted: 04/03/2024] [Indexed: 04/19/2024] Open
Affiliation(s)
- Katherine Dibley
- CSIRO Agriculture and Food, Commonwealth Scientific and Industrial Research Organisation, GPO Box 1700, Canberra, ACT 2601, NSW, Australia
| | - Matthias Jost
- CSIRO Agriculture and Food, Commonwealth Scientific and Industrial Research Organisation, GPO Box 1700, Canberra, ACT 2601, NSW, Australia
| | - Robert McIntosh
- The University of Sydney, School of Life and Environmental Sciences, Plant Breeding Institute, Cobbitty, NSW 2570, NSW, Australia.
| | - Evans Lagudah
- CSIRO Agriculture and Food, Commonwealth Scientific and Industrial Research Organisation, GPO Box 1700, Canberra, ACT 2601, NSW, Australia.
| | - Peng Zhang
- The University of Sydney, School of Life and Environmental Sciences, Plant Breeding Institute, Cobbitty, NSW 2570, NSW, Australia.
| |
Collapse
|
44
|
MacGowan SA, Madeira F, Britto-Borges T, Barton GJ. A unified analysis of evolutionary and population constraint in protein domains highlights structural features and pathogenic sites. Commun Biol 2024; 7:447. [PMID: 38605212 PMCID: PMC11009406 DOI: 10.1038/s42003-024-06117-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 03/27/2024] [Indexed: 04/13/2024] Open
Abstract
Protein evolution is constrained by structure and function, creating patterns in residue conservation that are routinely exploited to predict structure and other features. Similar constraints should affect variation across individuals, but it is only with the growth of human population sequencing that this has been tested at scale. Now, human population constraint has established applications in pathogenicity prediction, but it has not yet been explored for structural inference. Here, we map 2.4 million population variants to 5885 protein families and quantify residue-level constraint with a new Missense Enrichment Score (MES). Analysis of 61,214 structures from the PDB spanning 3661 families shows that missense depleted sites are enriched in buried residues or those involved in small-molecule or protein binding. MES is complementary to evolutionary conservation and a combined analysis allows a new classification of residues according to a conservation plane. This approach finds functional residues that are evolutionarily diverse, which can be related to specificity, as well as family-wide conserved sites that are critical for folding or function. We also find a possible contrast between lethal and non-lethal pathogenic sites, and a surprising clinical variant hot spot at a subset of missense enriched positions.
Collapse
Affiliation(s)
- Stuart A MacGowan
- Division of Computational Biology School of Life Sciences University of Dundee, Dow Street Dundee, DD1 5EH, Scotland, UK
| | - Fábio Madeira
- Division of Computational Biology School of Life Sciences University of Dundee, Dow Street Dundee, DD1 5EH, Scotland, UK
- European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Thiago Britto-Borges
- Division of Computational Biology School of Life Sciences University of Dundee, Dow Street Dundee, DD1 5EH, Scotland, UK
- Section of Bioinformatics and Systems Cardiology, Department of Internal Medicine III and Klaus Tschira Institute for Integrative Computational Cardiology, Heidelberg University Hospital, Heidelberg, Germany
| | - Geoffrey J Barton
- Division of Computational Biology School of Life Sciences University of Dundee, Dow Street Dundee, DD1 5EH, Scotland, UK.
| |
Collapse
|
45
|
Bibik P, Alibai S, Pandini A, Dantu SC. PyCoM: a python library for large-scale analysis of residue-residue coevolution data. Bioinformatics 2024; 40:btae166. [PMID: 38532297 PMCID: PMC11009027 DOI: 10.1093/bioinformatics/btae166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 02/02/2024] [Accepted: 03/25/2024] [Indexed: 03/28/2024] Open
Abstract
MOTIVATION Computational methods to detect correlated amino acid positions in proteins have become a valuable tool to predict intra- and inter-residue protein contacts, protein structures, and effects of mutation on protein stability and function. While there are many tools and webservers to compute coevolution scoring matrices, there is no central repository of alignments and coevolution matrices for large-scale studies and pattern detection leveraging on biological and structural annotations already available in UniProt. RESULTS We present a Python library, PyCoM, which enables users to query and analyze coevolution matrices and sequence alignments of 457 622 proteins, selected from UniProtKB/Swiss-Prot database (length ≤ 500 residues), from a precompiled coevolution matrix database (PyCoMdb). PyCoM facilitates the development of statistical analyses of residue coevolution patterns using filters on biological and structural annotations from UniProtKB/Swiss-Prot, with simple access to PyCoMdb for both novice and advanced users, supporting Jupyter Notebooks, Python scripts, and a web API access. The resource is open source and will help in generating data-driven computational models and methods to study and understand protein structures, stability, function, and design. AVAILABILITY AND IMPLEMENTATION PyCoM code is freely available from https://github.com/scdantu/pycom and PyCoMdb and the Jupyter Notebook tutorials are freely available from https://pycom.brunel.ac.uk.
Collapse
Affiliation(s)
- Philipp Bibik
- Department of Computer Science, Brunel University London, Uxbridge UB8 3PH, United Kingdom
| | - Sabriyeh Alibai
- Department of Computer Science, Brunel University London, Uxbridge UB8 3PH, United Kingdom
| | - Alessandro Pandini
- Department of Computer Science, Brunel University London, Uxbridge UB8 3PH, United Kingdom
| | - Sarath Chandra Dantu
- Department of Computer Science, Brunel University London, Uxbridge UB8 3PH, United Kingdom
| |
Collapse
|
46
|
de Carvalho RSA, Rasel SI, Khandelwal NK, Tomasiak TM. Cryo-EM structure of the tetra-phosphorylated R-domain in Ycf1 reveals key interactions for transport regulation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.06.583773. [PMID: 38496555 PMCID: PMC10942426 DOI: 10.1101/2024.03.06.583773] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
Many ATP-binding cassette (ABC) transporters are regulated by phosphorylation on long and disordered loops which present a challenge to visualize with structural methods. We have trapped an activated state of the regulatory domain (R-domain) of Yeast Cadmium Factor 1 (Ycf1) by enzymatically enriching the phosphorylated state. A 3.2 Å cryo-EM structure reveals an R-domain structure with four phosphorylated residues and a position for the entire R-domain. The structure reveals key R-domain interactions including a bridging interaction between NBD1 and NBD2 as well as an interaction with the R-insertion, another regulatory region. We systematically probe these interactions with a linker substitution strategy along the R-domain and find a close match with these interactions and survival under Ycf1-dependent growth conditions. We propose a model where four overlapping phosphorylation sites bridge several regions of Ycf1 to engage in a transport-competent state.
Collapse
Affiliation(s)
| | - Shamiul I Rasel
- Department of Chemistry and Biochemistry, University of Arizona, Tucson, AZ 85721
| | - Nitesh K Khandelwal
- Department of Chemistry and Biochemistry, University of Arizona, Tucson, AZ 85721
- Current Address: Department of Biochemistry and Biophysics, University of California - San Francisco, San Francisco, CA 94
| | - Thomas M Tomasiak
- Department of Chemistry and Biochemistry, University of Arizona, Tucson, AZ 85721
| |
Collapse
|
47
|
Perreault M, Means J, Gerson E, James M, Cotton S, Bergeron CG, Simon M, Carlin DA, Schmidt N, Moore TC, Blasbalg J, Sondheimer N, Ndugga-Kabuye K, Denney WS, Isabella VM, Lubkowicz D, Brennan A, Hava DL. The live biotherapeutic SYNB1353 decreases plasma methionine via directed degradation in animal models and healthy volunteers. Cell Host Microbe 2024; 32:382-395.e10. [PMID: 38309259 DOI: 10.1016/j.chom.2024.01.005] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 12/07/2023] [Accepted: 01/12/2024] [Indexed: 02/05/2024]
Abstract
Methionine is an essential proteinogenic amino acid, but its excess can lead to deleterious effects. Inborn errors of methionine metabolism resulting from loss of function in cystathionine β-synthase (CBS) cause classic homocystinuria (HCU), which is managed by a methionine-restricted diet. Synthetic biotics are gastrointestinal tract-targeted live biotherapeutics that can be engineered to replicate the benefits of dietary restriction. In this study, we assess whether SYNB1353, an E. coli Nissle 1917 derivative, impacts circulating methionine and homocysteine levels in animals and healthy volunteers. In both mice and nonhuman primates (NHPs), SYNB1353 blunts the appearance of plasma methionine and plasma homocysteine in response to an oral methionine load. A phase 1 clinical study conducted in healthy volunteers subjected to an oral methionine challenge demonstrates that SYNB1353 is well tolerated and blunts plasma methionine by 26%. Overall, SYNB1353 represents a promising approach for methionine reduction with potential utility for the treatment of HCU.
Collapse
|
48
|
Landerer C, Poehls J, Toth-Petroczy A. Fitness Effects of Phenotypic Mutations at Proteome-Scale Reveal Optimality of Translation Machinery. Mol Biol Evol 2024; 41:msae048. [PMID: 38421032 PMCID: PMC10939442 DOI: 10.1093/molbev/msae048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 01/30/2024] [Accepted: 02/23/2024] [Indexed: 03/02/2024] Open
Abstract
Errors in protein translation can lead to non-genetic, phenotypic mutations, including amino acid misincorporations. While phenotypic mutations can increase protein diversity, the systematic characterization of their proteome-wide frequencies and their evolutionary impact has been lacking. Here, we developed a mechanistic model of translation errors to investigate how selection acts on protein populations produced by amino acid misincorporations. We fitted the model to empirical observations of misincorporations obtained from over a hundred mass spectrometry datasets of E. coli and S. cerevisiae. We found that on average 20% to 23% of proteins synthesized in the cell are expected to harbor at least one amino acid misincorporation, and that deleterious misincorporations are less likely to occur. Combining misincorporation probabilities and the estimated fitness effects of amino acid substitutions in a population genetics framework, we found 74% of mistranslation events in E. coli and 94% in S. cerevisiae to be neutral. We further show that the set of available synonymous tRNAs is subject to evolutionary pressure, as the presence of missing tRNAs would increase codon-anticodon cross-reactivity and misincorporation error rates. Overall, we find that the translation machinery is likely optimal in E. coli and S. cerevisiae and that both local solutions at the level of codons and a global solution such as the tRNA pool can mitigate the impact of translation errors. We provide a framework to study the evolutionary impact of codon-specific translation errors and a method for their proteome-wide detection across organisms and conditions.
Collapse
Affiliation(s)
- Cedric Landerer
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany
- Center for Systems Biology Dresden, 01307 Dresden, Germany
| | - Jonas Poehls
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany
- Center for Systems Biology Dresden, 01307 Dresden, Germany
| | - Agnes Toth-Petroczy
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany
- Center for Systems Biology Dresden, 01307 Dresden, Germany
- Cluster of Excellence Physics of Life, TU Dresden, 01062 Dresden, Germany
| |
Collapse
|
49
|
Yehorova D, Crean RM, Kasson PM, Kamerlin SCL. Key interaction networks: Identifying evolutionarily conserved non-covalent interaction networks across protein families. Protein Sci 2024; 33:e4911. [PMID: 38358258 PMCID: PMC10868456 DOI: 10.1002/pro.4911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 01/08/2024] [Accepted: 01/10/2024] [Indexed: 02/16/2024]
Abstract
Protein structure (and thus function) is dictated by non-covalent interaction networks. These can be highly evolutionarily conserved across protein families, the members of which can diverge in sequence and evolutionary history. Here we present KIN, a tool to identify and analyze conserved non-covalent interaction networks across evolutionarily related groups of proteins. KIN is available for download under a GNU General Public License, version 2, from https://www.github.com/kamerlinlab/KIN. KIN can operate on experimentally determined structures, predicted structures, or molecular dynamics trajectories, providing insight into both conserved and missing interactions across evolutionarily related proteins. This provides useful insight both into protein evolution, as well as a tool that can be exploited for protein engineering efforts. As a showcase system, we demonstrate applications of this tool to understanding the evolutionary-relevant conserved interaction networks across the class A β-lactamases.
Collapse
Affiliation(s)
- Dariia Yehorova
- School of Chemistry and Biochemistry, Georgia Institute of TechnologyAtlantaGeorgiaUSA
| | - Rory M. Crean
- Department of Chemistry—BMCUppsala UniversityUppsalaSweden
| | - Peter M. Kasson
- Department of Molecular PhysiologyUniversity of VirginiaCharlottesvilleVirginiaUSA
- Department Biomedical EngineeringUniversity of VirginiaCharlottesvilleVirginiaUSA
- Department of Cell and Molecular BiologyUppsala UniversityUppsalaSweden
| | - Shina C. L. Kamerlin
- School of Chemistry and Biochemistry, Georgia Institute of TechnologyAtlantaGeorgiaUSA
- Department of Chemistry—BMCUppsala UniversityUppsalaSweden
| |
Collapse
|
50
|
Ektefaie Y, Shen A, Bykova D, Marin M, Zitnik M, Farhat M. Evaluating generalizability of artificial intelligence models for molecular datasets. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.25.581982. [PMID: 38464295 PMCID: PMC10925170 DOI: 10.1101/2024.02.25.581982] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Deep learning has made rapid advances in modeling molecular sequencing data. Despite achieving high performance on benchmarks, it remains unclear to what extent deep learning models learn general principles and generalize to previously unseen sequences. Benchmarks traditionally interrogate model generalizability by generating metadata based (MB) or sequence-similarity based (SB) train and test splits of input data before assessing model performance. Here, we show that this approach mischaracterizes model generalizability by failing to consider the full spectrum of cross-split overlap, i.e., similarity between train and test splits. We introduce Spectra, a spectral framework for comprehensive model evaluation. For a given model and input data, Spectra plots model performance as a function of decreasing cross-split overlap and reports the area under this curve as a measure of generalizability. We apply Spectra to 18 sequencing datasets with associated phenotypes ranging from antibiotic resistance in tuberculosis to protein-ligand binding to evaluate the generalizability of 19 state-of-the-art deep learning models, including large language models, graph neural networks, diffusion models, and convolutional neural networks. We show that SB and MB splits provide an incomplete assessment of model generalizability. With Spectra, we find as cross-split overlap decreases, deep learning models consistently exhibit a reduction in performance in a task- and model-dependent manner. Although no model consistently achieved the highest performance across all tasks, we show that deep learning models can generalize to previously unseen sequences on specific tasks. Spectra paves the way toward a better understanding of how foundation models generalize in biology.
Collapse
Affiliation(s)
- Yasha Ektefaie
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Andrew Shen
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Department of Computer Science, Northwestern University, Evanston, IL, USA
| | - Daria Bykova
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Maximillian Marin
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard Data Science Initiative, Cambridge, MA, USA
| | - Maha Farhat
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Pulmonary and Critical Care, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
| |
Collapse
|