1
|
Peng FZ, Wang C, Chen T, Schussheim B, Vincoff S, Chatterjee P. PTM-Mamba: a PTM-aware protein language model with bidirectional gated Mamba blocks. Nat Methods 2025; 22:945-949. [PMID: 40211004 PMCID: PMC12074982 DOI: 10.1038/s41592-025-02656-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Accepted: 03/05/2025] [Indexed: 04/12/2025]
Abstract
Current protein language models (LMs) accurately encode protein properties but have yet to represent post-translational modifications (PTMs), which are crucial for proteomic diversity and influence protein structure, function and interactions. To address this gap, we develop PTM-Mamba, a PTM-aware protein LM that integrates PTM tokens using bidirectional Mamba blocks fused with ESM-2 protein LM embeddings via a newly developed gating mechanism. PTM-Mamba uniquely models both wild-type and PTM sequences, enabling downstream tasks such as disease association and druggability prediction, PTM effect prediction on protein-protein interactions and zero-shot PTM discovery. In total, our work establishes PTM-Mamba as a foundational tool for PTM-aware protein modeling and design.
Collapse
Affiliation(s)
| | - Chentong Wang
- School of Life Sciences, Westlake University, Hangzhou, China
| | - Tong Chen
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | | | - Sophia Vincoff
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Pranam Chatterjee
- Department of Biomedical Engineering, Duke University, Durham, NC, USA.
- Department of Computer Science, Duke University, Durham, NC, USA.
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA.
| |
Collapse
|
2
|
Hong L, Ye T, Wang TZ, Srijay D, Liu H, Zhao L, Watson R, Vincoff S, Chen T, Kholina K, Goel S, DeLisa MP, Chatterjee P. Programmable protein stabilization with language model-derived peptide guides. Nat Commun 2025; 16:3555. [PMID: 40229275 PMCID: PMC11997201 DOI: 10.1038/s41467-025-58872-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2024] [Accepted: 04/02/2025] [Indexed: 04/16/2025] Open
Abstract
Dysregulated protein degradation via the ubiquitin-proteasomal pathway can induce numerous disease phenotypes, including cancer, neurodegeneration, and diabetes. While small molecule-based targeted protein degradation (TPD) and targeted protein stabilization (TPS) platforms can address this dysregulation, they rely on structured and stable binding pockets, which do not exist to classically "undruggable" targets. Here, we expand the TPS target space by engineering "deubiquibodies" (duAbs) via fusion of computationally-designed peptide binders to the catalytic domain of the potent OTUB1 deubiquitinase. In human cells, duAbs effectively stabilize exogenous and endogenous proteins in a DUB-dependent manner. Using protein language models to generate target-binding peptides, we engineer duAbs to conformationally diverse target proteins, including key tumor suppressor proteins p53 and WEE1, and heavily-disordered fusion oncoproteins, such as PAX3::FOXO1. We further encapsulate p53-targeting duAbs as mRNA in lipid nanoparticles and demonstrate effective intracellular delivery, p53 stabilization, and apoptosis activation, motivating further in vivo translation.
Collapse
Affiliation(s)
- Lauren Hong
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Tianzheng Ye
- Robert F. Smith School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, NY, USA
| | - Tian Z Wang
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Divya Srijay
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Howard Liu
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Lin Zhao
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Rio Watson
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Sophia Vincoff
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Tianlai Chen
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Kseniia Kholina
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Shrey Goel
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Matthew P DeLisa
- Robert F. Smith School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, NY, USA
- Nancy E. and Peter C. Meinig School of Biomedical Engineering, Cornell University, Ithaca, NY, USA
- Cornell Institute of Biotechnology, Cornell University, Ithaca, NY, USA
| | - Pranam Chatterjee
- Department of Biomedical Engineering, Duke University, Durham, NC, USA.
- Department of Computer Science, Duke University, Durham, NC, USA.
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA.
| |
Collapse
|
3
|
Heinzinger M, Rost B. Teaching AI to speak protein. Curr Opin Struct Biol 2025; 91:102986. [PMID: 39985945 DOI: 10.1016/j.sbi.2025.102986] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2024] [Revised: 12/30/2024] [Accepted: 01/02/2025] [Indexed: 02/24/2025]
Abstract
Large Language Models for proteins, namely protein Language Models (pLMs), have begun to provide an important alternative to capturing the information encoded in a protein sequence in computers. Arguably, pLMs have advanced importantly to understanding aspects of the language of life as written in proteins, and through this understanding, they are becoming an increasingly powerful means of advancing protein prediction, e.g., in the prediction of molecular function as expressed by identifying binding residues or variant effects. While benefitting from the same technology, protein structure prediction remains one of the few applications for which only using pLM embeddings from single sequences appears not to improve over or match the state-of-the-art. Fine-tuning foundation pLMs enhances efficiency and accuracy of solutions, in particular in cases with few experimental annotations. pLMs facilitate the integration of computational and experimental biology, of AI and wet-lab, in particular toward a new era of protein design.
Collapse
Affiliation(s)
- Michael Heinzinger
- TUM (Technical University of Munich), School of Computation, Information and Technology (CIT), Faculty of Informatics, Chair of Bioinformatics & Computational Biology - i12, Boltzmannstr. 3, 85748 Garching, Munich, Germany.
| | - Burkhard Rost
- TUM (Technical University of Munich), School of Computation, Information and Technology (CIT), Faculty of Informatics, Chair of Bioinformatics & Computational Biology - i12, Boltzmannstr. 3, 85748 Garching, Munich, Germany; Institute for Advanced Study (TUM-IAS), Lichtenbergstr. 2a, 85748 Garching, Munich, Germany; TUM School of Life Sciences Weihenstephan (WZW), Alte Akademie 8, Freising, Germany
| |
Collapse
|
4
|
Tang S, Han EL, Mitchell MJ. Peptide-functionalized nanoparticles for brain-targeted therapeutics. Drug Deliv Transl Res 2025:10.1007/s13346-025-01840-w. [PMID: 40164912 DOI: 10.1007/s13346-025-01840-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/17/2025] [Indexed: 04/02/2025]
Abstract
Despite the rapid development of nanoparticle (NP)-based drug delivery systems, intravenous delivery of drugs to the brain remains a major challenge due to various biological barriers. To achieve therapeutic effects, NP-encapsulated drugs must avoid accumulation in off-target organs and selectively deliver to the brain, successfully cross the blood-brain barrier (BBB), and reach the target cells in the brain. Conjugating receptor-specific ligands to the surface of NPs is a promising technique for engineering NPs to overcome these barriers. Specifically, peptides as brain-targeting ligands have been of increasing interest given their ease of synthesis, low cytotoxicity, and strong affinity to target proteins. The success of peptides as targeting ligands is largely due to the diverse strategies of designing and modifying peptides with favorable properties, including membrane permeability and multi-receptor targeting. Here, we review the design and implementation of peptide-functionalized NP systems for neurological disease applications. We also explore advances in rational peptide design strategies for brain targeting, including using generative deep-learning models to computationally design new peptides.
Collapse
Affiliation(s)
- Sophia Tang
- Department of Bioengineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Emily L Han
- Department of Bioengineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Michael J Mitchell
- Department of Bioengineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, 19104, USA.
- Abramson Cancer Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
- Center for Cellular Immunotherapies, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
- Penn Institute for RNA Innovation, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
- Institute for Immunology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
- Cardiovascular Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
- Institute for Regenerative Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|
5
|
Ou L, Setegne MT, Elliot J, Shen F, Dassama LMK. Protein-Based Degraders: From Chemical Biology Tools to Neo-Therapeutics. Chem Rev 2025; 125:2120-2183. [PMID: 39818743 PMCID: PMC11870016 DOI: 10.1021/acs.chemrev.4c00595] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2024] [Revised: 12/26/2024] [Accepted: 12/30/2024] [Indexed: 01/19/2025]
Abstract
The nascent field of targeted protein degradation (TPD) could revolutionize biomedicine due to the ability of degrader molecules to selectively modulate disease-relevant proteins. A key limitation to the broad application of TPD is its dependence on small-molecule ligands to target proteins of interest. This leaves unstructured proteins or those lacking defined cavities for small-molecule binding out of the scope of many TPD technologies. The use of proteins, peptides, and nucleic acids (otherwise known as "biologics") as the protein-targeting moieties in degraders addresses this limitation. In the following sections, we provide a comprehensive and critical review of studies that have used proteins and peptides to mediate the degradation and hence the functional control of otherwise challenging disease-relevant protein targets. We describe existing platforms for protein/peptide-based ligand identification and the drug delivery systems that might be exploited for the delivery of biologic-based degraders. Throughout the Review, we underscore the successes, challenges, and opportunities of using protein-based degraders as chemical biology tools to spur discoveries, elucidate mechanisms, and act as a new therapeutic modality.
Collapse
Affiliation(s)
- Lisha Ou
- Department
of Chemistry, Stanford University, Stanford, California 94305, United States
- Sarafan
ChEM-H Institute, Stanford University, Stanford, California 94305, United States
| | - Mekedlawit T. Setegne
- Department
of Chemistry, Stanford University, Stanford, California 94305, United States
- Sarafan
ChEM-H Institute, Stanford University, Stanford, California 94305, United States
| | - Jeandele Elliot
- Department
of Chemical Engineering, Stanford University, Stanford, California 94305, United States
| | - Fangfang Shen
- Department
of Chemistry, Stanford University, Stanford, California 94305, United States
| | - Laura M. K. Dassama
- Department
of Chemistry, Stanford University, Stanford, California 94305, United States
- Sarafan
ChEM-H Institute, Stanford University, Stanford, California 94305, United States
- Department
of Microbiology & Immunology, Stanford
School of Medicine, Stanford, California 94305, United States
| |
Collapse
|
6
|
Vincoff S, Goel S, Kholina K, Pulugurta R, Vure P, Chatterjee P. FusOn-pLM: a fusion oncoprotein-specific language model via adjusted rate masking. Nat Commun 2025; 16:1436. [PMID: 39920196 PMCID: PMC11806025 DOI: 10.1038/s41467-025-56745-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Accepted: 01/24/2025] [Indexed: 02/09/2025] Open
Abstract
Fusion oncoproteins, a class of chimeric proteins arising from chromosomal translocations, are major drivers of various pediatric cancers. These proteins are intrinsically disordered and lack druggable pockets, making them highly challenging therapeutic targets for both small molecule-based and structure-based approaches. Protein language models (pLMs) have recently emerged as powerful tools for capturing physicochemical and functional protein features but have yet to be trained on fusion oncoprotein sequences. We introduce FusOn-pLM, a fine-tuned pLM trained on a newly curated, comprehensive set of fusion oncoprotein sequences, FusOn-DB. Employing a unique cosine-scheduled masked language modeling strategy, FusOn-pLM dynamically adjusts masking rates (15%-40%) to optimize feature extraction and representation quality, surpassing baseline embeddings in fusion-specific tasks, including localization, puncta formation, and disorder prediction. FusOn-pLM uniquely predicts drug-resistant mutations, providing insights for therapeutic design that anticipates resistance mechanisms. In total, FusOn-pLM provides biologically relevant representations for advancing therapeutic discovery in fusion-driven cancers.
Collapse
Affiliation(s)
- Sophia Vincoff
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Shrey Goel
- Department of Computer Science, Duke University, Durham, NC, USA
| | - Kseniia Kholina
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Rishab Pulugurta
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Pranay Vure
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Pranam Chatterjee
- Department of Biomedical Engineering, Duke University, Durham, NC, USA.
- Department of Computer Science, Duke University, Durham, NC, USA.
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA.
| |
Collapse
|
7
|
Bhat S, Palepu K, Hong L, Mao J, Ye T, Iyer R, Zhao L, Chen T, Vincoff S, Watson R, Wang TZ, Srijay D, Kavirayuni VS, Kholina K, Goel S, Vure P, Deshpande AJ, Soderling SH, DeLisa MP, Chatterjee P. De novo design of peptide binders to conformationally diverse targets with contrastive language modeling. SCIENCE ADVANCES 2025; 11:eadr8638. [PMID: 39841846 PMCID: PMC11753435 DOI: 10.1126/sciadv.adr8638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/18/2024] [Accepted: 12/20/2024] [Indexed: 01/24/2025]
Abstract
Designing binders to target undruggable proteins presents a formidable challenge in drug discovery. In this work, we provide an algorithmic framework to design short, target-binding linear peptides, requiring only the amino acid sequence of the target protein. To do this, we propose a process to generate naturalistic peptide candidates through Gaussian perturbation of the peptidic latent space of the ESM-2 protein language model and subsequently screen these novel sequences for target-selective interaction activity via a contrastive language-image pretraining (CLIP)-based contrastive learning architecture. By integrating these generative and discriminative steps, we create a Peptide Prioritization via CLIP (PepPrCLIP) pipeline and validate highly ranked, target-specific peptides experimentally, both as inhibitory peptides and as fusions to E3 ubiquitin ligase domains. PepPrCLIP-derived constructs demonstrate functionally potent binding and degradation of conformationally diverse, disease-driving targets in vitro. In total, PepPrCLIP empowers the modulation of previously inaccessible proteins without reliance on stable and ordered tertiary structures.
Collapse
Affiliation(s)
- Suhaas Bhat
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Kalyan Palepu
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Lauren Hong
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Joey Mao
- Department of Cell Biology, Duke University, Durham, NC, USA
| | - Tianzheng Ye
- Robert F. Smith School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, NY, USA
| | - Rema Iyer
- Cancer Genome and Epigenetics Program, Sanford Burnham Prebys Institute, San Diego, CA, USA
| | - Lin Zhao
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Tianlai Chen
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Sophia Vincoff
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Rio Watson
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Tian Z. Wang
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Divya Srijay
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | | | - Kseniia Kholina
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Shrey Goel
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Pranay Vure
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Aniruddha J. Deshpande
- Cancer Genome and Epigenetics Program, Sanford Burnham Prebys Institute, San Diego, CA, USA
| | | | - Matthew P. DeLisa
- Robert F. Smith School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, NY, USA
- Meinig School of Biomedical Engineering, Cornell University, Ithaca, NY, USA
- Cornell Institute of Biotechnology, Cornell University, Ithaca, NY, USA
| | - Pranam Chatterjee
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
- Department of Computer Science, Duke University, Durham, NC, USA
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| |
Collapse
|
8
|
Guan C, Fernandes FC, Franco OL, de la Fuente-Nunez C. Leveraging large language models for peptide antibiotic design. CELL REPORTS. PHYSICAL SCIENCE 2025; 6:102359. [PMID: 39949833 PMCID: PMC11823563 DOI: 10.1016/j.xcrp.2024.102359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/16/2025]
Abstract
Large language models (LLMs) have significantly impacted various domains of our society, including recent applications in complex fields such as biology and chemistry. These models, built on sophisticated neural network architectures and trained on extensive datasets, are powerful tools for designing, optimizing, and generating molecules. This review explores the role of LLMs in discovering and designing antibiotics, focusing on peptide molecules. We highlight advancements in drug design and outline the challenges of applying LLMs in these areas.
Collapse
Affiliation(s)
- Changge Guan
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA
- Department of Chemistry, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA, USA
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA
- These authors contributed equally
| | - Fabiano C. Fernandes
- Centro de Análises Proteômicas e Bioquímicas, Pós-Graduação em Ciências Genômicas e Biotecnologia, Universidade Católica de Brasília, Brasília, Brazil
- Departamento de Ciência da Computação, Instituto Federal de Brasília, Campus Taguatinga, Brasília, Brazil
- These authors contributed equally
| | - Octavio L. Franco
- Centro de Análises Proteômicas e Bioquímicas, Pós-Graduação em Ciências Genômicas e Biotecnologia, Universidade Católica de Brasília, Brasília, Brazil
- S-Inova Biotech, Programa de Pós-Graduação em Biotecnologia, Universidade Católica Dom Bosco, Campo Grande, Brazil
| | - Cesar de la Fuente-Nunez
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA
- Department of Chemistry, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA, USA
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
9
|
Bhargav P, Mukherjee A. AlphaMut: A Deep Reinforcement Learning Model to Suggest Helix-Disrupting Mutations. J Chem Theory Comput 2025; 21:463-473. [PMID: 39702999 DOI: 10.1021/acs.jctc.4c01387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2024]
Abstract
Helices are important secondary structural motifs within proteins and are pivotal in numerous physiological processes. While amino acids (AA) such as alanine and leucine are known to promote helix formation, proline and glycine disfavor it. Helical structure formation, however, also depends on its environment, and hence, prior prediction of a mutational effect on a helical structure is difficult. Here, we employ a reinforcement learning algorithm to develop a predictive model for helix-disrupting mutations. We start with a model to disrupt helices independent of their protein environment. Our results show that only a few mutations lead to a drastic disruption of the target helix. We further extend our approach to helices in proteins and validate the results using rigorous free energy calculations. Our strategy identifies amino acids crucial for maintaining structural integrity and predicts key mutations that could alter protein structure. Through our work, we present a new use case for reinforcement learning in protein structure disruption.
Collapse
Affiliation(s)
- Prathith Bhargav
- Department of Chemistry, Indian Institute of Science Education and Research Pune, Dr Homi Bhabha Road, Pashan, Pune, Maharashtra 411008, India
| | - Arnab Mukherjee
- Department of Chemistry, Indian Institute of Science Education and Research Pune, Dr Homi Bhabha Road, Pashan, Pune, Maharashtra 411008, India
- Department of Data Science, Indian Institute of Science Education and Research Pune, Dr Homi Bhabha Road, Pashan, Pune, Maharashtra 411008, India
| |
Collapse
|
10
|
Rettie SA, Juergens D, Adebomi V, Bueso YF, Zhao Q, Leveille AN, Liu A, Bera AK, Wilms JA, Üffing A, Kang A, Brackenbrough E, Lamb M, Gerben SR, Murray A, Levine PM, Schneider M, Vasireddy V, Ovchinnikov S, Weiergräber OH, Willbold D, Kritzer JA, Mougous JD, Baker D, DiMaio F, Bhardwaj G. Accurate de novo design of high-affinity protein binding macrocycles using deep learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.18.622547. [PMID: 39605685 PMCID: PMC11601608 DOI: 10.1101/2024.11.18.622547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
The development of macrocyclic binders to therapeutic proteins typically relies on large-scale screening methods that are resource-intensive and provide little control over binding mode. Despite considerable progress in physics-based methods for peptide design and deep-learning methods for protein design, there are currently no robust approaches for de novo design of protein-binding macrocycles. Here, we introduce RFpeptides, a denoising diffusion-based pipeline for designing macrocyclic peptide binders against protein targets of interest. We test 20 or fewer designed macrocycles against each of four diverse proteins and obtain medium to high-affinity binders against all selected targets. Designs against MCL1 and MDM2 demonstrate KD between 1-10 μM, and the best anti-GABARAP macrocycle binds with a KD of 6 nM and a sub-nanomolar IC50 in vitro. For one of the targets, RbtA, we obtain a high-affinity binder with KD < 10 nM despite starting from the target sequence alone due to the lack of an experimentally determined target structure. X-ray structures determined for macrocycle-bound MCL1, GABARAP, and RbtA complexes match very closely with the computational design models, with three out of the four structures demonstrating Ca RMSD of less than 1.5 Å to the design models. In contrast to library screening approaches for which determining binding mode can be a major bottleneck, the binding modes of RFpeptides-generated macrocycles are known by design, which should greatly facilitate downstream optimization. RFpeptides thus provides a powerful framework for rapid and custom design of macrocyclic peptides for diagnostic and therapeutic applications.
Collapse
Affiliation(s)
- Stephen A. Rettie
- Department of Medicinal Chemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Molecular and Cellular Biology Program, University of Washington, Seattle, WA, USA
| | - David Juergens
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Graduate Program in Molecular Engineering, University of Washington, Seattle, WA, USA
| | - Victor Adebomi
- Department of Medicinal Chemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Yensi Flores Bueso
- Department of Medicinal Chemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Cancer Research @UCC, University College Cork, Cork, Ireland
| | - Qinqin Zhao
- Department of Microbiology, University of Washington, Seattle, WA, USA
| | | | - Andi Liu
- Department of Microbiology, University of Washington, Seattle, WA, USA
| | - Asim K. Bera
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Joana A. Wilms
- Heinrich-Heine-Universität Düsseldorf, Institut für Physikalische Biologie, Düsseldorf, Germany
- Forschungszentrum Jülich, Institute of Biological Information Processing, Structural Biochemistry (IBI-7), Jülich, Germany
| | - Alina Üffing
- Heinrich-Heine-Universität Düsseldorf, Institut für Physikalische Biologie, Düsseldorf, Germany
- Forschungszentrum Jülich, Institute of Biological Information Processing, Structural Biochemistry (IBI-7), Jülich, Germany
| | - Alex Kang
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | | | - Mila Lamb
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Stacey R. Gerben
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Analisa Murray
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Paul M. Levine
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Maika Schneider
- Department of Medicinal Chemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Department of Chemistry, University of Washington, Seattle, WA, USA
| | - Vibha Vasireddy
- Department of Medicinal Chemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Sergey Ovchinnikov
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Oliver H. Weiergräber
- Forschungszentrum Jülich, Institute of Biological Information Processing, Structural Biochemistry (IBI-7), Jülich, Germany
| | - Dieter Willbold
- Heinrich-Heine-Universität Düsseldorf, Institut für Physikalische Biologie, Düsseldorf, Germany
- Forschungszentrum Jülich, Institute of Biological Information Processing, Structural Biochemistry (IBI-7), Jülich, Germany
| | - Joshua A. Kritzer
- Department of Chemistry, Tufts University, 62 Talbot Avenue, Medford, MA, USA
| | - Joseph D. Mougous
- Department of Microbiology, University of Washington, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - David Baker
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Frank DiMaio
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Department of Biochemistry, University of Washington, Seattle, WA, USA
| | - Gaurav Bhardwaj
- Department of Medicinal Chemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| |
Collapse
|
11
|
Ye T, Alamgir A, Robertus CM, Colina D, Monticello C, Donahue TC, Hong L, Vincoff S, Goel S, Fekkes P, Camargo LM, Lam K, Heyes J, Putnam D, Alabi CA, Chatterjee P, DeLisa MP. Programmable protein degraders enable selective knockdown of pathogenic β-catenin subpopulations in vitro and in vivo. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.10.622803. [PMID: 39605463 PMCID: PMC11601283 DOI: 10.1101/2024.11.10.622803] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
Aberrant activation of Wnt signaling results in unregulated accumulation of cytosolic β-catenin, which subsequently enters the nucleus and promotes transcription of genes that contribute to cellular proliferation and malignancy. Here, we sought to eliminate pathogenic β-catenin from the cytosol using designer ubiquibodies (uAbs), chimeric proteins composed of an E3 ubiquitin ligase and a target-binding domain that redirect intracellular proteins to the proteasome for degradation. To accelerate uAb development, we leveraged a protein language model (pLM)-driven algorithm called SaLT&PepPr to computationally design "guide" peptides with affinity for β-catenin, which were subsequently fused to the catalytic domain of a human E3 called C-terminus of Hsp70-interacting protein (CHIP). Expression of the resulting peptide-guided uAbs in colorectal cancer cells led to the identification of several designs that significantly reduced the abnormally stable pool of free β-catenin in the cytosol and nucleus while preserving the normal membrane-associated subpopulation. This selective knockdown of pathogenic β-catenin suppressed Wnt/β-catenin signaling and impaired tumor cell survival and proliferation. Furthermore, one of the best degraders selectively decreased cytosolic but not membrane-associated β-catenin levels in livers of BALB/c mice following delivery as a lipid nanoparticle (LNP)-encapsulated mRNA. Collectively, these findings reveal the unique ability of uAbs to selectively eradicate abnormal proteins in vitro and in vivo and open the door to peptide-programmable biologic modulators of other disease-causing proteins.
Collapse
Affiliation(s)
- Tianzheng Ye
- Robert F. Smith School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, NY 14853 USA
| | - Azmain Alamgir
- Robert F. Smith School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, NY 14853 USA
| | - Cara M. Robertus
- Nancy E. and Peter C. Meinig School of Biomedical Engineering, Cornell University, Ithaca, New York 14853 USA
| | - Darianna Colina
- Biochemistry, Molecular and Cell Biology, Cornell University, Ithaca, NY 14853 USA
| | - Connor Monticello
- Nancy E. and Peter C. Meinig School of Biomedical Engineering, Cornell University, Ithaca, New York 14853 USA
| | - Thomas Connor Donahue
- Robert F. Smith School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, NY 14853 USA
| | - Lauren Hong
- Department of Biomedical Engineering, Duke University, Durham, NC 27708 USA
| | - Sophia Vincoff
- Department of Biomedical Engineering, Duke University, Durham, NC 27708 USA
| | - Shrey Goel
- Department of Biomedical Engineering, Duke University, Durham, NC 27708 USA
| | - Peter Fekkes
- UbiquiTx, 750 Main Street, Cambridge, MA 02139 USA
| | | | - Kieu Lam
- Genevant Sciences Corporation, 887 Great Northern Way, Vancouver, BC, V5T 4T5 Canada
| | - James Heyes
- Genevant Sciences Corporation, 887 Great Northern Way, Vancouver, BC, V5T 4T5 Canada
| | - David Putnam
- Robert F. Smith School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, NY 14853 USA
- Nancy E. and Peter C. Meinig School of Biomedical Engineering, Cornell University, Ithaca, New York 14853 USA
| | - Christopher A. Alabi
- Robert F. Smith School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, NY 14853 USA
- Nancy E. and Peter C. Meinig School of Biomedical Engineering, Cornell University, Ithaca, New York 14853 USA
| | - Pranam Chatterjee
- Department of Biomedical Engineering, Duke University, Durham, NC 27708 USA
- Department of Computer Science, Duke University, Durham, NC 27708 USA
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC 27708 USA
| | - Matthew P. DeLisa
- Robert F. Smith School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, NY 14853 USA
- Nancy E. and Peter C. Meinig School of Biomedical Engineering, Cornell University, Ithaca, New York 14853 USA
- Biochemistry, Molecular and Cell Biology, Cornell University, Ithaca, NY 14853 USA
- Cornell Institute of Biotechnology, Cornell University, Ithaca, NY 14853 USA
| |
Collapse
|
12
|
Huang J, Li W, Xiao B, Zhao C, Zheng H, Li Y, Wang J. PepCA: Unveiling protein-peptide interaction sites with a multi-input neural network model. iScience 2024; 27:110850. [PMID: 39391726 PMCID: PMC11465048 DOI: 10.1016/j.isci.2024.110850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 06/13/2024] [Accepted: 08/27/2024] [Indexed: 10/12/2024] Open
Abstract
The protein-peptide interaction plays a pivotal role in fields such as drug development, yet remains underexplored experimentally and challenging to model computationally. Herein, we introduce PepCA, a sequence-based approach for predicting peptide-binding sites on proteins. A primary obstacle in predicting peptide-protein interactions is the difficulty in acquiring precise protein structures, coupled with the uncertainty of polypeptide configurations. To address this, we first encode protein sequences using the Evolutionary Scale Modeling 2 (ESM-2) pre-trained model to extract latent structural information. Additionally, we have developed a multi-input coattention mechanism to concurrently update the encoding of both peptide and protein residues. PepCA integrates this module within an encoder-decoder structure. This model's high precision in identifying binding sites significantly advances the field of computational biology, offering vital insights for peptide drug development and protein science.
Collapse
Affiliation(s)
- Junxiong Huang
- iCarbonX (Zhuhai) Company Limited, Zhuhai, Guangdong, China
- iCarbonX (Shenzhen) Pharmaceutical Technology Co, Shenzhen, Guangdong, China
| | - Weikang Li
- iCarbonX (Zhuhai) Company Limited, Zhuhai, Guangdong, China
- iCarbonX (Shenzhen) Pharmaceutical Technology Co, Shenzhen, Guangdong, China
| | - Bin Xiao
- iCarbonX (Zhuhai) Company Limited, Zhuhai, Guangdong, China
- iCarbonX (Shenzhen) Pharmaceutical Technology Co, Shenzhen, Guangdong, China
| | - Chunqing Zhao
- iCarbonX (Zhuhai) Company Limited, Zhuhai, Guangdong, China
- iCarbonX (Shenzhen) Pharmaceutical Technology Co, Shenzhen, Guangdong, China
| | - Hancheng Zheng
- iCarbonX (Zhuhai) Company Limited, Zhuhai, Guangdong, China
- Shenzhen Digital Life Institute, Shenzhen, Guangdong, China
| | - Yingrui Li
- iCarbonX (Zhuhai) Company Limited, Zhuhai, Guangdong, China
- Faculty of Health and Medical Sciences, University of Surrey, Guildford, Surrey, UK
- Shenzhen Digital Life Institute, Shenzhen, Guangdong, China
- iCarbonX (Shenzhen) Pharmaceutical Technology Co, Shenzhen, Guangdong, China
| | - Jun Wang
- iCarbonX (Zhuhai) Company Limited, Zhuhai, Guangdong, China
- State Key Laboratory of Quality Research in Chinese Medicine, Macau University of Science and Technology, Taipa, Macau, China
- Shenzhen Digital Life Institute, Shenzhen, Guangdong, China
- iCarbonX (Shenzhen) Pharmaceutical Technology Co, Shenzhen, Guangdong, China
| |
Collapse
|
13
|
Chen T, Chatterjee P. Synergizing sequence and structure representations to predict protein variants. Cell Res 2024; 34:597-598. [PMID: 39090184 PMCID: PMC11369258 DOI: 10.1038/s41422-024-01010-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/04/2024] Open
Affiliation(s)
- Tong Chen
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Pranam Chatterjee
- Department of Biomedical Engineering, Duke University, Durham, NC, USA.
- Department of Computer Science, Duke University, Durham, NC, USA.
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA.
| |
Collapse
|
14
|
Chen T, Zhang Y, Chatterjee P. moPPIt: De Novo Generation of Motif-Specific Binders with Protein Language Models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.31.606098. [PMID: 39131360 PMCID: PMC11312608 DOI: 10.1101/2024.07.31.606098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/13/2024]
Abstract
The ability to precisely target specific motifs on disease-related proteins, whether conserved epitopes on viral proteins, intrinsically disordered regions within transcription factors, or breakpoint junctions in fusion oncoproteins, is essential for modulating their function while minimizing off-target effects. Current methods struggle to achieve this specificity without reliable structural information. In this work, we introduce a motif-specific PPI targeting algorithm, moPPIt, for de novo generation of motif-specific peptide binders from the target protein sequence alone. At the core of moPPIt is BindEvaluator, a transformer-based model that interpolates protein language model embeddings of two proteins via a series of multi-headed self-attention blocks, with a key focus on local motif features. Trained on over 510,000 annotated PPIs, BindEvaluator accurately predicts target binding sites given protein-protein sequence pairs with a test AUC > 0.94, improving to AUC > 0.96 when fine-tuned on peptide-protein pairs. By combining BindEvaluator with our PepMLM peptide generator and genetic algorithm-based optimization, moPPIt generates peptides that bind specifically to user-defined residues on target proteins. We demonstrate moPPIt's efficacy in computationally designing binders to specific motifs, first on targets with known binding peptides and then extending to structured and disordered targets with no known binders. In total, moPPIt serves as a powerful tool for developing highly specific peptide therapeutics without relying on target structure or structure-dependent latent spaces.
Collapse
Affiliation(s)
- Tong Chen
- Department of Biomedical Engineering, Duke University
| | - Yinuo Zhang
- Department of Biostatistics and Bioinformatics, Duke University
| | - Pranam Chatterjee
- Department of Biomedical Engineering, Duke University
- Department of Biostatistics and Bioinformatics, Duke University
- Department of Computer Science, Duke University
| |
Collapse
|
15
|
Hong L, Ye T, Wang TZ, Srijay D, Zhao L, Watson R, Vincoff S, Chen T, Kholina K, Goel S, DeLisa MP, Chatterjee P. Programmable Protein Stabilization with Language Model-Derived Peptide Guides. RESEARCH SQUARE 2024:rs.3.rs-4670386. [PMID: 39108486 PMCID: PMC11302690 DOI: 10.21203/rs.3.rs-4670386/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 08/12/2024]
Abstract
Dysregulated protein degradation via the ubiquitin-proteasomal pathway can induce numerous disease phenotypes, including cancer, neurodegeneration, and diabetes. Stabilizing improperly ubiquitinated proteins via target-specific deubiquitination is thus a critical therapeutic goal. Building off the major advances in targeted protein degradation (TPD) using bifunctional small-molecule degraders, targeted protein stabilization (TPS) modalities have been described recently. However, these rely on a limited set of chemical linkers and warheads, which are difficult to generate de novo for new targets and do not exist for classically "undruggable" targets. To address the limited reach of small molecule-based degraders, we previously engineered ubiquibodies (uAbs) by fusing computationally-designed "guide" peptides to E3 ubiquitin ligase domains for modular, CRISPR-analogous TPD. Here, we expand the TPS target space by engineering "deubiquibodies" (duAbs) via fusion of computationally-designed guides to the catalytic domain of the potent OTUB1 deubiquitinase. In human cells, duAbs effectively stabilize exogenous and endogenous proteins in a DUB-dependent manner. To demonstrate duAb modularity, we swap in new target-binding peptides designed via our generative language models to stabilize diverse target proteins, including key tumor suppressor proteins such as p53 and WEE1, as well as heavily-disordered fusion oncoproteins, such as PAX3::FOXO1. In total, our duAb system represents a simple, programmable, CRISPR-analogous strategy for TPS.
Collapse
Affiliation(s)
- Lauren Hong
- Department of Biomedical Engineering, Duke University
| | - Tianzheng Ye
- Robert F. Smith School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, NY, USA
| | - Tian Zi Wang
- Department of Biomedical Engineering, Duke University
| | - Divya Srijay
- Department of Biomedical Engineering, Duke University
| | - Lin Zhao
- Department of Biomedical Engineering, Duke University
| | - Rio Watson
- Department of Biomedical Engineering, Duke University
| | | | - Tianlai Chen
- Department of Biomedical Engineering, Duke University
| | | | - Shrey Goel
- Department of Biomedical Engineering, Duke University
| | - Matthew P. DeLisa
- Robert F. Smith School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, NY, USA
- Nancy E. and Peter C. Meinig School of Biomedical Engineering, Cornell University, Ithaca, NY, USA
- Cornell Institute of Biotechnology, Cornell University, Ithaca, NY, USA
| | - Pranam Chatterjee
- Department of Biomedical Engineering, Duke University
- Department of Computer Science, Duke University
- Department of Biostatistics and Bioinformatics, Duke University
| |
Collapse
|
16
|
Vincoff S, Goel S, Kholina K, Pulugurta R, Vure P, Chatterjee P. FusOn-pLM: A Fusion Oncoprotein-Specific Language Model via Focused Probabilistic Masking. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.03.597245. [PMID: 38895377 PMCID: PMC11185609 DOI: 10.1101/2024.06.03.597245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Fusion oncoproteins, a class of chimeric proteins arising from chromosomal translocations, drive and sustain various cancers, particularly those impacting children. Unfortunately, due to their intrinsically disordered nature, large size, and lack of well-defined, druggable pockets, they have been historically challenging to target therapeutically: neither small molecule-based methods nor structure-based approaches for binder design are strong options for this class of molecules. Recently, protein language models (pLMs) have demonstrated success at representing protein sequences with information-rich embeddings, enabling downstream design applications from sequence alone. However, no current pLM has been trained on fusion oncoprotein sequences and thus may not produce optimal representations for these proteins. In this work, we introduce FusOn-pLM, a novel pLM that fine-tunes the state-of-the-art ESM-2 model on fusion oncoprotein sequences. We specifically introduce a novel masked language modeling (MLM) strategy, employing a binding-site probability predictor to focus masking on key amino acid residues, thereby generating more optimal fusion oncoprotein-aware embeddings. Our model improves performance on both fusion oncoprotein-specific benchmarks and disorder prediction tasks in comparison to baseline ESM-2 representations, as well as manually-constructed biophysical embeddings, motivating downstream usage of FusOn-pLM embeddings for therapeutic design tasks targeting these fusions. We have made our model publicly available to the community at https://huggingface.co/ChatterjeeLab/FusOn-pLM.
Collapse
Affiliation(s)
| | - Shrey Goel
- Department of Computer Science, Duke University
| | | | | | - Pranay Vure
- Department of Biomedical Engineering, Duke University
| | - Pranam Chatterjee
- Department of Biomedical Engineering, Duke University
- Department of Computer Science, Duke University
- Department of Biostatistics and Bioinformatics, Duke University
| |
Collapse
|
17
|
Peng Z, Schussheim B, Chatterjee P. PTM-Mamba: A PTM-Aware Protein Language Model with Bidirectional Gated Mamba Blocks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.28.581983. [PMID: 38464112 PMCID: PMC10925343 DOI: 10.1101/2024.02.28.581983] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Proteins serve as the workhorses of living organisms, orchestrating a wide array of vital functions. Post-translational modifications (PTMs) of their amino acids greatly influence the structural and functional diversity of different protein types and uphold proteostasis, allowing cells to swiftly respond to environmental changes and intricately regulate complex biological processes. To this point, efforts to model the complex features of proteins have involved the training of large and expressive protein language models (pLMs) such as ESM-2 and ProtT5, which accurately encode structural, functional, and physicochemical properties of input protein sequences. However, the over 200 million sequences that these pLMs were trained on merely scratch the surface of proteomic diversity, as they neither input nor account for the effects of PTMs. In this work, we fill this major gap in protein sequence modeling by introducing PTM tokens into the pLM training regime. We then leverage recent advancements in structured state space models (SSMs), specifically Mamba, which utilizes efficient hardware-aware primitives to overcome the quadratic time complexities of Transformers. After adding a comprehensive set of PTM tokens to the model vocabulary, we train bidirectional Mamba blocks whose outputs are fused with state-of-the-art ESM-2 embeddings via a novel gating mechanism. We demonstrate that our resultant PTM-aware pLM, PTM-Mamba, improves upon ESM-2's performance on various PTM-specific tasks. PTM-Mamba is the first and only pLM that can uniquely input and represent both wild-type and PTM sequences, motivating downstream modeling and design applications specific to post-translationally modified proteins. To facilitate PTM-aware protein language modeling applications, we have made our model available at: https://huggingface.co/ChatterjeeLab/PTM-Mamba.
Collapse
Affiliation(s)
- Zhangzhi Peng
- Department of Biomedical Engineering, Duke University
| | | | - Pranam Chatterjee
- Department of Biomedical Engineering, Duke University
- Department of Computer Science, Duke University
- Department of Biostatistics and Bioinformatics, Duke University
| |
Collapse
|