1
|
Geometric deep learning methods and applications in 3D structure-based drug design. Drug Discov Today 2024; 29:104024. [PMID: 38759948 DOI: 10.1016/j.drudis.2024.104024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Revised: 05/02/2024] [Accepted: 05/10/2024] [Indexed: 05/19/2024]
Abstract
3D structure-based drug design (SBDD) is considered a challenging and rational way for innovative drug discovery. Geometric deep learning is a promising approach that solves the accurate model training of 3D SBDD through building neural network models to learn non-Euclidean data, such as 3D molecular graphs and manifold data. Here, we summarize geometric deep learning methods and applications that contain 3D molecular representations, equivariant graph neural networks (EGNNs), and six generative model methods [diffusion model, flow-based model, generative adversarial networks (GANs), variational autoencoder (VAE), autoregressive models, and energy-based models]. Our review provides insights into geometric deep learning methods and advanced applications of 3D SBDD that will be of relevance for the drug discovery community.
Collapse
|
2
|
AI-driven design of customized 3D-printed multi-layer capsules with controlled drug release profiles for personalized medicine. Int J Pharm 2024; 656:124114. [PMID: 38615804 DOI: 10.1016/j.ijpharm.2024.124114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 03/25/2024] [Accepted: 04/10/2024] [Indexed: 04/16/2024]
Abstract
Personalized medicine aims to effectively and efficiently provide customized drugs that cater to diverse populations, which is a significant yet challenging task. Recently, the integration of artificial intelligence (AI) and three-dimensional (3D) printing technology has transformed the medical field, and was expected to facilitate the efficient design and development of customized drugs through the synergy of their respective advantages. In this study, we present an innovative method that combines AI and 3D printing technology to design and fabricate customized capsules. Initially, we discretized and encoded the geometry of the capsule, simulated the dissolution process of the capsule with classical drug dissolution model, and verified it by experiments. Subsequently, we employed a genetic algorithm to explore the capsule geometric structure space and generate a complex multi-layer structure that satisfies the target drug release profiles, including stepwise release and zero-order release. Finally, Two model drugs, isoniazid and acetaminophen, were selected and fused deposition modeling (FDM) 3D printing technology was utilized to precisely print the AI-designed capsule. The reliability of the method was verified by comparing the in vitro release curve of the printed capsules with the target curve, and the f2 value was more than 50. Notably, accurate and autonomous design of the drug release curve was achieved mainly by changing the geometry of the capsule. This approach is expected to be applied to different drug needs and facilitate the development of customized oral dosage forms.
Collapse
|
3
|
The role and future prospects of artificial intelligence algorithms in peptide drug development. Biomed Pharmacother 2024; 175:116709. [PMID: 38713945 DOI: 10.1016/j.biopha.2024.116709] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2024] [Revised: 05/01/2024] [Accepted: 05/02/2024] [Indexed: 05/09/2024] Open
Abstract
Peptide medications have been more well-known in recent years due to their many benefits, including low side effects, high biological activity, specificity, effectiveness, and so on. Over 100 peptide medications have been introduced to the market to treat a variety of illnesses. Most of these peptide medications are developed on the basis of endogenous peptides or natural peptides, which frequently required expensive, time-consuming, and extensive tests to confirm. As artificial intelligence advances quickly, it is now possible to build machine learning or deep learning models that screen a large number of candidate sequences for therapeutic peptides. Therapeutic peptides, such as those with antibacterial or anticancer properties, have been developed by the application of artificial intelligence algorithms.The process of finding and developing peptide drugs is outlined in this review, along with a few related cases that were helped by AI and conventional methods. These resources will open up new avenues for peptide drug development and discovery, helping to meet the pressing needs of clinical patients for disease treatment. Although peptide drugs are a new class of biopharmaceuticals that distinguish them from chemical and small molecule drugs, their clinical purpose and value cannot be ignored. However, the traditional peptide drug research and development has a long development cycle and high investment, and the creation of peptide medications will be substantially hastened by the AI-assisted (AI+) mode, offering a new boost for combating diseases.
Collapse
|
4
|
Computational tools for plant genomics and breeding. SCIENCE CHINA. LIFE SCIENCES 2024:10.1007/s11427-024-2578-6. [PMID: 38676814 DOI: 10.1007/s11427-024-2578-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Accepted: 03/25/2024] [Indexed: 04/29/2024]
Abstract
Plant genomics and crop breeding are at the intersection of biotechnology and information technology. Driven by a combination of high-throughput sequencing, molecular biology and data science, great advances have been made in omics technologies at every step along the central dogma, especially in genome assembling, genome annotation, epigenomic profiling, and transcriptome profiling. These advances further revolutionized three directions of development. One is genetic dissection of complex traits in crops, along with genomic prediction and selection. The second is comparative genomics and evolution, which open up new opportunities to depict the evolutionary constraints of biological sequences for deleterious variant discovery. The third direction is the development of deep learning approaches for the rational design of biological sequences, especially proteins, for synthetic biology. All three directions of development serve as the foundation for a new era of crop breeding where agronomic traits are enhanced by genome design.
Collapse
|
5
|
Computational Approaches to Predict Protein-Protein Interactions in Crowded Cellular Environments. Chem Rev 2024; 124:3932-3977. [PMID: 38535831 PMCID: PMC11009965 DOI: 10.1021/acs.chemrev.3c00550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 02/20/2024] [Accepted: 02/21/2024] [Indexed: 04/11/2024]
Abstract
Investigating protein-protein interactions is crucial for understanding cellular biological processes because proteins often function within molecular complexes rather than in isolation. While experimental and computational methods have provided valuable insights into these interactions, they often overlook a critical factor: the crowded cellular environment. This environment significantly impacts protein behavior, including structural stability, diffusion, and ultimately the nature of binding. In this review, we discuss theoretical and computational approaches that allow the modeling of biological systems to guide and complement experiments and can thus significantly advance the investigation, and possibly the predictions, of protein-protein interactions in the crowded environment of cell cytoplasm. We explore topics such as statistical mechanics for lattice simulations, hydrodynamic interactions, diffusion processes in high-viscosity environments, and several methods based on molecular dynamics simulations. By synergistically leveraging methods from biophysics and computational biology, we review the state of the art of computational methods to study the impact of molecular crowding on protein-protein interactions and discuss its potential revolutionizing effects on the characterization of the human interactome.
Collapse
|
6
|
Opportunities and challenges in design and optimization of protein function. Nat Rev Mol Cell Biol 2024:10.1038/s41580-024-00718-y. [PMID: 38565617 DOI: 10.1038/s41580-024-00718-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/27/2024] [Indexed: 04/04/2024]
Abstract
The field of protein design has made remarkable progress over the past decade. Historically, the low reliability of purely structure-based design methods limited their application, but recent strategies that combine structure-based and sequence-based calculations, as well as machine learning tools, have dramatically improved protein engineering and design. In this Review, we discuss how these methods have enabled the design of increasingly complex structures and therapeutically relevant activities. Additionally, protein optimization methods have improved the stability and activity of complex eukaryotic proteins. Thanks to their increased reliability, computational design methods have been applied to improve therapeutics and enzymes for green chemistry and have generated vaccine antigens, antivirals and drug-delivery nano-vehicles. Moreover, the high success of design methods reflects an increased understanding of basic rules that govern the relationships among protein sequence, structure and function. However, de novo design is still limited mostly to α-helix bundles, restricting its potential to generate sophisticated enzymes and diverse protein and small-molecule binders. Designing complex protein structures is a challenging but necessary next step if we are to realize our objective of generating new-to-nature activities.
Collapse
|
7
|
Discovery of bioactive natural products of microbial origin as inhibitors of the PD-1/PD-L1 protein-protein interaction. Int J Biol Macromol 2024; 264:130458. [PMID: 38423421 DOI: 10.1016/j.ijbiomac.2024.130458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 02/24/2024] [Indexed: 03/02/2024]
Abstract
The PD-1/PD-L1 protein-protein interaction (PPI) controls an adaptive immune resistance mechanism exerted by tumor cells to evade immune responses. The large-molecule nature of current commercial monoclonal antibodies against this PPI hampers their effectiveness by limiting tumor penetration and inducing severe immune-related side effects. Synthetic small-molecule inhibitors may overcome such limitations and have demonstrated promising clinical translation, but their design is challenging. Microbial natural products (NPs) are a source of small molecules with vast chemical diversity that have proved anti-tumoral activities, but which immunotherapeutic properties as PD-1/PD-L1 inhibitors had remained uncharacterized so far. Here, we have developed the first cell-based PD-1/PD-L1 blockade reporter assay to screen NPs libraries. In this study, 6000 microbial extracts of maximum biosynthetic diversity were screened. A secondary metabolite called alpha-cyclopiazonic acid (α-CPA) of a bioactive fungal extract was confirmed as a new PD-1/PD-L1 inhibitor with low micromolar range in the cellular assay and in an additional cell-free competitive assay. Thermal denaturation experiments with PD-1 confirmed that the mechanism of inhibition is based on its stabilization upon binding to α-CPA. The identification of α-CPA as a novel PD-1 stabilizer proves the unprecedented resolution of this methodology at capturing specific PD-1/PD-L1 PPI inhibitors from chemically diverse NP libraries.
Collapse
|
8
|
ProNet DB: a proteome-wise database for protein surface property representations and RNA-binding profiles. Database (Oxford) 2024; 2024:baae012. [PMID: 38557634 PMCID: PMC10984565 DOI: 10.1093/database/baae012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 01/08/2024] [Accepted: 02/17/2024] [Indexed: 04/04/2024]
Abstract
The rapid growth in the number of experimental and predicted protein structures and more complicated protein structures poses a significant challenge for computational biology in leveraging structural information and accurate representation of protein surface properties. Recently, AlphaFold2 released the comprehensive proteomes of various species, and protein surface property representation plays a crucial role in protein-molecule interaction predictions, including those involving proteins, nucleic acids and compounds. Here, we proposed the first extensive database, namely ProNet DB, that integrates multiple protein surface representations and RNA-binding landscape for 326 175 protein structures. This collection encompasses the 16 model organism proteomes from the AlphaFold Protein Structure Database and experimentally validated structures from the Protein Data Bank. For each protein, ProNet DB provides access to the original protein structures along with the detailed surface property representations encompassing hydrophobicity, charge distribution and hydrogen bonding potential as well as interactive features such as the interacting face and RNA-binding sites and preferences. To facilitate an intuitive interpretation of these properties and the RNA-binding landscape, ProNet DB incorporates visualization tools like Mol* and an Online 3D Viewer, allowing for the direct observation and analysis of these representations on protein surfaces. The availability of pre-computed features enables instantaneous access for users, significantly advancing computational biology research in areas such as molecular mechanism elucidation, geometry-based drug discovery and the development of novel therapeutic approaches. Database URL: https://proj.cse.cuhk.edu.hk/aihlab/pronet/.
Collapse
|
9
|
A suite of designed protein cages using machine learning and protein fragment-based protocols. Structure 2024:S0969-2126(24)00056-X. [PMID: 38513658 DOI: 10.1016/j.str.2024.02.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Revised: 01/22/2024] [Accepted: 02/23/2024] [Indexed: 03/23/2024]
Abstract
Designed protein cages and related materials provide unique opportunities for applications in biotechnology and medicine, but their creation remains challenging. Here, we apply computational approaches to design a suite of tetrahedrally symmetric, self-assembling protein cages. For the generation of docked conformations, we emphasize a protein fragment-based approach, while for sequence design of the de novo interface, a comparison of knowledge-based and machine learning protocols highlights the power and increased experimental success achieved using ProteinMPNN. An analysis of design outcomes provides insights for improving interface design protocols, including prioritizing fragment-based motifs, balancing interface hydrophobicity and polarity, and identifying preferred polar contact patterns. In all, we report five structures for seven protein cages, along with two structures of intermediate assemblies, with the highest resolution reaching 2.0 Å using cryo-EM. This set of designed cages adds substantially to the body of available protein nanoparticles, and to methodologies for their creation.
Collapse
|
10
|
Atomically accurate de novo design of single-domain antibodies. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.14.585103. [PMID: 38562682 PMCID: PMC10983868 DOI: 10.1101/2024.03.14.585103] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Despite the central role that antibodies play in modern medicine, there is currently no way to rationally design novel antibodies to bind a specific epitope on a target. Instead, antibody discovery currently involves time-consuming immunization of an animal or library screening approaches. Here we demonstrate that a fine-tuned RFdiffusion network is capable of designing de novo antibody variable heavy chains (VHH's) that bind user-specified epitopes. We experimentally confirm binders to four disease-relevant epitopes, and the cryo-EM structure of a designed VHH bound to influenza hemagglutinin is nearly identical to the design model both in the configuration of the CDR loops and the overall binding pose.
Collapse
|
11
|
A Synthetic Multivalent Lipopeptide Derived from Pam3CSK4 with Irreversible Influenza Inhibition and Immuno-Stimulating Effects. SMALL (WEINHEIM AN DER BERGSTRASSE, GERMANY) 2024:e2307709. [PMID: 38438885 DOI: 10.1002/smll.202307709] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 02/23/2024] [Indexed: 03/06/2024]
Abstract
The activation of the host adaptive immune system is crucial for eliminating viruses. However, influenza infection often suppresses the innate immune response that precedes adaptive immunity, and the adaptive immune responses are typically delayed. Dendritic cells, serving as professional antigen-presenting cells, have a vital role in initiating the adaptive immune response. In this study, an immuno-stimulating antiviral system (ISAS) is introduced, which is composed of the immuno-stimulating adjuvant lipopeptide Pam3CSK4 that acts as a scaffold onto which it is covalently bound 3 to 4 influenza-inhibiting peptides. The multivalent display of peptides on the scaffold leads to a potent inhibition against H1N1 (EC50 = 20 nM). Importantly, the resulting lipopeptide, Pam3FDA, shows an irreversible inhibition mechanism. The chemical modification of peptides on the scaffold maintains Pam3CSK4's ability to stimulate dendritic cell maturation, thereby rendering Pam3FDA a unique antiviral. This is attributed to its immune activation capability, which also acts in synergy to expedite viral elimination.
Collapse
|
12
|
Geometric deep learning for the prediction of magnesium-binding sites in RNA structures. Int J Biol Macromol 2024; 262:130150. [PMID: 38365157 DOI: 10.1016/j.ijbiomac.2024.130150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 01/24/2024] [Accepted: 02/11/2024] [Indexed: 02/18/2024]
Abstract
Magnesium ions (Mg2+) are essential for the folding, functional expression, and structural stability of RNA molecules. However, predicting Mg2+-binding sites in RNA molecules based solely on RNA structures is still challenging. The molecular surface, characterized by a continuous shape with geometric and chemical properties, is important for RNA modelling and carries essential information for understanding the interactions between RNAs and Mg2+ ions. Here, we propose an approach named RNA-magnesium ion surface interaction fingerprinting (RMSIF), a geometric deep learning-based conceptual framework to predict magnesium ion binding sites in RNA structures. To evaluate the performance of RMSIF, we systematically enumerated decoy Mg2+ ions across a full-space grid within the range of 2 to 10 Å from the RNA molecule and made predictions accordingly. Visualization techniques were used to validate the prediction results and calculate success rates. Comparative assessments against state-of-the-art methods like MetalionRNA, MgNet, and Metal3DRNA revealed that RMSIF achieved superior success rates and accuracy in predicting Mg2+-binding sites. Additionally, in terms of the spatial distribution of Mg2+ ions within the RNA structures, a majority were situated in the deep grooves, while a minority occupied the shallow grooves. Collectively, the conceptual framework developed in this study holds promise for advancing insights into drug design, RNA co-transcriptional folding, and structure prediction.
Collapse
|
13
|
Comprehensive assessment of TECENTRIQ® and OPDIVO®: analyzing immunotherapy indications withdrawn in triple-negative breast cancer and hepatocellular carcinoma. Cancer Metastasis Rev 2024:10.1007/s10555-024-10174-x. [PMID: 38409546 DOI: 10.1007/s10555-024-10174-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Accepted: 02/05/2024] [Indexed: 02/28/2024]
Abstract
Atezolizumab (TECENTRIQ®) and nivolumab (OPDIVO®) are both immunotherapeutic indications targeting programmed cell death 1 ligand 1 (PD-L1) and programmed cell death 1 (PD-1), respectively. These inhibitors hold promise as therapies for triple-negative breast cancer (TNBC) and hepatocellular carcinoma (HCC) and have demonstrated encouraging results in reducing the progression and spread of tumors. However, due to their adverse effects and low response rates, the US Food and Drug Administration (FDA) has withdrawn the approval of atezolizumab in TNBC and nivolumab in HCC treatment. The withdrawals of atezolizumab and nivolumab have raised concerns regarding their effectiveness and the ability to predict treatment responses. Therefore, the current study aims to investigate the immunotherapy withdrawal of PD-1/PD-L1 inhibitors, specifically atezolizumab for TNBC and nivolumab for HCC. This study will examine both the structural and clinical aspects. This review provides detailed insights into the structure of the PD-1 receptor and its ligands, the interactions between PD-1 and PD-L1, and their interactions with the withdrawn antibodies (atezolizumab and nivolumab) as well as PD-1 and PD-L1 modifications. In addition, this review further assesses these antibodies in the context of TNBC and HCC. It seeks to elucidate the factors that contribute to diverse responses to PD-1/PD-L1 therapy in different types of cancer and propose approaches for predicting responses, mitigating the potential risks linked to therapy withdrawals, and optimizing patient outcomes. By better understanding the mechanisms underlying responses to PD-1/PD-L1 therapy and developing strategies to predict these responses, it is possible to create more efficient treatments for TNBC and HCC.
Collapse
|
14
|
Bioluminescent detection of viral surface proteins using branched multivalent protein switches. RSC Chem Biol 2024; 5:148-157. [PMID: 38333197 PMCID: PMC10849123 DOI: 10.1039/d3cb00164d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2023] [Accepted: 11/22/2023] [Indexed: 02/10/2024] Open
Abstract
Fast and reliable virus diagnostics is key to prevent the spread of viruses in populations. A hallmark of viruses is the presence of multivalent surface proteins, a property that can be harnessed to control conformational switching in sensor proteins. Here, we introduce a new sensor platform (dark-LUX) for the detection of viral surface proteins consisting of a general bioluminescent framework that can be post-translationally functionalized with separately expressed binding domains. The platform relies on (1) plug-and-play bioconjugation of different binding proteins via SpyTag/SpyCatcher technology to create branched protein structures, (2) an optimized turn-on bioluminescent switch based on complementation of the split-luciferase NanoBiT upon target binding and (3) straightforward exploration of the protein linker space. The influenza A virus (IAV) surface proteins hemagglutinin (HA) and neuraminidase (NA) were used as relevant multivalent targets to establish proof of principle and optimize relevant parameters such as linker properties, choice of target binding domains and the optimal combination of the competing NanoBiT components SmBiT and DarkBiT. The sensor framework allows rapid conjugation and exchange of various binding domains including scFvs, nanobodies and de novo designed binders for a variety of targets, including the construction of a heterobivalent switch that targets the head and stem region of hemagglutinin. The modularity of the platform thus allows straightforward optimization of binding domains and scaffold properties for existing viral targets, and is well suited to quickly adapt bioluminescent sensor proteins to effectively detect newly evolving viral epitopes.
Collapse
|
15
|
Design of human ACE2 mimic miniprotein binders that interact with RBD of SARS-CoV-2 variants of concerns. J Biomol Struct Dyn 2024:1-13. [PMID: 38315516 DOI: 10.1080/07391102.2024.2310789] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 01/20/2024] [Indexed: 02/07/2024]
Abstract
The world of medicine demands from the research community solutions to the emerging problem of SARS-CoV-2 variants and other such potential global pandemics. With advantages of specificity over small molecule drugs and designability over antibodies, miniprotein therapeutics offers a unique solution to the threats of rapidly emerging SARS-CoV-2 variants. Unfortunately, most of the promising miniprotein binders are de novo designed and it is not viable to generate molecules for each new variant. Therefore in this study, we demonstrate a method for design of miniprotein mimics from the interaction interphase of human angiotensin converting enzyme 2 (ACE2). ACE2 is the natural interacting partner for the SARS-CoV-2 spike receptor binding domain (RBD) and acts as a recognition molecule for viral entry into the host cells. Starting with ACE2 N-terminal triple helix interaction interphase, we generated more than 70 miniprotein sequences. Employing Rosetta folding and docking scores we selected 10 promising miniprotein candidates amongst which 3 were found to be soluble in lab studies. Further, using molecular mechanics (MM) calculations on molecular dynamics (MD) trajectories we test interaction of miniproteins with RBD from various variants of concern (VOC). Presently, we report two key findings; miniproteins in this study are generated using less than 10 lab testing experiments, yet when tested through in-vitro experiments, they show submicro to nanomolar affinities towards SARS-CoV-2 RBD. Also in simulation studies, when compared with previously developed therapeutics, our miniproteins display remarkable ability to mimic ACE2 interphase; making them an ideal solution to the ever evolving problem of VOCs.Communicated by Ramaswamy H. Sarma.
Collapse
|
16
|
Abstract
Recent breakthroughs in AI coupled with the rapid accumulation of protein sequence and structure data have radically transformed computational protein design. New methods promise to escape the constraints of natural and laboratory evolution, accelerating the generation of proteins for applications in biotechnology and medicine. To make sense of the exploding diversity of machine learning approaches, we introduce a unifying framework that classifies models on the basis of their use of three core data modalities: sequences, structures and functional labels. We discuss the new capabilities and outstanding challenges for the practical design of enzymes, antibodies, vaccines, nanomachines and more. We then highlight trends shaping the future of this field, from large-scale assays to more robust benchmarks, multimodal foundation models, enhanced sampling strategies and laboratory automation.
Collapse
|
17
|
Abstract
Information in proteins flows from sequence to structure to function, with each step causally driven by the preceding one. Protein design is founded on inverting this process: specify a desired function, design a structure executing this function, and find a sequence that folds into this structure. This 'central dogma' underlies nearly all de novo protein-design efforts. Our ability to accomplish these tasks depends on our understanding of protein folding and function and our ability to capture this understanding in computational methods. In recent years, deep learning-derived approaches for efficient and accurate structure modeling and enrichment of successful designs have enabled progression beyond the design of protein structures and towards the design of functional proteins. We examine these advances in the broader context of classical de novo protein design and consider implications for future challenges to come, including fundamental capabilities such as sequence and structure co-design and conformational control considering flexibility, and functional objectives such as antibody and enzyme design.
Collapse
|
18
|
De novo protein design-From new structures to programmable functions. Cell 2024; 187:526-544. [PMID: 38306980 PMCID: PMC10990048 DOI: 10.1016/j.cell.2023.12.028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Revised: 12/03/2023] [Accepted: 12/19/2023] [Indexed: 02/04/2024]
Abstract
Methods from artificial intelligence (AI) trained on large datasets of sequences and structures can now "write" proteins with new shapes and molecular functions de novo, without starting from proteins found in nature. In this Perspective, I will discuss the state of the field of de novo protein design at the juncture of physics-based modeling approaches and AI. New protein folds and higher-order assemblies can be designed with considerable experimental success rates, and difficult problems requiring tunable control over protein conformations and precise shape complementarity for molecular recognition are coming into reach. Emerging approaches incorporate engineering principles-tunability, controllability, and modularity-into the design process from the beginning. Exciting frontiers lie in deconstructing cellular functions with de novo proteins and, conversely, constructing synthetic cellular signaling from the ground up. As methods improve, many more challenges are unsolved.
Collapse
|
19
|
Antimicrobial resistance crisis: could artificial intelligence be the solution? Mil Med Res 2024; 11:7. [PMID: 38254241 PMCID: PMC10804841 DOI: 10.1186/s40779-024-00510-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 01/08/2024] [Indexed: 01/24/2024] Open
Abstract
Antimicrobial resistance is a global public health threat, and the World Health Organization (WHO) has announced a priority list of the most threatening pathogens against which novel antibiotics need to be developed. The discovery and introduction of novel antibiotics are time-consuming and expensive. According to WHO's report of antibacterial agents in clinical development, only 18 novel antibiotics have been approved since 2014. Therefore, novel antibiotics are critically needed. Artificial intelligence (AI) has been rapidly applied to drug development since its recent technical breakthrough and has dramatically improved the efficiency of the discovery of novel antibiotics. Here, we first summarized recently marketed novel antibiotics, and antibiotic candidates in clinical development. In addition, we systematically reviewed the involvement of AI in antibacterial drug development and utilization, including small molecules, antimicrobial peptides, phage therapy, essential oils, as well as resistance mechanism prediction, and antibiotic stewardship.
Collapse
|
20
|
Abstract
Thalidomide and its derivatives are powerful cancer therapeutics that are among the best-understood molecular glue degraders (MGDs). These drugs selectively reprogram the E3 ubiquitin ligase cereblon (CRBN) to commit target proteins for degradation by the ubiquitin-proteasome system. MGDs create novel recognition interfaces on the surface of the E3 ligase that engage in induced protein-protein interactions with neosubstrates. Molecular insight into their mechanism of action opens exciting opportunities to engage a plethora of targets through a specific recognition motif, the G-loop. Our analysis shows that current CRBN-based MGDs can in principle recognize over 2,500 proteins in the human proteome that contain a G-loop. We review recent advances in tuning the specificity between CRBN and its MGD-induced neosubstrates and deduce a set of simple rules that govern these interactions. We conclude that rational MGD design efforts will enable selective degradation of many more proteins, expanding this therapeutic modality to more disease areas.
Collapse
|
21
|
De novo-designed minibinders expand the synthetic biology sensing repertoire. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.12.575267. [PMID: 38293112 PMCID: PMC10827046 DOI: 10.1101/2024.01.12.575267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2024]
Abstract
Synthetic and chimeric receptors capable of recognizing and responding to user-defined antigens have enabled "smart" therapeutics based on engineered cells. These cell engineering tools depend on antigen sensors which are most often derived from antibodies. Advances in the de novo design of proteins have enabled the design of protein binders with the potential to target epitopes with unique properties and faster production timelines compared to antibodies. Building upon our previous work combining a de novo-designed minibinder of the Spike protein of SARS-CoV-2 with the synthetic receptor synNotch (SARSNotch), we investigated whether minibinders can be readily adapted to a diversity of cell engineering tools. We show that the Spike minibinder LCB1 easily generalizes to a next-generation proteolytic receptor SNIPR that performs similarly to our previously reported SARSNotch. LCB1-SNIPR successfully enables the detection of live SARS-CoV-2, an improvement over SARSNotch which can only detect cell-expressed Spike. To test the generalizability of minibinders to diverse applications, we tested LCB1 as an antigen sensor for a chimeric antigen receptor (CAR). LCB1-CAR enabled CD8+ T cells to cytotoxically target Spike-expressing cells. Our findings suggest that minibinders represent a novel class of antigen sensors that have the potential to dramatically expand the sensing repertoire of cell engineering tools.
Collapse
|
22
|
Advances in ligand-specific biosensing for structurally similar molecules. Cell Syst 2023; 14:1024-1043. [PMID: 38128482 PMCID: PMC10751988 DOI: 10.1016/j.cels.2023.10.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2023] [Revised: 08/23/2023] [Accepted: 10/19/2023] [Indexed: 12/23/2023]
Abstract
The specificity of biological systems makes it possible to develop biosensors targeting specific metabolites, toxins, and pollutants in complex medical or environmental samples without interference from structurally similar compounds. For the last two decades, great efforts have been devoted to creating proteins or nucleic acids with novel properties through synthetic biology strategies. Beyond augmenting biocatalytic activity, expanding target substrate scopes, and enhancing enzymes' enantioselectivity and stability, an increasing research area is the enhancement of molecular specificity for genetically encoded biosensors. Here, we summarize recent advances in the development of highly specific biosensor systems and their essential applications. First, we describe the rational design principles required to create libraries containing potential mutants with less promiscuity or better specificity. Next, we review the emerging high-throughput screening techniques to engineer biosensing specificity for the desired target. Finally, we examine the computer-aided evaluation and prediction methods to facilitate the construction of ligand-specific biosensors.
Collapse
|
23
|
In silico evolution of autoinhibitory domains for a PD-L1 antagonist using deep learning models. Proc Natl Acad Sci U S A 2023; 120:e2307371120. [PMID: 38032933 DOI: 10.1073/pnas.2307371120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 09/24/2023] [Indexed: 12/02/2023] Open
Abstract
There has been considerable progress in the development of computational methods for designing protein-protein interactions, but engineering high-affinity binders without extensive screening and maturation remains challenging. Here, we test a protein design pipeline that uses iterative rounds of deep learning (DL)-based structure prediction (AlphaFold2) and sequence optimization (ProteinMPNN) to design autoinhibitory domains (AiDs) for a PD-L1 antagonist. With the goal of creating an anticancer agent that is inactive until reaching the tumor environment, we sought to create autoinhibited (or masked) forms of the PD-L1 antagonist that can be unmasked by tumor-enriched proteases. Twenty-three de novo designed AiDs, varying in length and topology, were fused to the antagonist with a protease-sensitive linker, and binding to PD-L1 was measured with and without protease treatment. Nine of the fusion proteins demonstrated conditional binding to PD-L1, and the top-performing AiDs were selected for further characterization as single-domain proteins. Without any experimental affinity maturation, four of the AiDs bind to the PD-L1 antagonist with equilibrium dissociation constants (KDs) below 150 nM, with the lowest KD equal to 0.9 nM. Our study demonstrates that DL-based protein modeling can be used to rapidly generate high-affinity protein binders.
Collapse
|
24
|
ProteinMAE: masked autoencoder for protein surface self-supervised learning. Bioinformatics 2023; 39:btad724. [PMID: 38019955 PMCID: PMC10713117 DOI: 10.1093/bioinformatics/btad724] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 10/27/2023] [Accepted: 11/28/2023] [Indexed: 12/01/2023] Open
Abstract
SUMMARY The biological functions of proteins are determined by the chemical and geometric properties of their surfaces. Recently, with the booming progress of deep learning, a series of learning-based surface descriptors have been proposed and achieved inspirational performance in many tasks such as protein design, protein-protein interaction prediction, etc. However, they are still limited by the problem of label scarcity, since the labels are typically obtained through wet experiments. Inspired by the great success of self-supervised learning in natural language processing and computer vision, we introduce ProteinMAE, a self-supervised framework specifically designed for protein surface representation to mitigate label scarcity. Specifically, we propose an efficient network and utilize a large number of accessible unlabeled protein data to pretrain it by self-supervised learning. Then we use the pretrained weights as initialization and fine-tune the network on downstream tasks. To demonstrate the effectiveness of our method, we conduct experiments on three different downstream tasks including binding site identification in protein surface, ligand-binding protein pocket classification, and protein-protein interaction prediction. The extensive experiments show that our method not only successfully improves the network's performance on all downstream tasks, but also achieves competitive performance with state-of-the-art methods. Moreover, our proposed network also exhibits significant advantages in terms of computational cost, which only requires less than a tenth of memory cost of previous methods. AVAILABILITY AND IMPLEMENTATION https://github.com/phdymz/ProteinMAE.
Collapse
|
25
|
Genetically encoded protein crystals by hierarchical design. NATURE MATERIALS 2023; 22:1439-1440. [PMID: 38017040 DOI: 10.1038/s41563-023-01719-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2023]
|
26
|
PepMLM: Target Sequence-Conditioned Generation of Peptide Binders via Masked Language Modeling. ARXIV 2023:arXiv:2310.03842v2. [PMID: 37873004 PMCID: PMC10593082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
Target proteins that lack accessible binding pockets and conformational stability have posed increasing challenges for drug development. Induced proximity strategies, such as PROTACs and molecular glues, have thus gained attention as pharmacological alternatives, but still require small molecule docking at binding pockets for targeted protein degradation (TPD). The computational design of protein-based binders presents unique opportunities to access undruggable targets, but have often relied on stable 3D structures or predictions for effective binder generation. Recently, we have leveraged the expressive latent spaces of protein language models (pLMs) for the prioritization of peptide binders from sequence alone, which we have then fused to E3 ubiquitin ligase domains, creating a CRISPR-analogous TPD system for target proteins. However, our methods rely on training discriminator models for ranking heuristically or unconditionally-derived guide peptides for their target binding capability. In this work, we introduce PepMLM, a purely target sequence-conditioned de novo generator of linear peptide binders. By employing a novel masking strategy that uniquely positions cognate peptide sequences at the terminus of target protein sequences, PepMLM tasks the state-of-the-art ESM-2 pLM to fully reconstruct the binder region, achieving low perplexities matching or improving upon previously-validated peptide-protein sequence pairs. After successful in silico benchmarking with AlphaFold-Multimer, we experimentally verify PepMLM's efficacy via fusion of model-derived peptides to E3 ubiquitin ligase domains, demonstrating endogenous degradation of target substrates in cellular models. In total, PepMLM enables the generative design of candidate binders to any target protein, without the requirement of target structure, empowering downstream programmable proteome editing applications.
Collapse
|
27
|
A new age in protein design empowered by deep learning. Cell Syst 2023; 14:925-939. [PMID: 37972559 DOI: 10.1016/j.cels.2023.10.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 06/22/2023] [Accepted: 10/11/2023] [Indexed: 11/19/2023]
Abstract
The rapid progress in the field of deep learning has had a significant impact on protein design. Deep learning methods have recently produced a breakthrough in protein structure prediction, leading to the availability of high-quality models for millions of proteins. Along with novel architectures for generative modeling and sequence analysis, they have revolutionized the protein design field in the past few years remarkably by improving the accuracy and ability to identify novel protein sequences and structures. Deep neural networks can now learn and extract the fundamental features of protein structures, predict how they interact with other biomolecules, and have the potential to create new effective drugs for treating disease. As their applicability in protein design is rapidly growing, we review the recent developments and technology in deep learning methods and provide examples of their performance to generate novel functional proteins.
Collapse
|
28
|
Peptide binder design with inverse folding and protein structure prediction. Commun Chem 2023; 6:229. [PMID: 37880344 PMCID: PMC10600234 DOI: 10.1038/s42004-023-01029-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Accepted: 10/13/2023] [Indexed: 10/27/2023] Open
Abstract
The computational design of peptide binders towards a specific protein interface can aid diagnostic and therapeutic efforts. Here, we design peptide binders by combining the known structural space searched with Foldseek, the protein design method ESM-IF1, and AlphaFold2 (AF) in a joint framework. Foldseek generates backbone seeds for a modified version of ESM-IF1 adapted to protein complexes. The resulting sequences are evaluated with AF using an MSA representation for the receptor structure and a single sequence for the binder. We show that AF can accurately evaluate protein binders and that our bind score can select these (ROC AUC = 0.96 for the heterodimeric case). We find that designs created from seeds with more contacts per residue are more successful and tend to be short. There is a relationship between the sequence recovery in interface positions and the plDDT of the designs, where designs with ≥80% recovery have an average plDDT of 84 compared to 55 at 0%. Designed sequences have 60% higher median plDDT values towards intended receptors than non-intended ones. Successful binders (predicted interface RMSD ≤ 2 Å) are designed towards 185 (6.5%) heteromeric and 42 (3.6%) homomeric protein interfaces with ESM-IF1 compared with 18 (1.5%) using ProteinMPNN from 100 samples.
Collapse
|
29
|
A Suite of Designed Protein Cages Using Machine Learning Algorithms and Protein Fragment-Based Protocols. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.09.561468. [PMID: 37873110 PMCID: PMC10592684 DOI: 10.1101/2023.10.09.561468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
Designed protein cages and related materials provide unique opportunities for applications in biotechnology and medicine, while methods for their creation remain challenging and unpredictable. In the present study, we apply new computational approaches to design a suite of new tetrahedrally symmetric, self-assembling protein cages. For the generation of docked poses, we emphasize a protein fragment-based approach, while for de novo interface design, a comparison of computational protocols highlights the power and increased experimental success achieved using the machine learning program ProteinMPNN. In relating information from docking and design, we observe that agreement between fragment-based sequence preferences and ProteinMPNN sequence inference correlates with experimental success. Additional insights for designing polar interactions are highlighted by experimentally testing larger and more polar interfaces. In all, using X-ray crystallography and cryo-EM, we report five structures for seven protein cages, with atomic resolution in the best case reaching 2.0 Å. We also report structures of two incompletely assembled protein cages, providing unique insights into one type of assembly failure. The new set of designed cages and their structures add substantially to the body of available protein nanoparticles, and to methodologies for their creation.
Collapse
|
30
|
De novo prediction of explicit water molecule positions by a novel algorithm within the protein design software MUMBO. Sci Rep 2023; 13:16680. [PMID: 37794104 PMCID: PMC10550942 DOI: 10.1038/s41598-023-43659-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Accepted: 09/26/2023] [Indexed: 10/06/2023] Open
Abstract
By mediating interatomic interactions, water molecules play a major role in protein-protein, protein-DNA and protein-ligand interfaces, significantly affecting affinity and specificity. This notwithstanding, explicit water molecules are usually not considered in protein design software because of high computational costs. To challenge this situation, we analyzed the binding characteristics of 60,000 waters from high resolution crystal structures and used the observed parameters to implement the prediction of water molecules in the protein design and side chain-packing software MUMBO. To reduce the complexity of the problem, we incorporated water molecules through the solvation of rotamer pairs instead of relying on solvated rotamer libraries. Our validation demonstrates the potential of our algorithm by achieving recovery rates of 67% for bridging water molecules and up to 86% for fully coordinated waters. The efficacy of our algorithm is highlighted further by the prediction of 3 different proteinligand complexes. Here, 91% of water-mediated interactions between protein and ligand are correctly predicted. These results suggest that the new algorithm could prove highly beneficial for structure-based protein design, particularly for the optimization of ligand-binding pockets or protein-protein interfaces.
Collapse
|
31
|
Rational design of small-molecule responsive protein switches. Protein Sci 2023; 32:e4774. [PMID: 37656809 PMCID: PMC10510469 DOI: 10.1002/pro.4774] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 08/26/2023] [Accepted: 08/29/2023] [Indexed: 09/03/2023]
Abstract
Small-molecule responsive protein switches are powerful tools for controlling cellular processes. These switches are designed to respond rapidly and specifically to their inducer. They have been used in numerous applications, including the regulation of gene expression, post-translational protein modification, and signal transduction. Typically, small-molecule responsive protein switches consist of two proteins that interact with each other in the presence or absence of a small molecule. Recent advances in computational protein design already contributed to the development of protein switches with an expanded range of small-molecule inducers and increasingly sophisticated switch mechanisms. Further progress in the engineering of small-molecule responsive switches is fueled by cutting-edge computational design approaches, which will enable more complex and precise control over cellular processes and advance synthetic biology applications in biotechnology and medicine. Here, we discuss recent milestones and how technological advances are impacting the development of chemical switches.
Collapse
|
32
|
Recent Advances in the Biosynthesis of Natural Sugar Substitutes in Yeast. J Fungi (Basel) 2023; 9:907. [PMID: 37755015 PMCID: PMC10533046 DOI: 10.3390/jof9090907] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 08/29/2023] [Accepted: 09/01/2023] [Indexed: 09/28/2023] Open
Abstract
Natural sugar substitutes are safe, stable, and nearly calorie-free. Thus, they are gradually replacing the traditional high-calorie and artificial sweeteners in the food industry. Currently, the majority of natural sugar substitutes are extracted from plants, which often requires high levels of energy and causes environmental pollution. Recently, biosynthesis via engineered microbial cell factories has emerged as a green alternative for producing natural sugar substitutes. In this review, recent advances in the biosynthesis of natural sugar substitutes in yeasts are summarized. The metabolic engineering approaches reported for the biosynthesis of oligosaccharides, sugar alcohols, glycosides, and rare monosaccharides in various yeast strains are described. Meanwhile, some unresolved challenges in the bioproduction of natural sugar substitutes in yeast are discussed to offer guidance for future engineering.
Collapse
|
33
|
Physics-supervised deep learning-based optimization (PSDLO) with accuracy and efficiency. Proc Natl Acad Sci U S A 2023; 120:e2309062120. [PMID: 37603744 PMCID: PMC10466106 DOI: 10.1073/pnas.2309062120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 07/21/2023] [Indexed: 08/23/2023] Open
Abstract
Identifying efficient and accurate optimization algorithms is a long-desired goal for the scientific community. At present, a combination of evolutionary and deep-learning methods is widely used for optimization. In this paper, we demonstrate three cases involving different physics and conclude that no matter how accurate a deep-learning model is for a single, specific problem, a simple combination of evolutionary and deep-learning methods cannot achieve the desired optimization because of the intrinsic nature of the evolutionary method. We begin by using a physics-supervised deep-learning optimization algorithm (PSDLO) to supervise the results from the deep-learning model. We then intervene in the evolutionary process to eventually achieve simultaneous accuracy and efficiency. PSDLO is successfully demonstrated using both sufficient and insufficient datasets. PSDLO offers a perspective for solving optimization problems and can tackle complex science and engineering problems having many features. This approach to optimization algorithms holds tremendous potential for application in real-world engineering domains.
Collapse
|
34
|
Growing ecosystem of deep learning methods for modeling protein-protein interactions. Protein Eng Des Sel 2023; 36:gzad023. [PMID: 38102755 DOI: 10.1093/protein/gzad023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 12/06/2023] [Accepted: 12/07/2023] [Indexed: 12/17/2023] Open
Abstract
Numerous cellular functions rely on protein-protein interactions. Efforts to comprehensively characterize them remain challenged however by the diversity of molecular recognition mechanisms employed within the proteome. Deep learning has emerged as a promising approach for tackling this problem by exploiting both experimental data and basic biophysical knowledge about protein interactions. Here, we review the growing ecosystem of deep learning methods for modeling protein interactions, highlighting the diversity of these biophysically informed models and their respective trade-offs. We discuss recent successes in using representation learning to capture complex features pertinent to predicting protein interactions and interaction sites, geometric deep learning to reason over protein structures and predict complex structures, and generative modeling to design de novo protein assemblies. We also outline some of the outstanding challenges and promising new directions. Opportunities abound to discover novel interactions, elucidate their physical mechanisms, and engineer binders to modulate their functions using deep learning and, ultimately, unravel how protein interactions orchestrate complex cellular behaviors.
Collapse
|
35
|
A machine learning strategy for the identification of key in silico descriptors and prediction models for IgG monoclonal antibody developability properties. MAbs 2023; 15:2248671. [PMID: 37610144 PMCID: PMC10448975 DOI: 10.1080/19420862.2023.2248671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 07/28/2023] [Accepted: 08/11/2023] [Indexed: 08/24/2023] Open
Abstract
Identification of favorable biophysical properties for protein therapeutics as part of developability assessment is a crucial part of the preclinical development process. Successful prediction of such properties and bioassay results from calculated in silico features has potential to reduce the time and cost of delivering clinical-grade material to patients, but nevertheless has remained an ongoing challenge to the field. Here, we demonstrate an automated and flexible machine learning workflow designed to compare and identify the most powerful features from computationally derived physiochemical feature sets, generated from popular commercial software packages. We implement this workflow with medium-sized datasets of human and humanized IgG molecules to generate predictive regression models for two key developability endpoints, hydrophobicity and poly-specificity. The most important features discovered through the automated workflow corroborate several previous literature reports, and newly discovered features suggest directions for further research and potential model improvement.
Collapse
|