1
|
Atkinson T, Barrett TD, Cameron S, Guloglu B, Greenig M, Tan CB, Robinson L, Graves A, Copoiu L, Laterre A. Protein sequence modelling with Bayesian flow networks. Nat Commun 2025; 16:3197. [PMID: 40180946 PMCID: PMC11968962 DOI: 10.1038/s41467-025-58250-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2024] [Accepted: 03/11/2025] [Indexed: 04/05/2025] Open
Abstract
Exploring the vast and largely uncharted territory of amino acid sequences is crucial for understanding complex protein functions and the engineering of novel therapeutic proteins. Whilst generative machine learning has advanced protein sequence modelling, no existing approach is proficient in both unconditional and conditional generation. In this work, we propose that Bayesian Flow Networks (BFNs), a recently introduced framework for generative modelling, can address these challenges. We present ProtBFN, a 650M parameter model trained on protein sequences curated from UniProtKB, which generates natural-like, diverse, structurally coherent, and novel protein sequences, significantly outperforming leading autoregressive and discrete diffusion models. Further, we fine-tune ProtBFN on heavy chains from the Observed Antibody Space to obtain an antibody-specific model, AbBFN, which we use to evaluate zero-shot conditional generation capabilities. AbBFN is found to be competitive with or better than antibody-specific BERT-style models when applied to predicting individual framework or complimentary determining regions.
Collapse
Affiliation(s)
| | | | - Scott Cameron
- InstaDeep, 5 Merchant Square, London, W2 1AY, England
| | - Bora Guloglu
- InstaDeep, 5 Merchant Square, London, W2 1AY, England
| | | | - Charlie B Tan
- InstaDeep, 5 Merchant Square, London, W2 1AY, England
| | | | - Alex Graves
- InstaDeep, 5 Merchant Square, London, W2 1AY, England
| | - Liviu Copoiu
- InstaDeep, 5 Merchant Square, London, W2 1AY, England
| | | |
Collapse
|
2
|
Gu J, Mu W, Xu Y, Nie Y. From discovery to application: Enabling technology-based optimizing carbonyl reductases biocatalysis for active pharmaceutical ingredient synthesis. Biotechnol Adv 2025; 79:108496. [PMID: 39647674 DOI: 10.1016/j.biotechadv.2024.108496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2024] [Revised: 10/04/2024] [Accepted: 11/30/2024] [Indexed: 12/10/2024]
Abstract
The catalytic conversion of chiral alcohols and corresponding carbonyl compounds by carbonyl reductases (alcohol dehydrogenases), which are NAD(P) or NAD(P)H-dependent oxidoreductases, has attracted considerable attention. However, existing carbonyl reductases are insufficient to meet the demands of diverse industrial applications; hence, new enzymes with functions that can expand the toolbox of biocatalysts are urgently required. Developing precisely controlled chiral biocatalysts is of great significance for the efficient development of a broad spectrum of active pharmaceutical ingredients via biosynthesis. In this review, we summarized methods for discovering novel natural carbonyl reductases from various perspectives. Furthermore, advances in protein engineering, utilizing known sequence and structural information as well as catalytic dynamics mechanisms to improve potential functions, are also addressed. The exponential growth in data-driven tools over the past decade has made it possible to de novo design carbonyl reductases. Additionally, various applications of these high-performance carbonyl reductases and different strategies for coenzyme regeneration involving photocatalysis during the reaction process were reviewed. These advancements will bring new opportunities and challenges to the fields of green chemistry and biosynthesis in the future.
Collapse
Affiliation(s)
- Jie Gu
- Lab of Brewing Microbiology and Applied Enzymology, School of Biotechnology and Key laboratory of Industrial Biotechnology of Ministry of Education, Jiangnan University, Wuxi 214122, China; School of Food Science and Technology, Jiangnan University, Wuxi 214122, China
| | - Wanmeng Mu
- School of Food Science and Technology, Jiangnan University, Wuxi 214122, China; State Key Laboratory of Food Science and Technology, Jiangnan University, Wuxi 214122, China
| | - Yan Xu
- Lab of Brewing Microbiology and Applied Enzymology, School of Biotechnology and Key laboratory of Industrial Biotechnology of Ministry of Education, Jiangnan University, Wuxi 214122, China; State Key Laboratory of Food Science and Technology, Jiangnan University, Wuxi 214122, China
| | - Yao Nie
- Lab of Brewing Microbiology and Applied Enzymology, School of Biotechnology and Key laboratory of Industrial Biotechnology of Ministry of Education, Jiangnan University, Wuxi 214122, China.
| |
Collapse
|
3
|
Pitarch B, Pazos F. Deep Learning Approaches for the Prediction of Protein Functional Sites. Molecules 2025; 30:214. [PMID: 39860084 PMCID: PMC11767512 DOI: 10.3390/molecules30020214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2024] [Revised: 12/20/2024] [Accepted: 01/01/2025] [Indexed: 01/27/2025] Open
Abstract
Knowing which residues of a protein are important for its function is of paramount importance for understanding the molecular basis of this function and devising ways of modifying it for medical or biotechnological applications. Due to the difficulty in detecting these residues experimentally, prediction methods are essential to cope with the sequence deluge that is filling databases with uncharacterized protein sequences. Deep learning approaches are especially well suited for this task due to the large amounts of protein sequences for training them, the trivial codification of this sequence data to feed into these systems, and the intrinsic sequential nature of the data that makes them suitable for language models. As a consequence, deep learning-based approaches are being applied to the prediction of different types of functional sites and regions in proteins. This review aims to give an overview of the current landscape of methodologies so that interested users can have an idea of which kind of approaches are available for their proteins of interest. We also try to give an idea of how these systems work, as well as explain their limitations and high dependence on the training set so that users are aware of the quality of expected results.
Collapse
Affiliation(s)
| | - Florencio Pazos
- Computational Systems Biology Group, National Center for Biotechnology (CNB-CSIC), 28049 Madrid, Spain;
| |
Collapse
|
4
|
Bergeron JJM. Proteomics Impact on Cell Biology to Resolve Cell Structure and Function. Mol Cell Proteomics 2024; 23:100758. [PMID: 38574860 PMCID: PMC11070594 DOI: 10.1016/j.mcpro.2024.100758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 03/23/2024] [Accepted: 03/26/2024] [Indexed: 04/06/2024] Open
Abstract
The acceleration of advances in proteomics has enabled integration with imaging at the EM and light microscopy levels, cryo-EM of protein structures, and artificial intelligence with proteins comprehensively and accurately resolved for cell structures at nanometer to subnanometer resolution. Proteomics continues to outpace experimentally based structural imaging, but their ultimate integration is a path toward the goal of a compendium of all proteins to understand mechanistically cell structure and function.
Collapse
Affiliation(s)
- John J M Bergeron
- Department of Medicine, McGill University Hospital Research Institute, Montreal, Quebec, Canada.
| |
Collapse
|